10,000 Matching Annotations
  1. May 2026
    1. Reviewer #3 (Public review):

      Solyga, Zelechowski, and Keller present a concise report of an innovative study demonstrating clear visuomotor mismatch responses in ambulating humans, using a mobile EEG setup and virtual reality. Human subjects walked around a virtual corridor while EEGs were recorded. Occasionally, motion and visual flow were uncoupled, and this evoked a mismatch response that was strongest in occipitally placed electrodes and had a considerable signal to noise ratio. It was robust across participants and could not be explained by the visual stimulus alone.

      This is an important extension of their prior work in mice, and represents an elegant translation of those previous findings to humans, where future work can inform theories of e.g. psychiatric diseases that are believed to involve disordered predictive processing. For the most part, the authors are appropriately circumspect in their interpretations and discussions of the implications. The paper in its current form represents an important addition to the literature.

      The authors have included analyses of the auditory mismatch using temporal electrodes, referenced to Cz (and therefore should exhibit a mismatch positivity). This added data clearly and convincingly shows that the sensorimotor mismatch is, indeed, stronger than the passive auditory MMN.

      - The reference electrode placed at Cz makes it is difficult to interpret relative differences between frontal and occipital electrode responses, as the occipital electrodes are placed farther away from the Cz reference than the frontal electrodes. Similarly, signal occuring cortically near the Cz reference might only appear as though it is occipitally distributed in this montage. It is common in EEG research to re-montage the data to an averaged common reference in order to better interpret the scalp distributions. As the electrode coverage was sparse for some subjects, this could be challenging, and this reviewer does not feel that it is necessary to do this analysis step, or even to drastically rewrite the body of the paper. We only request that some discussion, however brief, is included in the discussion section or the methods that recommend more dense electrode coverage in the future to better interpret scalp distributions and potential meso-scale sources.

      - This is just a suggestion. The authors are encouraged to analyse (and report) time-frequency power and phase locking for these mismatch responses, as is common in much of the literature (see Roach et al 2008 Schizophrenia Bulletin). This is not to say that doing so will yield insights into oscillations per se, but converting the data to the time-frequency domain provides another perspective that has some advantages. fosters translations to rodent models, as ERP peaks do not map well between species, but e.g. delta-theta power does (see Lee et al 2018 Neuropsychopharmacology; Javitt et all 2018 Schizophrenia research; Gallimore et al 2023 Cereb Ctx). Further, ERP peaks can be influenced by the actual neuroanatomy of an individual (especially for quantifying V1 responses). Time frequency analyses may aid in interpreting the "early negative deflection with a peak latency of 48 ms " finding as well. As it stands, the report is complete, and it would be acceptable if the authors chose to save this type of analysis for a future publication.

    2. Author response:

      The following is the authors’ response to the original reviews.

      We thank you for the time you took to review our work and for your feedback! The main changes to the manuscript are:

      (1) We have performed additional experiments to increase the number of recordings from frontal and occipital electrodes (previously 51 (occipital: O1+O2) and 26 (frontal: Fp1+Fp2), now 133 and 102). The additional data have strengthened many of our results, including for example the trend for a latency difference between occipital and frontal electrodes that was likely underpowered and is now significant (Figure 3E). We have updated all relevant figures to include the additional data (Figures 2–6, Figure S4, Figure S5). None of the main conclusions have changed.

      (2) As suggested by reviewer 1, we have conducted additional experiments to rule out the possibility that the observed effects were driven by the temporal order of open and closed loop sessions (new Figure S6). We also found another 9 participants who were willing to go on the ‘vomit comet’ of six degrees of freedom (6DOF) playback (previously 5, now 14). These data have further strengthened our conclusion that playback halt responses in 4DOF and 6DOF playback are not substantially different (Figure S4).

      (3) To address the point of reviewers 2 and 3, that mismatch negativity (MMN) responses would be larger on temporal electrodes, we conducted additional experiments in which we also recorded from temporal electrodes T3–T6. We have now added a comparison of visuomotor mismatch and MMN responses on T3–T6 electrodes as Figures S8–S9. On all electrodes, visuomotor mismatch responses were larger than MMN responses.

      (4) As suggested by reviewer 1, we have added an analysis of the experience-dependent changes in mismatch responses comparing frontal and occipital responses early and late in the session (new Figure 4).

      (5) As suggested by reviewer 2, we conducted additional experiments in an independent cohort of participants (note, without concurrent EEG) to measure eye movements triggered by visuomotor mismatches. We found eye-movement speed and blink/eye-closure changes, but these had longer latency than visuomotor mismatch responses (Figure S7).

      (6) Finally, as suggested by reviewers 2 and 3, we applied independent component (ICA) and time–frequency analyses to the EEG data. We show these results and explain why they are not applicable or useful in our case in the responses below.

      Please note, during the revision, we found that a part of our analysis used a bandpass of 0.2-100 Hz while a 1-100 Hz bandpass filter was used elsewhere. This has now been standardized to a 1-100 Hz bandpass filter, and the corresponding methods were updated. This resulted in no relevant changes to the figures. Additionally, the 50 Hz band-stop filter was erroneously described in the methods as 49-51 Hz. The filter used was 40-60 Hz, and the methods have been updated to reflect this.

      Reviewer #1 (Public review):

      In this paper, the authors wished to determine human visuomotor mismatch responses in EEG in a VR setting. Participants were required to walk around a virtual corridor, where a mismatch was created by halting the display for 0.5s. This occurred every 10-15 seconds. They observe an occipital mismatch signal at 180 ms. They determine the specificity of this signal to visuomotor mismatch by subsequently playing back the same recording passively. They also show qualitatively that the mismatch response is larger than one generated in a standard auditory oddball paradigm. They conclude that humans therefore exhibit visuomotor mismatch responses like mice, and that this may provide an especially powerful paradigm for studying prediction error more generally.

      Asking about the role of visuomotor prediction in sensory processing is of fundamental importance to understanding perception and action control, but I wasn't entirely sure what to conclude from the present paradigm or findings. Visuomotor prediction did not appear to have been functionally isolated. I hope the comments below are helpful.

      (1) First, isolating visuomotor prediction by contrasting against a condition where the same video stream is played back subsequently does not seem to isolate visuomotor prediction. This condition always comes second, and therefore, predictability (rather than specifically visuomotor predictability) differs. Participants can learn to expect these screen freezes every 10-15 s, even precisely where they are in the session, and this will reduce the prediction error across time. Therefore, the smaller response in the passive condition may be partly explained by such learning. It's impossible to fully remove this confound, because the authors currently play back the visual specifics from the visuomotor condition, but given that the visuomotor correspondences are otherwise pretty stable, they could have an additional control condition where someone else's visual trace is played back instead of their own, and order counterbalanced. Learning that the freezes occur every 10-15 s, or even precisely where they occur, therefore, could not explain condition differences. At a minimum, it would be nice to see the traces for the first and second half of each session to see the extent to which the mismatch response gets smaller. This won't control for learning about the specific separations of the freezes, but it's a step up from the current information.

      In theory, it is correct that the open loop (playback) session is predictable. However, this is relatively unrealistic. The open loop session is a 5-minute sequence that participants have only experienced once before, when they were generating it in the closed loop session a couple of minutes earlier. It is unlikely that participants would remember the entire sequence to a precision of less than a second, which is what they would need to predict the mismatch event. However, the reviewer is correct that it is possible that the mismatch events lose salience with time, for example as a consequence of participants losing interest in the task with time, or by undergoing some form of adaptation. To address this, we repeated the experiments with the sequence of closed and open loop sessions reversed (Figures S6A-S6C), and we analyzed the responses as a function of time within the session (Figures S6D and S6E), as suggested.

      The reversed-order design consisted of (1) open loop session: a playback, in which participants viewed the recorded closed loop session of a previous participant. This was followed by (2) a closed loop session, in which participants actively walked through the tunnel and experienced visuomotor mismatch events. Using this design, we again found that responses in the closed loop session were significantly larger than in the open loop session (Figures S6A-S6C).

      In addition, we analyzed both new and previously collected data as a function of time in the session. We computed moving average responses across 10 mismatch or playback halt trials at different percentages of progress through the paradigm (Figures S6D and S6E). This analysis revealed no consistent experience-dependent changes that could account for the observed differences between closed and open loop session. While there was indeed some form of experience dependent attenuation of visuomotor mismatch responses (see new Figure 4), the difference at the transition from mismatch to playback halt (and vice versa) far exceeded these adaptation effects (Figures S6D and S6E). This analysis was performed only on data from participants for whom we had both closed and open loop sessions and met our inclusion criteria.

      We used a similar analysis to test whether early and late responses within a session systematically differed (new Figure 4). Here, to maximize the chance of finding a difference, we compared early (first five) and late (last five) trials. Behaviorally, participants reduced their walking speed following mismatch events, with a significantly larger reduction during early trials (14.3%) than during late trials (5.7%) (Figure 4A). Neural responses mirrored this pattern primarily on frontal electrodes: frontal activity showed a clear attenuation from early to late trials (Figure 4B), consistent with the reduction in behavioral responses. In contrast, changes on occipital electrodes were much smaller between early and late trials (Figure 4C-4D). Thus, experience-related modulation is substantially stronger in frontal compared to occipital regions.

      In sum, we do not believe that the difference between visuomotor mismatch responses and playback halt responses can be explained by differences in the predictability of mismatch and playback halt events.

      (2) Second, the authors admirably modified their visual-only condition to remove nausea from 6 df of movement (3D position, pitch, yaw, and roll). However, despite the fact it's far from ideal to have nauseous participants, it would appear from the figures that these modifications may have changed the responses (despite some pairwise lack of significance with small N). Specifically, the trace in S3 (6DOF) and 2E look similar - i.e., comparing the visuomotor condition to the visual condition that matches. Mismatch at 4/5 microvolts in both. Do these significantly differ from each other?

      Yes, the 6DOF playback halt response shown in the previous Figure S3 and the mismatch response shown in previous Figure 2E are significantly different (Author response image 1).

      Author response image 1.

      Comparison of visuomotor mismatch response (A) and 6DOF playback halt response (B) from the original submission with statistics of the comparison (C).

      Nevertheless, to strengthen this conclusion, we collected additional data in the 6DOF condition. We show the comparison for participants for whom both closed loop (active) and open loop sessions (6DOF) were recorded within the same recording session (14 participants) in Figure S4. Consistent with our previous findings, visuomotor mismatch responses were significantly larger than 6DOF playback halt responses (Figures S4A-S4C). And we found no evidence of a difference between 6DOF and 4DOF playback halt responses (Figures S4D and S4E).

      (3) It generally seems that if the authors wish to suggest that this paradigm can be used to study prediction error responses, they need to have controlled for the actions performed and the visual events. This logic is outlined in Press, Thomas, and Yon (2023), Neurosci Biobehav Rev, and Press, Kok, and Yon (2020) Trends Cogn Sci ('learning to perceive and perceiving to learn'). For example, always requiring Ps to walk and always concurrently playing similar visual events, but modifying the extent to which the visual events can be anticipated based on action. Otherwise, it seems more accurately described as a paradigm to study the influence of action on perception, which will be generated by a number of intertwined underlying mechanisms.

      We are not entirely sure we understand the point here correctly. If the reviewer is suggesting that visuomotor coupling is not describable by the ideas of predictive processing, we disagree. However, given that the papers the reviewer is pointing to are premised on what seems to be a somewhat unorthodox interpretation of predictive processing when it comes to cortical circuits, we suspect this is contributing to the misunderstanding here. Let us briefly explain. In the two papers, Press and colleagues argue that most experiments cannot distinguish between “predictive cancellation” and “gated suppression”. This is indeed relatively tricky, even when one has single neuron data. The question is, does movement simply suppress sensory feedback (as is likely the case e.g. in the famous example of the cricket), or does movement result in a precise removal of only the self-generated sensory reafference? The first good evidence of the latter happening in any system is quite recent (Keller and Hahnloser, 2009). The premise the authors build their argument on is that the theory posits that “the brain predictively ‘cancels’ expected action outcomes from perception” (from the abstract of one of the papers). This is incomplete. The minimum circuit for predictive processing is composed of 3 neuron types: positive prediction error neurons, negative prediction error neurons, and internal representation neurons. Only the positive prediction error neurons have the predictive cancellation property the authors discuss. This is not the case for either negative prediction error neurons, or for the internal representation neurons. Negative prediction error neurons are excited by predictions and suppressed by sensory input (i.e. if anything, they are “predictively amplified”). This circuit is relatively well characterized in mouse cortex – for a brief summary see (Keller and Mrsic-Flogel, 2018). Note, this is not our idea of course – the original formulation of predictive processing (Rao and Ballard, 1999) was built to explain end-stopping. These are responses to the absence of an expected line that were stronger than would be expected from classical theories (i.e. negative prediction error responses). In mouse visual cortex, we know that a sudden break in the coupling between locomotion and visual flow selectively activates layer 2/3 negative prediction error neurons. Thus, if human cortex also implements a predictive processing like circuit with positive and negative prediction error neurons, we would expect a break in visuomotor coupling to drive a measurable response in visual cortex (by exciting the population of negative prediction error neurons – this is also why we are quite excited by the phase reversal of visual and mismatch responses as this could indicate that mismatch activates negative prediction error neurons first and positive prediction error neurons later, and vice versa for visual stimulation – negative prediction error neurons are more superficial in cortex (O’Toole et al., 2023)). We do indeed find a response over occipital cortex consistent with the negative prediction error response we observe in mouse cortex. The difficulty in distinguishing “predictive cancellation” and “movement driven suppression” comes only when looking at positive prediction error type responses (that are suppressed by predictive inputs) but does not apply to negative prediction error responses. The predictive processing circuit we are testing is the one described by (Keller and Mrsic-Flogel, 2018; Rao and Ballard, 1999), and here the break in visuomotor coupling is a stimulus that drives negative prediction error responses. Note, other authors who have thought about cortical implementations of predictive processing (e.g. (Bastos et al., 2012)) have glossed over the problem that individual neurons cannot trivially encode both positive and negative errors. Prediction errors are a signed quantity. If neurons signal prediction errors in firing rates and are close to zero firing rate at baseline (as is the case in layer 2/3 of cortex), they cannot (short of rather exotic ideas) encode a signed prediction error. Hence such proposals are not very useful for thinking about prediction error responses in cortex. For these reasons, we see no problem with referring to the response as a prediction error response. This is in line with a large body of mouse research (using a nearly identical paradigm) on the topic.

      One could of course argue that gated suppression could also mean that movement relieves suppression. Thus, one could assume that some neurons are suppressed by movement while others are enhanced. If one allows for enough neuron and stimulus specificity in the precision of the movement related suppression and enhancement of responses, the two models (predictive processing and gated suppression) become equivalent, and the discussion becomes semantic. See (Vasilevskaya et al., 2023) for an extended discussion on this point, and the reasons why we think predictive processing is a more useful model than gated suppression (keep in mind, gated suppression only explains the data if we allow for stimulus/neuron specific gain factors of the suppression, in which case the two models are equivalent).

      More minor points:

      (1) I was also wondering whether the authors may consider the findings in frontal electrodes more closely. Within the statistical tests of the frontal electrodes against 0, as displayed in Figure 3c, the insignificance of the effect of Fp2 seems attributable to the small included sample size of just 13 participants for this electrode, as listed in Table S1, in combination with a single outlier skewing the result. The small sample size stands out especially in comparison to the sample size at occipital electrodes, which is double and therefore enjoys far more statistical power. It looks like the selected time window is not perfectly aligned for determining a frontal effect, and also the distribution in 3B looks like responses are absent in more central electrodes but present in occipital and frontal ones. I realise the focus of analysis is on visual processing, but there are likely to be researchers who find the frontal effect just as interesting.

      That is correct; our data in frontal electrodes was likely underpowered. The reason we have fewer data in frontal electrodes is that eye-blink artifacts are particularly strong in frontal channels, resulting in a larger proportion of trials failing to meet our data inclusion criteria. We have now added more data from frontal and occipital electrodes by including additional experimental sessions. In addition, we applied less stringent trial-exclusion criteria, requiring that no artifacts occur within the time window −0.5 to 1 s relative to the event trigger (instead of −0.5 to 2 s). This adjustment allowed us to retain a larger number of trials. As anticipated by the reviewer, this increase in data was sufficient to confirm a significant response to the visuomotor mismatch event at both frontal electrodes (Figure 3C). The expanded dataset also revealed a significant difference in response onset times between occipital and frontal electrodes (Figure 3E), an effect that was not significant previously. In addition, we have included analysis comparing early and late mismatch responses in frontal and occipital electrodes (Figure 4).

      (2) It is claimed throughout the manuscript that the 'strongest predictor (of sensory input) - by consistency of coupling - is self-generated movement'. This claim is going to be hard to validate, and I wonder whether it might be received better by the community to be framed as an especially strong predictor rather than necessarily the strongest. If I hear an ambulance siren, this is an especially strong predictor of subsequent visual events. If I see a traffic light turn red, then yellow, I can be pretty certain what will happen next. Etc.

      This is a statistical argument. Every movement – throughout life – is directly and immediately coupled to sensory feedback and has been throughout evolutionary history. The vast majority of visual input you receive (we estimate, well above 99%) is the consequence of your own movements (e.g. every few 100 ms your eye movements cause a full field change in your visual input). The same is likely true of proprioceptive and somatosensory input – the vast majority is the direct consequence of your own movements (not other people poking you). This is likely different in the auditory system where a much larger fraction of the input is externally driven (depending a bit on how much one likes to talk). But even here the best predictor is self-motion (most non-self-generated sounds one experiences in life are very difficult to predict with millisecond precision). The example the reviewer gives is a good illustration of this. Take the siren that hails the appearance of an ambulance. The siren tells us that an ambulance will appear, but not how it will look, not when exactly it will appear, and with only very low resolution as to where it will appear. Incidentally, if you ask people to draw an ambulance they tend to draw a WWII style white square vehicle with a red cross on the side – a style of ambulance they likely have not ever seen in life. Their visual predictions of what they are about to see are very low resolution. We catastrophically fail at making pixel perfect predictions from learned stimulus associations of this nature. The traffic light example is difficult to compare to visual feedback control of movement as it is a much simpler prediction of a single bit in the form of a change in color of an existing object.

      In addition, consider how often (in life) you have seen an ambulance after hearing it? 100 times maybe? Maybe less. How often have you seen traffic lights change - 10 000 times? 100 000 times? Now consider, how often you have experienced the visual consequences of moving your head or eyes to the left (keep in mind this includes micro saccades) – at a conservative, once per second, that is somewhere on the order of 1 000 000 000. This is not even in the same ballpark. Our brains can certainly learn to make the ambulance and traffic light type predictions - to some extent - but by far the best predictor of sensory feedback (simply by virtue of the physics of how our body interacts with the world) is self-motion.

      We think this is an argument we can make based on first principles, and one that is frequently overlooked in the field, as experiments often focus on training people or animals to learn novel associations that, especially in the case of mice, we often have no idea whether cortical circuits can even learn. We should focus experiments on the predictive systems our brains have evolved since long before the evolutionary appearance of ambulances and traffic lights. We understand that the reviewer may disagree with this, but unless the reviewer has a concrete example of an even stronger predictor (as measured by frequency of experience, consistency in coupling, and precision in timing – we can’t think of one), it is a point we will make.

      (3) The checkerboard inversion response at 48 ms is incredibly rapid. Can the authors comment more on what may drive this exceptionally fast response? It was my understanding that responses in this time window can only be isolated with human EEG by presenting spatially polarized events (cf. c1, e.g., Alilovic, Timmermans, Reteig, van Gaal, Slagter, 2019, Cerebral Cortex).

      We don’t know, but it is not inconsistent with previous reports. For example, compare the “standing” and “fast walking” target ERP responses in Figure 5 of (Gramann et al., 2010). Both here and in our data, the fast response peak is only really apparent in the direct comparison of visual responses recorded while participants were walking to those when they were stationary.

      While we have taken great care to calibrate the timing of the visual display with the EEG recording, one could be worried that the alignment is off by as much as tens of milliseconds. However, even if this were so, one could use P1 as a reference and determine that the fast peak roughly precedes P1 by about 40 ms. Which again would result in a latency of about 50 ms of the fast walking peak (assuming P1 peaks at about 90 ms). In sum, we have added a reference to the previous work (that we found thanks to the reviewer’s comment) but fear we have nothing intelligent to say beyond that.

      Reviewer #2 (Public review):

      Summary:

      This study investigates whether visuomotor mismatch responses can be detected in humans. By adapting paradigms from rodent studies, the authors report EEG evidence of mismatch responses during visuomotor conditions and compare them to visual-only stimulation and mismatch responses in other modalities.

      Strengths:

      (1) The authors use a creative experimental design to elicit visuomotor mismatch responses in humans.

      (2) The study provides an initial dataset and analytical framework that could support future research on human visuomotor prediction errors.

      Weaknesses:

      (1) Methodological issues (e.g., volume conduction, channel selection, lack of control for eye movements) make it difficult to confidently attribute the observed mismatch responses to activity in visual cortical regions.

      (2) A very large portion of the data was excluded due to motion artefacts, raising concerns about statistical power and representativeness. The criteria for trial inclusion and the number of accepted trials per participant appear arbitrary and not justified with reference to EEG reliability standards.

      (3) The comparison across sensory modalities (e.g., auditory vs. visual mismatch responses) is conceptually interesting, but due to the choice of analyzing auditory mismatch responses over occipital channels, it has limited interpretability.

      We have responded to these points in the more detailed itemization below.

      The authors successfully demonstrate that visuomotor mismatch paradigms can, in principle, be applied in human EEG. However, due to the issues outlined above, the current findings are relatively preliminary. If validated with improved methodology, this approach could significantly advance our understanding of predictive processing in the human visual system and provide a translational bridge between rodent and human work.

      Reviewer #2 (Recommendations for the authors):

      Overall, the study addresses an interesting and underexplored question (translation of the visuomotor mismatch responses observed in rodents to humans). Below, please find a list of specific suggestions for improvement

      Introduction:

      (1) "updating internal representations and internal models" - what is the difference between the two, and why is it relevant to this study?

      In a nutshell, an internal model is the synaptic weight matrix that transforms between coding spaces. An internal representation is the activity pattern coding for the current representation. See (Aizenbud et al., 2025; Keller and Mrsic-Flogel, 2018) for more lengthy elaborations. The fact that the mechanism used for representation update can also be used to update internal models (i.e. solve the credit assignment problem) is likely the prime advantage of predictive processing (see work from the Bogacz lab). The relevance to the current study is justifying why predictive processing is a reasonable hypothesis for the function of cortex.

      (2) "Certain stimuli can be predicted from the preceding sensory input" vs. "Predictions can also be based on memory" - how are these two different? Do you mean specific (e.g., long-term associative or episodic) memory types in the latter?

      Correct, this is an arbitrary distinction that primarily makes sense in the light of experimental approaches. In this particular case, we were talking about spatial memory. We made this explicit to increase clarity.

      (3) "the strongest predictor - by consistency of coupling - is self-generated movement"

      (a) Externally induced movement, while not self-generated and therefore not predicted, will also generate sensory coupling, so is it really only about consistency?

      Externally induced movement (as in somebody else moving one’s arm we are not sure this is what the reviewer means) will induce sensory-sensory coupling but not sensorimotor coupling. We might be misunderstanding the point. In case the reviewer means stimuli that trigger movement as in us asking participants to walk, or a sudden startle stimulus that makes them jump in all such cases there are of course sensorimotor predictions. Sensorimotor predictions are driven by efference copies of the motor command thus all movements whether ‘voluntarily’ executed or triggered by an external stimulus will drive sensorimotor predictions. (All of this of course assumes that the predictive processing theory is correct.)

      (b) Do you mean temporal consistency (minimal lags), statistical contingencies (same movements linked to the same sensory inputs), or both? How does it differentiate sensorimotor/visuomotor mismatch responses from responses to incongruent stimuli in sensory modalities (e.g. audiovisual)?

      Both. We have rephrased the sentence to try to make this clearer. See also response to reviewer 1 minor point 2 above.

      How does it differentiate sensorimotor/visuomotor mismatch responses from responses to incongruent stimuli in sensory modalities (e.g. audiovisual)?

      Most cross-modal associations are much less consistent (the exact sound of a glass shattering is always slightly different and impossible for us to predict), and orders of magnitude less frequently experienced, than sensorimotor associations. Again, see also response to reviewer 1 minor point 2 above.

      (4) "Every movement is directly coupled to sensory feedback throughout life"

      This may be the case for proprioceptive and/or somatosensory feedback, but not necessarily for visual feedback (e.g., a mouse moving its tail), which is the topic of the study.

      Correct, there are movements that can be disconnected from visual feedback. Most of the time, most movements however are not, and we are studying one of the more prominent ones that is clearly not decoupled locomotion. The contrast we aim to highlight here very prominently is that there is still this vague idea in the field that you can take a participant, or a mouse, and expose them/it to a few tens or hundreds of trials of some sensory stimulus contingency and then probe for prediction error responses to a pattern only recently if at all learned. Given the life-long experience of subjects and mice, is it really surprising that oddball responses are less strong than a sensorimotor mismatch?

      (5) "However, the overall level of this motor-related activity is much higher than one would expect simply from predictions of visual feedback that are compared against visual input."

      Could you please clarify what one would expect in this case, and/or back it up with citations?

      This is in reference to the fact that there are very strong movement related signals in the mouse visual cortex that persist even when the mouse is in complete darkness. In darkness, movements should not trigger any visual feedback change hence the activity is difficult to explain as a movement related prediction of visual flow. We have rephrased this section of the introduction to make this clearer.

      (6) "The more precise the prediction and comparison, the less motor-related activity should be detectable in visual cortex."

      I think this conflates two issues. A good match between prediction and input would indeed result in sensory attenuation. However, sensory precision, at least in active inference, can upregulate prediction error responses. Since predictions cannot be assumed to be perfect (due to external or internal noise), increased precision may therefore augment activity. See e.g. https://doi.org/10.1007/s10339-013-0571-3

      We agree with the reviewer – the phrasing here was misleading. We do not mean precision in the predictive processing sense, but the precision of sensorimotor control necessary for the behavior. We have rephrased the corresponding section of the manuscript.

      (7) Neither the introduction nor the discussion refers to previous human EEG studies on sensorimotor mismatch responses, where sensory feedback doesn't match motor actions (e.g. https://doi.org/10.3758/s13423-021-01992-z ; https://www.sciencedirect.com/science/article/pii/S0028393214003777 ; https://www.sciencedirect.com/science/article/pii/S0028393219301265).

      The studies cited by the reviewer primarily test how discrete violations of learned action–outcome associations are represented in the brain, whereas our visuomotor mismatch paradigm probes violations of continuous sensorimotor coupling during ongoing action. The paradigms are conceptually different both in how strong the coupling is (lifelong vs. learned in the experiment), and in how prediction errors are likely used (visuomotor control vs. stimulus detection). We have added a brief part to our introduction discussing this.

      Results:

      (1) A very large proportion of the dataset was excluded due to movement artefacts. This is rather problematic as

      (a) the rationale behind finding mismatch responses is that motion-related (neural) signals should affect visual cortical activity, so it's essential to disentangle these neural signals from artefacts;

      Correct, we excluded 21.7% of the total data for visuomotor mismatch paradigm. Note, this percentage compares to other similar studies of EEG recordings during movement (Oliveira et al., 2016). By “problematic”, we assume the reviewer means the fact that we have artefacts, not that we exclude trials with artefacts. The movement artefacts are typically caused by the acceleration during stepping in participants with a heavy gait. None of these movement artefacts are time locked to any of the responses we investigate. Thus, they should just appear as increased levels of noise if not excluded. We don’t understand why the reviewer thinks this is particularly problematic for our analysis/conclusions (beyond the trivial consequence of increasing noise levels that would only cause us to underestimate the strength of the mismatch signals we report).

      (b) the criterion for the number of trials of 15 triggers (per condition?) is arbitrary and lower than widely used in the literature, so authors should demonstrate that this is a sufficient number to observe a measurable ERP even for those participants with 15 triggers;

      We have between 16 and 25 visuomotor mismatch events per participant. Author response image 2 is a selection of single participant examples with different number of trials. The number of mismatch events is limited by the fact that we introduce them approximately every 10 - 15 s and have a total duration of the closed loop session of 5 minutes. Thus, on average, we expect to have 24 mismatch events. But we are not sure we understand the logic of the comment, if we set exclusion too low, we just risk losing a response in the noise. And we clearly have stronger and higher signal to noise mismatch responses with an average of 20 trials compared to visual responses during movement with an average of 40 trials or MMN responses with an average of 28 trials.

      Author response image 2.

      Reliable ERPs can be observed with as few as 16 trials across EEG channels. (A) Histograms showing the distribution of the number of valid mismatch trials per participant for each electrode pair (Fp1–2, C3–4, P3–4, O1–2). (B) Representative EEG responses to visuomotor mismatch events from a single participant, recorded at electrode pairs Fp1–2, C3–4, P3–4, and O1–2. Waveforms were computed using the indicated number of trials (shown above each trace). Dashed vertical red lines are onset and offset of the visuomotor mismatch.

      (c) it seems that the seemingly static "visual" condition resulted in a larger proportion of data rejected due to movement (or, as later mentioned, nausea) than the "visuomotor" condition, which is counterintuitive and needs further explanation;

      This is a misunderstanding the ‘visual paradigm’ the reviewer is referring to are the experiments shown in Figure 1. Here we record visual responses in both sitting and walking participants. In this experiment, as in others, exclusion was primarily driven by part of the paradigm where the subjects were moving. To make this clearer we have added Table S2 to the manuscript that provides an overview of trials excluded by paradigm and session.

      (d) authors mention eye movements as a potential issue, which should be possible to detect from frontal channels. Additionally, it's not entirely clear how many datasets were discarded (the results section mentions 19/48 in the visual condition, then 4+11 in the playback condition - isn't this the same condition?)

      The visual paradigm corresponds to the data shown in Figure 1, in which participants viewed a flipping checkerboard in both a walking and a stationary session. The open loop session is part of the visuomotor paradigm shown in Figure 2, where participants were exposed to a replay of the visual flow that had been self-generated during the preceding closed loop session, including the visual flow halts that constituted visuomotor mismatches in the closed loop session. Please note, to avoid such confusion, we have attempted to standardize the usage of paradigm (visual vs. visuomotor) and session (sitting vs. walking, and closed loop vs. open loop) throughout. In addition, we have added a table to summarize the number of excluded trials by paradigm and session as Table S2 to the manuscript.

      In comments 1 and 2 of the public review, the reviewer also points out that we did not control for eye movements and we presume relatedly claims that we did not use common EEG reliability standards. Regarding the first point, we performed additional experiments in an independent cohort of participants to test whether eye movements could account for the visuomotor mismatch responses. We recorded eye movements during closed loop sessions and found that changes in eye speed (Figure S7A) or blink rate (Figure S7B) following the mismatch stimulus had a longer latency than visuomotor mismatch responses in EEG. This suggests that the visuomotor mismatch response cannot be explained by eye blinks or changes in eye movement speed. Regarding the second point, we are not sure we understand. Trial exclusion based on a fixed voltage threshold of 100 µV is relatively common, and our rejection rates are on par, and particularly on occipital electrodes even lower, with other work in EEG recordings during locomotion or movement (see e.g. (Oliveira et al., 2016)).

      Nevertheless, we did attempt to apply independent component analysis (ICA) based filtering to the EEG data (Delorme and Makeig, 2004). However, these methods were designed for high channel density recordings. With only 8 channels, ICA is unable to reliably isolate eye movement or motion artefact components of the EEG. To illustrate this, we tested two artifact-rejection strategies. In the first approach, components associated with non-neural artifacts (e.g., muscle activity, line noise, eye movements) were removed only if at least 90% of the component’s variance was assigned to a single artifact class (Author response image 3A). In the second, more permissive approach aimed specifically at reducing eye movement artifacts, components were removed if artifact-related activity exceeded 90% for non-eye artifacts, while the threshold for eye-related components was lowered to 60% (Author response image 3C). We lowered the threshold for excluding eye-related components to ensure that EEG signals influenced by eye movements were effectively removed. In both cases - whether the eye-component threshold was set to 90% or 60% - the averaged responses to visuomotor mismatch trials remained largely similar to the previously reported data, despite higher noise in some traces. Interestingly, when we then followed the ICA filtering by our voltage threshold based exclusion with a threshold of 100 µV, the resulting traces closely resembled the patterns described in the paper (Author response image 3B and 3D). Thus, we conclude the nonICA filtered responses are easier to interpret, free of any potential ICA filtering artifacts, and far less parameter choice (of the ICA filtering) dependent.

      Author response image 3.

      Removal of artifacts identified with ICA does not change the visuomotor mismatch responses. (A) Visuomotor mismatch responses recorded from occipital electrodes after artifact correction. Components associated with non-neural artifacts (e.g., muscle activity, line noise, eye movements) were removed only if ≥90% of the component’s variance was attributed to a single artifact class. Solid black line represents the mean, and shading indicates the SEM across participants. Dashed vertical red lines are onset and offset of the visuomotor mismatch. (B) As in A, but excluding trials with amplitudes exceeding 100 µV. (C) As in A, but components were removed if artifact-related activity exceeded 90% for non-ocular artifacts, while the threshold for eye-related components was lowered to 60%. (D) As in C, but excluding trials with amplitudes exceeding 100 µV.

      (2) The finding that mismatch responses are observed at all channels, with differences in amplitudes but not latencies, indicates that volume conduction may affect the results. I would strongly suggest accounting for this using a method appropriate for the very small number of channels, e.g., phase lag index.

      We are not sure we understand. The phase lag index is a method to estimate functional connectivity in a way that corrects for volume conduction (using phase lag). We make no claims about functional connectivity; thus, we are not sure what the reviewer is suggesting we do. The fact that the visual and visuomotor mismatch responses were measurable on all electrodes could indeed be in part explained by volume conduction, but we see no way to estimate the volume conduction contribution. From mouse calcium imaging data, we know that both visual and visuomotor mismatch responses spread across large parts of dorsal cortex (including frontal regions like the ACC).

      With the addition of new data, the latency difference between occipital and frontal electrodes - previously observed only as a trend - is now statistically significant (Figure 3E). Occipital responses emerge earlier than frontal responses, suggesting that mismatch-related activity likely originates in sensory visual regions and subsequently propagates to more frontal areas, as similar to what had been reported in mouse cortex (Heindorf and Keller, 2024).

      (3) The authors compare different types of mismatch responses (including auditory oddballs) in the same set of (occipital) channels, but doesn't this undermine the spatial specificity of the results? Classical auditory mismatch negativity is typically observed over central channels, so weaker amplitudes of auditory mismatch responses in occipital channels are likely trivially explained by modality differences. As such, I'm not convinced that this comparison is informative even in a qualitative manner.

      To address this point, we conducted additional auditory oddball experiments with recordings over the auditory cortex (channels T3, T4, T5, and T6). Given our central reference, these channels should capture the strongest mismatch negativity. The amplitude of the visuomotor mismatch response exceeded that of mismatch negativity on all tested channels (new Figures S8 and S9).

      (4) On a similar note, is the polarity reversal found for visual vs. mismatch responses specific to occipital channels?

      Thank you for this interesting question. In fact, polarity reversal was consistently observed across all recorded channels; this has now been added as a main figure to the manuscript (Figure 5).

      (5) Figure S4C seems to cut off one outlier, and I don't see this outlier included in the boxplot.

      Correct, that is why we describe the boxplots in the figure legend as: “Boxes mark median, quartiles, and range of data not considered outliers.” The axes were now adjusted to include all data points.

      Discussion:

      "A central tenet of the cortical circuit for predictive processing is the split into separate populations of neurons that compute positive and negative prediction errors (Keller and Mrsic-Flogel, 2018; Rao and Ballard, 1999)" - this may be the case for visuomotor mismatch signals or reward prediction errors, but signed PEs do not play a central role in other proposed microcircuits for predictive processing in the perceptual domain (e.g. Bastos)

      Signed prediction errors do not play a central role in proposed cortical microcircuits for predictive processing that do not burden themselves with making a concrete proposal for the implementation of the prediction error computation. The (Bastos et al., 2012) work is a good example of this. The equation for the error term provided in that paper is clearly signed (nothing stops the error from going negative), but no proposal is made for how layer 2/3 excitatory neurons are supposed to signal this quantity. With baseline activity levels close to zero in layer 2/3, there really is only one way to do this, and that is separate populations of negative and positive prediction error neurons. With non-zero baseline firing rate, one could do this bidirectionally around a mean firing rate (as is typically thought of dopaminergic RPE neurons). There are more abstract Bayesian implementations that assume logarithmic transformations that could also implement a prediction error-like system without negative firing rates. But given the absence of any physiological evidence, we will refrain from discussing these. However, most importantly, there is now considerable evidence for the existence of both negative and positive prediction error neurons in layer 2/3 of mouse visual cortex. Thus, by “cortical circuit for predictive processing” we here mean those that make biologically plausible proposals for prediction error computations. Also note, the (Rao and Ballard, 1999) model is probably the prime example for what the reviewer calls a proposed microcircuit for predictive processing in the “perceptual domain”.

      Reviewer #3 (Public review):

      Summary:

      Solyga, Zelechowski, and Keller present a concise report of an innovative study demonstrating clear visuomotor mismatch responses in ambulating humans, using a mobile EEG setup and virtual reality. Human subjects walked around a virtual corridor while EEGs were recorded. Occasionally, motion and visual flow were uncoupled, and this evoked a mismatch response that was strongest in occipitally placed electrodes and had a considerable signal-to-noise ratio. It was robust across participants and could not be explained by the visual stimulus alone.

      Strengths:

      This is an important extension of their prior work in mice, and represents an elegant translation of those previous findings to humans, where future work can inform theories of e.g., psychiatric diseases that are believed to involve disordered predictive processing. For the most part, the authors are appropriately circumspect in their interpretations and discussions of the implications. I found the discussion of the polarity differences they found in light of separate positive and negative prediction errors, intriguing.

      Weaknesses:

      The primary weaknesses rest in how the results are sold and interpreted.

      Most notably, the interpretation of the results of the comparison of visuomotor mismatches to the passive auditory oddball induced mismatch responses is inappropriate, as suboptimal electrode choices, unclear matching of trial numbers, and other factors. To clarify, regarding the auditory oddball portion in Figure 5, the data quality is a concern for the auditory ERPs, and the choice of Occipital electrodes is a likely culprit. Typically, auditory evoked responses are maximal at Cz or FCz, although these contacts don't seem to be available with this setup. In general, caution is warranted in comparing ERP peaks between two different sensory modalities - especially if attention is directed elsewhere (to a silent movie) during one recording and not during the other. The authors discuss this as a purely "qualitative" comparison in the text, which is appreciated, and do acknowledge the limitations within the results section, but the figure title and, importantly, the abstract set a different tone. At least, for comparisons between auditory mismatch and visuomotor mismatch, trial numbers need to be equated, as ERP magnitude can be augmented by noise (which reduces with increased numbers of trials in the average).

      To address this point, we conducted additional auditory oddball experiments with recordings over the auditory cortex (channels T3, T4, T5, and T6). Given our central reference, these channels should capture the strongest mismatch negativity. Nevertheless, the amplitude of the visuomotor mismatch response exceeded that of mismatch negativity on all tested channels (these results are now shown in the new Figures S8 and S9), and the response power was significantly greater for the visuomotor mismatch than for mismatch negativity. Independent of electrode we test, the visuomotor mismatch response has a power 5 to 10 times higher than that of the MMN response. And the number of trials per participant that met quality criteria was comparable between the visuomotor mismatch paradigm (mean = 23 trials) and the auditory mismatch paradigm (mean = 28 trials) (Author response image 4).

      Author response image 4.

      Number of trials included for analysis is comparable between visuomotor and oddball paradigm. (A) Histogram showing the distribution of the number of valid trials per participant for O1-2 electrode pair in visuomotor mismatch paradigm. (B) Same as in A but for deviant stimulus presentations in the oddball paradigm.

      And more generally, the size of the mismatch event at the scalp does not scale one-to-one with the size at the level of the neural tissue. One can imagine a number of variables that impact scalp level magnitudes, which are orthogonal to actual cortex-level activation - the size, spread, and polarity variance of the activated source (which all would diminish amplitude at the scalp due to polyphasic summation/cancelation). The variance of phase to a stimulus across trials (cross trial phase locking) vs magnitude of underlying power - the former, in theory, relates to bottom-up activity and the latter can reflect feedback (which has more variability in time across trials; the distance of the scalp electrode from the activated tissue (which, for the auditory system, would be larger (FCz to superior temporal gyrus) than for the visual system (O1 to V1/2)). None of this precludes the inclusion of the auditory mismatch, which is a strength of the study, but interpretations about this supporting a supremacy of sensory-motor mismatch - regardless of validity - are not warranted. I would recommend changing the way this is presented in the abstract.

      We agree with the point that the EEG response does not need to reflect the total cortical activation. However, the discussion in the abstract (and elsewhere) is in the context of clinical experiments where the underlying cortical activity pattern is irrelevant if it does not trigger a clinically measurable (by EEG in this case) response. The abstract only makes a comparison to MMN implicitly in this sentence “Second, a paradigm that can trigger strong prediction error responses and consequently requires shorter recording times could simplify experiments in a clinical setting.” We are not sure how to phrase this even more carefully – the statement at face value is a truism. The reviewer, we assume, takes exception to the unstated implication that visuomotor prediction errors trigger stronger responses than MMN. Given the data we have, we assume most authors would not consider it an overstatement to make that claim outright.

      Otherwise, the data are of adequate quality to derive most of their conclusions.

      The authors claim that the mismatch responses emanate from within the occipital cortex, but I would require denser scalp coverage or a demonstration of consistent impedances across electrodes and across subjects to make conclusions about the underlying cortical sources (especially given the latencies of their peaks). In EEG, the distribution of voltage on the scalp is, of course, related to but not directly reflective of the distribution of the underlying sources. The authors are mostly careful in their discussion of this, but I would strongly recommend changing the work choice of "in occipital cortex" to "over occipital cortex" or even "posteriorly distributed". Even with very dense electrode coverage and co-registration to MRIs for the generation of forward models that constrain solutions, source localization of EEG signals is very challenging and not a simple problem. Given the convoluted and interior nature of human V1, the ability to reliably detect early evoked responses (which show the mismatch in mouse models) at the scalp in ERP peaks is challenging - especially if one is collapsing ERPs across subjects. And - given the latency of the mismatch responses, I'd imagine that many distributed cortical regions contribute to the responses seen at the scalp.

      This is an excellent point we have rephrased throughout to “over occipital cortex” instead of “in occipital cortex”.

      I think that Figure 3C, but as a difference of visual mismatch vs halting flow alone (in the open loop) might be additionally informative, as it clarifies exactly where the pure "mismatch" or prediction error is represented.

      We performed the analysis as suggested (Author response image 5). Visuomotor mismatch responses are stronger on all electrodes compared to playback halt responses. This difference is also larger in data recorded on occipital electrodes.

      Author response image 5.

      Comparison of the difference between visuomotor mismatch and playback halt on all electrodes. Average response strength was calculated within a 100 ms window centered on the peak of the average visuomotor mismatch response across all electrodes. Boxes mark median, quartiles, and range of data not considered outliers. Each circle represents data from one participant. **: p<0.01, *: p<0.05, Fp1-2: 20 participants, C3-4: 31 participants, P3-4: 35 participants, O1-2: 32 participants.

      As a suggestion, the authors are encouraged to analyse time-frequency power and phase locking for these mismatch responses, as is common in much of the literature (see Roach et al 2008, Schizophrenia Bulletin). This is not to say that doing so will yield insights into oscillations per se, but converting the data to the time-frequency domain provides another perspective that has some advantages. It fosters translations to rodent models, as ERP peaks do not map well between species, but e.g., delta-theta power does (see Lee et al 2018, Neuropsychopharmacology; Javitt et al 2018, Schizophrenia research; Gallimore et al 2023, Cereb Ctx). Further, ERP peaks can be influenced by the actual neuroanatomy of an individual (especially for quantifying V1 responses). Time frequency analyses may aid in interpreting the "early negative deflection with a peak latency of 48 ms " finding as well.

      We have performed time–frequency power and phase-locking analyses for both visual responses (Author response image 6 and Author response image 7) and visuomotor mismatch and playback halt responses (Author response image 8 and Author response image 9), as suggested. We have added the results of these analyses here, as these are not fully developed yet. We may add these to a future publication, for which we would properly want to quantify stability of these effects.

      In brief, time–frequency representations of power did identify potentially interesting differences between walking and sitting sessions in the visual paradigm. Inter-trial phase coherence (ITPC) revealed an early increase in alpha-band synchronization suggesting that phase alignment of alpha oscillations may contribute to the early differences in visual responses between walking and sitting. The same analyses were applied to visuomotor mismatch and playback halt responses. Time–frequency power analysis revealed an increase in delta-band power during visuomotor mismatch, consistent with previous reports linking delta activity to prediction error processing, including reward prediction errors (Cavanagh, 2015), unexpected final words (Webb and Sohoglu, 2025), and visual deviance detection (West et al., 2024). Notably, it appears as if the increase in delta power emerged first over occipital electrodes and appeared later over more frontal electrodes, forming a spatiotemporal gradient of onset across the scalp.

      Delta power changes were markedly reduced in the playback halt responses at the time of visual flow cessation. While some power changes were observed, they occurred primarily at visual flow onset rather than at flow offset. Inter-trial phase coherence analysis further revealed delta-band synchronization over occipital electrodes following visuomotor mismatch, whereas the playback halt response showed strong phase synchronization in both delta and theta bands following visual flow onset.

      Author response image 6.

      Time–frequency representations of EEG power changes during the visual paradigm. (A) Time–frequency maps showing changes in spectral power relative to baseline for electrodes Fp1–2, C3–4, P3–4, and O1–2 following checkerboard reversal in the sitting session. The dashed red vertical line indicates the time of the checkerboard reversal (0 s). (B) As in A, but recorded while participants were walking.

      Author response image 7.

      Inter-trial phase coherence (ITPC) for visual trials during sitting and walking. (A) ITPC across trials for electrode pairs Fp1–2, C3–4, P3–4, and O1–2 following checkerboard reversal in the sitting session. The dashed red vertical line marks the time of the checkerboard reversal (0 s). (B) As in A, but recorded during walking.

      Author response image 8.

      Time–frequency representations of EEG power changes during visuomotor mismatch and playback halt responses. (A) Time–frequency maps showing changes in spectral power relative to baseline for electrodes Fp1–2, C3–4, P3–4, and O1–2 following visuomotor mismatch presentation. Dashed vertical red lines are onset and offset of the visuomotor mismatch. (B) As in A, but for playback halts.

      Author response image 9.

      Inter-trial phase coherence (ITPC) for the visuomotor mismatch and playback halt responses. (A) ITPC across trials for electrode pairs Fp1–2, C3–4, P3–4, and O1–2 following visuomotor mismatch presentation. Dashed vertical red lines are onset and offset of the visuomotor mismatch. (B) As in A, but for playback halts.

      Finally, the sentence in the abstract that this paradigm " can trigger strong prediction error responses and consequently requires shorter recording times would simplify experiments in a clinical setting" is a nice setup to the paper, but the very fact that one third of recordings had to be removed due to movement artifact, and that hairstyle modulates the recording SnR, is reason that this paradigm, using the reported equipment, may have limited clinical utility in its current form. Further, auditory oddball paradigms are of great clinical utility because they do not require explicit attention and can be recorded very quickly with no behavioral involvement of a hospitalized patient. This should be discussed, although it does not detract from the overall scientific importance of the study. The authors should reconsider putting this statement in the abstract.

      We have added a paragraph to the discussion to address these points. Note, we get robust and strong responses with very few trials (Author response image 2). The fact that we need to discard up to 21.7 % of trials due to movement/eye blink artefacts, does little to change the fact that we need much fewer trials and have larger and more robust responses compared to other EEG paradigms. Finally, we understand that sometimes not needing participants to pay attention to the task is useful. However, having a paradigm that is engaging and fun for participants and takes 5 minutes of recording time is probably equally often of advantage.

      Reviewer #3 (Recommendations for the authors):

      Minor points:

      (1) In the Introduction, I'm not sure that the logic comes through as to what the authors aim to illustrate by comparing mice to humans, in terms of precision and "movement modulation". In some cases, the precision of the comparison is referred to, and in others, the precision of the prediction (I think?). I'm not sure if they mean for this to be different or not. Simlarly, on line 81, "If indeed the precision of visuomotor coupling determines the amount of motor modulation of visual responses" - here I'm a little confused, as "amount of motor modulation" to me, the term "modulation" refers to a conditional modifier (if moving, than suppress visual movement resposnes. if not moving, then amplify visual movement repssones) rather than movement driven activity. The way I'm reading it, the authors mean the latter, but I could be misunderstanding.

      We have rephrased this section of the introduction.

      (2) I think it could be helpful, in the sentence starting on line 65, to reiterate that this observation of higher-than-expected motor activity in V1 is in mice (if I'm understanding it correctly). I also found myself tangled up in the difference between motor-related activity in V1 and motor-modulation in V1 in this paragraph.

      We have rephrased this section of the introduction.

      (3) For signal power, was the amplitude squared on individual trials prior to averaging, or after averaging? If prior, it would help with separating amplitude modulations from phase variance.

      In our previous analysis, power was computed by squaring the amplitude after trial averaging (Author response image 10A). We repeated the analysis using the alternative approach in which power was calculated for individual trials and then averaged (Author response image 10B). Although this method yields substantially higher absolute power values, the overall pattern of results remains unchanged: visuomotor mismatch responses continue to show significantly higher power than visual responses. To look at the phase variance we additionally analyze inter-trial phase coherence (Author response image 7 and Author response image 9).

      Author response image 10.

      Visuomotor mismatch responses have more power compared to visual responses. (A) Comparison of power between visuomotor mismatch and visual responses, calculated within a 0 - 0.5 s time window following stimulus onset. Power was computed by squaring the amplitude after trial averaging. Boxes indicate the median and interquartile range, with whiskers showing the range excluding outliers; circles represent data from individual participants. ***p < 0.001. (B) Same comparison as in (A), but with power calculated by squaring the amplitude of individual trials prior to averaging.

      (4) The "the world suddenly flew forward!" response from the participant, I understand, and I believe that it is useful to illustrate a point. I do not understand the "Are you printing this? - Hi Mom! " part of the participant response, and I'm not sure it adds to the paper, beyond amusement, which seems inappropriate.

      One of the authors (the one who did none of the experiments) finds this endlessly hilarious and as the reviewer notes, it might add amusement more generally. “Inappropriate” might be a bit harsh – according to our favorite AI chatbot: “Amusement provides significant mental, physical, and social value by offering a necessary escape from routine, reducing stress, and fostering a connection. It enhances well-being through endorphin-releasing experiences and encourages social bonding, learning, and joy.” Nevertheless, we have censored the offending passage.

      Aizenbud, I., Audette, N., Auksztulewicz, R., Basiński, K., Bastos, A.M., Berry, M., Canales-Johnson, A., Choi, H., Clopath, C., Cohen, U., Costa, R.P., Filippo, R.D., Doronin, R., Errington, S.P., Gavornik, J.P., Gillon, C.J., Granier, A., Hamm, J.P., Hertäg, L., Kennedy, H., Kumar, S., Ladd, A., Ladret, H., Lecoq, J.A., Maier, A., McCarthy, P., Mei, J., Mejias, J., Mikulasch, F., Mudrik, N., Najafi, F., Nejad, K., Nejat, H., Oweiss, K., Petrovici, M.A., Priesemann, V., Rudelt, L., Ruediger, S., Russo, S., Salatiello, A., Senn, W., Sennesh, E., Sima, S., Uran, C., Vasilevskaya, A., Vezoli, J., Vinck, M., Westerberg, J.A., Wilmes, K., Xiong, Y.S., 2025. Neural mechanisms of predictive processing: a collaborative community experiment through the OpenScope program. https://doi.org/10.48550/arXiv.2504.09614

      Bastos, A.M., Usrey, W.M., Adams, R.A., Mangun, G.R., Fries, P., Friston, K.J., 2012. Canonical microcircuits for predictive coding. Neuron 76, 695–711. https://doi.org/10.1016/j.neuron.2012.10.038

      Cavanagh, J.F., 2015. Cortical delta activity reflects reward prediction error and related behavioral adjustments, but at different times. NeuroImage 110, 205–216. https://doi.org/10.1016/j.neuroimage.2015.02.007

      Delorme, A., Makeig, S., 2004. EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J. Neurosci. Methods 134, 9–21. https://doi.org/10.1016/j.jneumeth.2003.10.009

      Gramann, K., Gwin, J.T., Bigdely-Shamlo, N., Ferris, D.P., Makeig, S., 2010. Visual evoked responses during standing and walking. Front. Hum. Neurosci. 4, 202. https://doi.org/10.3389/fnhum.2010.00202

      Heindorf, M., Keller, G.B., 2024. Antipsychotic drugs selectively decorrelate long-range interactions in deep cortical layers. eLife 12, RP86805. https://doi.org/10.7554/eLife.86805

      Keller, G.B., Hahnloser, R.H.R., 2009. Neural processing of auditory feedback during vocal practice in a songbird. Nature 457, 187–90. https://doi.org/10.1038/nature07467

      Keller, G.B., Mrsic-Flogel, T.D., 2018. Predictive Processing: A Canonical Cortical Computation. Neuron 100, 424–435. https://doi.org/10.1016/j.neuron.2018.10.003

      Oliveira, A.S., Schlink, B.R., Hairston, W.D., König, P., Ferris, D.P., 2016. Proposing Metrics for Benchmarking Novel EEG Technologies Towards Real-World Measurements. Front. Hum. Neurosci. 10, 188. https://doi.org/10.3389/fnhum.2016.00188

      O’Toole, S.M., Oyibo, H.K., Keller, G.B., 2023. Molecularly targetable cell types in mouse visual cortex have distinguishable prediction error responses. Neuron 111, 2918-2928.e8. https://doi.org/10.1016/j.neuron.2023.08.015

      Rao, R.P.N., Ballard, D.H., 1999. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci. 2, 79–87. https://doi.org/10.1038/4580

      Vasilevskaya, A., Widmer, F.C., Keller, G.B., Jordan, R., 2023. Locomotion-induced gain of visual responses cannot explain visuomotor mismatch responses in layer 2/3 of primary visual cortex. Cell Rep. 42, 112096. https://doi.org/10.1016/j.celrep.2023.112096

      Webb, J.M., Sohoglu, E., 2025. Cortical tracking of prediction error during perception of connected speech. https://doi.org/10.1101/2025.07.18.665498

      West, C.L., Bastos, G., Duran, A., Nadeem, S., Ricci, D., Groves, A.M.R., Wargo, J.A., Peterka, D.S., Leeuwen, N.V., Hamm, J.P., 2024. A lasting impact of serotonergic psychedelics on visual processing and behavior. https://doi.org/10.1101/2024.07.03.601959

    1. eLife Assessment

      This important study convincingly shows that Vibrio bacteria act as predators of ecologically significant algae that contribute to harmful blooms in the lab and in their natural habitat, and that predation is induced by starvation. The authors suggest a working model that can be the basis for future work on this system. The study will be very impactful to those interested in the diversity of microbial predator-prey interactions and controlling toxic algal bloom.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. We appreciate the revisions and the authors addressed all of the remaining minor concerns listed by the reviewers. We have no further suggestions for revision.]

      Summary:

      Rolland and colleagues investigated the interaction between Vibrio bacteria and Alexandrium algae. The authors found a correlation between the abundance of the two in the Thau Lagoon and observed in the laboratory that Vibrio grows to higher numbers in the presence of the algae than in monoculture. Timelapse imaging of Alexandrium in coculture with Vibrio enabled the authors to observe Vibrio bacteria in proximity to the algae and subsequent algae death. The authors further determine the mechanism of the interaction between the two and point out similarities between the observed phenotypes and predator prey behaviours across organisms.

      Strengths:

      The study combines field work with mechanistic studies in the laboratory and uses a wide array of techniques ranging from co-cultivation experiments to genetic engineering, microscopy and proteomics. Further, the authors test multiple Vibrio and Alexandria species and claim a wide spread of the observed phenotypes.

      Comments on revisions:

      I thank the authors for their additional work on the manuscript. My comments were addressed to my satisfaction.

    3. Reviewer #2 (Public review):

      Goal summary:

      The authors sought to (i) demonstrate correlations between the dynamics of the dinoflagellate Alexandrium pacificum and the bacterim Vibrio atlanticus in natural populations, ii) demonstrate the occurrence of predation in laboratory experiments, iii) demonstrate that predation is induced by predator starvation, and iv) test for effects of quorum sensing and iron-uptake genes on the predation process.

      Strengths include:

      - Data indicating correlated dynamics in a natural environment that increase the motivation for study of in vitro interactions<br /> - Experimental design allowing clear inference of predation based on population counts of both prey and predators in addition to microscopy-based evidence<br /> - Supplementation of population-level data with molecular approaches to test hypotheses regarding possible involvement of quorum sensing and iron update in predation

      Weaknesses include:

      - A quantitative analysis of effects of manipulating V. atlanticus density on rates of predation would have been valuable

      Appraisal:

      The authors convincingly demonstrate that V. atlanticus can prey on A. pacificum, provide strongly suggestive evidence that such predation is induced by starvation and clearly demonstrate that both iron availability and correspondingly the presence of genes involved in iron uptake strongly influence the efficacy of predation.

      Discussion of impact:

      This paper will interest those interested in the diversity of forms of microbial predation and how microbial predatory behavior responds to environmental fluctuations. It will also interest those investigating bacteria-algae interactions and potential ecological controls of algal blooms. It may also interest researchers of microbial cooperation in light of the suggestion of communication between predator cells.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Rolland and colleagues investigated the interaction between Vibrio bacteria and Alexandrium algae. The authors found a correlation between the abundance of the two in the Thau Lagoon and observed in the laboratory that Vibrio grows to higher numbers in the presence of the algae than in monoculture. Timelapse imaging of Alexandrium in coculture with Vibrio enabled the authors to observe Vibrio bacteria in proximity to the algae and subsequent algae death. The authors further determine the mechanism of the interaction between the two and point out similarities between the observed phenotypes and predator prey behaviours across organisms.

      Strengths:

      The study combines field work with mechanistic studies in the laboratory and uses a wide array of techniques ranging from co-cultivation experiments to genetic engineering, microscopy and proteomics. Further, the authors test multiple Vibrio and Alexandria species and claim a wide spread of the observed phenotypes.

      Comments on revisions:

      I thank the authors for their additional work on the manuscript. My comments were addressed to my satisfaction.

      Dear Reviewer #1, we thank you for your careful evaluation of our manuscript and for the time and effort you dedicated to this review. We are pleased that the revised version has addressed your concerns to your satisfaction.

      Reviewer #2 (Public review):

      Goal summary

      The authors sought to (i) demonstrate correlations between the dynamics of the dinoflagellate Alexandrium pacificum and the bacterim Vibrio atlanticus in natural populations, ii) demonstrate the occurrence of predation in laboratory experiments, iii) demonstrate that predation is induced by predator starvation, and iv) test for effects of quorum sensing and iron-uptake genes on the predation process.

      Strengths include

      - Data indicating correlated dynamics in a natural environment that increase the motivation for study of in vitro interactions

      - Experimental design allowing clear inference of predation based on population counts of both prey and predators in addition to microscopy-based evidence

      - Supplementation of population-level data with molecular approaches to test hypotheses regarding possible involvement of quorum sensing and iron update in predation

      Weaknesses include

      - A quantitative analysis of effects of manipulating V. atlanticus density on rates of predation would have been valuable

      - Lack of clarity in some of the methodological descriptions

      Appraisal

      The authors convincingly demonstrate that V. atlanticus can prey on A. pacificum, provide strongly suggestive evidence that such predation is induced by starvation and clearly demonstrate that both iron availability and correspondingly the presence of genes involved in iron uptake strongly influence the efficacy of predation.

      Discussion of impact

      This paper will interest those interested in the diversity of forms of microbial predation and how microbial predatory behavior responds to environmental fluctuations. It will also interest those investigating bacteria-algae interactions and potential ecological controls of algal blooms. It may also interest researchers of microbial cooperation in light of the suggestion of communication between predator cells.

      Dear Reviewer #2, we sincerely thank you for the time you devoted to this second review of our manuscript. We greatly appreciate your thoughtful comments, which helped us further improve the clarity and precision of the manuscript. All your additional recommendations have been carefully considered and addressed in the revised version and in our responses below.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (2) The authors' reference to Fig. 4a did not address our concern about density potentially affecting the outcomes shown in Fig. 3. Fig. 4a does not provide any quantitative effects of manipulating Vibrio density. But the new density numbers the authors added in response to point (33) do seem to address our concern, because Vibrio densities become lower in the older cultures, excluding the possibility that the increased predation in older cultures might have been due higher Vibrio densities. We think this should be stated explicitly.

      (33) See point (2) above. We think the authors should explicitly state in the text that the increased predation in older cultures was not due higher Vibrio densities in those older cultures, referring to their data.

      As recommended by Reviewer#2, we added the sentence “Importantly, Vibrio densities decreased with culture age, ruling out the possibility that the stronger predation observed in older cultures was driven by higher bacterial densities” in the results section “Attack of A. pacificum ACT03 is activated by V. atlanticus LGP32 starvation.”

      (45) Is it known that bacterial predators collectively feed more on other bacteria than on microbial eukaryotes in natural habitats? While this certainly seems most likely, it's stated as fact and so should either the statement should be supported with relevant citations or phrased as a likely hypothesis.

      As suggested, we rephrased this sentence “Predatory bacteria are found in a wide variety of environments and are commonly described as feeding on other bacteria, although some cases of predation on microbial eukaryotes have also been hypothesized” in the discussion section.

      (46) Perhaps "Conceiving predators as free-living organisms that kill other organisms and feed on them, this study suggest that Vibrios engage in a novel form of predation in which they kill and feed on algae."

      The reference to 'developing' a predator behavior is not clear. What is meant by 'develop'? It seems unnecessary.

      The use of italics when writing Vibrio is inconsistent.

      We agree that the reference to “developing” a predatory behavior was unclear and unnecessary. We therefore revised the sentence as follows: “Conceiving predators as free-living organisms that kill other organisms and feed on them, this study suggests that Vibrio engages in a novel form of predation in which it kills and feeds on algae.” We also corrected the inconsistent use of italics for Vibrio throughout the manuscript.

      (48) The authors might wish to revise this sentence, as although M. xanxthus does have contact-dependent killing mechanism, it is our understanding that both Lysobacter and myxobacteria can kill some prey at a distance with diffusible secretions.

      The sentence “These bacteria must be in close proximity to their prey in order to cause lysis and utilize their biomass, regardless of the prey's species” was replaced by “These bacteria may require close proximity to their prey to cause lysis and utilize their biomass, although some can also kill prey at a distance through diffusible secretions”.

      (50) Why not directly say 'predatory behavior?

      We totally agree and have reworded the sentence.

      Line by line feedback:

      28 '...the phycosphere, an interface ...'

      We agree and have revised the wording.

      24 'In the attack stage, Vibrios...'

      This sentence has been rephrased as recommended.

      35 surrounds -> surround

      The correction has been done.

      36 The lysis is induced by the cells not by the 'stage'. We would rephrase to 'in which the lysis and consumption of the dinoflagellates occurs'

      This sentence has been rephrased as recommended.

      41 'a new mechanism that could to be involved' -> 'a new mechanism that could be involved ...'

      The correction has been done.

      61 forms

      The correction has been done.

      98 'the role...in'

      The suggested correction has been performed.

      103 'Qpcr' -> 'qPCR'

      Thank you for spotting this typo. “Qpcr” was corrected to “qPCR” in the manuscript.

      125 Misplaced punctuation

      The punctuation was corrected.

      152 The use of '.' vs 'x' to indicate multiplication when writing numbers is inconsistent. In some cases both are missing.

      Numbers have been corrected throughout the manuscript.

      231 I would rephrase 'poor nutrient stress' to 'little nutrient stress' or 'no nutrient stress'

      The rephrasing was carried out as suggested.

      310 R and used packages are not cited

      We added the citation (R Core Team, 2024). Linear models, QQ plots (which are part of linear models), tests, and AICs are included in R by default and are credited to the R Core Team.

      The sentence “Statistical analyses were performed using R 3.6.3 software” was replaced by “Statistical analyses were performed using R 3.6.3 software (R Core Team, 2024) using Rstudio”.

      358 'are capable of simultaneously attacking'

      The expression “are capable of simultaneously attacking” was revised in the manuscript to improve clarity and readability.

      366 'exponential growth phase'

      We have corrected the wording to “exponential growth phase” in the revised manuscript.

      430 The large difference in incubation time between the sea-water vs nutrient-rich treatments and use of different media are unfortunate. These additional variables compromise the ability to directly ascribe observed differences to starvation.

      We agree, the sentence “The comparative analysis of the proteome of V. atlanticus LGP32 incubated 60 h in artificial seawater (ENSW) versus V. atlanticus LGP32 grown 12 h in Zobell nutrient-rich medium revealed 10 proteins modulated by nutrient stress (Fig. S2)” was replaced by “The comparative analysis of the proteome of V. atlanticus LGP32 incubated 60 h in artificial seawater (ENSW) versus V. atlanticus LGP32 grown 12 h in Zobell nutrient-rich medium revealed 10 proteins that were differentially abundant under these two contrasting conditions (Fig. S2)”

      443 Somewhat unclear sentence. I would rephrase this to "Remarkably, of the 10 proteins identified by proteomic analysis and eliminated by mutation, only elimination of PvuB prevented V. atlanticus from attacking A. pacificum ACT03."

      To clarify this point, the sentence “Remarkably, among the 10 proteins identified by proteomic analysis only V. atlanticus LGP32 mutant lacking pvuB failed to attack A. pacificum ACT03 (Fig. 4C; ANOVA p <0.001)” was replaced by “Remarkably, of the 10 proteins identified by proteomic analysis and eliminated by mutation, only elimination of PvuB prevented V. atlanticus from attacking A. pacificum ACT03 (Fig. 4C; ANOVA p <0.001).”

      445 'attack simultaneously' -> 'simultaneously attack'

      The suggested modification has been done.

      450 H3BO4 is written as Boron later, it would be good to call it boron here as well so that it is easier to make the connection for the reader.

      We agree, we modified the manuscript and called it boron.

      459 'no linked' -> 'no link'

      The text was modified accordingly.

      483 'which induces' -> 'which induce'

      The correction has been made.

      519 The use of Vibrio atlanticus and V. atlanticus is inconsistent within the text.

      We have checked and modified the manuscript in accordance with the recommendations.

      807-808 The use of the phrase 'Akaike information criterion (AICc) models' is confusing. Aren't these models just generalized linear models? It should be rephrased to make clear that the AICc is just a test that is used to select which model to use.

      We clarified this point by revising Figure 1 legend. The sentences “(C) Result of Akaike information criterion (AICc) models tested to explain the mean value of degraded Alexandrium cells (dead cells) in spring. (D) Wald test of the AICc model attributing the mean value of degraded cells of Alexandrium in spring to free Vibrio “were replaced by “(C) Results of the Akaike Information Criterion (AICc) test conducted to select a model for explaining the mean value of dead Alexandrium (degraded cells) in spring. (D) Wald test of the AICc model explaining the mean value of dead Alexandrium in spring by free Vibrio”

      827 The chronological sequence of snapshots is not very clear. Perhaps it would be clearer if pictures over a shorter timeframe were used to clearly show the gathering of the V. atlanticus cells near the algal cells.

      To address this point, we removed the first and the last 14 seconds of the snapshots to clearly show the gathering of the V. atlanticus cells near the algal cells, and we added an arrow on Fig. 2D to indicate the chronological order.

    1. eLife Assessment

      This important study describes a novel Bayesian psychophysical approach that efficiently measures how well humans can discriminate between colors across the entire isoluminant plane. The evidence was considered compelling, as it included successful model validation against hold-out data and published datasets. This approach could prove to be of use to color vision scientists, as well as to those who employ computational psychophysics and attempt to model perceptual stimulus fields with smooth variations over coordinate spaces.

    2. Reviewer #1 (Public review):

      Summary:

      This paper presents an ambitious and technically impressive attempt to map how well humans can discriminate between colours across the entire isoluminant plane. The authors introduce a novel Wishart Process Psychophysical Model (WPPM) - a Bayesian method that estimates how visual noise varies across colour space. Using an adaptive sampling procedure, they then obtain a dense set of discrimination thresholds from relatively few trials, producing a smooth, continuous map of perceptual sensitivity. They validate their procedure by comparing actual and predicted thresholds at an independent set of sample points. The work is a valuable contribution to computational psychophysics and offers a promising framework for modelling other perceptual stimulus fields more generally.

      Strengths:

      The approach is elegant and well-described, and the data are of high quality. The writing throughout is clear and the figures are clean (elegant in fact) and do a good job of explaining how the analysis was performed. The whole paper is tremendously thorough and the technical appendices and attention to detail are impressive (for example, a huge amount of data about calibration, variability of the stim system over time etc). This should be a touchstone for other papers that use calibrated colour stimuli.

      Comments on revised version:

      The authors have addressed all the issues I raised to my satisfaction.

    3. Reviewer #3 (Public review):

      Summary:

      This study presents a powerful and rigorous approach for characterizing stimulus discriminability throughout a sensory manifold, and is applied to the specific context of predicting color discrimination thresholds across the chromatic plane.

      Strengths:

      Color discrimination has played a fundamental role in studies of human color vision and for color applications, but as the authors note, remains poorly characterized. The study leverages the assumption that thresholds should vary smoothly and systematically within the space, and validates this with their own tests and comparisons with previous studies.

      Comments on revised version:

      My comments have been addressed.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank the editors and the reviewers for the thorough and insightful comments and suggestions. Addressing them has strengthened our manuscript. We have carefully addressed all reviewer comments, as described in detail below, as well as additional comments we received from others. In addition, we made two substantive updates to the manuscript:

      (1) We improved the estimation of uncertainty in the model predictions by computing 95% confidence intervals using 120 bootstrapped datasets (instead of the 100% of 10 bootstrapped datasets in the original submission) to match the number of bootstrap for the validation dataset.

      (2) We selected a slightly different hyperparameter value based on follow-up analyses suggested by Reviewer 1, which provided very useful information.

      Importantly, none of these changes alter the main results or conclusions of the paper.

      Beyond these changes and those outlined below, we also worked to improve the clarity of the prose throughout as well as added various additional citations to the literature.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper presents an ambitious and technically impressive attempt to map how well humans can discriminate between colours across the entire isoluminant plane. The authors introduce a novel Wishart Process Psychophysical Model (WPPM) - a Bayesian method that estimates how visual noise varies across colour space. Using an adaptive sampling procedure, they then obtain a dense set of discrimination thresholds from relatively few trials, producing a smooth, continuous map of perceptual sensitivity. They validate their procedure by comparing actual and predicted thresholds at an independent set of sample points. The work is a valuable contribution to computational psychophysics and offers a promising framework for modelling other perceptual stimulus fields more generally.

      Strengths:

      The approach is elegant and well-described (I learned a lot!), and the data are of high quality. The writing throughout is clear, and the figures are clean (elegant in fact) and do a good job of explaining how the analysis was performed. The whole paper is tremendously thorough, and the technical appendices and attention to detail are impressive (for example, a huge amount of data about calibration, variability of the stim system over time, etc). This should be a touchstone for other papers that use calibrated colour stimuli.

      Weaknesses:

      Overall, the paper works as a general validation of the WPPM approach. Importantly, the authors validate the model for the particular stimuli that they use by testing model predictions against novel sample locations that were not part of the fitting procedure (Figure 2). The agreement is pretty good, and there is no overall bias (perhaps local bias?), but they do note a statistically-significant deviation in the shape of the threshold ellipses. The data also deviate significantly from historical measurements, and I think the paper would be considerably stronger with additional analyses to test the generality of its conclusions and to make clearer how they connect with classical colour vision research. In particular, three points could use some extra work:

      (1) Smoothness prior.

      The WPPM assumes that perceptual noise changes smoothly across colour space, but the degree of smoothness (the eta parameter) must affect the results. I did not see an analysis of its effects - it seems to be fixed at 0.5 (line 650). The authors claim that because the confidence intervals of the MOCS and the model thresholds overlap (line 223), the smoothing is not a problem, but this might just be because the thresholds are noisy. A systematic analysis varying this parameter (or at least testing a few other values), and reporting both predictive accuracy and anisotropy magnitude, would clarify whether the model's smoothness assumption is permitting or suppressing genuine structure in the data. Is the gamma parameter also similarly important? In particular, does changing the underlying smoothness constraint alter the systematic deviation between the model and the MOCS thresholds? The authors have thought about this (of course! - line 224), but also note a discrepancy (line 238). I also wonder if it would be possible to do some analysis on the posterior, which might also show if there are some regions of color space where this matters more than others? The reason for doing this is, in part, motivated by the third point below - it's not clear how well the fits here agree with historical data.

      Thank you for raising this important point. We have now added analyses of the effects of the two smoothness-related hyperparameters, ε and γ (see Appendix 10).

      First, we swept a range of values for each hyperparameter (ε: 0.1 – 1; γ: 0.000001 – 0.003) and evaluated model performance using 5-fold cross-validation of the dataset used to fit the WPPM, quantifying predictive accuracy on held-out test data. We used the mean negative log likelihood averaged across the held-out data in the cross validation as our measure of predictive accuracy (Figs. S27-31).

      The two hyperparameters affect cross-validation accuracy in a similar manner. With γ fixed at 0.0003, predictive accuracy is highest for ε in the range of approximately 0.3–0.5 and drops quite rapidly for ε < 0.3. We attribute this drop to oversmoothing. Cross-validation accuracy also decreases, albeit more gradually, for ε > 0.5. We attribute this to increased variance due to undersmoothing relative to the power of our datasets. Similarly, with ε fixed at 0.4, predictive accuracy is highest for γ values between approximately 0.0001 and 0.001, declines rapidly for smaller γ (oversmoothing), and more slowly for larger γ (undersmoothing).

      Second, we examined how the hyperparameter ε affected the agreement between the WPPM fit and the MOCS validation data. Specifically, at each ε, for each participant, we computed the linear regression between WPPM thresholds and validation thresholds at 25 reference locations. Then, we examined the slope and correlation coefficient of all participants as a function of ε. We found a classic bias–variance tradeoff. Excessive smoothness introduces bias by failing to capture structure in the data, whereas insufficient smoothness increases variance in model predictions. These results further support a choice of ε = 0.4 as lying near the optimal balance between bias and variance (Fig. S32).

      Based on these analyses, we selected for the final analysis ε = 0.4, slightly smaller than the preregistered value used in the original submission (0.5), while retaining the original value of γ (0.0003).

      We now discuss these reasons for changing this value in the revision, as well as provide a more general discussion of the importance and practicalities of hyperparameter choice in Bayesian approaches to analyzing data (Discussion / Prior specification).

      (2) Comparison with simpler models. It would help to see whether the full WPPM is genuinely required. Clearly, the data (both here and from historical papers) require some sort of anisotropy in the fitting - the sensitivities decrease as the stimuli move away from the adaptation point. But it's >not< clear how much the fits benefit from the full parameterisation used here. Perhaps fits for a small hierarchy of simpler models - starting with isotropic Gaussian noise (as a sort of 'null baseline') and progressing to a few low-dimensional variants - would reveal how much predictive power is gained by adding spatially varying anisotropy. This would demonstrate that the model's complexity is justified by the data.

      In the 5-fold cross-validation analysis described above (and now presented in Appendix 10), we found that when ε or γ is small, the stronger smoothness constraint leads to threshold ellipses that are nearly identical to each other across color space. Under these conditions, model predictions show poor accuracy on held-out test data and lead to poor predictions of the validation data. This observation addresses the underlying point raised by the reviewer, albeit in a different way than suggested: it shows that a degree of spatially varying anisotropy is necessary to capture the structure of the data. We now make this point in the paper (Discussion / Prior specification).

      More broadly, we employed the WPPM as a prior that imposed smoothness but not much other obvious structure, and used this to learn about the psychometric field. We are currently working to understand how we can best use our current data to improve the prior we would apply to future measurements. There are a number of approaches to this. One would be to seek a parametric mechanistic model that can describe the current data, and to the extent this is possible formulate prior distributions over the parameters of the model. The results reported here thus provide a foundation for deriving and evaluating more structured priors that would even more efficiently leverage future datasets, but with the feature that they impose more structure. We have added this perspective to the Discussion / Extensions of the WPPM framework.

      (3) Quantitative comparison to historical data. The paper currently compares its results to MacAdam, Krauskopf & Karl, and Danilova & Mollon only by visual inspection. It is hard to extract and scale actual data from historical papers, but from the quality of the plotting here, it looks like the authors have achieved this, and so quantitative comparisons are possible. The MacAdam data comparisons are pretty interesting - in particular, the orientations of the long axes of the threshold ellipses do not really seem to line up between the two datasets - and I thought that the orientation of those ellipses was a critical feature of the MacAdam data. Quantitative comparisons (perhaps overall correlations, which should be immune to scaling issues, axis-ratio, orientation, or RMS differences) would give concrete measures of the quality of the model. I know the authors spend a lot of time comparing to the CIE data, and this is great.... But re-expressing the fitted thresholds in CIE or DKL coordinates, and comparing them directly with classical datasets, would make the paper's claims of "agreement" much more convincing.

      Although we are sympathetic to this request, we have chosen not to implement the sort of quantitative comparison requested by the reviewer. The reason is that an important feature of color thresholds is that they depend on the spatial (e.g. Kelly, 1974; Poirson & Wandell, 1996; Danilova & Mollon, 2025) and temporal (e.g. Kelly, 1974) properties of the stimuli, and on the observer’s state of adaptation (e.g. Loomis & Berger, 1979; Krauskopf & Gegenfurtner, 1992). Because (as the reviewer notes below) the spatial and temporal properties of our stimuli were not matched to those of the comparison datasets, our purpose in making these comparisons was to examine qualitative agreement, as well as to situate our results in the literature and to demonstrate that our approach allows us to read out thresholds around the references and in the color spaces used in other studies. We would not expect detailed quantitative agreement with the current dataset because of differences in stimuli.

      As a consequence of this, we think we would be overreaching to quantify the differences between our data and classic datasets. This consideration is particularly important for the MacAdam measurements, where because of the matching adjustment procedure used, the observer’s state of adaptation is likely to have varied (by amounts that are difficult to estimate) from one reference to the next (e.g. Danilova & Mollon, 2025). We have clarified the manuscript with respect to these points (Results / Comparison with previous measurements).

      A point to make on this topic is that an important and interesting future direction that emerges from our work is to develop efficient methods to characterize the dependence of the full discrimination field on ancillary variables, such as those that describe spatial and temporal properties and/or the state of adaptation, which we now also mention in the paper (Discussion / Implications for the mechanisms of color perception). Although not the primary motivation, doing so would enable comparison of data with a wider range of studies.

      We do agree that the comparisons to CIELAB predictions work better when we express them in CIELAB, and have now done so (Fig. 3D; Fig. S24-S26).

      Kelly, D. H. (1974). "Spatio-temporal frequency characteristics of color-vision mechanisms." Journal of the Optical Society of America 64(7): 983–990.

      Poirson, A. B. and B. A. Wandell (1996). "Pattern-color separable pathways predict sensitivity to simple colored patterns " Vision Research 36(4): 515–526.

      Danilova, M. V. and J. D. Mollon (2025). "Effect of stimulus size on chromatic discrimination." Journal of the Optical Society of America A 42(5).

      Loomis, J. M. and T. Berger (1979). "Effects of chromatic adaptation on color discrimination and color appearance." Vision Research 19(8): 891–901.

      Krauskopf, J., Gegenfurtner, K. (1992). "Color discrimination and adaptation." Vision Research 32(11): 2165–2175.

      Overall, this is a creative and technically sophisticated paper that will be of broad interest to vision scientists. It is probably already a definitive method paper showing how we can sample sensitivity accurately across colour space (and other visual stimulus spaces). But I think that until the comparison with historical datasets is made clear (and, for example, how the optimal smoothness parameters are estimated), it has slightly less to tell us about human colour vision. This might actually be fine - perhaps we just need the methods?

      Related to this, I'd also note that the authors chose a very non-standard stimulus to perform these measurements with (a rendered 3D 'Greebley' blob). This does have the advantage of some sort of ecological validity. But it has the significant disadvantage that it is unlike all the other (much simpler) stimuli that have been used in the past - and this is likely to be one of the reasons why the current (fitted) data do not seem to sit in very good agreement with historical measurements.

      As the reviewer notes, our stimuli head in the direction of ecological validity (see also Hedjar et al., 2025) and indeed this was a consideration when we chose them, at the cost of limiting the degree of comparison we can make with prior studies (as discussed above). Another reason we chose our stimuli is that they enable the current data to be used as a basis of comparison with stimuli where we add specularity, change object shape, and vary object pose in the future. These manipulations are not possible with flat matte patches. Such experiments are of interest to us, as they will tell us about how effectively color may be used to differentiate stimuli in cases where other ecologically important variables co-vary. We now mention this motivation in the paper (Results / Task and Stimuli).

      Hedjar, L., M. Toscani and K. R. Gegenfurtner (2025). "Importance of hue: color discrimination of three-dimensional objects and two-dimensional discs." Journal of the Optical Society of America A 42(5).

      Reviewer #2 (Public review):

      Summary:

      Hong et al. present a new method that uses a Wishart process to dramatically increase the efficiency of measuring visual sensitivity as a function of stimulus parameters for stimuli that vary in a multidimensional space. Importantly, they have validated their model against their own hold-out data and against 3 published datasets, as well as against colour spaces aimed at 'perceptual uniformity' by equating JNDs. Their model achieves high predictive success and could be usefully applied in colour vision science and psychophysics more generally, and to tackle analogous problems in neuroscience featuring smooth variation over coordinate spaces.

      Strengths:

      (1) This research makes a substantial contribution by providing a new method to very significantly increase the efficiency with which inferences about visual sensitivity can be drawn, so much so that it will open up new research avenues that were previously not feasible. Secondly, the methods are well thought out and unusually robust. The authors made a lot of effort to validate their model, but also to put their results in the context of existing results on colour discrimination, transforming their results to present them in the same colour spaces as used by previous authors to allow direct comparisons. Hold-out validation is a great way to test the model, and this has been done for an unusually large number of observers (by the standards of colour discrimination research). Thirdly, they make their code and materials freely available with the intention of supporting progress and innovation. These tools are likely to be widely used in vision science, and could of course be used to address analogous problems for other sensory modalities and beyond.

      Weaknesses:

      It would be nice to better understand what constraints the choice of basis functions puts on the space of possible solutions. More generally, could there be particular features of colour discrimination (e.g., rapid changes near the white point) that the model captures less well.

      This comment bears conceptual similarity to Reviewer 1’s question about the hyperparameters of our prior, as it is basically asking whether we might be oversmoothing through the choice of form and number of basis functions. The hyperparameter sweeps we now present suggest that within the choice of basis functions we used, we are operating at a reasonable point on the bias-variance tradeoff curve - we can see bias emerging with a smoother prior, and variance increasing with a less smooth prior. Our expectation is that varying the smoothness of the prior in other ways, such as by varying the form and number of the basis functions, would lead to similar tradeoffs.

      We did perform one additional check that shows, within our current framework, that adding more basis functions is unlikely to change things much. This was to plot the fit weights as a function of Chebyshev basis order (Figure S4 in Appendix 2). These decline to near zero at the highest order we used, suggesting that adding more would not alter the inferred psychometric field, given our hyperparameter choices. Although we could explore this question further by explicitly fitting the data using more basis functions along with different hyperparameter choices, or different functional forms for the basis functions, we decided not to pursue this in favor of performing the other additional analyses we now present.

      We resonate with the reviewer’s concern that assuming smoothness, both by assuming that isoperformance contours are elliptical and by assuming that these vary smoothly with reference, might cause us to miss features of the true underlying field in cases where that field varies rapidly or the isoperformance contours are asymmetric or non-elliptical. Our approach to this was to measure the validation thresholds and demonstrate that any bias in our WPPM-inferred field is small for these measurements. Because we shared the reviewer’s intuition that the adapting point is a candidate location where there might be less smooth variation, we measured a validation threshold at this reference for every subject. Nonetheless, we only measured in one direction around the adapting reference for each subject. We considered validation approaches where we measured full ellipses at a set of validation references, but we were worried about effects of uncertainty reduction and perceptual learning which might distort thresholds at highly sampled locations.

      It is the case that if one wanted to study the discrimination field in more detail around a particular reference, one could concentrate trials in a smaller model space around that reference, and for the same number of trials use a prior with less smoothness relative to the underlying stimulus space. Indeed, simply halving the size of the stimulus space that maps onto the [-1,1] model space and keeping the same prior over the model space effectively halves the degree of smoothness expressed with respect to the stimulus space. Thus our methods could prove useful in studying more rapid variations in the discrimination field if one hypothesized that they might occur around particular reference choices, but this would still rest upon the elliptical assumption. To relax that assumption, one could use the threshold field estimation methods implemented in AEPsych, which incorporate a smoothness assumption but do not assume elliptical isoperformance contours. Weakening the prior in this way would, however, increase trial demand to obtain similar measurement precision.

      As a general matter, we don’t think it is possible to leverage smoothness for trial efficiency on the one hand and at the same time be completely sure that there isn’t some aspect to the underlying ground truth that has been smoothed over. Carefully choosing the degree of prior smoothness together with the number of experimental trials in the context of a particular content problem is an important part of bringing the WPPM and related methods to bear, and one where simulation and held-out data both play an important role.

      We now bring these points out more fully in the paper (Discussion / Extensions of the WPPM framework; Discussion / Prior specification).

      Chen, C.-C., J. M. Foley and D. H. Brainard (2000). "Detection of chromoluminance patterns on chromoluminance pedestals I: threshold measurements." Vision Research 40(7): 773–788.

      The substantial individual differences evident in Figure S20 (comparison with Krauskopf and Gegenfurtner, 1992) are interesting in this context. Some observers show radial biases for the discrimination ellipses away from the white point, some show biases along the negative diagonal (with major axes oriented parallel to the blue-yellow axis), and others show a mixture of the two biases. Are these genuine individual differences, or could the model be performing less accurately in this desaturated region of colour space?

      We agree that these differences are interesting. We have now added more complete bootstrapped confidence regions in these (Appendix 8) and the other comparison figures (Appendix 6, 7, 9), so that an estimate of measurement precision is directly available in these figures. These confidence regions suggest that the individual differences in this region of color space are real. A longer-term goal is to develop more mechanistic models that can account for individual subject data through parameter choice. This might lead to insight into what differs in the visual system across individuals.

      Reviewer #3 (Public review):

      Summary:

      This study presents a powerful and rigorous approach for characterizing stimulus discriminability throughout a sensory manifold, and is applied to the specific context of predicting color discrimination thresholds across the chromatic plane.

      Strengths:

      Color discrimination has played a fundamental role in studies of human color vision and for color applications, but as the authors note, it remains poorly characterized. The study leverages the assumption that thresholds should vary smoothly and systematically within the space, and validates this with their own tests and comparisons with previous studies.

      Weaknesses:

      The paper assumes that threshold variations are due to changes in the level of intrinsic noise at different stimulus levels. However, it's not clear to me why they could not also be explained by nonlinearities in the responses, with fixed noise. Indeed, most accounts of contrast coding (which the study is at least in part measuring because the presentation kept the adapt point close to the gray background chromaticity, and thus measured increment thresholds), assume a nonlinear contrast response function, which can at least as easily explain why the thresholds were higher for colors farther from the gray point. It would be very helpful if a section could be added that explains why noise differences rather than signal differences are assumed and how these could be distinguished. If they cannot, then it would be better to allow for both and refer to the variation in terms of S/N rather than N alone.

      We agree with the reviewer. We are measuring SNR and attributing it to noise, but cannot identify from the data whether changes in SNR across color spaces are due to changes in noise, to a nonlinear relationship between stimulus space and the observer’s response space with noise in the response space held fixed, or both. We now make this point where we introduce the Results / Wishart Process Psychophysical Model and reiterate it in the Discussion / Extensions of the

      WPPM framework.

      Related to this point, the authors note that the thresholds should depend on a number of additional factors, including the spatial and temporal properties and the state of adaptation. However, many of these again seem to be more likely to affect the signal than the noise.

      We don’t disagree. Indeed, as we noted in our response to a comment by Reviewer 1 and above in the context of individual differences, we are very interested in developing a mechanistically plausible model that accounts for the data. If we or others are able to do so, that would provide a basis for parsing performance into separate signal and noise effects. And if such a model has natural ways in which additional variables affect its predictions, measuring the effects of these variables would be a way to provide evidence in favor of the model (Discussion / Implication for the mechanisms of color perception - Extensions of the WPPM framework).

      An advantage of the approach is that it makes no assumptions about the underlying mechanisms. However, the choice to sample only within the equiluminant plane is itself a mechanistic assumption, and these could potentially be leveraged for deciding how to sample to improve the characterization and efficiency. For example, given what we know about early color coding, would it be more (or less) efficient to select samples based on a DKL space, etc?

      The more we are willing to assume about the structure of the psychometric field, the more efficiently we can measure it. As the reviewer correctly notes, this principle applies to trial placement as well. We are currently using an adaptive method (AEPsych) that starts with a fairly weak smoothness prior and attempts to place trials using heuristics that aim to minimize the expected uncertainty in the posterior. As we learn more about the discrimination field, we should be able to leverage stronger priors to increase trial efficiency. This point is closely related to one we made above about developing stronger priors that capture what we have learned in this study. Such priors could also help improve trial placement. For a prior that has a relatively small number of parameters, for example, perhaps a mechanistic prior, methods such as Quest+ (Watson, 2017) may be used for trial placement.

      Watson, A. B. (2017). "QUEST+: A general multidimensional Bayesian adaptive psychometric method." J Vis 17(3): 10.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I do not think that the authors need to perform additional experiments. However, I would like to see some additional analyses regarding the assumptions made in the fitting procedure and how they affect the final maps.

      I also think some more quantitative comparisons with historical data would be valuable - at the moment, a lot of the comparisons are simply 'by eye'.

      It would have been nice to have the code and data available during the review procedure - I'm sure these will be released with excellent documentation?

      We addressed the first two points in the public review section. The code is now available online as is the data. These links are now provided in the paper (Methods and Materials / Data and code availability).

      Reviewer #2 (Recommendations for the authors):

      Minor points

      I have a few suggestions for additions and small changes.

      (1) Several examples of covariance matrix fields are shown in Figure 1, 4, but these are for simulated examples. It would be nice to see the fields actually fit the data! I would be interested in seeing this for all participants in an Appendix, and maybe for participant CH in the main paper?

      We have made the changes (see Figure 4 and Figure S3).

      (2) I have not worked through all the math in the appendices line by line, but it seems to be complete, and the model validation results speak for themselves. I think the authors have done a pretty good job of explaining the model conceptually (not easy), but I struggled with the 'weighted sum' step in Figure 4 and the main text. I would appreciate a bit more hand-holding here, e.g, why is an 'overcomplete' representation needed as an intermediate, and providing an intuition of why there are 12 matrices in the overcomplete representation and what each matrix in this representation represents.

      We have now added more explanations in the figure legend and text (Fig. 4 and Methods and Materials / The Wishart Process Psychometric Model).

      (3) Individual differences: There is a section on this in the manuscript, and it's concluded that there are only "modest" individual differences. However, in Figure S20, the individual differences, I think, are huge and place observers almost in qualitatively different categories! Some observers show a radial bias in discrimination ellipses, others seem to show basically a bias along the negative diagonal, and others a mixture of both biases. These ellipses are at a desaturated part of colour space - is it possible that there are some rapid changes in the underlying noise in this region that the Wishart fit has not captured due to relatively sparse sampling or the fact that the basis functions are all fairly low spatial frequency? I wondered whether the results are constrained by the choice of Cartesian rather than polar basis functions, e.g, polar basis functions may have better allowed fine-grained changes near the white point but slower changes at higher saturations away from the white point.

      We agree that the individual differences are meaningful and, in some cases, quite pronounced. Our intent in describing the differences as “modest” was to emphasize that the overall structure of the psychometric fields remains broadly consistent across observers. We have revised the Results to note and more fully describe these differences.

      Regarding the possibility that sharp changes in the underlying noise near the achromatic point might not be fully captured by the current model, we agree that this is an important consideration. The current implementation uses relatively low-order Chebyshev basis functions that primarily capture smooth global variations in the psychometric field. While validation analyses indicate that these basis functions capture the dominant structure in the data, they may be less sensitive to sharp local variations such as those that could occur near the white point. Future work could address this by mapping the model space to a smaller region around the achromatic reference or by exploring alternative basis sets (e.g., polar or Zernike functions) that may better capture such localized structure. This is discussed above in this response and now addressed in Discussion / Extensions of the WPPM framework.

      On sampling, I wondered if the results might have been biased by the strongly biased ellipse that occurs at the grey point. If not, and the model is accurate in this region of colour space, I think this figure does show some large individual differences, and it would be good to comment on these in the individual differences section of the manuscript.

      Based on our analysis of trial placement (Fig. S1), the adaptive algorithm does not appear to have disproportionately concentrated trials near the gray point. In fact, more trials were allocated to the edges of the stimulus space than to the center. This suggests that the WPPM estimates are unlikely to be driven primarily by performance in the gray region. In addition, we examined the threshold ellipses around the gray reference in DKL space and found that they are broadly consistent across participants (Figs. S22–S23). Together, these analyses suggest that the anisotropy observed near the gray point reflects a genuine property of the psychometric field rather than an artifact of the sampling procedure.

      As noted just above, we have added additional text about individual differences in the Results and referenced it in the Discussion.

      (4) The manuscript seems unusually free of typographical errors, but I noticed that in many places "Krauskopf and Karl 1992" is cited! Also, I think something has gone wrong with the legend to Figure 2 - perhaps the order of panels was swapped around, but the legend was not fully updated. There is a repeated reference to the "summary of regression slopes" which seems to be in 2 positions, after C and G. It would make more sense to label panel G as D and progress from there, or switch the order of the panels so that G is on the bottom row.

      Thank you for catching those errors. They are now fixed.

      Reviewer #3 (Recommendations for the authors):

      A minor point (or perhaps major if your last name is Gegenfurtner) is that the reference to Krauskopf and Karl is incorrect.

      They are now fixed.

    1. eLife Assessment

      This study addresses the mechanism of action of benzoylurea insecticides and explores the metabolic consequences of inhibiting glycogen breakdown in insects. Both reviewers identify major flaws with the premise of the work. The strength of the provided evidence is inadequate as the data do not, or poorly, support several central claims. The significance of the findings is considered marginal.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors investigate whether glycogen phosphorylase is a potential molecular target of benzoylphenylurea insecticides and examine the physiological consequences of inhibiting glycogen breakdown in the diamondback moth Plutella xylostella. The authors express and characterize recombinant glycogen phosphorylase, test its inhibition by a mammalian glycogen phosphorylase inhibitor and by the insecticide diflubenzuron, and assess the physiological effects of glycogen phosphorylase inhibition through chemical exposure and RNA interference. Based on these experiments, the authors conclude that benzoylphenylurea insecticides do not target glycogen phosphorylase and propose that insects compensate for glycogen phosphorylase inhibition through activation of gluconeogenesis, allowing them to maintain glucose homeostasis and complete development despite strong suppression of the enzyme.

      Strengths:

      The study addresses an interesting and long-standing question in insect toxicology regarding the mechanism of action of benzoylphenylurea insecticides. The authors combine several complementary approaches, including recombinant enzyme characterization, inhibitor assays, RNA interference, gene expression analyses, and metabolite measurements. The biochemical characterization of the recombinant glycogen phosphorylase and the demonstration that the tested glycogen phosphorylase inhibitor can strongly inhibit enzyme activity represent important technical strengths. In addition, the study integrates biochemical and physiological observations to explore how insects might compensate for disruptions in central carbohydrate metabolism.

      Weaknesses:

      Several aspects of the central conclusions rely on indirect evidence and would benefit from additional validation. The proposed compensatory mechanism (gluconeogenesis supported by amino acid mobilization) is inferred primarily from transcriptional changes in gluconeogenic genes, reduced protein levels, and changes in metabolite concentrations. While these observations are consistent with increased gluconeogenic activity, they do not directly demonstrate metabolic flux through this pathway. Direct measurements of gluconeogenic flux would be required to confirm that carbon derived from non-carbohydrate substrates contributes to glucose production.

      Some interpretations are also speculative. For example, the lack of glycogen accumulation following glycogen phosphorylase knockdown is attributed to alternative glycogen degradation pathways, such as α-amylase or glycogen debranching enzymes, but these possibilities are not experimentally examined. Measuring the expression or activity of these enzymes would help evaluate whether such pathways contribute to the observed metabolic response.

      The physiological consequences of the proposed metabolic compensation are also not fully explored. If proteins are mobilized to support gluconeogenesis, this shift might be expected to affect organismal traits such as adult body size, flight capacity, or reproductive performance. Assessing these traits could provide valuable insight into whether the proposed compensatory metabolism carries fitness costs.

      Finally, some conclusions extend beyond the direct evidence presented. The study shows that diflubenzuron does not inhibit glycogen phosphorylase in vitro, but broader conclusions regarding the mechanism of action of benzoylphenylurea insecticides as a class may require additional evidence. In addition, some biochemical and cell-based observations would benefit from confirmation in whole insects, given that metabolic regulation can differ substantially between isolated enzyme or cell-based systems and intact larvae, where hormonal signaling, tissue interactions, and nutrient availability influence metabolic responses.

    3. Reviewer #2 (Public review):

      (1) Significance of the findings and strength of the evidence

      This manuscript evaluates the hypothesis that benzoylurea (BPU) insecticides exert their effects through inhibition of glycogen phosphorylase rather than chitin synthase (CHS). The central premise-that structural similarity among acylurea compounds implies shared molecular targets-is not supported by existing evidence.

      Extensive genetic and biochemical studies, including Reference 5, demonstrate that chitin synthase is the primary insecticidal target of BPUs. In particular, amino acid substitutions at a single site in CHS confer high levels of resistance to diflubenzuron and related compounds, with causality established through CRISPR/Cas9 editing in Drosophila melanogaster. This body of evidence substantially weakens the rationale for proposing glycogen phosphorylase as an alternative primary target.

      The manuscript reports that an acylurea compound previously identified as an inhibitor of mammalian glycogen phosphorylase also inhibits glycogen phosphorylase from Plutella xylostella, while diflubenzuron does not. This observation is consistent with prior work showing that glycogen phosphorylase inhibition among acylureas depends on specific side chain substitutions rather than the shared acylurea core. Consequently, the finding does not support the broader inference that acylurea structure predicts common biological function.

      The manuscript further argues that inhibition of glycogen phosphorylase is not insecticidal and attributes this to metabolic compensation through alternative glucose producing pathways. While it is well established that eukaryotic cells possess multiple mechanisms for maintaining glucose availability, the evidence provided here does not fully support the broader claim that this mechanism explains the lack of insecticidal activity. In particular, the conclusion that the study "resolves" the primary hypothesis is not justified by the data presented.

      Overall, while some experimental observations are sound in isolation, the overarching conclusions are not supported by the strength of the evidence. The significance of the findings is therefore limited.

      (2) Interpretation in the context of existing literature

      The introduction states that the molecular target of BPU insecticides remains a major unresolved controversy. However, multiple prior studies, including References 1, 4, and 5, provide strong genetic evidence that CHS is the primary and essential target of BPUs. These results demonstrate causality rather than simple correlation, particularly through targeted gene editing approaches.

      The manuscript further claims that biochemical studies have failed to demonstrate CHS inhibition by BPUs in cell free assays. However, the cited references (6-9) did not express CHS in such assays and therefore do not directly address this question. As a result, the suggested discrepancy between genetic and enzymatic evidence is not well founded.<br /> Structural analysis of acylurea compounds indicates that biological activity depends on side chain composition rather than the conserved acylurea core. Prior screening studies (Reference 11) show substantial variability in glycogen phosphorylase inhibition among acylureas despite a shared core structure. This undermines the proposal that the acylurea moiety itself constitutes a meaningful clue to a shared molecular mechanism.

      Regarding implications for pesticide design, targeting chitin synthesis remains an attractive strategy because chitin is essential for arthropods and absent in mammals, providing both efficacy and specificity. By contrast, metabolic enzymes such as glycogen phosphorylase are widely conserved, making them less suitable targets from a toxicological and safety perspective.

      (3) Specific technical comments

      The manuscript uses the term "dataology," which is neither defined nor contextualized within the text. As currently used, the term appears unrelated to the subject matter and may be confusing to readers. Clarification or removal would improve clarity.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The proposed compensatory mechanism is inferred primarily from transcriptional changes and metabolite levels; direct measurements of gluconeogenic flux are lacking.

      We agree that isotopic tracer experiments would provide the most direct evidence for gluconeogenic flux. While such experiments are beyond the scope of the current revision, we will explicitly acknowledge this as a key limitation and clearly state it as an important direction for future research. We note, however, that the convergent evidence from multiple independent lines, transcriptional upregulation of PEPCK and G-6-Pase, declining protein levels, altered amino acid profiles, and maintained trehalose levels, collectively supports gluconeogenic activation, even though each individual line is indirect. In the revised manuscript, we will present this evidence more cautiously, framing it as “consistent with gluconeogenic compensation” rather than definitively establishing metabolic flux.

      (2) Alternative glycogen degradation pathways (α-amylase, glycogen debranching enzymes) are proposed but not experimentally examined.

      We have now directly addressed this by measuring, via RT-qPCR, the expression of glycogen branching enzyme (GBE) and α-amylase following PxGP knockdown. Our preliminary results reveal a striking and informative pattern:

      GBE was significantly upregulated at 24 h (+29.24%), 48 h (+16.78%), and 96 h (+44.46%) post-injection, indicating transcriptional activation of an alternative glycogen-metabolizing enzyme in response to GP suppression.

      α-Amylase showed no significant change at any time point, suggesting that the compensatory response is pathway-specific rather than a generalized upregulation of all starch/glycogen-degrading enzymes.

      This differential response, GBE up while α-amylase unchanged, provides the first direct evidence that P. xylostella selectively activates specific alternative glycogen catabolic pathways when GP function is compromised. These data will be incorporated into the revised manuscript as a new figure panel.

      (3) Physiological consequences of the proposed metabolic compensation (fitness costs) are not explored.

      We have now assessed fitness consequences of PxGP knockdown by measuring feeding rate, larval body weight, and pupal weight. The results reveal a transient but significant fitness cost:

      Feeding rate: no significant difference between dsGP and dsGFP groups across all time points (24–120 h), indicating that the observed metabolic changes are not attributable to reduced food intake.

      Larval weight: significantly reduced at 24 h (−29.10%) and 48 h (−25.38%) in the dsGP group, demonstrating that metabolic compensation carries a measurable short-term cost.

      Pupal weight: no significant difference, indicating that larvae recover from the transient weight deficit before pupation.

      This pattern, transient larval weight loss with full pupal recovery, is consistent with our proposed model: GP suppression triggers protein catabolism to fuel gluconeogenesis (explaining the weight loss), but the compensatory mechanism is sufficiently effective to restore metabolic homeostasis before the pupal transition. Adult wing area and female fecundity measurements are currently in progress and will be included in the revised manuscript.

      (4) Enzyme activity is not measured in RNAi-treated insects; only transcript-level knockdown is reported.

      We have now measured GP enzyme activity (GPa) in crude extracts from RNAi-treated larvae using the coupled-enzyme spectrophotometric assay. The results provide important new insights:

      Per-larva GP activity was significantly reduced at 24 h (−27.57%) and 48 h (−29.28%), confirming that RNAi-mediated transcript suppression translates to reduced enzyme function in vivo.

      Per-protein GP activity showed a significant reduction only at 48 h (−10.35%). This apparent discrepancy is explained by a substantial decrease in total protein concentration at 24 h (−44.48%), which then gradually recovered. When enzyme activity is normalized to a declining protein pool, the per-protein reduction appears smaller.

      Importantly, the 44.48% decline in total protein at 24 h provides independent biochemical confirmation of our proposed protein catabolism: it is consistent with the mobilization of protein stores to supply amino acids for gluconeogenesis, directly supporting the compensatory mechanism described in our manuscript.

      These enzyme activity data will be presented alongside the existing transcript-level data in the revised manuscript, providing a complete picture from gene expression through enzyme function.

      (5) Conclusions regarding BPU class may require testing additional compounds beyond diflubenzuron.

      We agree and will explicitly limit our conclusion to diflubenzuron in the revised manuscript. The relevant text will be revised to state that “DFB does not inhibit PxGP” rather than making broader claims about the BPU class as a whole.

      (6) Structural evidence that GPI can bind PxGP in a comparable manner to its mammalian target is lacking.

      We have performed molecular docking and binding free energy analysis to address this concern directly. The PxGP homodimer structure was modeled using SWISS-MODEL with the rabbit muscle GP–acyl urea co-crystal structure (PDB: 2ATI; Klabunde et al., 2005) as the template. Molecular docking and MM/GBSA calculations were performed using Cresset Flare V11.

      Key findings:

      GPI exhibited substantially stronger binding to PxGP (ΔG = −34.63 kcal/mol) compared to DFB (ΔG = −29.29 kcal/mol), with a ΔΔG of −5.34 kcal/mol.

      Energy decomposition revealed that van der Waals interactions are the primary driver of selectivity (ΔG<sub>VDW</sub> = −11.49 kcal/mol), reflecting superior shape complementarity of GPI within the binding pocket.

      GPI was predicted to bind at the allosteric site at the dimer interface, engaging seven residues across both subunits (Asn44 and Val45 from chain A; Trp67, Gln71, Tyr75, Arg193, and Asp227 from chain B), a binding mode consistent with the experimentally determined site of acyl urea inhibitors in mammalian GP.

      DFB contacted only six residues, primarily from a single subunit, and its difluorobenzoyl moiety remained entirely solvent-exposed without productive protein contacts, explaining its inability to achieve effective target engagement.

      These structural data, together with the biochemical inhibition data (IC<sub>50</sub> = 2.96 nM for GPI; no inhibition by DFB), provide a comprehensive molecular explanation for the observed selectivity. The results will be presented as a new figure and table in the revised manuscript.

      (7) Dietary carbohydrates could mask the metabolic effects of GP inhibition.

      Our new data showing no difference in feeding rate between dsGP and dsGFP groups addresses this concern from one angle: the metabolic changes we observe are not attributable to altered food intake. We will also add a discussion of the potential contribution of dietary carbohydrates to glucose homeostasis and acknowledge this as a caveat in interpreting the metabolite data.

      Minor points: All terminology errors (“gluconeogenolysis” → “gluconeogenesis”), typographical errors (“over over four decades”), and formatting inconsistencies will be corrected. We will clarify the metabolite normalization approach and improve figure labeling and pathway schematics.

      Reviewer #2 (Public review):

      (1) The central premise — that structural similarity among acylurea compounds implies shared molecular targets — is not supported by existing evidence.

      We agree that the original manuscript overstated the significance of the shared acylurea core as a predictor of common biological activity. In the revised manuscript, we will substantially restructure the Introduction to:

      (1) Explicitly acknowledge the compelling genetic evidence from CRISPR/Cas9 experiments (Reference 5) establishing CHS as the primary site conferring BPU resistance.

      (2) Reframe the study’s objective: rather than proposing to “resolve” the BPU target controversy, the revised manuscript will focus on the systematic evaluation of GP as an independent insecticidal target and the discovery of a gluconeogenic compensation mechanism, questions that have scientific value independent of the BPU mechanism debate.

      (3) Remove the claim that the study “resolves the primary hypothesis.” The conclusion will instead state that our biochemical data demonstrate DFB does not inhibit PxGP, adding enzyme-level evidence to the existing genetic framework.

      (2) Target selectivity among acylurea compounds is determined by side-chain composition, not the shared core.

      We fully agree, and our new structural data now provide a molecular explanation for this principle at the atomic level. Molecular docking reveals that both GPI and DFB anchor to PxGP through their common acylurea carbonyl groups (forming hydrogen bonds with Arg193), but diverge dramatically in their side-chain engagement: GPI’s methoxyphenyl-methylurea moiety engages five additional residues across the dimer interface, while DFB’s difluorobenzoyl group remains entirely solvent-exposed. The van der Waals energy difference (ΔΔG<sub>VDW</sub> = −11.49 kcal/mol) quantitatively reflects this differential shape complementarity. These data directly support Reviewer 2’s point and will be presented as new evidence in the revised manuscript.

      (3) References 6–9 did not express CHS in cell-free assays.

      We will revise the relevant passage for greater precision. Our revised text will distinguish between (a) the absence of direct biochemical evidence for BPU-mediated CHS inhibition in cell-free systems and (b) the technical challenge of expressing and purifying functional CHS for such assays. This distinction will be stated more carefully to avoid any mischaracterization of the cited literature.

      (4) The term “dataology” is non-standard.

      This term has been removed and replaced with “data.” In accordance with eLife’s policy on the use of AI tools and technology, we will include a statement in the Materials and Methods section declaring that AI-based language editing tools were used for English grammar and style refinement. All scientific content was generated entirely by the authors.

      Author response table 1.

      We are confident that the substantial new experimental data and restructured narrative will meaningfully strengthen the manuscript.

    1. eLife Assessment

      The study provides valuable findings suggesting that modifying the donor's diet improves the effectiveness of fecal transplant therapies for liver disease. Although the reported results are of value, the evidence supporting the overall conclusions is incomplete. In particular, causal inferences regarding the effects of microbiota composition, as well as caproic acid signaling on the phenotypes studied, need further confirmation.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to determine whether dietary conditioning of fecal microbiota donors can influence the therapeutic efficacy of fecal microbiota transplantation (FMT) in alcohol-associated liver disease (ALD). Specifically, they tested whether donor diets enriched in vegetable or egg-derived proteins alter microbiota composition and function in ways that enhance recovery from alcohol-induced liver injury. Using a murine ALD model, the study integrates microbiome profiling, metabolomics, proteomics, and functional assays to identify mechanisms underlying improved outcomes. The authors propose that vegetable protein-conditioned microbiota promote beneficial microbial remodeling and increased production of caproic acid, which in turn activates hepatic PPARα signaling and enhances fatty acid β-oxidation, thereby reducing steatosis and inflammation.

      Strengths:

      The study is ambitious and methodologically comprehensive. The central idea, that donor diet can modulate FMT efficacy in ALD, is compelling and potentially impactful. It combines in vivo disease models, microbiome analysis (16S rRNA sequencing), metabolomics and proteomics, pharmacological inhibition experiments, and in vitro validation in hepatocytes. This multi-layered approach is a clear strength and allows the authors to explore the gut-liver axis. The comparison between different protein sources (vegetable vs egg) is very interesting, and the PPARα inhibition experiments provide relatively strong functional support for the involvement of host metabolic signaling pathways in mediating the observed effects.

      Weaknesses:

      Despite the comprehensive scope of the manuscript, several aspects of the study limit the strength of its mechanistic conclusions. The causal attribution to caproic acid remains incomplete. While caproic acid is identified and functionally tested, there is no direct demonstration that it is necessary for the Veg-FMT phenotype in vivo. The metabolomics data suggest multiple candidate metabolites, but these are not systematically explored. The study identifies specific bacterial taxa and, separately, key metabolites, but does not establish a direct connection between microbial composition and metabolite production. The use of GW6471 supports involvement of PPARα but does not fully establish specificity, as off-target effects cannot be excluded. Finally, it is not fully clear whether effects are exclusively microbiota-driven or could partially reflect the transfer of diet-derived metabolites.

      The authors successfully demonstrate that donor dietary conditioning influences the therapeutic efficacy of FMT in a murine model of ALD. The data convincingly show that vegetable protein-conditioned microbiota is associated with improved liver injury, reduced inflammation, and enhanced intestinal barrier integrity compared with controls or an egg protein-enriched diet. While the proteomic and gene expression data suggest activation of pathways related to fatty acid β-oxidation, these measurements do not directly demonstrate increased metabolic flux. The use of the PPARα antagonist GW6471 provides important functional support for the involvement of this pathway, as inhibition attenuates the protective effects of Veg-FMT. However, this approach primarily establishes pathway dependency rather than directly confirming enhanced β-oxidation activity. The authors may therefore wish to moderate their interpretation or clarify this distinction, particularly given the relatively modest fold changes observed in several targets. The role of caproic acid as a central mediator is plausible but not definitively established. Finally, the link between microbiota composition, metabolic function, and host signaling remains partly correlative. Overall, the study achieves its primary aim at a phenotypic level, but some of the mechanistic claims would benefit from more cautious interpretation or additional validation.

      Likely impact of the work on the field, and the utility of the methods and data to the community:

      The work addresses an important and underexplored question: how donor characteristics influence FMT efficacy. By introducing donor diet as a modifiable variable, the study has potential implications for optimizing microbiota-based therapies. The datasets (microbiome, metabolomics, and proteomics) may also be valuable to the community, as they provide a resource for exploring gut-liver metabolic interactions. The translational impact will, however, depend on validation in human systems and a clearer identification of causal mechanisms.

    3. Reviewer #2 (Public review):

      The manuscript explores a valuable strategy for optimizing Fecal Microbiota Transplantation (FMT) efficacy in alcoholic liver disease through donor dietary intervention. I have identified several critical logical gaps, missing links in the evidence chain, and methodological ambiguities that require detailed explanation and supplementation.

      (1) While the Methods section states that each recipient mouse group consisted of 16 animals, microbiome sequencing was performed on only 4 samples per group. This sample size is insufficient, and the high inter-individual variability observed reduces the statistical power and representativeness of the data. I recommend increasing the sequencing sample size or, at a minimum, explicitly acknowledging the risk of false positives due to the small sample size in the Discussion.

      (2) The layout of Figure 4 should be adjusted. Panel A should be enlarged for better visibility, while Panel B should be reduced in size to balance the figure composition.

      (3) A rationale should be provided for the selection of egg white protein as the animal protein control. Does this adequately represent animal proteins in general? Could the results differ if casein or whey protein were used? The current choice limits the generalizability of the conclusions, and this limitation should be addressed.

      (4) The ALD model was established over 12 weeks, yet the FMT intervention consisted of only 3 administrations with a 1-week observation period. In the context of such a severe liver injury model, a 1-week recovery period appears insufficient to observe genuine fibrosis reversal, which typically requires a longer timeframe. The authors should discuss whether short-term FMT can truly induce structural remodeling or if the observed effects are transient.

      (5) The results rely heavily on PICRUSt2 for functional prediction. As prediction does not equate to factual validation, the authors should exercise caution in their wording within the Discussion. Alternatively, I recommend supplementing the study with shotgun metagenomic sequencing to verify the existence of these pathways rather than relying solely on predictive algorithms.

      (6) Although Egg-FMT was less effective than Veg-FMT, it performed better than the standard FMT or abstinence groups. Why is the effect of egg white protein intermediate? Is this due to rapid digestion resulting in insufficient substrate, or differences in metabolite production? A deeper comparative analysis of the Egg-FMT group is required, rather than treating it merely as a negative control.

      (7) Relying solely on the "inhibitor blocking effect" proves only that Caproic acid's function is dependent on the PPARα pathway, not that it directly acts on PPARα. To claim direct activation, the authors must demonstrate direct binding between Caproic acid and the PPARα protein (e.g., via SPR or MST assays). Alternatively, a luciferase reporter assay driven specifically by PPARα response elements (PPRE) should be conducted. If Caproic acid induces luminescence, it would confirm transcriptional activation of PPARα rather than mere downstream activation.

    4. Author response:

      We thank the Reviewing Editor, Senior Editor, and both reviewers for their constructive evaluation of our manuscript. We are encouraged that the reviewers found the central question, whether donor dietary conditioning modulates FMT efficacy in ALD, compelling and the multi-omics framework a strength. Their critiques converge on a shared theme: the manuscript's mechanistic claims around caproic acid and PPARα signaling currently rest on associative and pathway-level evidence, and would benefit from more direct causal testing and more guarded language. We agree, and we outline below the revisions we plan to undertake.

      Public Reviews:

      Reviewer #1 (Public review):

      While the proteomic and gene expression data suggest activation of pathways related to fatty acid β-oxidation, these measurements do not directly demonstrate increased metabolic flux. The use of the PPARα antagonist GW6471 provides important functional support for the involvement of this pathway; however, this approach primarily establishes pathway dependency rather than directly confirming enhanced β-oxidation activity. The role of caproic acid as a central mediator is plausible but not definitively established. Finally, the link between microbiota composition, metabolic function, and host signaling remains partly correlative.

      We thank the reviewer for this thoughtful assessment. We agree that the GW6471 inhibition experiments primarily support pathway dependency rather than direct activation of PPARα by caproic acid, and we will revise the manuscript accordingly to avoid overstating mechanistic conclusions. However, we would like to clarify that the objective of the current study was not to directly quantify metabolic flux. We agree that metabolic flux should not be used here. We will be modifying this in the text to make it clear that we measured mitochondrial beta oxidation as a response to caproic acid.

      To functionally assess alterations in fatty acid β-oxidation capacity, we performed Seahorse Mito Fuel Flex assays, which demonstrated altered dependency and utilization of fatty acid oxidation pathways in response to caproic acid treatment. We will further clarify this distinction in the revised.

      In addition, we agree that the role of caproic acid as a central mediator and the relationship between microbiota composition, metabolite production, and host signaling remain partly correlative. Therefore, we will moderate the interpretation throughout the manuscript and incorporate additional correlation analyses between microbial taxa, caproic acid levels, and disease-associated metabolic parameters to strengthen the microbiota-metabolite-host association while acknowledging the associative nature of these findings.

      Reviewer #2 (Public review):

      (1) While the Methods section states that each recipient mouse group consisted of 16 animals, microbiome sequencing was performed on only 4 samples per group. This sample size is insufficient, and the high inter-individual variability observed reduces the statistical power and representativeness of the data. I recommend increasing the sequencing sample size or, at a minimum, explicitly acknowledging the risk of false positives due to the small sample size in the Discussion.

      We thank the reviewer for this important comment. We would like to clarify that microbiome sequencing was performed on 6 samples per group and not on 4 samples per group, and we will revise the Methods section to improve clarity regarding the number of biological replicates analyzed. The 4 samples were used only for whole proteome analysis.

      In addition, several previously published murine microbiome studies investigating gut microbial alterations in liver disease and FMT interventions have used comparable sample sizes (typically 5-8 animals per group) for 16S rRNA sequencing analyses [1–3]. Nevertheless, we agree that inter individual variability may influence microbiome analyses, and therefore we will explicitly acknowledge this limitation and the possibility of reduced statistical power in the revised Discussion section. We will also ensure that interpretations derived from microbiome compositional analyses are presented more cautiously.

      (2) The layout of Figure 4 should be adjusted. Panel A should be enlarged for better visibility, while Panel B should be reduced in size to balance the figure composition.

      We thank the reviewer for this suggestion. We will revise the layout of Figure 4 accordingly by enlarging Panel A for improved visibility and reducing the size of Panel B to achieve a more balanced figure composition.

      (3) A rationale should be provided for the selection of egg white protein as the animal protein control. Does this adequately represent animal proteins in general? Could the results differ if casein or whey protein were used? The current choice limits the generalizability of the conclusions, and this limitation should be addressed.

      We thank the reviewer for this important suggestion. In the revised manuscript, we will provide additional rationale for selecting egg albumin as the animal-derived protein source. Egg albumin was chosen because it is a well-characterized protein with high biological value, rapid digestibility, standardized composition, and has also been used in our previous ALD-related dietary intervention studies for experimental consistency [4].

      We agree that egg albumin does not represent all animal protein sources. Due to its rapid digestion and absorption, relatively less substrate may reach the distal gut for microbial fermentation compared with more complex proteins. In contrast, proteins such as casein or whey may generate distinct microbial and metabolite profiles and potentially different host responses.

      Accordingly, we will explicitly acknowledge this limitation in the revised manuscript and clarify that our findings should not be generalized to all animal-derived proteins.

      (4) The ALD model was established over 12 weeks, yet the FMT intervention consisted of only 3 administrations with a 1-week observation period. In the context of such a severe liver injury model, a 1-week recovery period appears insufficient to observe genuine fibrosis reversal, which typically requires a longer timeframe. The authors should discuss whether short-term FMT can truly induce structural remodeling or if the observed effects are transient.

      We thank the reviewer for this important and thoughtful observation. We agree that a one-week post-FMT observation period appears insufficient to conclude complete structural remodeling or durable fibrosis reversal in a chronic 12-week ALD model. Though it should be noted that the results achieved with the one week intervention suggest otherwise in this animal model of ALD. As can be observed from the immunohistochemistry of abstinence and treatment groups, which was further quantified for steatosis and fibrosis, there is a __% and __% reduction respectively in the treatment group. Thus we can safely conclude that in the given animal model, an alternate day FMT for 3 doses can reverse steatosis and fibrosis.

      In the revised manuscript, we will explicitly clarify this distinction.

      (5) The results rely heavily on PICRUSt2 for functional prediction. As prediction does not equate to factual validation, the authors should exercise caution in their wording within the Discussion. Alternatively, I recommend supplementing the study with shotgun metagenomic sequencing to verify the existence of these pathways rather than relying solely on predictive algorithms.

      We thank the reviewer for this important suggestion and agree that PICRUSt2-based analyses represent predictive functional inference rather than direct validation of microbial metabolic activity. We will explicitly acknowledge in the Results and Discussion that PICRUSt2 outputs are inferences rather than measurements, and we will integrate our metabolomics data to show where predicted microbial pathways (fatty acid salvage, β-oxidation related pathways) coincide with measurable metabolite shifts, providing observational support for the predictions.

      We would like to avoid doing metagenomic analysis to substantiate PICRUST2 findings primarily because metagenomic analysis would provide information on the set of genes each species carries, and not the functional state of the resulting pathways. To read out the pathways we would be left with the same two options of PICRUS2 or metabolome analysis. Yes, if we perform transcriptome analysis we can reach to a conclusion on which pathways are active. Which is likely to be similar to the readout we get from the end result of these pathways – the metabolome.

      (6) Although Egg-FMT was less effective than Veg-FMT, it performed better than the standard FMT or abstinence groups. Why is the effect of egg white protein intermediate? Is this due to rapid digestion resulting in insufficient substrate, or differences in metabolite production? A deeper comparative analysis of the Egg-FMT group is required, rather than treating it merely as a negative control.

      We thank the reviewer for this insightful observation. We agree that the Egg-FMT group demonstrated an intermediate phenotype and should not be interpreted merely as a negative control. We will modify the text in the manuscript to mention the outcomes with egg protein, wherever it missing. In the revised manuscript, we will modify the language accordingly and expand the Discussion.

      (7) “Relying solely on the ‘inhibitor blocking effect’ proves only that Caproic acid's function is dependent on the PPARα pathway, not that it directly acts on PPARα. To claim direct activation, the authors must demonstrate direct binding between Caproic acid and the PPARα protein (e.g., via SPR or MST assays). Alternatively, a luciferase reporter assay driven specifically by PPARα response elements (PPRE) should be conducted. If Caproic acid induces luminescence, it would confirm transcriptional activation of PPARα rather than mere downstream activation.”

      We thank the reviewer for this important and insightful suggestion. We agree that the current inhibitor-based experiments primarily support the involvement of the PPARα pathway and do not definitively establish direct interaction or transcriptional activation of PPARα by caproic acid. Accordingly, in the revised manuscript, we will moderate our interpretation and avoid statements implying direct activation based solely on the current data.

      We also agree that direct validation experiments such as SPR/MST-based binding assays or PPREdriven luciferase reporter assays would substantially strengthen the mechanistic conclusions. We are currently planning additional experiments to further evaluate the direct action of caproic acid on PPARα and will incorporate these analyses in future revisions and follow-up studies.

      With the pending experiments we request the Editors to kindly provide us a time of about 2 months to send back the revised manuscript.

      References:

      (1) Mitsinikos, F. T., Chac, D., Schillingford, N. & DePaolo, R. W. Modifying macronutrients is superior to microbiome transplantation in treating nonalcoholic fatty liver disease. Gut Microbes 12, 1792256.

      (2) Ferrere, G. et al. Fecal microbiota manipulation prevents dysbiosis and alcohol-induced liver injury in mice. J. Hepatol. 66, 806–815 (2017).

      (3) Zhang, Y., Li, P., Chen, B. & Zheng, R. Therapeutic effects of fecal microbial transplantation on alcoholic liver injury in rat models. Clin. Res. Hepatol. Gastroenterol. 48, 102478 (2024).

      (4) Mittal, A. et al. Protein supplementation differentially alters gut microbiota and associated liver injury recovery in mouse model of alcohol-related liver disease. Clin. Nutr. 46, 96–106 (2025).

    1. eLife Assessment

      This Review Article provides a compendium of advice for MD-PhD students to consider when deciding which, if any, clinical field they will select for residency training. It is grounded in published data and effectively considers factors including the potential for clinical disciplines to sustain research integration, provide mentorship, meet lifestyle expectations, and foster a long-term career as a research-focused physician-scientist.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The review comments were minor and constructive, and the authors have been very responsive.]

      Summary:

      This brief piece by Swartz and colleagues outlines the complexities surrounding the choice of clinical specialty for physician-scientists. It is, in general, clear and well-written, and it will be useful to research-oriented medical students choosing a path and to the mentors who are guiding them.

      Strengths:

      The writing is clear. The points made are not profound, but they are important and will be of use to the intended audience.

    3. Reviewer #2 (Public review):

      Summary:

      This article is a useful compendium of advice for MD/PhD students (and research-focused MD students) to consider when it is time to decide on a clinical field for residency training. The authors are a distinguished group of physician-scientists and program directors who are drawing on published data and their own experience as mentors to provide advice and resources to students about to make what can be a career-defining choice. It makes an effective argument for considering important differences between clinical fields in their ability to sustain research integration, provide mentorship, meet lifestyle expectations, and foster a long-term career as a research-focused physician-scientist.

      Strengths:

      (1) A lot has been written about physician-scientists as an endangered species. Given the important role that physician-scientists can play if they engage in research that is informed by experience in patient care, not nearly enough has been written about the choices that students make during training that can keep them on track or throw them off.

      (2) The article provides not only general advice, but specific information in the 2 tables that can help trainees to weigh their priorities and consider their options.

      (3) Among the best advice is to weigh clinical demands, maintenance of procedural skills, recognition of the impact of research time on salary, and the impact of high salaries on the tension between research effort and clinical effort in clinical departments, which is where most physician-scientists in academia are employed.

    4. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This Review Article provides a compendium of advice for MD-PhD students to consider when deciding which, if any, clinical field they will select for residency training. It is grounded in published data and effectively considers factors including the potential for clinical disciplines to sustain research integration, provide mentorship, meet lifestyle expectations, and foster a long-term career as a research-focused physician-scientist.

      We thank the editors for this positive assessment. We have revised the manuscript to sharpen the decision-making framework and make the advice more actionable, as detailed below.

      Public reviews:

      Reviewer #1 (Public review):

      This brief piece by Swartz and colleagues outlines the complexities surrounding the choice of clinical specialty for physician-scientists. It is, in general, clear and well-written, and it will be useful to research-oriented medical students choosing a path and to the mentors who are guiding them.

      We thank Reviewer #1 for these supportive comments.

      Strengths:

      The writing is clear. The points made are not profound, but they are important and will be of use to the intended audience.

      We appreciate this assessment and agree that the value of this piece lies in consolidating practical, experience-based guidance in one resource for trainees and mentors.

      Weaknesses:

      I have only minor suggestions for improvement. There are some areas of redundancy where the article could be tightened up by consolidating.

      We agree and have made substantial revisions to reduce redundancy throughout the manuscript. Specifically, we have streamlined the Introduction by removing a lengthy paragraph that previewed the article’s contents in a way that overlapped with later sections. The revised Introduction now concisely introduces five core decision-making factors (alignment between clinical and research interests, the structure of clinical work, availability of mentorship and research pathways, institutional culture, and financial sustainability) and directs readers to the new Table 1 and Figure 1 as organizing frameworks.

      We have also consolidated overlapping discussions of research alignment, protected time, and clinical demands. The sections on clinical workload and protected research time have been tightened to minimize repeated points about specialty-specific demands, and we now cross-reference Table 1 rather than re-stating the same considerations in multiple places. Prose has been revised throughout for concision and clarity.

      Reviewer #2 (Public review):

      This article is a useful compendium of advice for MD/PhD students (and research-focused MD students) to consider when it is time to decide on a clinical field for residency training. The authors are a distinguished group of physician-scientists and program directors who are drawing on published data and their own experience as mentors to provide advice and resources to students about to make what can be a career-defining choice.

      We thank Reviewer #2 for this generous and thoughtful evaluation.

      Strengths:

      (1) A lot has been written about physician-scientists as an endangered species. Given the important role that physician-scientists can play if they engage in research that is informed by experience in patient care, not nearly enough has been written about the choices that students make during training that can keep them on track or throw them off.

      We share this perspective and appreciate the reviewer’s recognition of this gap in the literature. Our goal was precisely to address the decision-making process itself, which is often under-discussed in formal publications despite being a frequent topic in mentoring conversations.

      (2) The article provides not only general advice, but specific information in the 2 tables that can help trainees to weigh their priorities and consider their options.

      Thank you. We have further strengthened the tabular content in this revision by adding a new Table 1 (described below) and renumbering the original tables accordingly.

      (3) Among the best advice is to weigh clinical demands, maintenance of procedural skills, recognition of the impact of research time on salary, and the impact of high salaries on the tension between research effort and clinical effort in clinical departments, which is where most physician-scientists in academia are employed.

      We appreciate this feedback and have made this advice more prominent by incorporating these factors explicitly into the new Table 1 framework and by adding a more direct statement in the text about how specialty-specific structural differences affect the ease of sustaining a research career.

      Area for Improvement

      (1) Some of the most useful pieces of advice are scattered through the text when they might be more impactful if focused. For example, what are the 4 or 5 most essential factors that someone in an MD/PhD or an MD program should weigh when they are deciding between clinical disciplines? There are also published data on the experience of past graduates in achieving a research-focused career in each clinical discipline. How should that data be applied by trainees? What are the factors that should be weighed in deciding where to work as a research-focused physician once training has been completed?

      We agree that the most critical decision-making factors were insufficiently distilled. To address this, we have made two major changes.

      First, we have added a new Table 1: “Key Decision Factors for Physician-Scientists Choosing a Clinical Specialty.” This table identifies five essential factors—(i) Alignment of Clinical Specialty with Research Focus, (ii) Structure of Clinical Work and Its Impact on Research Time, (iii) Availability of Structured Research Pathways and Mentorship, (iv) Institutional Environment and Culture, and (v) Financial Model and Long-Term Sustainability—and for each provides columns describing Why It Matters, What to Look For, and Potential Red Flags. This table is designed to be directly actionable for trainees comparing specialties and programs.

      Second, the Introduction now explicitly names these five factors as the organizing framework for the article and directs readers to Table 1 as a synthesis tool. The prior introductory paragraph, which previewed the article’s structure in a general way, has been replaced with a more focused synthesis.

      Regarding the published outcomes data: we have retained the specialty-specific outcomes data in what is now Table 2 (previously Table 1) and have added context in the text about how trainees should interpret these data—specifically, that published graduation and career outcome data provide a useful baseline but should be weighed alongside institutional context, since the same specialty can look very different at different institutions.

      Regarding factors for evaluating post-training positions: we have added a new paragraph in the section on Protected Research Time that addresses how trainees can evaluate the institutional environment at the faculty level, including specific metrics trainees can examine (see response to Points #4 and #5 below).

      (2) Some clinical fields at academic institutions have proved to be much more hospitable to careers as research-focused physicians than others. Published data highlight the challenges. I believe the authors have tried very hard to present a balanced perspective, but in the process, they have, I believe, missed an opportunity to guide trainees and make them aware of what they should look for to avoid making a decision that may prove incompatible with their long-term goals.

      We appreciate this candid observation and agree that our prior draft was overly cautious in this regard. In the revision, we have added a more explicit statement acknowledging that while successful physician-scientists exist across all specialties, the structural ease of sustaining a research-intensive career varies substantially by field. Specifically, we have added the following language to the section on Balancing Clinical and Research Responsibilities:

      “In practice, specialties with high procedural demands and unpredictable clinical schedules are often more challenging environments for sustaining research-intensive careers unless strong institutional protections are in place. While successful physician-scientists exist across all specialties, the structural ease of sustaining a research-intensive career varies substantially by field, and trainees should approach certain specialties with a clear understanding of the additional negotiation and institutional support required.”

      Additionally, the new Table 1 includes a “Potential Red Flags” column that gives trainees concrete warning signs to watch for when evaluating specialties and programs (e.g., departments primarily driven by clinical revenue with limited research infrastructure; absence of physician-scientists in leadership roles; inability to reduce clinical effort).

      (3) Where will be the jobs for physician-scientists who have an MD ± PhD and want to do research and discovery? How many openings will there be for physician-scientists in academia 5–10 years from now? In industry? How are recent events in Washington affecting the continuation of those jobs?

      after careful consideration, we believe that a detailed treatment of labor market projections, industry trends, and the effects of federal funding policy on the physician-scientist workforce falls outside the scope of this article, which is focused on the decision-making process for specialty selection. We note that the workforce question has been the subject of several recent analyses and commentaries (e.g., Milewicz et al., ASCI/AAP/APSA workforce reports) and feel that a thorough treatment would warrant a dedicated manuscript. We have not added this content but acknowledge the reviewer’s point in our thinking about future work.

      (4) Should one of the “smart choices” in the article’s title be where you do the residency, and not just which residency you do? How important is it to be at a successful, research-intensive medical center/university, both during and after residency and fellowship training? If being in an institution where there are numerous very successful physician-scientists and scientists improves the likelihood of being able to sustain a physician-scientist career, how should graduating students improve their chances of being at one of those institutions?

      This is an excellent point, and we agree that institutional environment is at least as important as specialty choice itself. We have made several changes to address this.

      In the Introduction, we have added the statement: “Importantly, the ability to sustain a physician-scientist career is often determined as much by the institutional environment and training program as by the specialty itself.” This signals early in the manuscript that “where” is as critical as “which.”

      In the new Table 1, we have included a row on “Institutional Environment and Culture” as one of the five key decision factors, with the explicit note that institutional commitment is often more determinative than specialty alone in enabling long-term success as a physician-scientist.

      We have also added a dedicated paragraph advising trainees to assess the broader institutional environment by examining: (i) the number of R01-funded investigators within the department, (ii) the presence of institutional training grants (e.g., T32 programs), and (iii) the track record of trainees transitioning from mentored (K) awards to independent (R) funding. We direct trainees to publicly available resources such as NIH RePORTER and the Blue Ridge Institute for Medical Research rankings.

      Finally, we have added a concluding sentence to the protected time section: “Taken together, these factors reinforce that institutional environment and departmental culture are often as determinative as specialty choice itself in shaping a sustainable physician-scientist career.”

      (5) In every clinical discipline, there are departments that value physician-scientists more than other departments and invest accordingly. What advice would the authors give to help graduating students identify those departments?

      This point is closely related to Point #4, and we have addressed it through the same set of revisions. The new paragraph on evaluating institutional environments provides concrete, actionable guidance for trainees on how to assess departmental commitment to physician-scientists, including specific metrics (R01 density, T32 presence, K-to-R transition rates) and publicly accessible tools (NIH RePORTER, Blue Ridge Institute rankings).

      The new Table 1 “Potential Red Flags” column highlights warning signs that a department may not be supportive of physician-scientist careers, including: departments primarily driven by clinical revenue (RVUs) with limited research infrastructure; lack of protected time enforcement; minimal NIH funding; and absence of physician-scientists in leadership roles.

      We have also expanded the existing discussion in the section on mentorship and residency selection, where we already noted the value of identifying departments with T32 grants and active physician-scientist mentors. The revised text now more explicitly connects these markers to the departmental evaluation process.

      We believe these revisions substantially strengthen the manuscript and are grateful for the reviewers’ constructive feedback.

    1. eLife Assessment

      This valuable study presents a tool that uses brain anatomy to predict the layout and size of early visual maps, and it is strengthened by the use of a large and diverse collection of scans to examine differences across people and groups. The evidence is solid for the general usefulness of the approach, but incomplete for some of the broader claims about prediction accuracy and use across data sets, particularly for estimates of map size and for showing that the model improves on repeated functional measurements. This paper is likely to be of significant interest to visual perception researchers, especially those who use fMRI.

    2. Reviewer #1 (Public review):

      Summary:

      This paper describes a deep learning toolbox that can be used to automatically estimate functional topographic maps directly from human brain anatomy. Building on the first author's earlier work, which demonstrated the feasibility of using deep learning for this purpose, the new version of the toolbox now requires only a single anatomical MRI scan to generate predictions, eliminating the need for a myelin scan. This represents a significant practical improvement.

      Strengths:

      Having such a toolbox is very useful, since manual annotation and delineation of functional visual field maps is a laborious process that also requires deep expertise. The toolbox can save researchers substantial amounts of time and money, and also allows less experienced researchers to now perform this type of analysis. Notably, for certain participants and patients, the time they are able to reside in the scanner might be limited. Being able to focus on the primary research question, rather than the essential yet basic topographic information, could boost data quality and evaluation and might limit the number of participants that need to be included.

      Weaknesses:

      In the paper, the authors compare the performance of their new version to two previous approaches. Figure 2b shows that the new toolbox performs similarly to the previous deep-learning-based toolbox, but requires only an anatomical scan, which is a significant improvement. They also compare it to an older method that uses an atlas without requiring deep learning. For eccentricity and pRF size predictions, both deep-learning methods perform better than the older approach. For polar angle, a critical parameter for delineating visual field maps, the gain is substantially less. Moreover, the comparison to the atlas method (Benson2014) is not entirely fair, as, to our knowledge, there is also a more advanced atlas version that uses Bayesian fitting methods and already performs better than the old method. To better understand the gain of using deep learning, it would be beneficial if the authors also made the comparison to this more recent atlas-based approach. Moreover, it would be useful to know the correlations for the representative participant. Some examples of relatively "bad" maps would also be useful to have (and could be provided as supplementary information).

      Figure 2b shows that the toolbox is quite good at estimating eccentricity and polar angle parameters, but less good at estimating the population receptive field (pRF) size. I will return to this latter point.

      An interesting feature is that while the toolbox is trained on a specific data set (HCP), it can, "out-of-the-box", be applied to different existing data sets, without the need to retrain the model. This is quite important for the general utility of the method. The results for this are shown in Figure 3. Again, in panel b, it can be seen that the toolbox does a good job at estimating eccentricity and polar angle values, but performs rather poorly for pRF size: the deepRetinotopy toolbox has a strong tendency to only estimate very small pRFs, particularly when applying it across different datasets. For this reason, at the moment, these estimates appear hardly useful. It would be very helpful for readers if the authors could clarify or elaborate on this point, particularly regarding the limitations of pRF size predictions. They explain that this could be due to the use of different types of stimuli, but even within the same (HCP) dataset, the predictions primarily suggest tiny pRFs, even though the training dataset also contains larger ones (which can be better seen in supplementary Figure 4). Showing the predictions for higher-order brain areas, which have larger pRFs on average, could serve a similar evaluation purpose. Presumably, the underlying reasons are complex and could relate to the use of different stimuli, different analysis toolboxes, and how the deep learning model is currently being trained. Possibly, the abundance of small pRFs at lower eccentricity in the training set (which is usually the case in any empirical analysis) has given the model a very strong bias toward predicting small pRFs.

      There would be various ways to verify which of these components is critical. For example, the model could be trained only on the bar stimuli of the HCP dataset, or the pRFs for all stimuli and datasets could be estimated using the same software tool. The latter seems important. For example, Supplementary Figure 4 indicates a high correlation between the Stanford and NYU cohorts that have used the same stimulus and analysis package, despite having different resolutions and scanners. Further investigation into the underlying reasons for these discrepancies would strengthen the paper. It would also provide valuable guidance for users of the toolbox on which toolbox predictions to trust and which not, as well as how well the model generalizes to other stimulus types, scanners, and image resolutions.

      An aspect that is not directly apparent from the title, abstract, and introduction is that the deepRetinotopy toolbox does not by itself produce estimates of visual area labels or boundaries. It predicts only polar angle and eccentricity values. To predict labels and boundaries, the authors combine the toolbox with an atlas (the aforementioned Bayesian atlas). For visual areas V1 - V3, it does a very good job, in that the predictions are as good as the empirical ones. Notably, the authors indicate that the predictions for V2 and, in particular, V3 are worse than for V1, but Figure 4 clearly shows that predictions are as good as the empirical ones. More cannot be expected from a model that is trained on such empirical data.

      Irrespective of the limitations with respect to predicting pRF size, the toolbox opens up functionally oriented analyses of very large cohorts of healthy participants, of which only anatomical data is available. The authors present an example of this by confirming the existence of differences in horizontal and vertical asymmetries in the field maps of the visual cortex of children and adults. While Figure 5 confirms the existence of differences, the analysis could be expanded to provide deeper insights, such as normalized developmental trajectories for both asymmetries, given the size of the dataset. This would better highlight the true power of their approach.

      While the authors address limitations with respect to studying experience-dependent atypical functional organization, they do not address how the deepRetinotopy toolbox would handle (acquired) brain lesions. Addressing this, even if only speculative, would be welcome. Another welcome addition would be to see the predictions for additional brain areas, even if those would (presumably) be worse at present. Such information would nevertheless be essential for users considering applying this toolbox. Moreover, this could be a valuable resource serving as a benchmark for future iterations of either deepRetinotopy or other approaches.

    3. Reviewer #2 (Public review):

      Summary:

      The authors introduce the deepRetinotopy toolbox, a deep learning-based software package that allows for user-friendly automatic delineation of visual areas based on anatomical (T1-weighted) MRI scans. This is an important evolution over a prior published version of the software, which required myelin maps additionally. The new version will hence allow many more users to obtain high-fidelity field-map delineations based on existing data or using standard protocols, providing a huge advance to the field. The authors exploited this strength and mapped visual field maps (for areas V1-V3) in 11060 human MRI scans covering different age classes to quantify changes of retinotopic organization across age groups, showing that previously functionally identified imbalances of early visual cortex field maps can now be identified on the basis of anatomical scans alone.

      Strengths:

      Overall, this is a tremendously important methodological contribution of primarily high practical and applied value. It allows functional imaging labs to delineate human cortical visual field maps with confirmed high fidelity using anatomical T1-weighted scans only. This will save expensive functional imaging and time-consuming analyses that were previously required to achieve nearly the same result and far better results than prior model-based approaches offered.

      Also, the quantification of the accumulated very large dataset is meticulous and provides impressively detailed results of the field map changes for areas V1-V3 as a function of age.

      Weaknesses:

      (1) The weak point of the contribution is the choice to limit anatomical quality assessments and error quantifications to just three early regions, V1-V3, even though the deepRetinotopy toolbox can delineate over 20 regions (including parietal, ventral, and lateral regions, such as IPS0-5, hV4, VO1-2, V3A, PHC1-2, LO1-2, and TO1-2).

      (2) The limit is fine for their large-scale application of the toolbox to age groups, as here, a clear hypothesis on early cortex variability was tested.

      (3) However, the introduction of the toolbox itself warrants quality assessments and comparisons to prior models and ground truth beyond V1-V3, just like the authors did in their prior publication of the predecessor model.

      (4) This is important as the vast majority of applications of this toolbox will likely go beyond V1-V3 to delineate dorsal, ventral, and lateral regions.

      (5) For the present paper, this will require only 1 or 2 additional figures, or extending their present figures 2 and 4 along the lines of their previous figure 7 (Ribeiro et al 2021), which included error measures for high-level regions. Ideally, you provide sub-graphs separately for early visual, dorsal, ventral, and lateral regions.

      (6) Going beyond V1-V3 is important for several reasons: first, future studies applying the software beyond V3 will need quantification for reassurance and justification. Second, for the sake of transparency, even if results are noisy or on par with prior models. Third, as a benchmark or reference point for future approaches.

    4. Reviewer #3 (Public review):

      Summary:

      This valuable study presents a tool that uses brain anatomy to predict the layout and size of early visual maps, and it is strengthened by testing across a large and diverse collection of scans. The work will be useful for researchers who want to estimate likely visual map layout from standard anatomical scans and to relate anatomical differences to differences in visual organization across groups. The evidence is solid for the general usefulness of the approach, but incomplete for broader claims about prediction accuracy and use across datasets, particularly for estimates of map size and for showing that the model improves on repeated functional measurements.

      Strengths:

      The paper addresses a useful and important problem: estimating early visual map organization from anatomical measurements alone. Tools that predict these types of functional data from anatomical measurements were introduced more than a decade ago by Benson and colleagues, and the present authors have significantly extended that work. That is a real strength of the manuscript, because there is genuine value in having a practical tool that can estimate likely visual organization from standard anatomical scans.

      Another major strength is the rigorous cross-dataset benchmarking and the accumulation of multiple datasets. The authors assembled a large and diverse set of scans and assessed model performance across different scanners, field strengths, and visual stimuli, which gives the reader a much better sense of how broadly the approach may apply. The retrospective analysis of more than 11,000 scans is especially notable and creates an unusual opportunity to ask how anatomical variation may relate to population differences in visual organization.

      I also think the paper does a good job of showing why such a tool could matter in practice. A complete tool could be used in several ways. First, it could help users identify the locations of activations measured in other experiments with respect to the typical V1-V3 maps. Second, maps measured from an individual subject or patient could be compared with the predictions from the tool to ask whether they differ meaningfully from a standard anatomy-based map. Third, the tool can be used, as the authors have done here, to examine differences in anatomy across populations and interpret these differences with respect to retinotopic maps. Of these uses, the first already seems well supported by the current presentation.

      Weaknesses:

      (1) Quantification of the Analysis

      My main concern is that the analysis relies heavily on global summary measures such as correlation and Dice score. Those measures are useful, but the paper would be more informative if it also quantified boundary differences in millimeters, especially for comparisons such as the V1/V2 boundary in Figure 2. That kind of analysis would help readers understand how large the errors are in physically meaningful terms.

      (2) Model fitting methods

      I also think the discussion of prediction failures for pRF size should be more explicit. The mismatch is likely influenced by the fact that the training data and several evaluation datasets were fit with different models and different analysis software. In particular, the network was trained on non-linear size estimates from the HCP data, while the comparison datasets were derived using other packages and, in some cases, different model assumptions. That likely contributes to the spread in Figure 3b and should be discussed more directly. It is important to discuss that the pRF parameters were derived using different software tools.

      - HCP dataset (training data): analyzePRF (Compressive Spatial Summation model)

      - NYU dataset: vistasoft

      - Stanford dataset: vistasoft

      - New Zealand dataset: SamSrf

      - CHN dataset: Custom MATLAB software

      (3) Clarifying Model Accuracy

      If deepRetinotopy generates a true "noise-removed" representation of functional mapping based on anatomy, then fitting it to one fMRI measurement should predict a second, independent fMRI run better than the noisy data from the first run does.

      The authors possess the exact data for this test. For the HCP dataset, the empirical fMRI data were explicitly separated into two halves: "fit 2" (the first half of the fMRI runs) and "fit 3" (the second half). They correlated these two halves to establish a "noise ceiling," the maximum possible reliability of the data. Looking at their results in Figure 2b, the correlation of the deepRetinotopy predictions falls below this noise ceiling. This means that the noisy functional Half 1 actually predicts functional Half 2 better than the anatomical model does.

      The authors should state this explicitly. A side-by-side plot of Half 1 predicting Half 2 versus deepRetinotopy predicting Half 2 would show that the anatomical model regularizes map location well, but misses reliable subject-specific variation that anatomy alone cannot capture.

      (4) The Hemodynamic Response Function

      The assumptions used to generate the original empirical maps are permanently baked into the deep learning model. However, the authors explicitly mention the hemodynamic response function (HRF) only once, noting in the Methods that the modeled time series was "convolved with a canonical hemodynamic response function."

      Beyond this single mention, there is no direct discussion of how the assumption of a single canonical HRF across all 161 HCP training subjects might have systematically impacted or biased the network's predictions. The authors address cross-dataset differences broadly under the umbrella of "experimental design" and "fMRI preprocessing pipeline" biases, but the HRF is a core biological property that mediates the connection between the anatomy and the data. The authors should explicitly discuss how this canonical assumption limits or biases the resulting deepRetinotopy network.

      (5) Scoping the Input Data and Normative Use

      The authors use FreeSurfer to generate a mean curvature map for the entire midthickness cortical surface. This full-hemisphere curvature map is resampled to a standard template surface space (32k_fs_LR), acting as the data frame that feeds input features into the neural network. However, while the network receives the full geometric structure of the hemisphere, it is explicitly trained to predict retinotopic parameters only within a restricted posterior ROI, based on the Wang et al. atlas and containing roughly 3,200 vertices per hemisphere.

      A useful experiment to try, and perhaps the authors have already considered this, would be to restrict the input features exclusively to the posterior vertices. Including all anterior vertices may make it harder for the network to fit the localized visual data. A brief commentary on why the full hemisphere was retained as input could be highly informative for researchers adapting this geometric deep learning pipeline.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      In the paper, the authors compare the performance of their new version to two previous approaches. Figure 2b shows that the new toolbox performs similarly to the previous deep-learning-based toolbox, but requires only an anatomical scan, which is a significant improvement. They also compare it to an older method that uses an atlas without requiring deep learning. For eccentricity and pRF size predictions, both deep-learning methods perform better than the older approach. For polar angle, a critical parameter for delineating visual field maps, the gain is substantially less. Moreover, the comparison to the atlas method (Benson2014) is not entirely fair, as, to our knowledge, there is also a more advanced atlas version that uses Bayesian fitting methods and already performs better than the old method. To better understand the gain of using deep learning, it would be beneficial if the authors also made the comparison to this more recent atlas-based approach. Moreover, it would be useful to know the correlations for the representative participant. Some examples of relatively "bad" maps would also be useful to have (and could be provided as supplementary information).

      We thank the reviewer for their constructive feedback. We plan to expand our benchmarking section to include the Bayesian model comparison. Note, however, that the additional accuracy gain afforded with the Bayesian model of retinotopy (Benson and Winawer, 2018) results from combining anatomical data with retinotopic maps estimated with a few minutes of functional data. The Bayesian model of retinotopy without such functional data is equivalent to Benson14. We plan to report the correlations (between predicted and empirical maps) for the representative participant shown in Figure 2 and include an additional supplementary figure showing retinotopic map predictions for a participant whose predictions deviate the most from empirical maps, as suggested by the reviewer.

      Figure 2b shows that the toolbox is quite good at estimating eccentricity and polar angle parameters, but less good at estimating the population receptive field (pRF) size. I will return to this latter point.

      An interesting feature is that while the toolbox is trained on a specific data set (HCP), it can, "out-of-the-box", be applied to different existing data sets, without the need to retrain the model. This is quite important for the general utility of the method. The results for this are shown in Figure 3. Again, in panel b, it can be seen that the toolbox does a good job at estimating eccentricity and polar angle values, but performs rather poorly for pRF size: the deepRetinotopy toolbox has a strong tendency to only estimate very small pRFs, particularly when applying it across different datasets. For this reason, at the moment, these estimates appear hardly useful. It would be very helpful for readers if the authors could clarify or elaborate on this point, particularly regarding the limitations of pRF size predictions. They explain that this could be due to the use of different types of stimuli, but even within the same (HCP) dataset, the predictions primarily suggest tiny pRFs, even though the training dataset also contains larger ones (which can be better seen in supplementary Figure 4). Showing the predictions for higher-order brain areas, which have larger pRFs on average, could serve a similar evaluation purpose. Presumably, the underlying reasons are complex and could relate to the use of different stimuli, different analysis toolboxes, and how the deep learning model is currently being trained. Possibly, the abundance of small pRFs at lower eccentricity in the training set (which is usually the case in any empirical analysis) has given the model a very strong bias toward predicting small pRFs.

      There would be various ways to verify which of these components is critical. For example, the model could be trained only on the bar stimuli of the HCP dataset, or the pRFs for all stimuli and datasets could be estimated using the same software tool. The latter seems important. For example, Supplementary Figure 4 indicates a high correlation between the Stanford and NYU cohorts that have used the same stimulus and analysis package, despite having different resolutions and scanners. Further investigation into the underlying reasons for these discrepancies would strengthen the paper. It would also provide valuable guidance for users of the toolbox on which toolbox predictions to trust and which not, as well as how well the model generalizes to other stimulus types, scanners, and image resolutions.

      We will expand our discussion of the limitations of pRF size prediction, highlighting that differences in visual stimuli, analysis toolboxes used to estimate pRF parameters from empirical data, and the current training of deepRetinotopy affect prediction accuracy. As the reviewer pointed out, the underlying reasons are complex, and it is difficult to isolate all the potential contributing factors. However, in addition to our expanded discussion, we also intend to present results from additional experiments that assess the impact of different loss functions on the range of predicted pRF sizes (to explain how training may partly account for the differences observed in the HCP dataset). We will also perform pRF fitting on at least one dataset using the same software/encoding model as in the HCP dataset (the training data) to illustrate that the lower performance in pRF size prediction in out-of-distribution datasets is also partly explained by differences in how the empirical maps were obtained.

      An aspect that is not directly apparent from the title, abstract, and introduction is that the deepRetinotopy toolbox does not by itself produce estimates of visual area labels or boundaries. It predicts only polar angle and eccentricity values. To predict labels and boundaries, the authors combine the toolbox with an atlas (the aforementioned Bayesian atlas). For visual areas V1 - V3, it does a very good job, in that the predictions are as good as the empirical ones. Notably, the authors indicate that the predictions for V2 and, in particular, V3 are worse than for V1, but Figure 4 clearly shows that predictions are as good as the empirical ones. More cannot be expected from a model that is trained on such empirical data.

      We will edit the introduction and abstract to make it clearer that the deepRetinotopy toolbox does not yet produce estimates of visual boundaries on its own.

      Irrespective of the limitations with respect to predicting pRF size, the toolbox opens up functionally oriented analyses of very large cohorts of healthy participants, of which only anatomical data is available. The authors present an example of this by confirming the existence of differences in horizontal and vertical asymmetries in the field maps of the visual cortex of children and adults. While Figure 5 confirms the existence of differences, the analysis could be expanded to provide deeper insights, such as normalized developmental trajectories for both asymmetries, given the size of the dataset. This would better highlight the true power of their approach.

      Although providing insights into developmental trajectories for horizontal and vertical asymmetries is beyond the scope of the current work, as it would require aggregating datasets such that individuals’ age span a larger range (ABCD dataset only contains individuals between 9-11 years old and the HCP Young Adult dataset between 22-36 years old), we plan to provide some complementary analyses (differences across ages and sex within the ABCD dataset).

      While the authors address limitations with respect to studying experience-dependent atypical functional organization, they do not address how the deepRetinotopy toolbox would handle (acquired) brain lesions. Addressing this, even if only speculative, would be welcome. Another welcome addition would be to see the predictions for additional brain areas, even if those would (presumably) be worse at present. Such information would nevertheless be essential for users considering applying this toolbox. Moreover, this could be a valuable resource serving as a benchmark for future iterations of either deepRetinotopy or other approaches.

      We plan to expand and report performance evaluation across other visual areas (using Wang atlas’ parcels) to serve as a benchmarking resource. Moreover, we will expand our discussion on how deepRetinotopy would handle brain lesions.

      Reviewer #2 (Public review):

      (1) The weak point of the contribution is the choice to limit anatomical quality assessments and error quantifications to just three early regions, V1-V3, even though the deepRetinotopy toolbox can delineate over 20 regions (including parietal, ventral, and lateral regions, such as IPS0-5, hV4, VO1-2, V3A, PHC1-2, LO1-2, and TO1-2).

      (2) The limit is fine for their large-scale application of the toolbox to age groups, as here, a clear hypothesis on early cortex variability was tested.

      (3) However, the introduction of the toolbox itself warrants quality assessments and comparisons to prior models and ground truth beyond V1-V3, just like the authors did in their prior publication of the predecessor model.

      (4) This is important as the vast majority of applications of this toolbox will likely go beyond V1-V3 to delineate dorsal, ventral, and lateral regions.

      (5) For the present paper, this will require only 1 or 2 additional figures, or extending their present figures 2 and 4 along the lines of their previous figure 7 (Ribeiro et al 2021), which included error measures for high-level regions. Ideally, you provide sub-graphs separately for early visual, dorsal, ventral, and lateral regions.

      (6) Going beyond V1-V3 is important for several reasons: first, future studies applying the software beyond V3 will need quantification for reassurance and justification. Second, for the sake of transparency, even if results are noisy or on par with prior models. Third, as a benchmark or reference point for future approaches.

      We thank the reviewer for their constructive feedback, and we agree that expanding our performance assessment beyond V1-3 would be a valuable benchmarking resource. Thus, we plan to evaluate retinotopic map prediction accuracy across visual areas defined by the Wang atlas’ parcels, expanding on the results reported in Figure 2, and provide it as a supplementary figure. However, performance estimation ultimately depends on the quality of the dataset used for evaluation. The empirical maps, although treated as ground truth, may themselves misrepresent the underlying retinotopic organization. As a matter of fact, the quality of the empirical data (HCP dataset and others) is indeed lowest in some of the higher-order visual areas.

      It may be unclear from the text that the deepRetinotopy toolbox does not yet produce estimates of visual boundaries on its own. Accordingly, we illustrate how deepRetinotopy toolbox’s predictions can be combined with another tool [the Ba yesian model of retinotopy from Benson and Winawer (2018)] to obtain visual area boundaries automatically. We will edit the introduction and abstract to make it clearer. Given the availability of empirical labels (currently only for V1-3) and the segmentation tool (which was only assessed for V1-3), we cannot expand Figure 4 to other visual areas as suggested.

      Reviewer #3 (Public review):

      Quantification of the Analysis: My main concern is that the analysis relies heavily on global summary measures such as correlation and Dice score. Those measures are useful, but the paper would be more informative if it also quantified boundary differences in millimeters, especially for comparisons such as the V1/V2 boundary in Figure 2. That kind of analysis would help readers understand how large the errors are in physically meaningful terms.

      We thank the reviewer for their constructive feedback. Following the reviewer’s suggestion, we plan to expand our segmentation evaluation to quantify the extent to which boundary predictions from deepRetinotopy’s maps deviate from those from empirical maps, in millimetres.

      Model fitting methods: I also think the discussion of prediction failures for pRF size should be more explicit. The mismatch is likely influenced by the fact that the training data and several evaluation datasets were fit with different models and different analysis software. In particular, the network was trained on non-linear size estimates from the HCP data, while the comparison datasets were derived using other packages and, in some cases, different model assumptions. That likely contributes to the spread in Figure 3b and should be discussed more directly. It is important to discuss that the pRF parameters were derived using different software tools.

      We will expand our discussion of the limitations of pRF size prediction, highlighting that differences in visual stimuli, different encoding models for estimating pRF parameters from empirical data, and the current training of deepRetinotopy affect prediction accuracy. In addition to our expanded discussion, we intend to also present results from additional experiments that assess the impact of those factors on pRF size prediction performance.

      Clarifying Model Accuracy: If deepRetinotopy generates a true "noise-removed" representation of functional mapping based on anatomy, then fitting it to one fMRI measurement should predict a second, independent fMRI run better than the noisy data from the first run does.

      The authors possess the exact data for this test. For the HCP dataset, the empirical fMRI data were explicitly separated into two halves: "fit 2" (the first half of the fMRI runs) and "fit 3" (the second half). They correlated these two halves to establish a "noise ceiling," the maximum possible reliability of the data. Looking at their results in Figure 2b, the correlation of the deepRetinotopy predictions falls below this noise ceiling. This means that the noisy functional Half 1 actually predicts functional Half 2 better than the anatomical model does.

      The authors should state this explicitly. A side-by-side plot of Half 1 predicting Half 2 versus deepRetinotopy predicting Half 2 would show that the anatomical model regularizes map location well, but misses reliable subject-specific variation that anatomy alone cannot capture.

      We will expand our benchmarking session to make these comparisons (“Half 1 predicting Half 2 versus deepRetinotopy predicting Half 2”) more explicit. It is important to highlight that there is more subject-specific variation that is currently not captured by our model, and it can also serve as a benchmarking resource for future model versions and newer approaches.

      The Hemodynamic Response Function: The assumptions used to generate the original empirical maps are permanently baked into the deep learning model. However, the authors explicitly mention the hemodynamic response function (HRF) only once, noting in the Methods that the modeled time series was "convolved with a canonical hemodynamic response function."

      Beyond this single mention, there is no direct discussion of how the assumption of a single canonical HRF across all 161 HCP training subjects might have systematically impacted or biased the network's predictions. The authors address cross-dataset differences broadly under the umbrella of "experimental design" and "fMRI preprocessing pipeline" biases, but the HRF is a core biological property that mediates the connection between the anatomy and the data. The authors should explicitly discuss how this canonical assumption limits or biases the resulting deepRetinotopy network.

      As Reviewers 3 and 1 have noted, the observed limitations in pRF size prediction stem from multiple underlying factors. One of those factors is indeed the HRF assumed in the encoding models. We will expand our discussion about factors that may introduce biases into deepRetinotopy predictions, including the HRF.

      Scoping the Input Data and Normative Use: The authors use FreeSurfer to generate a mean curvature map for the entire midthickness cortical surface. This full-hemisphere curvature map is resampled to a standard template surface space (32k_fs_LR), acting as the data frame that feeds input features into the neural network. However, while the network receives the full geometric structure of the hemisphere, it is explicitly trained to predict retinotopic parameters only within a restricted posterior ROI, based on the Wang et al. atlas and containing roughly 3,200 vertices per hemisphere.

      A useful experiment to try, and perhaps the authors have already considered this, would be to restrict the input features exclusively to the posterior vertices. Including all anterior vertices may make it harder for the network to fit the localized visual data. A brief commentary on why the full hemisphere was retained as input could be highly informative for researchers adapting this geometric deep learning pipeline.

      Thanks for this suggestion. We have not performed a systematic evaluation of using ROIs that span a larger portion of the cortex (including the full hemisphere). It is a great idea to do so and report it in our manuscript to inform other researchers interested in adapting our pipeline. We intend to also update our toolbox by retraining our models to take all posterior vertices as suggested, which would improve the coverage of current predictions.

    1. eLife Assessment

      This is an important and rigorous study that addresses the question of what determines the spatial organization of endocytic zones at synapses. The authors use compelling approaches, in both Drosophila and rodent model systems, to define the role of activity and active zone structure on the organization of the peri-active zone. While the findings are primarily negative, they are carefully executed and contribute to the field by refining existing models of presynaptic organization.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Emperador-Melero et al. seek to determine whether recruitment of endocytic machinery to the periactive zone is activity-dependent or tethered to delivery of active zone machinery. They use genetic knockouts and pharmacological block in two model synapses - cultured mouse hippocampal neurons and Drosophila neuromuscular junctions - to determine how well endocytic machinery localizes after chronic inhibition or acute depolarization by super-resolution imaging. They find acute depolarization in both models have minimal to no effect on the localization of endocytic machinery at the periactive zone, suggesting that these proteins are constitutively maintained rather than upregulated in response to evoked activity. Interestingly, chronic inhibition slightly increases endocytic machinery levels, implying a potential homeostatic upregulation in preparation for rebound depolarization. Using genetic knockouts, the authors show that localization of endocytic machinery to periactive zones occurs independently of proper active zone assembly, even in the absence of upstream organizers like Liprin-α.

      Overall, they propose that the constitutive deployment of endocytic machinery reflects its critical role in facilitating rapid and reliable membrane internalization during synaptic functions beyond classical endocytosis, such as regulation of the exocytic fusion pore and dense-core vesicle fusion. Although many experiments reveal limited changes in the localization or abundance of endocytic machinery, the findings are thorough, and data substantially supports a model in which endocytic components are organized through a pathway distinct from that of the active zone. This work advances our understanding of synaptic dynamics by supporting a model in which endocytic machinery is constitutively recruited and regulated by distinct upstream organizers compared to active zone proteins. It also highlights the utility of super-resolution imaging across diverse synapse types to uncover functionally conserved elements of synaptic biology.

      Strengths:

      The study's technical strengths, particularly the use of super-resolution microscopy and rigorous image analyses developed by the group, bolster their findings.

      Weaknesses:

      One limitation, acknowledged by the authors, is the persistence of spontaneous activity at these synapses, which could still impact the organization of these regions.

      Comments on revisions:

      The authors have addressed all of my previous comments.

    3. Reviewer #2 (Public review):

      Summary:

      This study examines whether the localization of endocytic proteins to presynaptic periactive zones depends on synaptic activity or active zone scaffolds. Using genetic and pharmacological perturbations in both Drosophila and mouse neurons, the authors show that key endocytic proteins remain localized to periactive zones even when evoked release or active zone architecture is disrupted. While the findings are largely negative, the study is methodologically solid and provides useful constraints for current models of synaptic vesicle recycling.

      Strengths:

      The experimental design is careful and systematic, spanning both fly and mammalian systems. The use of advanced genetic models, including Liprin-α quadruple knockout mice, is a notable strength. High-resolution imaging approaches (STED, Airyscan) are appropriately applied to assess nanoscale organization. The study clarifies that strict activity dependence of endocytic recruitment may not be a general principle.

      Weaknesses (largely addressed in revision):

      Several initial concerns have been satisfactorily addressed in the revised manuscript. In particular, the inclusion of EndoA/Dap160 experiments and the expanded discussion improve the work. Some limitations remain, including the reliance on Tetanus toxin at the Drosophila NMJ, which does not fully abolish presynaptic fusion, and the still limited insight into the mechanistic basis of periactive zone organization. The biological interpretation of small changes in protein levels upon silencing also remains somewhat unclear.

      Comments on revisions:

      I thank the authors for the careful revision of the manuscript. The additional experiments, in particular the inclusion of EndoA and Dap160 at the Drosophila NMJ, as well as the extended discussion of limitations, are appreciated and address important points raised in the first round.

      While the principal conclusions of the study remain unchanged, and the manuscript is still largely based on negative results, I find that the authors now present these data in a more balanced and transparent manner. The discussion of activity-dependence is improved and more nuanced, especially with regard to possible contributions of spontaneous release and homeostatic effects.

      In my opinion, despite the mostly negative nature of the findings, the work provides a valuable and relevant contribution, as it defines important constraints on current models of periactive zone organization. The study is technically strong, carefully executed, and systematically performed across different model systems.

      Overall, the revised manuscript is clearly improved and represents a solid and well-executed piece of work that will be of interest to the field.

    4. Reviewer #3 (Public review):

      Summary:

      This study examines how synaptic endocytic zones are positioned using a combination of cultured neurons and the Drosophila neuromuscular junction. The authors test whether neuronal activity, active zone assembly, or liprin-α function is required to localize endocytic zone markers, including Dynamin, Amphiphysin, Nervous Wreck, PIPK1γ, and AP-180. None of the manipulations tested caused a coordinated disruption in the localization or abundance of these markers, leading to the conclusion that endocytic zones form independently of synaptic activity and active zone scaffolds.

      Strengths:

      The work is systematic and carefully executed, using multiple manipulations and two complementary model systems. The authors consistently examine multiple molecular markers, strengthening the interpretation that endocytic zone positioning is robust to changes in activity and structural assembly.

      Weaknesses:

      The main limitation is that the study does not test whether the methods used are sensitive enough to detect subtle functional disruption, and no condition tested produces clear disorganization of the endocytic zone. As a result, the conclusion that these zones assemble independently is supported by negative data, without a strong positive control for disassembly or mislocalization.

      This paper addresses a longstanding question in synaptic biology and provides a well-supported boundary on the types of mechanisms that are likely to govern endocytic zone localization. The conclusions are well justified by the data, though additional evidence would be needed to define the assembly mechanism itself.

      Comments on revisions:

      The authors responded to the initial review with care. They both revised the manuscript and conducted new experiments to address each reviewer's concern. The responses to the review were effective, and I think that the revised manuscript provides significant new insights. In my view, it does not require additional revisions.

    5. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their careful consideration of our work and constructive comments. We are glad that reviewers appreciated the rigor and value of our work. In response to the reviewer comments we have made the following changes:

      (1) Addition of new experiments on EndoA localization at the Drosophila NMJ (Fig. 2).

      (2) Addition of new experiments on Dap160 localization at the Drosophila NMJ (Fig. 2).

      (3) Addition of new experiments to validate Dynamin, Dap160 and EndoA antibodies (Fig. 2 – figure supplement 1).

      (4) Assessment of the activity-dependence of EndoA and Dap160 localization at the Drosophila NMJ (Fig. 3).

      (5) Assessment of the liprin-dependence of EndoA and Dap160 localization at the Drosophila NMJ (Fig. 8).

      (6) Addition of a limitations section to the discussion to directly address that spontaneous release was not fully ablated in our studies and might contribute to recruitment.

      (7) Addition of an outlook to the same section on what experimental avenues could address the limitations in the future.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Emperador-Melero et al. seek to determine whether recruitment of endocytic machinery to the periactive zone is activity-dependent or tethered to delivery of active zone machinery. They use genetic knockouts and pharmacological block in two model synapses - cultured mouse hippocampal neurons and Drosophila neuromuscular junctions - to determine how well endocytic machinery localizes after chronic inhibition or acute depolarization by super-resolution imaging. They find that acute depolarization in both models has minimal to no effect on the localization of endocytic machinery at the periactive zone, suggesting that these proteins are constitutively maintained rather than upregulated in response to transient activity. Interestingly, chronic inhibition slightly increases endocytic machinery levels, implying a potential homeostatic upregulation in preparation for rebound depolarization. Using genetic knockouts, the authors show that localization of endocytic machinery to periactive zones occurs independently of proper active zone assembly, even in the absence of upstream organizers like Liprin-α. Overall, they propose that the constitutive deployment of endocytic machinery reflects its critical role in facilitating rapid and reliable membrane internalization during synaptic functions beyond classical endocytosis, such as regulation of the exocytic fusion pore and dense-core vesicle fusion. Although many experiments reveal limited changes in the localization or abundance of endocytic machinery, the findings are thorough, and data substantially support a model in which endocytic components are organized through a pathway distinct from that of the active zone. This work advances our understanding of synaptic dynamics by supporting a model in which endocytic machinery is constitutively recruited and regulated by distinct upstream organizers compared to active zone proteins. It also highlights the utility of super-resolution imaging across diverse synapse types to uncover functionally conserved elements of synaptic biology.

      We thank the reviewer for the positive assessment of our study.

      Strengths:

      The study's technical strengths, particularly the use of super-resolution microscopy and rigorous image analyses developed by the group, bolster their findings.

      We thank the reviewer for highlighting the technical strength of our work.

      Weaknesses:

      One notable limitation, however, is the absence of interrogation of endocytic proteins previously suggested to be recruited in an activity-dependent manner, in particular, endophilin.

      We thank the reviewer for the suggestion. We have added experiments to assess the localization of two more proteins at Drosophila NMJs. These proteins are EndoA and Dap160, both of which have been reported to traffic between the synaptic vesicle cloud and the plasma membrane in response to stimulation [1-3]. In line with these studies, we observed that EndoA and Dap160 partially co-localize with a synaptic vesicle marker and with a periactive zone marker, indicating localization to both compartments (Fig. 2). However, neither high frequency stimulation nor expression of TeNT changed the levels or the distribution of these two proteins at the periactive zone (Fig. 3). Similarly, the deployment of these proteins at the periactive zone at the Drospophila NMJ was not dependent on the active zone scaffold Liprin-α (Fig. 8). Our data indicate that deployment of EndoA and Dap160 to the periactive zone does not require evoked synaptic activity.

      We believe that there are multiple plausible explanations for our findings compared to previous work on Endophilin, which we discuss on lines 407-410: “Increased synaptic enrichment was also observed for Endophilin at nematode NMJs in mutants with disrupted exocytosis (Bai et al., 2010). We do not see such large shifts in Endophilin following similar manipulations, which might reflect distinct synaptic architectures in the C. elegans dorsal cord versus Drosophila NMJ terminals.” Further, this study finds that a plasma membrane-tethered Endophilin strongly colocalizes with endocytic machinery and largely rescues function. This suggests that the plasma membrane is the primary functional compartment for Endophilin. Together with our work, we conclude that these data suggest that Endophilin constitutively, but not completely, localizes to the periactive zone.

      Reviewer #2 (Public review):

      Summary:

      This study examines whether the localization of endocytic proteins to presynaptic periactive zones depends on synaptic activity or active zone scaffolds. Using a combination of genetic and pharmacological perturbations in Drosophila and mouse neurons, the authors show that proteins such as Dynamin, Amphiphysin, AP-180, and others are still recruited to periactive zones even when evoked release or active zone architecture is disrupted. While the results are mostly negative, the study is methodologically solid and contributes to a more nuanced understanding of synaptic vesicle recycling machinery.

      We thank the reviewer for deeming our work solid and for highlighting its importance for the field.

      Strengths:

      (1) The experimental design is careful and systematic, covering both fly and mammalian systems.

      (2) The use of advanced genetic models (e.g., Liprin-α quadruple knockout mice) is a notable strength.

      (3) High-resolution imaging (STED, Airyscan) is well used to assess spatial localization.

      (4) The findings clarify that certain core assumptions - such as strict activity dependence of endocytic recruitment - may not hold universally.

      We thank the reviewer for pointing out these strengths.

      Weaknesses:

      (1) The study would benefit from a clearer positive control to demonstrate activity-dependent recruitment (e.g., Endophilin).

      We have added experiments to measure the localization of Endophilin, a protein previously reported to localize to the synaptic vesicle cloud [1], in Drosophila NMJs (Figs. 2 and 3). We observed that EndoA localized both to the synaptic vesicle cloud and to the periactive zone area. While stimulation did not enhance levels in either compartment, this outcome is not inconsistent with shuttling of protein between compartments during activity. Nevertheless, our data support a model in which EndoA, like the other tested endocytic proteins, is present at the periactive zone at rest.

      (2) The reliance on Tetanus toxin in the Drosophila NMJ experiments in my eyes is a limitation, as it does not block all presynaptic fusion events; this should be discussed more directly.

      We agree with the point of the reviewer. To more directly discuss it, we have included a “Limitations and Outlook” section in the revised version. We state that “conclusions that can be drawn on the roles of spontaneous release in periactive zone assembly remain limited” (lines 514-515). We further state that, while the manipulations that we included result in decreased spontaneous release, “it is possible that the remaining spontaneous release supports periactive zone assembly” (518-519) and that “Future studies might test manipulations with strong effects on miniature release including those affecting SNARE proteins and their regulators, with the caveat that these manipulations might have effects on upstream trafficking and in some cases on cell survival (Kaeser and Regehr, 2014; Santos et al., 2017).” (519-523).

      (3) The potential role of Dynamin in organizing other periactive zone proteins is not addressed and could be an important next step.

      We agree with the reviewer that this is an interesting possibility. On lines 454-455, we make the broad point that “interactions between endocytic proteins may further contribute to the anchoring of this apparatus”, and on lines 459-460, we specifically suggest a role for Dynamin by stating that “perturbing interactions between Dynamin-1 and Endophilin-A1 increases the distance between these proteins (Imoto et al., 2024), suggesting their binding has a scaffolding function.”

      (4) Some small changes in protein levels upon silencing are reported; their biological meaning (e.g., compensation vs. variability) is not fully clarified.

      These changes might include homeostatic adaptations. In the revised version of the manuscript, this is addressed on lines 135-137 and 405-407. We think it is overall difficult to assign biological meaning to small-magnitude changes, and chose to highlight the main point that there are no large-magnitude changes.

      (5) While alternative organizing mechanisms (actin, lipids, adhesion molecules) are mentioned, a more forward-looking discussion of how to test these models would be helpful.

      Following the reviewer’s suggestion, we have added an outlook section to the discussion where we provide suggestions for future studies (lines 510-543).

      (6) The authors should consider including, or at least discussing, a well-established activity-dependent endocytic protein (e.g., Endophilin) as a positive control to help contextualize the negative findings.

      We have included new experiments on EndoA at the fly neuromuscular junction (Fig. 2, Fig. 3, Fig. 8, Fig. 3 – figure supplement 1) and have added appropriate discussion of these findings as outlined above.

      Reviewer #3 (Public review):

      Summary:

      This study examines how synaptic endocytic zones are positioned using a combination of cultured neurons and the Drosophila neuromuscular junction. The authors test whether neuronal activity, active zone assembly, or liprin-α function is required to localize endocytic zone markers, including Dynamin, Amphiphysin, Nervous Wreck, PIPK1γ, and AP-180. None of the manipulations tested caused a coordinated disruption in the localization or abundance of these markers, leading to the conclusion that endocytic zones form independently of synaptic activity and active zone scaffolds.

      We thank the reviewer for reviewing our work.

      Strengths:

      The work is systematic and carefully executed, using multiple manipulations and two complementary model systems. The authors consistently examine multiple molecular markers, strengthening the interpretation that endocytic zone positioning is robust to changes in activity and structural assembly.

      We thank the reviewer for pointing out these strengths.

      Weaknesses:

      The main limitation is that the study does not test whether the methods used are sensitive enough to detect subtle functional disruption, and no condition tested produces clear disorganization of the endocytic zone. As a result, the conclusion that these zones assemble independently is supported by negative data, without a strong positive control for disassembly or mislocalization.

      We are confident that our methods are sensitive enough to detect changes within synaptic compartments. First, for mouse neurons assessed with STED microscopy, we have demonstrated that we can distinguish between the N- and the C-termini of the presynaptic protein Bassoon, which are positioned only a few tens of nanometers apart [4]. We have subsequently been consistently able to resolve the localization of pre- and postsynaptic proteins that also localize a few tens of nanometers apart and have established that genetic manipulations of active zone proteins induce detectable disruptions as assessed by STED microscopy [4-12]. Given that the periactive zone is larger than the distances that we can resolve, we are confident that we can detect changes in this area with enough sensitivity. Second, for Drosophila NMJs, we use a carefully validated workflow that allows assessing the distribution of periactive zone proteins and can detect subtle changes [13]. Unfortunately, there are no known manipulations that lead to periactive zone disassembly that could serve as a positive control, which reflects the little knowledge available in this field. We acknowledge that there may be subtle changes in protein localization that escape the resolution of our microscopy methods or experimental design, but this would not undermine the conclusion that the periactive zone remains assembled across the manipulations that we have tested. Overall, none of the manipulations we test induces a detectable disruption of the periactive zone. Naturally, we cannot exclude milder effects and have added a limitations section to discuss this possibility and some of the subtle changes we observe.

      This paper addresses a longstanding question in synaptic biology and provides a well-supported boundary on the types of mechanisms that are likely to govern endocytic zone localization. The conclusions are well justified by the data, though additional evidence would be needed to define the assembly mechanism itself.

      We thank the reviewer for the support of the conclusion of our study.

      Recommendations for the authors:

      Reviewing Editor Comments:

      This is a rigorous study that, while presenting largely negative data, delimitates the processes that control peri-active zone organization. In addition to the interpretive and technical comments below, we encourage the authors to consider extending this study in two areas. First, examining the activity-dependence of Endophilin, and perhaps other factors, being recruited to the PAZ, where previous research has indicated a positive role for activity. Second, further characterization of the role of miniature release events in potentially contributing to PAZ organization. Overall, this was a rigorous and well-executed study.

      We thank the reviewing editor for this positive assessment of our work.

      Reviewer #1 (Recommendations for the authors):

      (1) The rationale for comparing chronic inhibition to acute depolarization could be more clearly articulated. While this approach may be grounded in prior studies, the physiological consequences of chronic silencing differ markedly from those of transient activity, and these distinctions should be more explicitly addressed in the interpretation of results. For example, might lower intensity, chronic stimulation be a better comparison? Since fixation takes place immediately after stimulation, the time window to capture changes in protein recruitment may be curtailed.

      We thank the reviewer for this comment. The introduction of the manuscript now includes a rationale on lines 110-112. By inhibiting evoked synaptic vesicle fusion throughout the lifespan of neurons, we assessed whether this process is necessary for periactive zone assembly and concluded that it is not a requirement. By acutely depolarizing neurons with 50 mM KCl or with a 40 Hz train of action potentials, we were able to test whether synaptic vesicle fusion triggers the rapid recruitment of endocytic proteins to the periactive zone and concluded that this is not the case for most of the endocytic proteins that we studied. While these results indicate that a constitutive pathway must exist to assemble the periactive zone, we remain agnostic as to whether stimulation paradigms not tested in our study can enhance the deployment of endocytic proteins, especially over long periods of time. This may be the case for low, chronic stimulation, as suggested by the reviewer. We clarify these limitations on a “limitations and outlook” section of the discussion (lines 510-543).

      (2) Amphiphysin stood out as the only protein showing a notable change in opposite directions under either active zone protein knockout/blockers and Liprin-α knockout. Given the predominance of negative results, it would be valuable to devote more discussion to why Amphiphysin behaves differently. What functional role might it play in this context that sets it apart from other endocytic components?

      As suggested by the reviewer, we have extended the discussion on Amphiphysin. One possibility why Amphiphysin may respond differently to different genetic manipulations or changes in stimulation is that different endocytic proteins might belong to different endocytic submachineries. This is addressed on lines 421-424. On lines 444-449, we further discuss the subtle decrease in the levels of Amphiphysin and AP-180 in Liprin-α mutants. We suggest that the actin cytoskeleton may be the link between the active zone and the endocytic apparatus, and that this link may be partially disrupted in Liprin-α mutants. Overall, we note that Amphiphysin is still localized to the periactive zone at rest, and hence that it fits with the overall model of constitutive deployment that we propose.

      (3) The claim of activity-independence may need to be nuanced. Although the data suggest no recruitment in response to acute stimulation, the subtle changes following chronic inhibition complicate this interpretation, especially when considering redundancy. If activity-dependence is considered bidirectional, these findings might reflect a more complex regulatory mechanism. The interpretation in lines 188-190 more accurately captures this complexity than earlier generalizations.

      We agree with the reviewer that the dependence on activity should be discussed in a nuanced fashion. We have scrutinized the manuscript on this point and state throughout that recruitment is independent of evoked activity and not necessarily of any kind of activity. We believe that this interpretation is accurate because evoked release of neurotransmitter was ablated by the pharmacological and genetic manipulations that we used. Furthermore, we have included a “Limitations of the study” section in the discussion where we openly address that spontaneous fusion of synaptic vesicles cannot be ruled out as a potential mechanism to sustain periactive zone assembly (lines 514-523). Finally, we have expanded on the complexity of periactive zone assembly relative to activity. In particular, homeostasis may contribute to increased levels of endocytic proteins upon chronic blockade of evoked transmission (lines 404-406).

      (4) Given published work on endophilin's role in activity-dependent endocytic recruitment, adding endophilin (at least in the Drosophila NMJ experiments) would be highly informative.

      We thank the reviewer for the suggestion. We have added experiments to assess the localization of two more proteins at Drosophila NMJs. These proteins are EndoA and Dap160, both of which have been reported to traffic between the synaptic vesicle cloud and the plasma membrane in response to stimulation [1-3]. In line with these studies, we observed that EndoA and Dap160 partially co-localize with a synaptic vesicle marker and with a periactive zone marker, indicating localization to both compartments (Fig. 2). However, neither high frequency stimulation nor expression of TeNT changed the levels or the distribution of these two proteins at the periactive zone (Fig. 3). Similarly, the deployment of these proteins at the periactive zone at the Drosophila NMJ was not dependent on the active zone scaffold Liprin-α (Fig. 8). Our data indicate that deployment of EndoA and Dap160 to the periactive zone does not require evoked synaptic activity.

      We believe that there are multiple plausible explanations for these findings compared to previous work on Endophilin [3], which we discuss on lines 407-410:

      “Increased synaptic enrichment was also observed for Endophilin at nematode NMJs in mutants with disrupted exocytosis (Bai et al.,2010). We do not see such large shifts in Endophilin following similar manipulations, which might reflect distinct synaptic architectures in the C. elegans dorsal cord vs Drosophila NMJ terminals.” Further, this study finds that a plasma membrane-tethered Endophilin strongly colocalizes with endocytic machinery and largely rescues function. This suggests that the plasma membrane is the primary functional compartment for Endophilin. Together, all data are compatible with a model in which Endophilin constitutively, but not completely, localizes to the periactive zone.

      (5) Line 57 might have a typo in the citation.

      We thank the reviewer for pointing this out. The citations now include: Bai et al., 2010; Jiang et al., 2024; Koh et al., 2007; Winther et al., 2013 and Winther et al. 2015. Please note that these two last citations are grouped as Winther et al. 2013, 2015 following our formatting style.

      (6) Line 208 might be missing a citation that justifies parameters.

      In the revision, this information is discussed on lines 222-224, where we cite our prior work describing these data: “Each unit is divided into ‘mesh’ and ‘core’ regions, where the periactive zone mesh is a ~175 nm wide area localized at ~330 nm from the center, and the ‘core’ region is the interior to this mesh (Del Signore et al., 2023)”.

      Reviewer #2 (Recommendations for the authors):

      (1) Please consider including, or at least discussing, a well-established activity-dependent endocytic protein (e.g., Endophilin) as a positive control to help contextualize the negative findings.

      We thank the reviewer for the suggestion. We have added experiments to assess the localization of two more proteins at Drosophila NMJs. These proteins are EndoA and Dap160, both of which have been reported to traffic between the synaptic vesicle cloud and the plasma membrane in response to stimulation [1-3]. In line with these studies, we observed that EndoA and Dap160 partially co-localize with a synaptic vesicle marker and with a periactive zone marker, indicating localization to both compartments (Fig. 2). However, neither high frequency stimulation nor expression of TeNT changed the levels or the distribution of these two proteins at the periactive zone (Fig. 3). Similarly, the deployment of these proteins at the periactive zone at the Drosophila NMJ was not dependent on the active zone scaffold Liprin-α (Fig. 8). Our data indicate that deployment of EndoA and Dap160 to the periactive zone does not require evoked synaptic activity.

      We believe that there are multiple plausible explanations for our findings compared to previous work on Endophilin [3], which we discuss on lines 407-410: “Increased synaptic enrichment was also observed for Endophilin at nematode NMJs in mutants with disrupted exocytosis (Bai et al.,2010). We do not see such large shifts in Endophilin following similar manipulations, which might reflect distinct synaptic architectures in the C. elegans dorsal cord vs Drosophila NMJ terminals.” Further, this study finds that a plasma membrane-tethered Endophilin strongly colocalizes with endocytic machinery and largely rescues function. This suggests that the plasma membrane is the primary functional compartment for Endophilin. Together, all data are consistent with a model in which Endophilin constitutively, but not completely, localizes to the periactive zone.

      (2) Expand the discussion of TeNT's limitations-specifically that it does not block spontaneous fusion or alternative fusion pathways-and consider referencing more stringent tools (e.g., Botulinum toxins or SNARE mutants), even if they weren't used here.

      Following the reviewer’s suggestion, we have included a “Limitations and Outlook” section in the revised version. We state that “conclusions that can be drawn on the roles of spontaneous release in periactive zone assembly remain limited” (lines 514-515). We further state that, while the manipulations that we included result in decreased spontaneous release, “it is possible that the remaining spontaneous release supports periactive zone assembly” (518-519) and that “Future studies might test manipulations with strong effects on miniature release including those affecting SNARE proteins and their regulators, with the caveat that these manipulations might have effects on upstream trafficking and in some cases on cell survival (Kaeser and Regehr, 2014; Santos et al., 2017)” (520-523).

      (3) We encourage the authors to briefly discuss whether Dynamin might contribute to periactive zone structure beyond its role in membrane fission. Loss-of-function data could be particularly informative in future work.

      We agree with the reviewer that this is an interesting possibility. On lines 454-455, we make the broad point that “interactions between endocytic proteins may further contribute to the anchoring of this apparatus”, and on lines 459-460, we specifically suggest a role for Dynamin by stating that “perturbing interactions between Dynamin-1 and Endophilin-A1 increases the distance between these proteins (Imoto et al., 2024), suggesting their binding has a scaffolding function.”

      (4) Clarify the interpretation of increased endocytic protein levels upon chronic silencing - are these interpreted as homeostatic responses or experimental variability?

      We suggest that these changes might include homeostatic adaptations. We note that this increase is of the same magnitude as the increase in active zone proteins following a similar pharmacological manipulation on lines 405-406, where we state that “a mechanism for this effect might be a homeostatic response (Wen and Turrigiano, 2024) similar in magnitude to the increase in active zone protein levels following activity blockade (Held et al., 2020).”

      (5) The Discussion could be strengthened by sketching out more concrete experimental approaches to test candidate mechanisms (e.g., roles for actin, lipids, adhesion molecules) in organizing periactive zones.

      The potential roles of the cell adhesion molecules (lines 430-440), cytoskeleton and lipids (442-452) are addressed in the discussion. Furthermore, following the reviewer’s suggestion, we have added the following statement (lines 541-543): “This work builds a foundation to assess alternative mechanisms and models of periactive zone assembly, including roles of the cytoskeleton, lipids, adhesion molecules, and intrinsic endocytic protein interactions”. We hope that the reviewer agrees that the discussion of our paper is not the right format to provide a concrete experimental plan for future work. In our view, the discussion should put the findings of our experiments in the context of the field.

      Reviewer #3 (Recommendations for the authors):

      (1) At a spine synapse, the endocytic zone is estimated to be between 100-200nm from the active zone. The focus of the author's analysis is largely outside of this region (0-150nm), raising the question of whether the area studied may be outside of the area affected by the manipulations made. While STED systems claim ~80 nm resolution, this is rarely achieved in practice, and the authors do not report the effective resolution of their system. Reporting the resolution achieved would address this issue. In addition, super-resolution imaging does not appear to have been used at the Drosophila NMJ. The authors should clarify whether resolution limitations influenced the choice of analysis region and whether their imaging approach is sufficient to detect changes in the endocytic zone.

      We believe that it is unlikely that the relevant signals were missed. First, in mouse synapses, most signal corresponding to endocytic proteins was detected inside the selected region of interest. Our rationale to select the area was based on the fact that expanding the region analyzed would have reduced the sensitivity of our approach, as averaging over a larger area would dilute the signal. The resolution of our microscopy should not be a limitation either. In our previous work, we demonstrated that STED microscopy allows discriminating between the N- and the C-terminal termini of the presynaptic scaffold Bassoon, which are positioned only a few tens of nanometers apart [4]. This establishes that we can resolve differences at tens of nanometers in biological context, which is more relevant than the resolution measured with fluorescent beads (which we have repeatedly assessed to be ~80 nm laterally). Subsequently, we have also been consistently able to resolve the localization of pre- and postsynaptic proteins that also localize a few tens of nanometers apart [4-12]. Given that the periactive zone spans over a larger area than the distances that we can resolve experimentally in the examples above, we are confident that our measurements are sensitive enough to detect changes in this area.

      Second, for Drosophila NMJs, the choice for the region of interest and the overall analysis was done following a workflow validated in our previous work [13]. This method analyzes both immediately adjacent and more distant regions from the active zone, and does not exclude any region based on distance from the active zone as described on lines 222-224: “Each unit is divided into ‘mesh’ and ‘core’ regions, where the periactive zone mesh is a ~175 nm wide area localized at ~330 nm from the center, and the ‘core’ region is the interior to this mesh (Del Signore et al., 2023).” In our previous study, we analyzed the distribution of periactive zone proteins at rest with STED microscopy and with Airyscan confocal microscopy. The resolution provided by Airyscan is reported to be ~175 nm in XY and ~400 nm in Z, which is sufficient to assess localization to the periactive zone compartment imaging methods and is not inferior to imaging methods previously used to report changes in the distribution of endocytic proteins; for examples, see [1,2]. In the revised manuscript, we have added new data measuring the levels and distribution of EndoA and Dap160 using STED microscopy (Figure 3 – figure supplement 1). The results acquired with STED microscopy and with Airyscan confocal microscopy are consistent with one another.

      Overall, the accuracy of the imaging methods and analyses used in this study are sufficient to assess periactive zone structure given its size and organization.

      (2) Interestingly, in a number of cases, the authors observe significant differences in endocytic markers (Figure 1q, 4k, 6k, 6r). However, little is made of these differences. The authors should provide more discussion of these changes and how they make sense of them alongside their claims of a lack of effect from their manipulations.

      The reviewer raises a good point. We interpret these changes in two different ways. First, we suggest that changes observed in response to block of action potentials or disassembly of the active zone might be homeostatic. This is addressed on lines 135-137. Second, we discuss that the actin cytoskeleton may be the link between the active zone and the endocytic apparatus. Several active zone proteins interact with the actin cytoskeleton. One of them is Liprin-α. This interaction may explain the decrease in the level of Amphiphysin and AP-180 at the periactive zone in Liprin-α null neurons. This is addressed on lines 444-449. We hope that the reviewer agrees that overall, we should focus on the main conclusion that deployment of endocytic proteins persists over a number of manipulations and synapse types.

      (3) The graphs in Figure 1c and 1g, 3g, 4c, 4e, 6c, and 6g do not appear to be identical. If the solid line represents the mean and the lighter color represents the distribution of these data, these data appear to be different from one another. It is surprising that these differences are not significant. What statistical tests were used to determine whether the differences in these graphs are not significant? Is the issue that a relatively now number of synapses were examined (30-60)? Did the authors conduct a power analysis?

      We apologize if the display of our data and analyses was not clear. We do not perform statistical analyses on the line profiles. Instead, we perform it on two values that are extracted from line profiles. These values are (1) the distance between the peak intensity values of the protein of interest and the marker and (2) the peak intensity values. For example, in Figure 1, distances are quantified and statistically analyzed in panel j, and the peak levels are quantified and statistically analyzed in panel k. We have clarified this in the legend of current Figures 1, 4, 5, and 7.

      (4) The authors clearly state that their experiments address the role of evoked activity in endocytic zone positioning, but they do not examine whether spontaneous vesicle fusion might play a role. Given the availability of Drosophila mutants that decrease (Doc2, Dunc-13) or increase (syt1) spontaneous release, this is a notable omission. Ideally, these mutants should be examined. And at a minimum, the authors should discuss whether spontaneous release could contribute to endocytic zone organization.

      We agree with the reviewer that spontaneous fusion of synaptic vesicles may contribute to periactive zone organization. Many of the genetic manipulations that we used in mouse neurons result in a significant decrease in spontaneous release. This includes Ca<sub>V</sub>2 triple knockouts with a ~60% decrease in spontaneous fusion [10], RIM+ELKS quadruple knockouts with a ~70% decrease in spontaneous fusion [9] and Liprin-α quadruple knockouts with a ~50% decrease in spontaneous fusion [7]. We cannot rule out that the spontaneous release that is left is sufficient to mediate assembly functions. The conclusive way to address this possibility is using a manipulation that ablates spontaneous release without altering other pathways. However, to our knowledge, this is not available. The manipulations suggested by the reviewer might suffer from similar limitations, as they would change the frequency of spontaneous release without fully ablating it, and they would also affect evoked release. We have included a limitations section in the discussion where we address this (lines 514-523), specifically stating “conclusions that can be drawn on the roles of spontaneous release in periactive zone assembly remain limited. While many of the manipulations used here, including Ca<sub>V</sub>2 knockout (Held et al., 2020), RIM+ELKS knockout (Tan et al., 2022; Wang et al., 2016) and Liprin-α knockout (Emperador-Melero et al., 2024) in hippocampal neurons, and TeNT expression in fly NMJs (Sweeney et al.,1995) , result in 50% to 70% decreased spontaneous release rates, it is possible that the remaining spontaneous release supports periactive zone assembly. Future studies might test manipulations with strong effects on miniature release including those affecting SNARE proteins and their regulators, with the caveat that these manipulations might have effects on upstream trafficking and in some cases on cell survival (Kaeser and Regehr, 2014; Santos et al., 2017).” We hope that the reviewer agrees that assessing these mutants should be a topic of future studies, given that we already test many mutants in the paper.

      (5) In Figures 1 and 6, the authors assess presynaptic protein localization in cultured neurons, but it is unclear whether these are synaptic sites. Many presynaptic proteins traffic together and can accumulate at sites lacking postsynaptic specializations. The authors should validate that the observed spatial organization occurs at bona fide synapses, ideally by co-labeling with postsynaptic markers as done in Figure 4. If methods like these were used, providing more details on how synapses were identified and selected would be useful to the reader.

      While we understand the reviewer’s point, we are confident that the structures analyzed are bona fide synapses for three reasons, as we have established before across many papers [4-8,10-12,17].

      The diameter of the structures detected using the synaptic vesicle marker Synaptophysin aligns much more closely with the size of the large vesicle clusters found at presynaptic terminals than with that of a few transport vesicles.

      In side-view synapses, the bar-like distribution of the active zone marker (Bassoon or Munc13-1) at one edge of the vesicle cloud indicates that active zone proteins are organized at one edge of the vesicle cluster—consistent with the architecture of synapses.

      Synaptophysin is one of our key markers for detecting synapses. In our cultures, most of the Synaptophysin signal colocalizes with postsynaptic markers (either PSD-95 or Gephyrin), as we have established across many studies [4,7-12]. This indicates that the markers used here are sufficient to select synapses. Furthermore, the frequency at which synapses were identified using an active zone marker as the second marker was similar to that observed when using a postsynaptic marker, suggesting that we were not randomly including unrelated structures.

      (6) Many of the images, particularly of the Drosophila NMJ, are of low quality and are shown in very small images. In addition, the quality of the images throughout the paper makes it difficult to assess the author's analysis and results. The authors should provide larger, higher-quality images that show examples of the means for each of the examples shown. This is an issue for most of the figures, but is particularly prominent in the dNMJ. A minor additional point is that the authors should be clear whether the dNMJ images are collected at super-resolution or using a conventional microscope.

      We believe that the quality of our images is sufficient for the assessments made for the following reasons:

      These images were acquired with enough spatial resolution to assess levels at the PAZ as discussed in response to this reviewer’s first comment. In our previous work, we used images acquired at the same resolution and presented in the same manner for both mouse hippocampal synapses [6,7] and Drosophila NMJs [13,18]. In those previous studies, we drew conclusions at a similar level of detail as in the current study.

      In our view, our representative images are not inferior in quality to other papers in the field addressing similar questions [1,2,19,20].

      We have selected sample images based on the quantified mean values per condition. Hence, we strived to select panels that are objectively representative regarding the quantified parameters.

      We have specified microscopy methods in the figure legends. Specifically, for Drosophila NMJs, we used Airyscan confocal microscopy and STED microscopy. For each experiment, it is now stated which microscopy method was used in the corresponding legend.

      References:

      (1) Winther, Å. M. E. et al. An Endocytic Scaffolding Protein together with Synapsin Regulates Synaptic Vesicle Clustering in the Drosophila Neuromuscular Junction. J Neurosci 35, 14756–14770 (2015).

      (2) Winther, Å. M. E. et al. The dynamin-binding domains of Dap160/intersectin affect bulk membrane retrieval in synapses. J Cell Sci 126, 1021–1031 (2013).

      (3) Bai, J., Hu, Z., Dittman, J. S., Pym, E. C. G. & Kaplan, J. M. Endophilin functions as a membrane-bending molecule and is delivered to endocytic zones by exocytosis. Cell 143, 430–441 (2010).

      (4) Wong, M. Y. et al. Liprin-alpha3 controls vesicle docking and exocytosis at the active zone of hippocampal synapses. Proc Natl Acad Sci U S A 115, 2234–2239 (2018).

      (5) Emperador-Melero, J., de Nola, G. & Kaeser, P. S. Intact synapse structure and function after combined knockout of PTPδ, PTPσ, and LAR. Elife 10, (2021).

      (6) Emperador-Melero, J. et al. PKC-phosphorylation of Liprin-α3 triggers phase separation and controls presynaptic active zone structure. Nat Commun 12, 3057 (2021).

      (7) Emperador-Melero, J. et al. Distinct active zone protein machineries mediate Ca2+ channel clustering and vesicle priming at hippocampal synapses. Nature Neuroscience 2024 1–15 (2024) doi:10.1038/s41593-024-01720-5.

      (8) Tan, C., Wang, S. S. H., de Nola, G. & Kaeser, P. S. Rebuilding essential active zone functions within a synapse. Neuron 110, 1498-1515.e8 (2022).

      (9) Wang, S. S. H. et al. Fusion Competent Synaptic Vesicles Persist upon Active Zone Disruption and Loss of Vesicle Docking. Neuron 91, 777–791 (2016).

      (10) Held, R. G. et al. Synapse and Active Zone Assembly in the Absence of Presynaptic Ca(2+) Channels and Ca(2+) Entry. Neuron 107, 667-683.e9 (2020).

      (11) Chin, M. & Kaeser, P. S. The intracellular C-terminus confers compartment-specific targeting of voltage-gated calcium channels. Cell Rep 43, 114428 (2024).

      (12) Nyitrai, H., Wang, S. S. H. & Kaeser, P. S. ELKS1 Captures Rab6-Marked Vesicular Cargo in Presynaptic Nerve Terminals. Cell Rep 31, 107712 (2020).

      (13) Del Signore, S. J., Mitzner, M. G., Silveira, A. M., Fai, T. G. & Rodal, A. A. An approach for quantitative mapping of synaptic periactive zone architecture and organization. Mol Biol Cell 34, (2023).

      (14) Sweeney, S. T., Broadie, K., Keane, J., Niemann, H. & O’Kane, C. J. Targeted expression of tetanus toxin light chain in Drosophila specifically eliminates synaptic transmission and causes behavioral defects. Neuron 14, 341–351 (1995).

      (15) Kaeser, P. S. & Regehr, W. G. Molecular mechanisms for synchronous, asynchronous, and spontaneous neurotransmitter release. Annu Rev Physiol 76, 333–363 (2014).

      (16) Santos, T. C., Wierda, K., Broeke, J. H., Toonen, R. F. & Verhage, M. Early Golgi Abnormalities and Neurodegeneration upon Loss of Presynaptic Proteins Munc18-1, Syntaxin-1, or SNAP-25. Journal of Neuroscience 37, 4525–4539 (2017).

      (17) de Jong, A. P. H. et al. RIM C2B Domains Target Presynaptic Active Zone Functions to PIP2-Containing Membranes. Neuron 98, 335-349.e7 (2018).

      (18) Del Signore, S. J. et al. An autoinhibitory clamp of actin assembly constrains and directs synaptic endocytosis. Elife 10, (2021).

      (19) Imoto, Y. et al. Dynamin 1xA interacts with Endophilin A1 via its spliced long C-terminus for ultrafast endocytosis. EMBO Journal https://doi.org/10.1038/S44318-024-00145-X

      (20) Imoto, Y. et al. Dynamin is primed at endocytic sites for ultrafast endocytosis. Neuron 110, 2815-2835.e13 (2022).

    1. eLife Assessment

      This potentially useful manuscript addresses the 3D chromatin architecture in monocytes from a few patients with alcohol-associated hepatitis and its relationship to enhanced transcription of innate immune genes. While the concept and methodological approach are interesting in principle, the evidence is incomplete as a result of insufficient sample sizes as well as other substantive analytical concerns.

    2. Reviewer #3 (Public review):

      In this manuscript, the authors use HiC to study the 3D genome of CD14+ CD16+ monocytes from the blood of healthy and those from patients with Alcohol-associated Hepatitis.

      Overall, the authors perform a cursory analysis of the HiC data and conclude that there are a large number of changes in 3D genome architecture between healthy and AH patient monocytes. They highlight some specific examples that are linked to changes in gene expression. The analysis is of such a preliminary nature that I would usually expect to see the data from all figures in just one or two figures.

      In addition, I have a number of concerns regarding the experimental design and the depth of the analyses performed that I think must be addressed.

      (1) There is a myriad of literature that describes the existence of cell-type-specific 3D genome architecture. In this manuscript, there is an assumption by the authors that the CD14+ CD16+ monocytes represent the same population from both the healthy and diseased patients. Therefore, the authors conclude that the differences they see in the HiC data are due to disease-related changes in the equivalent cell types. However, I am concerned that the AH patient monocytes may have differentiated due to their environment so that they are in fact akin to a different cell type and the 3D genome changes they describe reflect this. This is supported by published articles, for example: Dhanda et al., Intermediate Monocytes in Acute Alcoholic Hepatitis Are Functionally Activated and Induce IL-17 Expression in CD4+ T Cells. J Immunol (2019) 203 (12): 3190-3198, in which they show an increased frequency of CD14+ CD16+ intermediate monocytes in AH patients that are functionally distinct.

      I suggest that if the authors would like to study the specific effects of AH on 3D genome architecture then they should carefully FACsort the equivalent monocyte populations from the healthy and AH patients.

      (2) The analysis of the HiC data is quite preliminary. In the 3D genome field, it is usual to report the different scales of genome architecture, for example, compartments, topologically associated domains (TADs) and loops. I think that reporting this information and how it changes in AH patients in the appropriate cell types would be of great interest to the field.

      Comments on revisions:

      In the revision the authors did not respond to my concerns which I believe still remain valid and compromise the author's conclusions of AH-specific effects on genome architecture.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate the relationship between 3D chromatin architecture and innate immune gene regulation in monocytes from patients with alcohol-associated hepatitis (AH). Using Hi-C technology, they attempt to identify structural changes in the genome that correlate with altered gene expression. Their central claim is that genome restructuring contributes to the hyper-inflammatory phenotype associated with AH.

      Strengths:

      (1) The manuscript employs Hi-C technology, which, in principle, is a powerful approach for studying genome organization.

      (2) The focus on disease-relevant genes, particularly innate immune loci, provides a contextually important angle for understanding AH.

      Weaknesses:

      (1) Sample Size: The study relies on an exceptionally small cohort (4 AH patients and 4 healthy controls), rendering the results statistically underpowered and highly susceptible to variability.

      (2) Hi-C Resolution unpaired to RNA seq: The data are presented at a resolution of 100kb, which is insufficient to uncover meaningful chromatin interactions at the level of individual genes. This data is unpaired.

      (3) Functional Validation: The manuscript lacks experiments to directly link changes in chromatin architecture with gene expression or monocyte function, leaving the claims speculative.

      (4) Data Integration: The lack of Hi-C with ATAC and RNA-seq data handicaps the analysis and really makes it superficial. In short, it does not convincingly demonstrate a functional relationship.

      (5) Confounding Factors: The manuscript neglects critical confounding variables such as comorbidities, medications, and lifestyle factors, which could influence chromatin structure and gene expression independently of AH.

      Appraisal of the Aims and Results:

      The manuscript sets out to establish a connection between chromatin architecture and AH pathology. However, the study fails to achieve its stated aims due to inadequate methods and insufficient data. The conclusions drawn from the Hi-C analyses alone are poorly supported, and the lack of functional validation undermines the credibility of the proposed mechanisms. Overall, the results do not provide compelling evidence to substantiate the authors' claims.

      Impact on the Field and Utility to the Community:

      The work, in its current form, is unlikely to have a meaningful impact on the field. The limited scope, methodological shortcomings, and lack of robust data significantly diminish its potential utility. Without addressing these critical gaps, the study does not offer new insights into the role of genome architecture in AH or provide useful methodologies or datasets for the community.

      Additional Context:

      The manuscript would benefit from a more comprehensive analysis of potential mechanisms underlying the observed changes, including the interplay between chromatin architecture and epigenetic modifications. Furthermore, longitudinal studies or therapeutic interventions could provide insights into the dynamic aspects of genome restructuring in AH. These considerations are entirely absent from the current study.

      Conclusion:

      The manuscript does not achieve its stated goals and does not present sufficient evidence to support its conclusions. The limitations in sample size, resolution, and experimental rigor severely hinder its contribution to the field. Addressing these fundamental flaws will be essential for the work to be considered a meaningful addition to the literature.

      Reviewer #2 (Public review):

      Summary:

      Dr. Adam Kim and collaborators study the changes in chromatin structure in monocytes obtained from alcohol-associated hepatitis (AH) when compared to healthy controls (HC). Through the usage of high throughput chromatin conformation capture technology (Hi-C), they collected data on contact frequencies between both contiguous and distal DNA windows (100 kB each); mainly within the same chromosome. From the analyses of those data in the two cohorts under analysis, authors describe frequent pairs of regions subject to significant changes in contact frequency across cohorts. Their accumulation onto specific regions of the genome -referred to as hotspots- motivated authors to narrow down their analyses to these disease-associated regions, in many of which, authors claim, a number of key innate immune genes can be found. Ultimately, the authors try to draw a link between the changes observed in chromatin architecture in some of these hotspots and the differential co-expression of the genes lying within those regions, as ascertained in previous single-cell transcriptomic analyses.

      Strengths:

      The main strength of this paper lies in the generation of Hi-C data from patients, a valuable asset that, as the authors emphasize, offers critical insights into the role of chromatin architecture dysregulation in the pathogenesis of alcohol-associated hepatitis (AH). If confirmed, the reported findings have the potential to highlight an important, yet overlooked, aspect of cellular dysregulation-chromatin conformation changes - not only in AH but potentially in other immune-related conditions with a component of pathological inflammation.

      Weaknesses:

      In what I regard as the two most important weaknesses of the work, I feel that they are more methodological than conceptual. The first of these issues concerns the perhaps insufficient level of description provided on the definition of some key types of genomic regions, such as topologically associated domains, DNA hotspots, or even DNA loci showing significant changes in contact frequency between AH and HC. In spite of the importance of these concepts in the paper, no operational, explicit description of how are they defined, from a statistical point of view, is provided in the current version of the manuscript.

      Without these definitions, some of the claims that authors make in their work become hard to sustain. Some examples are the claim that randomizing samples does not lead to significant differences between cohorts; the claim that most of the changes in contact frequency happen locally; or the claim that most changes do not alter the structure of TADs, but appear either within, or between TADs. In my viewpoint, specific descriptions and implementation of proper tests to check these hypotheses and back up the mentioned specific claims, along with the inclusion of explicit results on these matters, would contribute very significantly to strengthening the overall message of the paper.

      The second notable weakness of the study pertains to the characterization of the changes observed around immune genes in relation to genome-wide expectations. Although the authors suggest that certain hotspots contain a high number of immune-related genes, no enrichment analysis is provided to verify whether these regions indeed harbor a higher concentration of such genes compared to other genomic areas. It would be important for readers to be promptly informed if no such enrichment is observed, for in that case, the presence of some immune genes within these hotspots would carry more limited implications.

      Additionally, the criteria used to define a hotspot are not clearly outlined, making it difficult to assess whether the changes in contact frequencies around the immune genes highlighted in figures 5-8 are truly more pronounced than what would be expected genome-wide.

      Reviewer #3 (Public review):

      In this manuscript, the authors use HiC to study the 3D genome of CD14+ CD16+ monocytes from the blood of healthy and those from patients with Alcohol-associated Hepatitis.

      Overall, the authors perform a cursory analysis of the HiC data and conclude that there are a large number of changes in 3D genome architecture between healthy and AH patient monocytes. They highlight some specific examples that are linked to changes in gene expression. The analysis is of such a preliminary nature that I would usually expect to see the data from all figures in just one or two figures.

      In addition, I have a number of concerns regarding the experimental design and the depth of the analyses performed that I think must be addressed.

      (1) There is a myriad of literature that describes the existence of cell type-specific 3D genome architecture. In this manuscript, there is an assumption by the authors that the CD14+ CD16+ monocytes represent the same population from both healthy and diseased patients. Therefore, the authors conclude that the differences they see in the HiC data are due to disease-related changes in the equivalent cell types. However, I am concerned that the AH patient monocytes may have differentiated due to their environment so that they are in fact akin to a different cell type and the 3D genome changes they describe reflect this. This is supported by published articles for example: Dhanda et al., Intermediate Monocytes in Acute Alcoholic Hepatitis Are Functionally Activated and Induce IL-17 Expression in CD4+ T Cells. J Immunol (2019) 203 (12): 3190-3198, in which they show an increased frequency of CD14+ CD16+ intermediate monocytes in AH patients that are functionally distinct.

      I suggest that if the authors would like to study the specific effects of AH on 3D genome architecture then they should carefully FACsort the equivalent monocyte populations from the healthy and AH patients.

      (2) The analysis of the HiC data is quite preliminary. In the 3D genome field, it is usual to report the different scales of genome architecture, for example, compartments, topologically associated domains (TADs), and loops. I think that reporting this information and how it changes in AH patients in the appropriate cell types would be of great interest to the field.

      We thank the reviewers for their careful and thorough examination of our manuscript. We agree with all of their comments regarding the limitations of the study. Many of the criticisms focus on the small sample size of our study (n=4 for healthy controls and disease patients) in both Hi-C and single-cell RNA-seq experiments, and that these experiments are unpaired, or in other words, PBMCs came from different patients for each experiment.

      Unfortunately, these experiments are fairly complicated to perform, requiring patient cells and very expensive deep sequencing. We are not currently in a position to be able to easily or cost effectively increase sample size. In the case of Hi-C, we still believe our study to be of value as Hi-C is not a commonly used technique to study disease effects on chromatin, and very few studies have employed a large enough sample size to perform statistical comparisons. Additionally, to analyze the data at a higher resolution would require deeper sequencing, and unfortunately we do not have the resources to sequence these libraries deeper. Regarding the single-cell RNA-seq data, this dataset was generated for an earlier study [1] focusing on gene expression responses to LPS, and we were unable to get PBMCs from exactly the same patients to perform the Hi-C study.

      We disagree that our study has limited scientific value. Our study is the first to use Hi-C to show that the 3D genome architecture of primary monocytes is changed in a disease context. The only other study to follow a similar approach performed Hi-C in monocytes from 2 healthy and 2 Systemic lupus erythematosus (SLE) patients, and in their study the data from both patients were combined prior to comparison. No statistics were performed and their conclusion was no differences in genome architecture due to disease. They did find differences between primary monocytes and the THP1 monocytic cell line, but this lacked statistical analysis. Their conclusion was that inflammatory disease may not lead to genome wide changes in architecture. Our study, though a very different disease than SLE, shows statistically significant differences between AH and healthy controls. We believe our study lays the groundwork for how Hi-C can be used to study genome architecture in human disease, and the possible downstream effects.

      Confounding Factors: The manuscript neglects critical confounding variables such as comorbidities, medications, and lifestyle factors, which could influence chromatin structure and gene expression independently of AH.

      This is an interesting suggestion. This dataset only contains 4 AH patients, which we have included basic clinical data in Supplemental Table 1, including Age, HCA1c, Bilirubin, AST, ALT, Creatinine, Albumin, and MELD score. 3/4 of these patients are severe AH while 1 is moderate (AH2). Despite one patient being moderate, all four AH patients had similar correlations with each other, suggesting these disease specific differences we observed are not indicative of severity. More patient samples are needed to determine if genome architecture changes throughout disease progression. We have added this important discussion to the manuscript (page 12, lines 5-14).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The criteria used to determine which pairs of regions exhibit significant differences in contact frequency between alcohol-associated hepatitis (AH) and healthy controls (HC) are not disclosed. It would be beneficial for the authors to provide this information, including details such as the number of pairs tested, the nature of the statistical tests conducted, the method of multiple testing correction applied, as well as the significance thresholds used, and the number of loci-pairs below these thresholds for each chromosome. This information would greatly enhance the reader's understanding of the relevance of the reported findings.

      Thank you for this comment, though we are not sure we totally understand. All of our statistics were performed using multiHiCcompare [2], where we input all 8 datasets (.hic files from Juicer), then measured statistical differences between defined groups (HC vs AH). For our randomization studies, we randomized the group comparisons, so each group contained a mix of HC and AH.

      Second, a formal statistical definition of what constitutes a hotspot would be valuable for clarity.

      Thank you for this suggestion. Initially, hotspots were defined as just regions of the genome with a high frequency of very significant differential contacts. We have defined a more formal definition of “hotspot” based on similar criteria. A hotspot is defined by both adjusted p value and frequency of locations. First, we filtered all pair-wise chromosomal interactions by a very, very stringent padj < 0.0000001 to focus on only the most changed coordinates (Supplemental Table 4). Then we looked for regions of the genome with a high frequency of these differential locations. Borders for each hotspot were determined more liberally by looking at the full list of differential spots (padj < 0.05). Then we used code to list genes within each interacting region. We have added these important details to the Methods (page 14, lines 11-14).

      Third, a clear definition of the criteria used to identify different topologically associated domains (if these were indeed defined in the data and/or utilized in the analyses) would also be a helpful addition.

      Thank you for this suggestion, we did not identify TADs or really utilize TADs in any of these analyses.

      Likewise, several statements throughout the paper lack support from specific analyses, although it should be feasible to implement such analyses (or at least present them if they have already been conducted) to substantiate these claims:

      If randomizing samples does not result in significant differences between (randomized) cohorts, it would be beneficial to provide insights into the number of loci pairs that exhibit differences in frequency when using both the actual and randomized cohorts.

      Thank you for asking this question, as this is an important point. Using multiHiCcompare, if we compare WT (n=4) to AH (n=4), we get the results in the figures and supplementary data but if we randomize Group 1 (WT, WT, AH, AH) vs Group 2 (WT, WT, AH, AH), we get almost 0 significant changes in contact frequency. To show this more robustly, we performed 5 randomized comparisons and found far fewer changes in contact frequency between groups. This shows that these changes in contact frequency caused by disease are not random, but rather due to our real difference in AH. This point has been added to the Results (page 6, lines 15-17), and Methods (page 14, lines 16-21)

      If most changes in contact frequency occur locally, it would be useful to visualize the relationship between effect sizes and/or significance levels for the observed differences in frequency in relation to the distance between the involved loci. Additionally, comparing these results to the average baseline contact intensities as a function of distance would be informative. This comparison could help determine whether the distance decay in effect size/significance for the differences between AH and HC is faster or slower than the decay rates for baseline contact frequencies.

      This is a good suggestion. In our initial analysis, we made a number of figures relating chromosome positions, distance between loci, and statistics regarding the differential contact frequency. In the initial submission, we only showed Figure 3, which shows the logFC (log fold change) for the differential contact frequency by chromosomal position on both sides. To address this question, we have added a supplemental figure showing logFC as a function of the distance between two loci (new Supplemental Figure 3)

      Similarly, the assertion that most changes do not affect the structure of topologically associated domains (TADs) but occur either within or between TADs should be supported by specific testing; otherwise, or else, removed.

      Thank you, yes we have adjusted the language in the Discussion

      Furthermore, the authors should clarify whether differences in chromatin conformation are more pronounced around immune genes compared to genome-wide expectations. If this is not the case, it would be helpful to quantify the intensity of these differences around the highlighted genes in relation to the rest of the genome. To achieve this, I would suggest the following:

      Conduct enrichment analyses on the genes located within the most prominent hotspots to determine whether they are significantly enriched in immune genes (and, or, alternatively, in any other functional category).

      Estimate the average absolute fold change in contact frequency within all topologically associated domains (TADs) identified in the study. This would allow for the identification of immune gene-containing TADs highlighted in Figures 5-8, providing readers with a quantitative understanding of how anomalously different these genomic regions are with regards to the magnitude of its alterations in AH, compared to the rest of the genome.

      While some of the selected gene clusters appear to co-localize well with topologically associated domains (e.g., Figures 5A, 8A), others seemingly encompass either multiple TADs (Figure 6) or only portions of them (Figure 7). This should be clarified.

      Thank you, this is a great suggestion. In order to be as unbiased as possible, we took all genes present in the regions with the highest significant changes in genome (Supplemental Table 4) that we used to identify the hotspots. And you are correct, we do in fact see enrichment of genes involved in innate immune signaling. This has been added to Results (page 7, lines 19-25) and Figure 4.

      Finally, there are several minor issues concerning the figures that could be easily addressed to substantially enhance their readability:

      Font sizes in most figures should be increased, particularly for some axis labels and tick marks. This issue affects most figures; for instance, in Figure 4, it hinders the reader's ability to interpret the ranges of the data presented.

      Thank you, the figures have been adjusted

      Figures 5 to 8 (panels A and B) would benefit significantly from a more consistent format. Specifically, the gene cluster boxes should also be included in the right panels, and the gene locations should be displayed on the left in a uniform format across all figures (e.g., formatting Figures 7 and 8 to match the style of Figures 5 and 6).

      Figures 5 and 6 have a similar structure to each other because we were focusing on all of the genes in that chromosomal region. Figures 7 and 8 are different because we are focusing on how the region around a certain hotspot of interest changes.

      It is also important to note that the genes plotted in Figures 8C and 8D are not the same. Concerning these two panels, it would be valuable to clarify whether the data presented pertains exclusively to monocytes. If so, information regarding the number of cells analyzed and the number of donors from which they were drawn would also be beneficial.

      These figures are generated using scRNA-seq data. They represent all of the genes expressed in that region of the genome, in their chromosomal position. If a gene is not expressed in the scRNA-seq data, then it is not shown. I have debated with myself a lot on how to show gene expression in a region of the genome, but I think this is the clearest way to show this; including the genes that have no expression would make it more confusing. But yes, if you compare HC and AH, you see some differences in the list of genes. We have added more clarity to the figure legend for this figure.

      References

      (1) Kim, A., Bellar, A., McMullen, M. R., Li, X. & Nagy, L. E. Functionally Diverse Inflammatory Responses in Peripheral and Liver Monocytes in Alcohol-Associated Hepatitis. Hepatol Commun 4, 1459-1476 (2020). https://doi.org:10.1002/hep4.1563

      (2) Stansfield, J. C., Cresswell, K. G. & Dozmorov, M. G. multiHiCcompare: joint normalization and comparative analysis of complex Hi-C experiments. Bioinformatics 35, 2916-2923 (2019). https://doi.org:10.1093/bioinformatics/btz048

    1. eLife Assessment

      This manuscript presents a valuable antiviral approach using an engineered ACE2-Fc fusion protein that demonstrates broad-spectrum neutralization capacity against SARS-CoV-2 variants and achieves significant prophylactic protection in animal models through a novel Fc-mediated phagocytosis mechanism. The study provides convincing evidence for protective efficacy through rigorous in vivo validation in mice, mechanistic characterization via transcriptomic analysis and biodistribution studies, and demonstration of antibody-dependent cellular phagocytosis as the primary clearance mechanism mediated by the decoy. The work will be of interest to researchers working in vaccine development and associated immune responses.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Wang et al. describes the development of an optimized soluble ACE2-Fc fusion protein, B5-D3, for intranasal prophylaxis against SARS-CoV-2. As shown, B5-D3 conferred protection not only by acting as a neutralizing decoy, but also by redirecting virus-decoy complexes to phagocytic cells for lysosomal degradation. The authors showed complete in vivo protection in K18-hACE2 mice and investigated the underlying mechanism by a combination of Fc-mutant controls, transcriptomics, biodistribution studies, and in vitro assays.

      Strengths:

      The major strength of this work is the identification of a novel antiviral approach with broad-spectrum and beyond simple neutralization. Mutant ACE2 enables broad and potent binding activity with the S proteins of SARS-CoV-2 variants, while the fused Fc part mediates phagocytosis to clear the viral particles. The conceptual advance of this ACE2-Fc combination is convincingly validated by in vivo protection data and by the completely abrogated protection of Fc LALA mutant.

      Additionally:

      The authors include a discussion (in Discussion part) about a previously reported ACE2 decamer (DOI: 10.1080/22221751.2023.2275598) and compared with the ACE2-Fc fusion protein developed in this study. The authors also tested the off-target activity and showed no evidence of toxicity in vivo.

    3. Reviewer #2 (Public review):

      Summary:

      Wang et al. engineered an ACE2 mutant by introducing two mutations (T92Q and H374N), and fused this ACE2 mutant to human IgG1-Fc (B5-D3). Experimental results suggest that B5-D3 exhibits broad-spectrum neutralization capacity and confers effective protection upon intranasal administration in SARS-CoV-2-infected K18-hACE2 mice. Transcriptomic analysis suggests that B5-D3 induces early immune activation in lung tissues of infected mice. Fluorescence-based bio-distribution assay further indicates rapid accumulation of B5-D3 in the respiratory tract, particularly in airway macrophages. Further investigation shows that B5-D3 promotes viral phagocytic clearance by macrophages via an Fc-mediated effector function, namely antibody-dependent cellular phagocytosis (ADCP), while simultaneously blocking ACE2-mediated viral infection in epithelial cells. These results provide some insights into improving decoy treatments against SARS-CoV-2 and other potential respiratory viruses.

      Strengths:

      The protective effect of this ACE2-Fc fusion protein against SARS-CoV-2 infection has been evaluated in a reasonable way.

      Weaknesses:

      (1) Some of the mice experiments suffer from insufficient sample numbers, which affect the statistical power and reliability of the results. The author acknowledged this weakness, noting that the supply of aged mice was limited, while arguing that, although the sample size is small, the data from these mice are consistent.

      (2) Compared to 6 hours, intranasal administration of B5-D3 at 24 hours before viral infection results in reduced protective efficacy. However, only survival and body weight data are provided, with no supporting evidence from virological assays such as viral titer measurement. The author acknowledged that such data would be more comprehensive and attributed the limitation to constraints in animal services.

      (3) The efficacy of the B5-D3-LALA group was not as good as that of the B5-D3 group. The author suggested that there might be a certain degree of viral variation, and viral infection in the lungs may be uneven in the B5-D3-LALA group.

    4. Reviewer #3 (Public review):

      Strengths:

      The core strength of this study lies in its innovative demonstration that an engineered sACE2-Fc fusion redirects virus-decoy complexes to Fc-mediated phagocytosis and lysosomal clearance in macrophages, revealing a distinct antiviral mechanism beyond traditional neutralization. Its complete prophylactic protection in animal models and precise targeting of airway phagocytes establish a novel therapeutic paradigm against SARS-CoV-2 variants and future respiratory viruses.

      Weaknesses:

      The study attributes the complete antiviral protection to Fc-mediated phagocytic clearance, a central claim that requires more rigorous experimental validation. The observation that abrogating Fc functions compromises protection could be confounded by potential alterations in the protein's stability, half-life, or overall structure. To firmly establish this mechanism, it is crucial to include a control molecule with a mutated Fc region that lacks FcγR binding while preserving the Fc structure itself. Without this critical control, the conclusion that phagocytic clearance is the primary mechanism remains inadequately supported. The strategy of deliberately targeting virus-decoy complexes to phagocytes via Fc receptors inherently raises the question of Antibody-Dependent Enhancement (ADE) of disease. While the authors demonstrate a lack of productive infection in macrophages, this only addresses one facet of ADE. The risk of Fc-mediated exacerbation of inflammation (ADE) remains a critical concern. The manuscript would be significantly strengthened by a direct discussion of this risk and by including data, such as cytokine profiling from treated macrophages, to more comprehensively address the safety profile of this approach. The exclusive use of the K18-hACE2 mouse model, which exhibits severe disease, limits the generalizability of the findings. The "complete protection" observed may not translate to models with more robust and naturalistic immune responses or to human physiology. Furthermore, the lack of data against circulating SARS-CoV-2 variants of concern. The concept of sACE2-Fc fusion proteins as decoy receptors is not novel, and numerous similar constructs have been previously reported. The manuscript would benefit from a clearer demonstration of how the optimized B5-D3 mutant represents a significant advance over existing sACE2-Fc designs. A direct comparative analysis with previously published benchmarks, particularly in terms of neutralizing potency, Fc effector function strength, and in vivo efficacy, is necessary to establish the incremental value and novelty of this specific agent.

      Comments on revised version:

      The author has successfully addressed the raised issue.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Wang et al. describes the development of an optimized soluble ACE2-Fc fusion protein, B5-D3, for intranasal prophylaxis against SARS-CoV-2. As shown, B5-D3 conferred protection not only by acting as a neutralizing decoy, but also by redirecting virus-decoy complexes to phagocytic cells for lysosomal degradation. The authors showed complete in vivo protection in K18-hACE2 mice and investigated the underlying mechanism by a combination of Fc-mutant controls, transcriptomics, biodistribution studies, and in vitro assays.

      Strengths:

      The major strength of this work is the identification of a novel antiviral approach with broad-spectrum and beyond simple neutralization. Mutant ACE2 enables broad and potent binding activity with the S proteins of SARS-CoV-2 variants, while the fused Fc part mediates phagocytosis to clear the viral particles. The conceptual advance of this ACE2-Fc combination is convincingly validated by in vivo protection data and by the completely abrogated protection of Fc LALA mutant.

      We thank the reviewer for his recognition and positive comments on our study.

      Weaknesses:

      Some aspects could be further modified.

      (1) A previously reported ACE2 decamer (DOI: 10.1080/22221751.2023.2275598) needs to be mentioned and compared in the Discussion part.

      We thank the reviewer for pointing out this weakness.

      Indeed, previous studies reported that the ACE2-IgM decamer, taking advantage of the decameric structure of IgM, exhibited higher avidity to spikes and greater potency for viral neutralization [1-3]. In particular, the study by Guo et al. has demonstrated a broad-spectrum neutralization ability of the ACE2-IgM decamer against multiple SARS-CoV-2 variants and reported the efficacy of intranasal prophylaxis in preventing lethal SARS-CoV-2 challenge in K18-hACE2 mice.

      We agree with the reviewer that it is promising that our B5-D3 design would benefit from switching to the IgM isotype. However, the distinct biological features imposed by IgM Fc, including short serum half-life and restricted tissue penetration [4], may complicate the study design and diverge our focus.

      In our current study, we would focus on the IgG1 Fc-based decoy design, while inactivating the enzyme activity of ACE2 to avoid disturbing the renin angiotensin system. This design allowed us to compare diverse administration routes and regimens and to gain useful insights into the potential of sACE2-Fc decoy in combating SARS-CoV-2 in vivo.

      We appreciated the reviewer‘s insightful suggestion. In the revised manuscript, we have included additional discussion regarding ACE2-IgM decamer, addressing the relevant concern on page 17 lines 409–414.

      (2) Limitations of this study, such as off-target binding and potential immunogenicity, should also be discussed.

      We thank the reviewer for his insightful comments and agree that off-target activity is a major concern for designing the ACE2 decoy.

      (1) In our study, the representative sACE2-Fc decoy candidate B5-D3 contains H374N mutation (D3) that is designed to inactivate ACE2 enzyme activity by causing dyscoordination of Zn2+. Our in vitro enzymatic activity assay has demonstrated that the H374N mutation (D3), as well as other three single mutations D1, D4 and D5, in either WT sACE2-Fc or B5 mutant, could effectively abolish the hACE2 enzyme activity (Supplementary Fig. 2e, h).

      (2) To further address the concern on off-target activity, we performed AAV-based overexpression experiments in K18-hACE2 mice and examined serum levels of RAS hormones, using ELISA methods that specifically detect serum renin, Angiotensin II (Ang II), and Ang (1-7). While our data from WT sACE2-Fc overexpression revealed significantly elevated serum renin and Ang II, indicating a disruption of the RAS (Supplementary Fig. 4d, e); the results from examined double mutants, including B5-D3, showed negligible change in any of these metabolite levels, demonstrating no off-target effect and minimal disturbance to the RAS activity in K18-hACE2 mice (Supplementary Fig. 4d–f).

      (3) Moreover, in this experiment, after the prolonged overexpression of all these molecules in K18hACE2 mice, histological examination of multiple organs showed no evidence of immune cell infiltration and tissue damage and no difference was observed between the mice receiving WT sACE2-Fc or B5-D3(Supplementary Fig. 4g).

      In the revised manuscript, we have included the results from the AAV-delivered in vivo overexpression of WT sACE2-Fc and three most promising double mutants (B5-D3, B5-D4 and B5-D5) on page 5 lines 118–122 and on page 6 lines 123–135 in the main text. The relevant data were presented in the new Supplementary Fig. 4.

      Reviewer #2 (Public review):

      Summary:

      Wang et al. engineered an optimized ACE2 mutant by introducing two mutations (T92Q and H374N) and fused this ACE2 mutant to human IgG1-Fc (B5-D3). Experimental results suggest that B5-D3 exhibits broad-spectrum neutralization capacity and confers effective protection upon intranasal administration in SARS-CoV-2-infected K18-hACE2 mice. Transcriptomic analysis suggests that B5D3 induces early immune activation in lung tissues of infected mice. Fluorescence-based biodistribution assay further indicates rapid accumulation of B5-D3 in the respiratory tract, particularly in airway macrophages. Further investigation shows that B5-D3 promotes viral phagocytic clearance by macrophages via an Fc-mediated effector function, namely antibody-dependent cellular phagocytosis (ADCP), while simultaneously blocking ACE2-mediated viral infection in epithelial cells. These results provide insights into improving decoy treatments against SARS-CoV-2 and other potential respiratory viruses.

      Strengths:

      The protective effect of this ACE2-Fc fusion protein against SARS-CoV-2 infection has been evaluated in a quite comprehensive way.

      We thank the reviewer for his recognition and positive comments on our study.

      Weaknesses:

      (1) The paper lacks an explanation regarding the reason for the combination of mutations listed in Supplementary Figure 2b. For example, for the mutations that enhance spike protein binding, B2-B6 does not fully align with the mutations listed in Table S1 of Reference 4, yet no specific criteria are provided.

      We thank the reviewer for pointing out this negligence.

      We constructed the B2-B6 mutants based on the study by Chan et al. [5] (Reference 4 in the previous version), mainly referencing to their Fig. 1A rather than to their Table S1. In Chan’s study, each of the proposed mutations were discovered as single mutations in monomeric sACE2 molecules based on the enrichment in target cell-binding. T92 was a notable hot spot for enriched mutations in their Fig. 1A.

      Since monomeric and dimeric forms of sACE2 showed dramatically different kinetics for ACE2-RBD interaction, we selected five proposed mutations and further examined their affinity and activity in dimeric sACE2-Fc in our study. We chose not only the combinations of mutations, such as B3, B4, and B6 proposed in their Table S1, but also explored less-complicated mutation(s) like B2 (T27Y/L79T) and B5 (T92Q) in their Fig. 1A, which were in silico predicted to enhance ACE2-RBD binding but not tested in sACE2-Fc in Chan’s study.

      Interestingly, although our results confirmed enhanced viral neutralization by all these mutations, the activity increase compared to WT ACE2-Fc was rather limited. Hence, we chose not to explore other mutations but to focus on B2–B6 to construct an enhanced ACE2-Fc decoy as a representative, to investigate the potential of ACE2-Fc decoys in combating SARS-CoV-2 infections.

      In the revised manuscript, we have further amended the writing on page 4 lines 84–87 to enhance the readability. Whereas for conciseness of the manuscript, we did not describe in too much detail how we selected the mutations to be tested.

      Second, for the mutations that abolished enzymatic activity, while D1 and D2, D3, D4, and D5 are cited from References 12, 11, and 33, respectively, the reason for combining D3 and D4 into A2, and D1 and D2 into A3 remains unexplained. It is also unclear whether some of these other possible combinations have been tested. Furthermore, for the B5-derived mutations, only double-mutant combinations with D1-D5 are tested, with no attempt made to evaluate triple mutations involving A2 or A3.

      We thank the reviewer for pointing out this negligence.

      A2 and A3 mutations were originally proposed as double mutations [6,7]. A2 (H374N/H378N) was first reported by Guy et al. [6] (Reference 11 in the previous version), while A3 (R273G/T445G) was originally proposed in Payandeh et al.’s study [7] (Reference 33 in the previous version).

      In this study, we further split the two mutations in A2 and A3, to generate the single enzymedeactivating mutations, D1 and D2 from A3, and D3 and D4 from A2. Among these single mutations, D2 failed to inactivate ACE2 enzymatic activity (Supplementary Fig. 2e), and it was excluded in subsequent analyses.

      D5 (H345L) was a single mutation directly adopted from the report by Glasgow et al. [8] (Reference 12 in the previous version).

      After combining the B5 with the enzyme-deactivating mutations (A2, A3, D1, D3, D4, D5), our neuralization assay results showed that, the simpler compound mutants with only two mutations, like B5-D1, B5-D3, B5-D4 and B5-D5, exhibited stronger neutralization capacity than B5-A2 and B5-A3 with triple mutations. Moreover, since fewer mutations were more favorable to reduce risks in causing protein structure alteration and evoking host immunity, we then focused on the sACE2-Fc double mutants B5-D3, B5-D4 and B5-D5 in the subsequent neutralization and overexpression assays (Supplementary Fig. 3 and 4), and examined B5-D3 as a representative candidate in the in vivo infection tests and follow-up analysis (Figure 2–6, and Supplementary Figures 5–18).

      We agree that the lack of explanation for splitting A2 and A3 into D1 to D4 single mutations made the rationale unclear. In the revised manuscript, we have included our previous test results on B5-A2 and B5-A3, cited Lei et al.’s study using A2 in ACE2 decoy [9], and explained the rationale for splitting A2 and A3 into D1 to D4 mutations. Relevant revision was made on page 4 lines 94–97 in the main text, while the design and data for B5-A2 and B5-A3 were included in the revised Figure 1b and Supplementary Figure 2b, f–h.

      (2) Figures 1b, 1d, and 1e lack statistical analyses, making it difficult to determine whether B5 and D3 exhibit significant advantages. For Wuhan-Hu-1 strain, B2 and B5 are similar, and for D614G strain, B2, B3, B4, B5, and B6 display comparable results. However, only the glycosylation-related single mutant B5 is chosen for further combinatorial constructs. Moreover, for VOC/VOI strains, B5 is superior to B5-D3; for the Alpha strain, B5-D4 and B5-D5 are superior to B5-D3; and for the Delta and Lambda strains, B5-D5 is superior to B5-D3. These observations further highlight the need for a clearer explanation of the selection strategy.

      We agree with the reviewer’s insightful observations.

      Indeed, although our results confirmed enhanced viral neutralization by these reported mutations, the activity increases compared to WT ACE2-Fc were generally limited. Importantly, these observations were largely consistent with other reports (including the study by Chan et al. [5]), suggesting limited potential of mutagenesis in enhancing the ACE2-RBD/Spike interaction. Therefore, we chose to selectively examine B2-B6 to construct an enhanced ACE2-Fc decoy with reasonable performance, as a representative candidate to study the application potential of ACE2-Fc decoy.

      The IC<sub>50</sub> values in Figures 1b, 1d, and 1e were calculated from neutralization curves, measuring infection reduction at multiple concentrations in duplicates, which therefore were presented with statistical support. Based on the multiple neutralization assays, B5-D3 consistently showed a high performance among other top-performers (Figure 1, Supplementary Fig. 2f,g, and Supplementary Fig. 3).

      We agree that B2 and B5 performed comparably well in neutralization assays, but B2 contains two mutations (T27Y/T92Q) while B5 carries a single mutation (T92Q). Hence, we decided to focus on B5 due to its lowest mutational burden and least potential risk.

      We agree that for VOC/VOI strains, B5 was superior to B5-D3 in pseudovirus-neutralization assays. However, B3-D3 was enzymatically inactive, which is essential for generating safe ACE2 decoy and, therefore, justifies our usage of B5-D3 over B5.

      We agree with the reviewer that, altogether, the B5-D3 did not show significant advantages than other top performers like B5-D4 and B5-D5. Here, B5-D3 was selected as a representative, which performed equally well rather than being the most outstanding candidate, for subsequent examination of efficacy, safety, and mechanistic insights.

      We thank the reviewer for his valuable feedback. In the revised manuscript, we have further amended our description of B5-D3, as a “representative” candidate, to improve the readability. Relevant changes can be found on page 4 line 84, page 5 line 109, page 14 line 333 and page 15 line 360.

      (3) Figure 1e does not specify the construct form of the control hIgG1, namely whether it is an hIgG1 Fc fragment or a full-length hIgG1 protein. If the full-length form is used, the design of its Fab region should be clarified to ensure the accuracy and comparability of the experimental control.

      We thank the reviewer for pointing out this negligence.

      In this study, we used the in vivo grade recombinant human IgG1 isotype control antibody in its full length (Syd labs, #PA007125) as the negative control. It is the 4F17 clone, which is widely used and showed low or no specific binding to any human samples [10] (Human IgG1 Isotype Control Antibody | Recombinant, in vivo Grade - Syd Labs). We have added the relevant information in the MATERIALS AND METHODS on page 23 lines 548–549.

      (4) In Figure 2a, all three PBS control mice died, whereas in Figure 2f, three out of five PBS control mice died, with the remaining showing gradual weight recovery. This discrepancy may reflect individual immune variations within the control groups, and it is necessary to clarify whether potential autoimmune factors could have affected the comparability of the results. Also, the mouse experiments suffer from insufficient sample sizes, which affects the statistical power and reliability of the results. In Figure 2a, each group contains only 4 replicates, one of which was used for lung tissue sampling. As a result, body weight monitoring data is derived from only 3 mice per group (the figure legend indicating n=4 should be corrected to n=3). Such a small sample size limits the robustness of the conclusions. Similarly, in Figure 2f, although each group has 5 replicates, body weight data are presented for only 4 mice, with no explanation provided for the exclusion of the fifth mouse. Furthermore, the lung tissue experiments in Figure 3a include only 3 replicates, which is also inadequate.

      We thank the reviewer for his valuable feedback.

      Figure 2a was the first in vivo infection experiment of this study, and we performed the test in aged female K18-hACE2 mice at 10–12 months old. Whereas for the subsequent experiments in Figure 2f and Figure 3, we changed to young female K18-hACE2 mice at 2–3 months old, because the limited supply of old mice. While in Figure 2a, four aged mice (not three) in the PBS control group all died within 7 dpi, results of Figure 2f and Figure 3 consistently showed heterogeneous responses among young mice in the PBS control groups. Since increased susceptibility to SARS-CoV-2 infection has been broadly observed among aged human populations and it was also supported by mouse study [11], here we would attribute the observed discrepancy to the age difference between the two cohorts in Figure 2a and 2f. In the revised manuscript, we have further elucidated this observation in results (on page 7 lines 163–167) and included a new reference for better clarification (page 7 line 167).

      Furthermore, because the PBS control mice in both Figure 2a and 2f died within 7 dpi, which was too soon for autoimmune factors to take place. Moreover, we have performed AAV-based prolonged overexpression experiments in K18-hACE2 mice (new Supplementary Fig. 4), which showed no tissue damage in either WT sACE2-Fc or B5-D3 treated mice, suggesting low immunogenicity. Collectively, the autoimmune factors are unlikely the reason leading to the different survival between PBS controls in Figure 2a and 2f.

      We thank the reviewer for pointing out the weakness regarding small sample sizes in our study.

      (1) In Figure 2a–c, the experiment was performed in an aged cohort at 10–12 months old, starting with 5 mice in each virus-inoculated group and 4 mice in the mock control group. At 4 dpi, we sacrificed one mouse from each group for tissue analysis. Therefore, in the survival analysis, there were 4 mice in each virus-inoculated group and 3 mice in the mock control group, whose survival and body weight changes were presented in Figure 2b, c.

      Despite the relatively small sample sizes in Figure 2b, c, all 4 PBS control mice died, while all 4 mice in 6-hour B5-D3 IN prophylaxis group survived, demonstrating 100% survival and no sign of body weight loss. The survival and body weight data were highly consistent, strongly supporting that B5-D3 intranasal prophylaxis could protect the mice from lethal SARS-CoV-2 infection.

      To enhance clarity, in the revised manuscript, we have added the sample size information in chart legends in Figure 2a–c.

      (2) In Figure 2f–h, the experiment was performed in a young cohort at 2–3 months old and the body weight and survival data were presented for 5 mice in each group (not for 4 mice). Notably, although 2 out of 5 young mice in the PBS control group eventually survived from the viral infection, they had suffered significant weight loss during 4–7 dpi, similarly to the died. Whereas all 5 mice in the – 6hr B5-D3 IN prophylaxis group showed no sign of weight loss. Hence, these data were highly consistent with Figure 2b, c, supporting the efficiency of B5-D3 IN prophylaxis in protection against SARS-CoV-2 infection.

      We noticed that some data points in Figure 2g, h were very close to each other, making it difficult to distinguish the data line for individual mice. To enhance clarity, in the revised manuscript, we have added sample-size information in chart legends in Figure 2g and 2h.

      (3) In Figure 3a, we aimed to examine the lung tissues at early time points. For each treatment, we have 3 mice sacrificed at a single selected time point. Hence, total 9 mice were examined in the PBS control group and B5-D3 IN group, yielding results at 1 dpi, 2 dpi and 4 dpi that consistently supported each other. Moreover, the viral titers, S, and N protein expression analysis all showed significant difference among different groups. Therefore, our experiments have enough discrepancy between different treatment groups to draw the conclusion.

      (5) Compared to 6 hours, intranasal administration of B5-D3 at 24 hours before viral infection results in reduced protective efficacy. However, only survival and body weight data are provided, with no supporting evidence from virological assays such as viral titer measurement. Therefore, the long-term effectiveness lacks sufficient experimental validation.

      In Figure 2f–h, we aimed to compare the efficacies of IN administration of B5-D3 at different timepoints, mainly focusing on the body weight change and survival data along the infection and recovery time. As indicated by early data in Figure 2d, viruses were largely cleared by 4 dpi in mice treated with B5-D3 prophylaxis. Therefore, in this test, we did not examine virus titers in the recovered animals by the end of observation at 14 dpi. Instead, we examined plasma levels of virus-neutralizing antibodies in the survivors at the endpoint, which indeed supported that the 6-hours and 24-hours IN B5-D3 prophylaxis provided effective protection against the SARS-CoV-2 infection and resulted in minimal levels of neutralizing antibodies in plasma, as shown in Figure 2i.

      Collectively, the body weight, survival, and antibody data all supported that 6-hour IN B5-D3 prophylaxis achieved the best efficacy. Hence, we performed comprehensive viral titer and profiling analysis at early time points like 1 dpi, 2 dpi, and 4 dpi, focusing only on the 6-hour IN B5-D3 prophylaxis. This works also included B5-D3-LALA control to examine viral titers, host immune responses, and underlying mechanisms (Figure 3,4).

      We agree with the reviewer that it would be more comprehensive if our experiments could include indepth analysis of the 24-hours IN B5-D3 prophylaxis group. However, due to limited capacity of animal service, we chose to focus on the best-performing group as a representative treatment to study the underlying mechanisms.

      (6) In Figures 3b and 3c, viral spike (S) and nucleocapsid (N) RNA relative expression levels are quantified by qPCR. The results show significant individual variation within the B5-D3-LALA treatment group: one mouse exhibits high S and N expression, while the other two show low expression. Viral load levels are also inconsistent: two mice have high viral loads, and one has a low viral load. Due to this variability, the available data are insufficient to robustly support the conclusion.

      We understand the reviewer’s concern on the variability within the B5-D3-LALA group. However, we have some reservations about the importance of further increasing the sample sizes in this test.

      First, since viral gene transcription and viral particle levels represented different phases in viral life, they may follow different kinetics during infection progression and lead to variability. Second, we used different parts of the lung tissues from each mouse for extracting RNA and tissue homogenates, which were then used for detection of S/N expression and viral load levels, respectively. The uneven viral infection in the lung might also contribute to the variability. Furthermore, in this test, both our qPCR and viral load analysis data consistently demonstrated that the B5-D3-LALA was less effective than B5-D3, indicating that Fc function played an important role in supporting full protection by B5-D3 against lethal SAS-CoV-2 infections. This observation is also supported by other studies [12].

      We appreciate the valuable feedback from the reviewer. In the revised manuscript, we have further clarified these observations on page 8, lines 192–194, and included alveolar thickening data on page 9, lines 202–204.

      (7) Figure 3e: "H&E staining indicated alveolar thickening in all groups," including the Mock group. Since the Mock group did not receive virus or active drug treatment, this observed change may result from local tissue reaction induced by the intranasal inoculation procedure itself, rather than specific immune activation. A control group (no manipulation) should be set to rule out potential confounding effects of the experimental procedure on tissue morphology, thereby allowing a more accurate assessment of the drug's effects.

      We thank the reviewer for his insightful comments and suggestions.

      We have further examined our H&E staining and quantified alveolar thickening in different treatment groups. Indeed, the data suggested a transient alveolar thickening in the mock group at 1 dpi, which was improved at 2 dpi. This observation supports that the intranasal procedure itself indeed caused a transient alveolar thickening, that was evident at 1 dpi but disappeared at 2 dpi.

      Notably, moderate alveolar thickening was found to be persistent in the B5-D3-treated mice till the end point at 4 dpi. Whereas the PBS groups with intensive SARS-CoV-2 infection progressively developed severe structural damage and showed much stronger alveolar thickening than B5-D3 or mock groups at 4 dpi. Consistent with the partial protection by B5-D3-LALA, histological analysis of lung samples in this group revealed severer yet heterogenous alveolar thickening. These observations suggested that -6h IN B5-D3 treatment prevented tissue damage brought by infection with minimal yet efficient immune activation.

      In the revised manuscript, we have included the quantitation results of alveolar thickening on page 9, lines 200–204 and presented the data in new Supplementary Fig. 7.

      (8) In Supplementary Figure 11b, a considerable number of alveolar macrophages (AMs) are observed in both the PBS and B5-D3 groups. This makes it difficult to determine whether the observed accumulation is specifically induced by B5-D3.

      We thank the reviewer for pointing out this issue.

      In this experiment, the cell populations examined in previous Supplementary Fig. 11b and Fig. 5h are different, though graphs appear similar.

      Supplementary Fig. 11b (new Supplementary Fig. 12b) showed the analysis among CD45+ immune cells, regardless of B5-D3-AF750 signal. The dominance of AMs among immune cell populations is a normal physiological feature of BALF cells. To make this clear, we have added new data of BALF cells from untreated mice in the revised manuscript and new Supplementary Fig. 12b.

      Fig. 5h displayed for cell type analysis among the CD45+ B5-D3-AF750+ cells —only CD45+ immune cells that took up the AF750-labeled B5-D3.

      To enhance clarity, in the revised manuscript, we have amended the labels as CD45+ B5-D3-AF750+ in Figure 5h (and similarly in revised Supplementary Fig. 13), to differentiate the data from that in CD45+ cells shown in the revised Supplementary Fig. 12b.

      (9) In the flow cytometry experiment shown in Figure 5, the PBS control group is not labeled with AF750, which necessarily results in a value of zero for "B5-D3+ cells" on the y-axis. An appropriate control (e.g., hIgG1-Fc labeled with AF750) should be included.

      We thank the reviewer for his valuable question.

      In this experiment, we intended to analyze all immune cells with positive AF750 signals, to identify the major immune cell types that took up AF750-B5-D3 as the candidate cells responsible for the observed activation of innate immunity. Hence, here we deliberately set PBS vehicle treatment without AF750 signal as the control group for gating.

      This analysis aimed to provide an overall picture of immune cell types that actively take up ACE2 decoy, likely via Fc receptor-mediated binding. Control IgG1 labeled with AF750, with an Fc region, may show similar profile and biodistribution among BALF immune cells, which, therefore, was not examined as control for gating.

      Instead, in the revised manuscript, we have added new analysis results comparing the efficiencies of B5-D3 and IgG1 in mediating pseudovirus uptake in THP-1-derived macrophages. IgG1 isotype control was examined to address ACE2-specific effect. Indeed, we observed no pseudovirus uptake based on p24 signal, in the IgG1 treated samples, indicating that the presence of B5-D3 is crucial for efficient pseudovirus uptake in macrophages due to the sACE2-spike affinity. These results have been added on page 13 lines 310–316 in the main text, and the relevant data was presented in new Supplementary Fig. 17.

      (10) The Methods section: a more detailed description of the experimental procedures involving HIV p24 and SARS-CoV-2 should be included.

      We thank the reviewer for pointing out this weakness.

      In the revised manuscript, we have provided further details of the relevant experimental procedures in the Materials and Methods part, on page 21, lines 507–517.

      Reviewer #3 (Public review):

      Strengths:

      The core strength of this study lies in its innovative demonstration that an engineered sACE2-Fc fusion redirects virus-decoy complexes to Fc-mediated phagocytosis and lysosomal clearance in macrophages, revealing a distinct antiviral mechanism beyond traditional neutralization. Its complete prophylactic protection in animal models and precise targeting of airway phagocytes establish a novel therapeutic paradigm against SARS-CoV-2 variants and future respiratory viruses.

      We thank the reviewer for his recognition and positive comments on our study.

      Weaknesses:

      The study attributes complete antiviral protection to Fc-mediated phagocytic clearance, a central claim that requires more rigorous experimental validation. The observation that abrogating Fc functions compromises protection could be confounded by potential alterations in the protein's stability, half-life, or overall structure. To firmly establish this mechanism, it is crucial to include a control molecule with a mutated Fc region that lacks FcγR binding while preserving the Fc structure itself. Without this critical control, the conclusion that phagocytic clearance is the primary mechanism remains inadequately supported.

      We thank the reviewer for his insightful comments and suggestions.

      The L234A/L235A mutations in human IgG1 Fc region are most widely used to abolish its FcγR binding and Fc effector functions [13]. In this study, we have used B5-D3-LALA in the in vivo infection experiments in K18-hACE2 mice, as the control molecule that lacks FcγR binding while preserving the Fc structure (Figure 3, 4).

      To address the reviewer’s concern, we further performed new analysis comparing the efficiencies of different versions of B5-D3 in mediating pseudovirus uptake in THP-1-derived macrophages. In this test, B5-D3-LALA and B5-D3 were examined side-by-side to address the role of Fc effector functions in the phagocytosis process. Meanwhile, IgG1 isotype control was examined to address ACE2-specific effect. Indeed, we detected significant reduction of pseudovirus uptake based on p24 signal, in the B5D3-LALA treated samples compared to those receiving B5-D3. This decreased pseudoviral uptake correlated with the loss of Fc-mediated effector functions in B5-D3-LALA, indicating the involvement of Fc functions in efficient macrophage uptake of B5-D3-virus complex.

      In the revised manuscript, we have included these results on page 13 lines 310–316 in the main text and presented relevant data in Supplementary Fig. 17.

      The strategy of deliberately targeting virus-decoy complexes to phagocytes via Fc receptors inherently raises the question of Antibody-Dependent Enhancement (ADE) of disease. While the authors demonstrate a lack of productive infection in macrophages, this only addresses one facet of ADE. The risk of Fc-mediated exacerbation of inflammation (ADE) remains a critical concern. The manuscript would be significantly strengthened by a direct discussion of this risk and by including data, such as cytokine profiling from treated macrophages, to more comprehensively address the safety profile of this approach.

      (1) We thank the reviewer for his insightful comments and suggestions regarding the ADE issue.

      Indeed, Antibody-Dependent Enhancement (ADE) of viral infection is a critical concern when developing the ACE2 decoy strategy. In this study, we have carefully examined the relevant risk based on our data from various in vitro and in vivo assays.

      In our in vivo infection experiments, all B5-D3 prophylaxis and treatment groups, regardless of the administration times and routes, showed improved outcomes like less body-weight loss and better survival, compared to the PBS control groups (Figure 2). None of these treatment groups demonstrated worsened infections, indicating that ADE phenomenon was not occurring or did not play a major role during the B5-D3 treatments. Instead, moderate immune activation was observed in the lung of B5-D3 treated mice, which occurred much earlier but was milder compared to that in the PBS groups, and may reflect responses that lead to the efficient early clearance of viruses without observable symptoms (Figure 3 and 4).

      In our in vitro assays shown in Figure 6, B5-D3 treatments in epithelial or non-immune cell models (hACE2-Galu-3 and hACE2-293T) significantly blocked the entry of pseudovirus into cells and yielded much reduced luciferase signals (Figure 6d–g). Whereas in the THP-1-derived macrophages, although the presence of B5-D3 largely enhanced the entry of SARS-CoV-2 pseudovirus into cells (Figure 6a,b), it did not result in active infection and produced no luciferase signal (Figure 6g). These results were robustly reproducible, indicating that pseudoviruses did not successfully release its genome RNA and viral proteins (like RTase and integrases) after entering macrophages. Instead, colocalization analysis of p24 (pseudoviruses), sACE2-Fc (B5-D3), and LAMP1 (lysosome) signals suggested probability of pseudovirus degradation in endosomes/lysosomes after cell entry (Figure 6a,c). Consistently, examination of the macrophages that had taken up pseudovirus showed that the Spike (S) proteins from the pseudovirus particles were not cleaved to release S2’ fragment at a distinct smaller size (Figure 6h). As the cleavage of S protein in host cells is critical for effective membrane fusion, it is essential and regarded as hallmark for successful viral entry and escape from endosome. Collectively, these data consistently indicated that the SARS-CoV-2 pseudoviruses were degraded directly in lysosomes after entering macrophages, showing no sign of ADE.

      (2) We thank the reviewer for his valuable suggestion and have performed RNA-seq analysis to profile immune responses in the treated macrophages.

      We performed RNA-Seq analysis to investigate major transcriptional changes in THP-1-derived macrophages after the pseudovirus infection, with or without B5-D3 treatments. Although no individual genes fulfilled the cutoff threshold of significant up-/down-regulation, we observed antiviral responses in the pseodovirus-B5-D3 treated samples by GSEA (new Supplementary Fig. 18). This observation indicated that the B5-D3 treatment and subsequent cell-entry of pseudovirusB5-D3 complexes into macrophages induced immune activation at moderate levels, but not evoking strong immune responses that can be harmful to the host.

      In the revised manuscript, we have included the new RNA-seq analysis results on macrophage infection tests on page 13 lines 317–322 and page 14 lines 323–325 in the main text and presented the relevant data in the new Supplementary Fig. 18. Furthermore, we agree that ADE is a critical issue and have further enriched our discussion on page 17 lines 415–417, to emphasize that the risk for ADE should be thoroughly evaluated to further develop the decoy strategy for human use.

      The exclusive use of the K18-hACE2 mouse model, which exhibits severe disease, limits the generalizability of the findings. The "complete protection" observed may not translate to models with more robust and naturalistic immune responses or to human physiology.

      We thank the reviewer for pointing out the limitation of the mouse model used.

      (1) Given that wild type mice are not susceptible to SARS and SARS-CoV-2 infection, transgenic mice have been generated to express hACE2, through various designs and strategies, serving as models for viral infection and drug development. However, many of these hACE2 transgenic mouse models exhibit mild infections due to moderate hACE2 levels, failing to develop the severity observed in SARS and COVID patients [14].

      (2) The K18-hACE2 transgenic mouse line (B6. Cg-Tg(K18-ACE2)2Prlmn/J, Jackson Laboratory) used in our study carries multiple copies of K18-hACE2 transgene cassette [15]. Compared to other hACE2 transgenic mouse models, this K18-hACE2 line shows higher expression of hACE2 in airway and other epithelia and supports severer infections by both SARS and SARS-CoV2 viruses, successfully causing lethality [16]. Hence, K18-hACE2 mice is a widely used model to study SARS and SARS-CoV2 virus infections and drug developments.

      (3) We agree that K18-hACE2 mice is a relatively weak transgenic line with poor productivity. However, it demonstrates best susceptibility to SARS-CoV-2 infection among established mouse models. In this study, we observed robust responses to SARS-CoV-2 infection in both aged and young cohorts, with all infected mice consistently demonstrating significant body weight loss during 4 dpi to 7 dpi (the PBS groups in Figure 2b, g)

      We agree with the reviewer that it would be more convincing to assess the efficacy of B5-D3 using additional animal models. However, we have some reservations about the importance of these additional tests. First, the generality of ACE2-Fc decoy concept and its efficacy have been reported in other studies using various models [17,18]. Moreover, different transgenic mice or animal models exhibit distinct kinetics in the pathogenesis process and immune responses to SAS-CoV-2 infections, which differ from that in human patients at varied aspects. Hence, given the limited capacity of animal facility, we chose to focus on the K18-hACE2 mice that have demonstrated most robust and convincing infection data, to investigate the potential of B5-D3 administered through various strategies as well as the underlying mechanisms for the full protection observed in IN prophylaxis.

      In the revised manuscript, we have further enriched our discussion regarding this limitation, on page 17 lines 417–422.

      Furthermore, the lack of data on circulating SARS-CoV-2 variants is a concern

      We thank the reviewer for his valuable comment.

      In this study, we have demonstrated the viral neutralization capacity of B5-D3, as a representative of the enhanced sACE2 decoy, using multiple pseudoviruses and authentic SARS-CoV-2, which collectively covered eleven variants (up to Omicron strains). Our results from both in vitro neutralization and PRNT experiments confirmed the robust resilience of B5-D3 against viral evolution (Figure 1c–g). This observation aligns well with other studies and is broadly supported by various investigations, as was pointed out below by the reviewer.

      Furthermore, studies on viral evolution have observed a robust trend that later-emerging SARS-CoV-2 variants exhibit a higher affinity for the ACE2 receptor, enhancing their infectivity and transmissibility [19]. Therefore, it is unlikely for a newly emerged SARS-CoV-2 variant to escape from B5-D3mediated neutralization.

      Collectively, all evidence consistently supports the principle of decoy design, B5-D3 (or other effective ACE2 decoys) possess the intrinsic ability to neutralize new circulating SARS-CoV-2 variants, as long as the virus variants rely on ACE2 receptor for cell entry. Hence, although further tests on circulating viral variants would add strengths to our study, the significance of this additional data may be limited.

      In the revised manuscript, we have further addressed this concern in the discussion, on page 16 lines 394–397.

      The concept of sACE2-Fc fusion proteins as decoy receptors is not novel, and numerous similar constructs have been previously reported. The manuscript would benefit from a clearer demonstration of how the optimized B5-D3 mutant represents a significant advance over existing sACE2-Fc designs.

      We thank the reviewer for his valuable comments.

      Indeed, previous research has reported multiple ACE2 mutations to enhance its binding to spike proteins and neutralization against SARS-CoV-2. However, combining ACE2 mutations based on in silico predictions to both enhance spike binding and eliminate the ACE2 enzymatic activity resulted in accumulated burdens. For instance, ACE2 decoy candidates with up to five mutations like K31F/N33D/H34S/E35Q/H345L [8] and L79F/M82Y/Q325Y/H374A/H378A [12] have demonstrated excellent potency to neutralize SARS-CoV-2 in both in vitro and in vivo assays. However, the extensive mutations could be associated with structural instability and reduced production efficiency [8,12]. Furthermore, the high mutation loads increase risks for immunogenicity, which is a critical issue in future clinical applications. Corroboratively, Urano et al. detected in vitro T cell stimulation elicited by the L79F mutation, whereas the T92Q mutation (included in our decoy design) showed much lower immunogenicity and enhanced spike binding affinity [20].

      In our ACE2 decoy design, we incorporated only two mutations (like T92Q and H374N in B5-D3) to enhance neutralization potency while eliminating enzymatic activity, resulting in simplest ACE2 mutants desired for engineering enhanced decoy. B5-D3, as one representative, not only exhibited minimal mutation-related risks (Supplementary Fig. 2i) but also top-level neutralization potencies among all candidate mutants tested (Figure 1, Supplementary Fig. 2f,g and Supplementary Fig. 3). To further address the safety of B5-D3 for in vivo use, we have performed prolonged in vivo overexpression of B5-D3 ACE2 decoy through AAV delivery in immune-competent K18-hACE2 mice, which indeed showed no sign of RAS disturbance or immune infiltration causing tissue damage. (In the revise manuscript, we have included these new results on page 5 lines 118–122 and page 6 lines 123–135 in the main text and presented the data in new Supplementary Fig. 4).

      Therefore, instead of demonstrating advantage over existing sACE2-Fc designs, our study used the optimized B5-D3 as a representative ACE2 decoy of top performers, to systematically examined various administration strategies as well as the underlying mechanisms for the full protection observed in IN prophylaxis. Aligned with this effort, our study identified 6-hours IN prophylaxis as the most effective regimen to confer complete protection against SARS-CoV-2 infection in K18-hACE2 mice. Further investigation through transcriptomics, bio-distribution, and phagocytosis analysis revealed that IN-delivered B5-D3 not only neutralizes viruses but also engaged airway phagocytes to promote early viral clearance and host immune activation, uncovering a distinct antiviral mechanism for the universal “decoy strategy” to combat unknown air-borne respiratory virus in the future.

      In the revised manuscript, we have further clarified our focus on using B5-D3 as a “representative” of ACE2 decoy on page 4 line 84, page 5 line 109, page 14 line 333, and page 15 line 360.

      A direct comparative analysis with previously published benchmarks, particularly in terms of neutralizing potency, Fc effector function strength, and in vivo efficacy, is necessary to establish the incremental value and novelty of this specific agent.

      We thank the reviewer for his valuable comments.

      Indeed, our study has aimed to address this concern and made partial progress through in vitro neutralization assays (Figure 1b and Supplementary Fig. 2c,d,f,g). Our results from the limited yet meaningful comparisons with the sACE2 lacking Fc domain and selected sACE2-Fc mutants published/proposed previously clearly demonstrated “substantial enhancement through Fc-fusion” (Supplementary Fig. 1d) and modest improvement from protein mutagenesis at ACE2-Spike interaction interface” (Figure 1b and Supplementary Fig. 2c,d,f,g).

      Based on the results from our various neutralization assays, we chose B5-D3 as a representative of enhanced decoy for in vivo infection, which identified 6-hours IN prophylaxis to confer complete protection against infection, demonstrating significant impact of administration strategies on in vivo efficacy of B5-D3 (Figure 2). Subsequent analysis further uncovered intriguing phenomena regarding the cellular distribution of IN-administered B5-D3 and the early immune activation triggered in the lung, which underlies the full protection by IN prophylaxis and represents an important novelty of this study.

      We agree with the reviewer that further analysis with additional benchmark versions would enhance the value of this study, but we have reservation regarding the importance. To enhance clarity, in the revised manuscript, we have further emphasized our study focus on using B5-D3 as a representative ACE2 decoy throughout the text and enriched the discussion on page 15 line 348–365.

      References

      (1) Ku Z, Xie X, Hinton PR, Liu X, Ye X, Muruato AE, Ng DC, Biswas S, Zou J, Liu Y, Pandya D, Menachery VD, Rahman S, Cao Y-A, Deng H, Xiong W, Carlin KB, Liu J, Su H, Haanes EJ, Keyt BA, Zhang N, Carroll SF, Shi P-Y & An Z. Nasal delivery of an IgM offers broad protection from SARS-CoV-2 variants. Nature 595, 718-723 (2021).

      (2) Liu J, Mao F, Chen J, Lu S, Qi Y, Sun Y, Fang L, Yeung ML, Liu C, Yu G, Li G, Liu X, Yao Y, Huang P, Hao D, Liu Z, Ding Y, Liu H, Yang F, Chen P, Sa R, Sheng Y, Tian X, Peng R, Li X, Luo J, Cheng Y, Zheng Y, Lin Y, Song R, Jin R, Huang B, Choe H, Farzan M, Yuen KY, Tan W, Peng X, Sui J & Li W. An IgM-like inhalable ACE2 fusion protein broadly neutralizes SARSCoV-2 variants. Nat Commun 14, 5191 (2023).

      (3) Guo H, Cho B, Hinton PR, He S, Yu Y, Ramesh AK, Sivaccumar JP, Ku Z, Campo K, Holland S, Sachdeva S, Mensch C, Dawod M, Whitaker A, Eisenhauer P, Falcone A, Honce R, Botten JW, Carroll SF, Keyt BA, Womack AW, Strohl WR, Xu K, Zhang N, An Z, Ha S, Shiver JW & Fu T-M. An ACE2 decamer viral trap as a durable intervention solution for current and future SARS-CoV. Emerging Microbes & Infections 12, 2275598 (2023).

      (4) Keyt BA, Baliga R, Sinclair AM, Carroll SF & Peterson MS. Structure, Function, and Therapeutic Use of IgM Antibodies. Antibodies 9, 53 (2020).

      (5) Chan KK, Dorosky D, Sharma P, Abbasi SA, Dye JM, Kranz DM, Herbert AS & Procko E. Engineering human ACE2 to optimize binding to the spike protein of SARS coronavirus 2. Science 369, 1261-1265 (2020).

      (6) Guy JL, Jackson RM, Jensen HA, Hooper NM & Turner AJ. Identification of critical active-site residues in angiotensin-converting enzyme-2 (ACE2) by site-directed mutagenesis. The FEBS Journal 272, 3512-3520 (2005).

      (7) Payandeh Z, Rahbar MR, Jahangiri A, Hashemi ZS, Zakeri A, Jafarisani M, Rasaee MJ & Khalili S. Design of an engineered ACE2 as a novel therapeutics against COVID-19. Journal of Theoretical Biology 505, 110425 (2020).

      (8) Glasgow A, Glasgow J, Limonta D, Solomon P, Lui I, Zhang Y, Nix MA, Rettko NJ, Zha S, Yamin R, Kao K, Rosenberg OS, Ravetch JV, Wiita AP, Leung KK, Lim SA, Zhou XX, Hobman TC, Kortemme T & Wells JA. Engineered ACE2 receptor traps potently neutralize SARS-CoV2. Proceedings of the National Academy of Sciences 117, 28046-28055 (2020).

      (9) Lei C, Qian K, Li T, Zhang S, Fu W, Ding M & Hu S. Neutralization of SARS-CoV-2 spike pseudotyped virus by recombinant ACE2-Ig. Nature Communications 11, 2070 (2020).

      (10) Maciuba S, Bowden GD, Stratton HJ, Wisniewski K, Schteingart CD, Almagro JC, Valadon P, Lowitz J, Glaser SM, Lee G, Dolatyari M, Navratilova E, Porreca F & Riviere PJM. Discovery and characterization of prolactin neutralizing monoclonal antibodies for the treatment of female-prevalent pain disorders. MAbs 15, 2254676 (2023).

      (11) Dwivedi V, Shivanna V, Gautam S, Delgado J, Hicks A, Argonza M, Meredith R, Turner J, Martinez-Sobrido L, Torrelles JB & Kulkarni V. Age associated susceptibility to SARS-CoV-2 infection in the K18-hACE2 transgenic mouse model. Geroscience 46, 2901-2913 (2024).

      (12) Chen Y, Sun L, Ullah I, Beaudoin-Bussières G, Anand SP, Hederman AP, Tolbert WD, Sherburn R, Nguyen DN, Marchitto L, Ding S, Wu D, Luo Y, Gottumukkala S, Moran S, Kumar P, Piszczek G, Mothes W, Ackerman ME, Finzi A, Uchil PD, Gonzalez FJ & Pazgier M. Engineered ACE2-Fc counters murine lethal SARS-CoV-2 infection through direct neutralization and Fc-effector activities. Science Advances 8, eabn4188 (2022).

      (13) Lund J, Winter G, Jones PT, Pound JD, Tanaka T, Walker MR, Artymiuk PJ, Arata Y, Burton DR, Jefferis R & Woof JM. Human Fc gamma RI and Fc gamma RII interact with distinct but overlapping sites on human IgG. The Journal of Immunology 147, 2657-2662 (1991).

      (14) Lutz C, Maher L, Lee C & Kang W. COVID-19 preclinical models: human angiotensinconverting enzyme 2 transgenic mice. Hum Genomics 14, 20 (2020).

      (15) McCray PB, Pewe L, Wohlford-Lenane C, Hickey M, Manzel L, Shi L, Netland J, Jia HP, Halabi C, Sigmund CD, Meyerholz DK, Kirby P, Look DC & Perlman S. Lethal Infection of K18hACE2 Mice Infected with Severe Acute Respiratory Syndrome Coronavirus. Journal of Virology 81, 813-821 (2007).

      (16) Oladunni FS, Park JG, Pino PA, Gonzalez O, Akhter A, Allue-Guardia A, Olmo-Fontanez A, Gautam S, Garcia-Vilanova A, Ye C, Chiem K, Headley C, Dwivedi V, Parodi LM, Alfson KJ, Staples HM, Schami A, Garcia JI, Whigham A, Platt RN, 2nd, Gazi M, Martinez J, Chuba C, Earley S, Rodriguez OH, Mdaki SD, Kavelish KN, Escalona R, Hallam CRA, Christie C, Patterson JL, Anderson TJC, Carrion R, Jr., Dick EJ, Jr., Hall-Ursone S, Schlesinger LS, Alvarez X, Kaushal D, Giavedoni LD, Turner J, Martinez-Sobrido L & Torrelles JB. Lethality of SARS-CoV-2 infection in K18 human angiotensin-converting enzyme 2 transgenic mice. Nat Commun 11, 6122 (2020).

      (17) Urano E, Itoh Y, Suzuki T, Sasaki T, Kishikawa JI, Akamatsu K, Higuchi Y, Sakai Y, Okamura T, Mitoma S, Sugihara F, Takada A, Kimura M, Nakao S, Hirose M, Sasaki T, Koketsu R, Tsuji S, Yanagida S, Shioda T, Hara E, Matoba S, Matsuura Y, Kanda Y, Arase H, Okada M, Takagi J, Kato T, Hoshino A, Yasutomi Y, Saito A & Okamoto T. An inhaled ACE2 decoy confers protection against SARS-CoV-2 infection in preclinical models. Sci Transl Med 15, eadi2623 (2023).

      (18) Higuchi Y, Suzuki T, Arimori T, Ikemura N, Mihara E, Kirita Y, Ohgitani E, Mazda O, Motooka D, Nakamura S, Sakai Y, Itoh Y, Sugihara F, Matsuura Y, Matoba S, Okamoto T, Takagi J & Hoshino A. Engineered ACE2 receptor therapy overcomes mutational escape of SARS-CoV-2. Nature Communications 12, 3802 (2021).

      (19) Cho MJ, Been NR & Son H. From Alpha to Omicron: Structural Insights into SARS-CoV-2 RBD Evolution and ACE2 Binding. European Journal of Public Health 35(2025).

      (20) Urano E, Itoh Y, Suzuki T, Sasaki T, Kishikawa J-i, Akamatsu K, Higuchi Y, Sakai Y, Okamura T, Mitoma S, Sugihara F, Takada A, Kimura M, Nakao S, Hirose M, Sasaki T, Koketsu R, Tsuji S, Yanagida S, Shioda T, Hara E, Matoba S, Matsuura Y, Kanda Y, Arase H, Okada M, Takagi J, Kato T, Hoshino A, Yasutomi Y, Saito A & Okamoto T. An inhaled ACE2 decoy confers protection against SARS-CoV-2 infection in preclinical models. Science Translational Medicine 15, eadi2623 (2023).

    1. eLife Assessment

      This study integrates large-scale behavioral, genetic, and molecular analyses in animal models to investigate morphine response. Utilizing high-quality, time-series Quantitative Trait Loci (QTL) mapping, the work provides compelling evidential support for novel, time-dependent genetic interactions (epistasis). A fundamental result of this rigorous analysis is the discovery of a novel Oprm1-Fgf12-MAPK signaling pathway, which offers new insights into the mechanisms of opioid sensitivity.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have appropriately addressed the comments raised in the previous round of review.]

      Summary:

      The study by Lemen et al. represents a comprehensive and unique analysis of gene networks in rat models of opioid use disorder, using multiple strains and both sexes. It provides a time-series analysis of Quantitative Trait Loci (QTLs) in response to morphine exposure.

      Strengths:

      A key finding is the identification of a previously unknown morphine-sensitive pathway involving Oprm1 and Fgf12, which activates a cascade through MAPK kinases in D1 medium spiny neurons (MSNs). Strengths include the large-scale, multi-strain, sex-inclusive design, the time-series QTL mapping provides dynamic insights, and the discovery of an Oprm1-Fgf12-MAPK signaling pathway in D1 MSNs, which is novel and relevant.

    3. Reviewer #2 (Public review):

      Summary:

      This highly novel and significant manuscript re-analyzes behavioral QTL data derived from morphine locomotor activity in the BXD recombinant inbred panel. The combination of interacting behavioral-pharmacology (morphine and naltrexone) time course data, high-resolution mouse genetic analyses, genetic analysis of gene expression (eQTLs), cross-species analysis with human gene expression and genetic data, and molecular modeling approaches with Bayesian network analysis produces new information on loci modulating morphine locomotor activity.

      Furthermore, the identification of time-wise epistatic interactions between the Oprm1 and Fgf12 loci is highly novel and points to methodological approaches for identifying other epistatic interactions using animal model genetic studies.

      Strengths:

      (1) Use of state-of-the art genetic tools for mapping behavioral phenotypes in mouse models.

      (2) Adequately powered analysis incorporating both sexes and time course analyses.

      (3) Detection of time and sex-dependent interactions of two QTL loci modulating morphine locomotor activity.

      (4) Identification of putative candidate genes by combined expression and behavioral genetic analyses.

      (5) Use of Bayesian analysis to model causal interactions between multiple genes and behavioral time points.

      Appraisal:

      The authors largely succeeded in reaching goals with novel findings and methodology.

      Significance of Findings:

      This study will likely spur future direct experimental studies to test hypotheses generated by this complex analysis. Additionally, the broad methodological approach incorporating time course genetic analyses may encourage other studies to identify epistatic interactions in mouse genetic studies.

    4. Reviewer #3 (Public review):

      Summary:

      This is a clearly written paper that describes the reanalysis of data from a BXD study of the locomotor response to morphine and naloxone. The authors detect significant loci and an epistatic interaction between two of those loci. Single-cell data from outbred rats is used to investigate the interaction. The authors also use network methods and incorporate human data into their analysis.

      Strengths:

      One major strength of this work is the use of granular time-series data, enabling the identification of time-point-specific QTL. This allowed for the identification of an additional, distinct QTL (the Fgf12 locus) in this work compared to previously published analysis of these data, as well as the identification of an epistatic effect between Oprm1 (driving early stages of locomotor activation) and Fgf12 (driving later stages).

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Lemen et al. represents a comprehensive and unique analysis of gene networks in rat models of opioid use disorder, using multiple strains and both sexes. It provides a time-series analysis of Quantitative Trait Loci (QTLs) in response to morphine exposure.

      Strengths:

      A key finding is the identification of a previously unknown morphine-sensitive pathway involving Oprm1 and Fgf12, which activates a cascade through MAPK kinases in D1 medium spiny neurons (MSNs). Strengths include the large-scale, multi-strain, sex-inclusive design, the time-series QTL mapping provides dynamic insights, and the discovery of an Oprm1-Fgf12-MAPK signaling pathway in D1 MSNs, which is novel and relevant.

      Weaknesses:

      (1) The proposed involvement of Nav1.2 (SCN2A) as a downstream target of the Oprm1-Fgf12 pathway requires further analysis/evidence. Is Nav1.2 (SCN2A) expressed in D1 neurons?

      The authors mentioned that SCN8A (Nav1.6) was tested as a candidate mediator of Oprm1-Fgf12 loci and variation in locomotor activity. However, the proposed model supports SCN2A as a target rather than SCN8A. This is somewhat unexpected since SCN8A is highly abundant in MSN.

      Can the authors provide expression data for SCN2A, Oprm1, and Fgf12 in D1 vs. D2 MSNs?

      Author response image 1.

      We generated Author response image 1 to show both Scn2a and Scn8a are ubiquitously expressed in MSN and GABAergic neurons.

      (2) The authors should consider adding a reference to FGF12 in Schizophrenia (PMC8027596) in the Introduction.

      This is a relevant reference. We have cited it in the discussion section instead of introduction because we felt that is more relevant.

      (3) There is recent evidence supporting the druggability of other intracellular FGFs, such as FGF14 (PMC11696184) and FGF13 (PMC12259270), through their interactions with Nav channels. What are the implications of these findings for drug discovery in the context of the present study? Could FGF12 be considered a potential druggable therapeutic target for opioid use disorder (OUD)?

      The recent success in targeting FGF14 and FGF13 protein-protein interactions with sodium channels suggests that FGF12 could indeed be a druggable target for OUD. We have added a section to the Discussion exploring the potential for developing small-molecule modulators of the FGF12-Nav interface as a novel therapeutic strategy.

      Reviewer #2 (Public review):

      Summary:

      This highly novel and significant manuscript re-analyzes behavioral QTL data derived from morphine locomotor activity in the BXD recombinant inbred panel. The combination of interacting behavioral-pharmacology (morphine and naltrexone) time course data, high-resolution mouse genetic analyses, genetic analysis of gene expression (eQTLs), cross-species analysis with human gene expression and genetic data, and molecular modeling approaches with Bayesian network analysis produces new information on loci modulating morphine locomotor activity.

      Furthermore, the identification of time-wise epistatic interactions between the Oprm1 and Fgf12 loci is highly novel and points to methodological approaches for identifying other epistatic interactions using animal model genetic studies.

      Strengths:

      (1) Use of state-of-the art genetic tools for mapping behavioral phenotypes in mouse models.

      (2) Adequately powered analysis incorporating both sexes and time course analyses.

      (3) Detection of time and sex-dependent interactions of two QTL loci modulating morphine locomotor activity.

      (4) Identification of putative candidate genes by combined expression and behavioral genetic analyses.

      (5) Use of Bayesian analysis to model causal interactions between multiple genes and behavioral time points.

      Weaknesses:

      (1) There is a need for careful editing of the text and figures to eliminate multiple typographical and other compositional errors.

      We have performed a thorough review of the manuscript and corrected typographical errors, including "ddactivates" and other compositional issues.

      (2) There are multiple examples of overstating the possible significance of results that should be corrected or at least directly pointed out as weaknesses in the Discussion. These include:

      (a) Assumption that the Oprm1 gene is the causal candidate gene for the major morphine locomotor Chr10 QTL at the early time epochs. Oprm1 is 400,000 bp away from the support interval of the Mor10a QTL locus, and there is no mention as to whether the Oprm1 mRNA eQTL overlaps with Mor10a.

      We have clarified this in the text. While Oprm1 is located proximal to the peak, its massive size and the presence of a strong mRNA cis-eQTL in the NAc and hippocampus that precisely overlaps with the Mor10a QTL support interval provide robust evidence for its candidacy. We have added this detail to the Results section.

      (b) Although the Bayesian analysis of possible complex interactions between Oprm1, Fgf12, other interacting genes, and behaviors is very innovative and produces testable hypotheses, a more straightforward mediation analysis of causal relationships between genotype, gene expression, and phenotype would have added strength to the arguments for the causal role of these individual genes.

      We agree that mediation analysis would be a valuable addition. We revised the Results section to acknowledge that while the Bayesian network provides a comprehensive causal hypothesis, future studies employing formal mediation analysis could further strengthen these individual gene-to-behavior links.

      (c) The GWAS data analysis for Oprm1 and Fgf12 is incomplete in not mentioning actual significance levels for Oprm1 and perhaps overstating the nominal significance findings for Fgf12.

      We have updated the manuscript to include the specific significance levels for the human GWAS findings related to Oprm1 and Fgf12. We have clarified that the OPRM1 variant rs1799971 reached genome-wide significance (OR = 1.046, p = 4.92 × 10<sup>-9</sup>). Furthermore, we have ensured that the findings for FGF12 are described as nominally significant to avoid any overstatement of the results. For example, we now specify that the top FGF12 SNP rs1553460 achieved nominal significance (OR = 1.015, p = 0.021). The Results and Discussion sections have been revised to reflect these precise statistical values.

      Appraisal:

      The authors largely succeeded in reaching goals with novel findings and methodology.

      Significance of Findings:

      This study will likely spur future direct experimental studies to test hypotheses generated by this complex analysis. Additionally, the broad methodological approach incorporating time course genetic analyses may encourage other studies to identify epistatic interactions in mouse genetic studies.

      Reviewer #3 (Public review):

      Summary:

      This is a clearly written paper that describes the reanalysis of data from a BXD study of the locomotor response to morphine and naloxone. The authors detect significant loci and an epistatic interaction between two of those loci. Single-cell data from outbred rats is used to investigate the interaction. The authors also use network methods and incorporate human data into their analysis.

      Strengths:

      One major strength of this work is the use of granular time-series data, enabling the identification of time-point-specific QTL. This allowed for the identification of an additional, distinct QTL (the Fgf12 locus) in this work compared to previously published analysis of these data, as well as the identification of an epistatic effect between Oprm1 (driving early stages of locomotor activation) and Fgf12 (driving later stages).

      Weaknesses:

      (1) What criteria were used to determine whether the epistatic interaction was significant? How many possible interactions were explored?

      By design we only tested for epistasis between the Oprm1 and the Fgf12 loci—a single test of a non-linear interaction. As such there is no correction for multiple tests and no need for permutation. In other words the “nominal” P value in this case is the only relevant P value. We have added this clarification in the Results and Methods.

      (2) Results are presented for males and females separately, but the decision to examine the two sexes separately was never explained or justified. Since it is not standard to perform GWAS broken down by sex, some initial explanation of this decision is needed. Perhaps the discussion could also discuss what (if anything) was learned as a result of the sex-specific analysis. In the end, was it useful?

      We chose to analyze sexes separately AND jointly due to significant sex differences and sex by strain interactions in locomotion data. This rationale has been added to the results section. We also discussed sex-specific results in the revision.

      (3) The confidence intervals for the results were not well described, although I do see them in one of the tables. The authors used a 1.5 support interval, but didn't offer any justification for this decision. Is that a 95% confidence interval? If not, should more consideration have been given to genes outside that interval? For some of the QTLs that are not the focus of this paper, the confidence intervals were very large (>10 Mb). Is that typical for BXDs?

      The 1.5 LOD support interval is a standard metric for most QTL mapping studies, and does correspond approximately to a 95% confidence or support interval. Large intervals are common in BXD studies when effect sizes are moderate or recombination density is lower in specific regions. We have clarified the use of the 1.5 LOD interval in the Results section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In the vast majority of the figures, the text is too small to read.

      We have adjusted the font size in most of the figures.

      Reviewer #2 (Recommendations for the authors):

      (1) There is a need for careful editing of the text and figures to eliminate multiple typographical and other compositional errors. Examples of these include:

      (a) Figure 2E&F lacks identification of Oprm1 as the gene for cis-eQTL studies.

      (b) Figure 2H is fairly uninterpretable given the small font sizes. It should be excluded, put as a supplemental figure, or reconfigured to highlight the most important findings in a more legible manner.

      (c) Figure 4b: columns in the table need to be identified by a header row.

      We thank the reviewer for these comments and have addressed them in the revised version.

      Oprm1 is now labeled in Figure 2E and 2F, Figure 2G and 2H is now moved to the Supplementary material. And a header row is added to the table in Figure 4b.

      Reviewer #3 (Recommendations for the authors):

      Abstract

      (1) For the abstract, it might be simpler to name the alleles as "the C57BL/6J allele", etc., since B allele will confuse people unfamiliar with mouse nomenclature.

      It is critical to not confound the organism known as C57BL/6J with the genotype, allele, or haplotype that a mouse happens to inherit. Diverse types of mice inherit reference alleles but they may be only very distantly related the C57BL/6J strain. And even the C57BL/6J strain is a moving target that accumulates mutations that are not even consider reference. For example the mutation in Gabra2 of C57BL/6J is a de novo mutation that is not carried by many of the BXD strains since this mutation happened in JAX foundation stock after the BXDs were first established by Dr. Ben Taylor in the 1970s.

      The convention is to refer to mouse strains by one string and RRID, the abbreviation of that strain by a common code (often B6), and the abbreviation of the allele, genotype, or haplotype by the italic letter B. This has been the recommendation of the Mouse Nomenclature Committee (on which one of the authors has been a member) for well over 50 years.

      (2) I wondered if "also associated with a high B allele" could be reworded somehow; I had to re-read that sentence several times.

      This sentence has been reworded for clarity.

      (3) Parts of the abstract are written in the present tense, but then it switches to past ("we generated" but then "a Bayesian network analysis supports...").

      We have thoroughly revised the abstract. Following standard scientific writing conventions, we now utilize the past tense to describe the specific experimental actions and results of this study. We have maintained the present tense for established biological facts and the broader significance of the findings.

      (4) While the -log(p) values are all impressive, the abstract should indicate what threshold is used for genome-wide significance and how that threshold was obtained.

      We have added the significance threshold to the Abstract.

      (5) Do the details of the MAP kinase cascade need to be explained in the abstract? It feels like a lot of detail for an abstract and represents one of the most speculative aspects of the paper. Maybe just say you identified a possible network, but save the details for the main paper.

      This is a valid suggestion. We removed the specific MAP kinase from the abstract.

      Introduction

      (1) You could add a sentence explaining why using an LMM (GEMMA) was an improvement over the prior analysis.

      We have added a sentence explaining that GEMMA improves mapping power and better controls for population structure compared to previous methods.

      (2) When mentioning Philips 2010, you could indicate that it identified Oprm1. This might be easier than "In addition to Oprm1" which confused me at first because it had not been mentioned before, so 'in addition' was jarring.

      We have revised the text to state that Philip et al. (2010) originally identified the Oprm1 locus.

      Results

      (1) There are additional instances of the tense switching between past and present in the results section.

      We have standardized the tenses in the Results section.

      (2) "Ostn, Uts2d, Ccdc50, Gm10823, Fgf12, and Mb21d2" - before giving arguments for fgf12, can you clarify if there are coding variants or eQTLs for any of these genes?

      We have added a statement clarifying the coding variants for other genes in this interval and highlighting their eQTL status.

      (3) "a total number of 4,495 high-quality nuclei transcriptomes". Consider removing the word "number".

      Removed.

      (4) "approximately 6 males and 6 females" - could you point the reader to a supplementary table that has the exact number of individuals at the end of this sentence?

      The exact number of mice used in each of the BXD strains is not recorded in the original publication by Philip et al., with only mean and max was given. We have clarified that 6 is the average.

      (5) "computed using a subset" - please explain how you selected this subset (I assumed LD pruning, but why not be explicit. How many SNPs/markers were there originally, and how many are retained?

      We have specified that the subset of markers was selected via LD pruning to represent the genetic diversity of the BXDs.

      (6) A few words about how the significant threshold was obtained (permutation?) are needed.

      We have clarified that the significance threshold was obtained through 1,000 permutations.

      (7) Some of the GWAS results are presented for males and females separately (as well as combined). This is not typical, and so maybe a sentence explaining why the authors thought there might be sex specific GWAS results would be warranted.

      The rationale for sex-specific analysis is provided in the results section (significant sex difference and sex by strain interaction)

      (8) The correlation between the sexes of 0.68 could be evidence that there are sex-specific genetic effects, but could it also just be due to increased noise as you reduce sample size? What is the confidence interval for that number? Does it include 1? Or 0? If you randomly split the dataset, rather than splitting on the basis of sex, would you obtain higher correlations? The idea of sex differences is interesting, but a bit more work is needed to clarify these concerns.

      The correlation of 0.68 (95% CI: 0.52–0.79) significantly excludes both 0 and 1. The drop from r = ~0.86 at earlier intervals suggests a biological shift rather than noise due to sample size, as n remains constant (n = ~ 6 /sex/strain) across all time points. This divergence is driven by sex-specific genetic modifiers, such as the Fgf12 locus, which is more than twice as strong in females (LOD 10.6) as in males (LOD 4.3). We have addressed this in the revision.

      (9) Maybe I missed it, but how did you determine the threshold for significance for the epistatic interaction? Could you also clearly indicate how many possible cases of epistasis were examined/considered, since that dictates the correction for multiple testing.

      We only tested the interaction between the Fgf12 and the Oprm loci.

      (10) "To further examine whether Oprm1 and Fgf12 were co-expressed in the same cells of the NAc," can you first give an indication as to why you looked in NAc versus other brain areas you might have considered?

      We have added a sentence explaining that the NAc was chosen due to its central role in opioid reward and the observed strain differences in dopamine release in this region.

      (11) "...from every cell type conveyed a weak but significant positive correlation (r = 0.08, p = 1.8e-8) between the expression of Oprm1 and Fgf12 (Figure 7e). When we performed Pearson's correlation analysis within each individual cell cluster, only D1-MSN-3 had a significant positive correlation (r = 0.35, p = 6.1e-8, Figure 7f). In contrast, D1-MSN-2 had a significantly weak negative correlation (r = -0.12, p = 0.02, Figure 7g)." Can you explain why these correlations are relevant? What hypothesis are you testing?

      We have clarified that these correlations were used to test the hypothesis that Oprm1 and Fgf12 are co-expressed and potentially co-regulated within the same neuronal subtype to support their epistatic interaction.

      (12) "After the morphine locomotion tests were complete," can you give a specific timepoint? Like, was it exactly 180 minutes after the morphine injection?

      We have specified that naloxone was injected exactly 180 minutes after the morphine injection.

      (13) I appreciate the desire to relate the results of this paper to human GWAS results; however, I don't feel there is much worth discussing beyond the Oprm1 finding. Therefore, I would suggest removing this from the results section and instead just making it a discussion topic. The results presented are clearly the weakest part of this paper, and I personally think it is a shame to end the results section with something that is not very informative. But I suspect the authors may wish to retain this section, and I leave that decision to them and the editor.

      We have retained this section but moved some of the more speculative human data discussion to the Discussion section as suggested.

      Discussion

      (1) Typo "deactivates".

      Corrected to "activates".

      (2) The last sentence in the first paragraph again discusses the comparison to humans; I would remove this.

      That sentence is condensed.

      (3) "These data indicate that Oprm1 is a strong candidate gene for the Chr 10 locus associated with morphine-induced locomotion response." I would remind them of the eQTL for Oprm1 since this is a key piece of evidence supporting this gene as a candidate.

      We have added a reminder of the overlapping mRNA cis-eQTL for Oprm1.

      (4) "It is likely that differences in morphine-induced dopamine release are involved in the highly variable locomotor responses to morphine across the BXD family." I agree this might be true, but since you have no evidence to support this claim, is it worth mentioning at all?

      We have rephrased this as a hypothesis or cited relevant literature supporting this link in parental strains.

      (5) Could you include a sentence or two about why Philip 2010 didn't find Fgf12? Lack of markers? The difference between an LM and an LMM?

      We have added an explanation that the use of a high-density WGS-based marker set and the LMM (GEMMA) allowed for the detection of this novel locus that was previously missed.

      (6) Section titled "Cell-type specific gene expression in NAc". While this is interesting, you might also want to remind the reader that epistatic interactions do not necessarily require the genes to be expressed in the same cell or for their gene products to physically interact.

      We have added this caveat to the Discussion.

      (7) I think the Bayesian network section is not very strong. For example, they did not compare the results for their two chosen genes to the results they might have obtained if they had chosen other genes from their QTL intervals. My guess is that those other genes might have also produced results that were equally convincing. I'm not asking them to do that, but it reflects the risk of false positive results when taking an approach like this. Nevertheless, I am guessing the authors would prefer to include this section.

      We appreciate the reviewer pointing out this possibility and agree with this concern. We have added a statement acknowledging the risk of false positives in Bayesian modeling in this context and noting that these findings are intended as testable hypotheses

      Methods

      (1) How were the 2 HS rats selected? I had the impression that Dr. Telese's lab had access to snRNA-seq data from more than 2 HS rats.

      We have clarified that these rats were selected based on their addiction-like behavior phenotypes from a larger cohort.

      (2) I didn't look back, but did the main paper point out that the rats are treated with oxycodone rather than morphine?

      We have clarified this distinction in the Methods section.

    1. eLife Assessment

      This important study investigates how the nervous system adapts to changes in the mechanics of the body, which are altered through a tendon transfer surgery affecting finger extensor and flexor muscles. By measuring task performance, joint kinematics, and muscle activity for several weeks post surgery, the authors provide convincing evidence that monkeys undergo a two-phase adaptation process. First, they adopt a maladaptive strategy to overcome the functional challenges imposed by the surgery, and then revert to a strategy that uses the same patterns of muscle coactivation observed pre-tendon transfer.

    2. Reviewer #1 (Public review):

      Summary:

      Many studies have investigated adaptation to altered sensorimotor mappings or to an altered mechanical environment. This paper asks a different but also important question in motor control and neurorehabilitation: how does the brain adapt to changes in the controlled plant? The authors addressed this question by performing a tendon transfer surgery in two monkeys during which the swapped tendons flexing and extending the digits. They then monitored changes in task performance, muscle activation and kinematics post-recovery over several months, to assess changes in putative neural strategies.

      Strengths:

      (1) The authors performed complicated tendon transfer experiments to address their question of how the nervous system adapts to changes in the organisation of the neuromusculoskeletal system, and present very interesting data characterising neural (and in one monkey, also behavioural) changes post tendon transfer over several months.

      (2) The fact that the authors had to employ to two slightly different tasks -one more artificial, the other more naturalistic- in the two monkeys and yet found qualitatively similar changes across them makes the findings more compelling. After all these are very challenging experiments!

      (3) The paper is well written, the analyses are sound, and the authors interpret the data appropriately, acknowledging the key limitations.

      Weaknesses:

      None of note.

    3. Reviewer #3 (Public review):

      Summary:

      In this study, Philipp et al. investigate how a monkey learns to compensate for a large, chronic biomechanical perturbation--a tendon transfer surgery, swapping the actions of two muscles that flex and extend the fingers. After performing the surgery and confirming that the muscle actions are swapped, the authors follow the monkeys' performance on grasping tasks over several months. There are several main findings:

      - There is an initial stage of learning (around 60 days), where monkeys simply swap the activation timing of their flexors and extensors during the grasp task to compensate for the two swapped muscles.

      - This is (seemingly paradoxically) followed by a stage where muscle activation timing returns almost to what it was pre-surgery, suggesting that monkeys suddenly swap to a new strategy that is better than the simple swap.

      - Muscle synergies seem remarkably stable through the entire learning course, indicating that monkeys do not fractionate their muscle control to swap the activations of only the two transferred muscles.

      - Muscle synergy activation shows a similar learning course, where the flexion synergy and extension synergy activations are temporarily swapped in the first learning stage and then revert to pre-surgery timing in the second learning stage.

      - The second phase of learning seems to arise from making new, compensatory movements (supported by other muscle synergies) that get around the problem of swapped tendons.

      Strengths:

      This study is quite remarkable in scope, studying two monkeys over a period of months after a difficult tendon-transfer surgery. As the authors point out, this kind of perturbation is an excellent testbed for the kind of long-term learning that one might observe in a patient after stroke or injury, and provides unique benefits over more temporary perturbations like visuomotor transformations and over studying learning through development. Moreover, while the two-stage learning course makes sense, I found the details to be genuinely surprising--specifically the fact that: 1) muscle synergies continue to be stable for months after the surgery, despite being maladaptive; and 2) muscle activation timing reverts to pre-surgery levels by the end of the learning course. These two facts together initially make it seem like the monkey simply ignores the new biomechanics by the end of the learning course, but the authors do well to explain that this is mainly because the monkeys develop a new kind of movement to circumvent the surgical manipulation.

      I found these results fascinating, especially in comparison to some recent work in motor cortex, showing that a monkey may be able to break correlations between the activities of motor cortical neurons, but only after several of coaching and training (Oby et al. PNAS 2019). Even then, it seemed like the monkey was not fully breaking correlations but rather pushing existing correlations harder to get succeed at the virtual task (a brain-computer interface with perturbed control).

      Weaknesses:

      I found the analysis to be reasonably well considered and relatively thorough. The authors have also suitably addressed my comments on the previous version. One minor weakness that remains (understandably so) is that the two animals in the study performed different tasks, and the results of the secondary synergy analysis seem to be quite different (Figure 10). That said, I don't think this weakness reduces the impact of the study, and though multiple replications of the same results would provide more convincing evidence, I don't think it's necessary to make the points that the authors are making.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) I think this is an important paper, but I’m puzzled about a tension in the results. On the one hand, it looks like the behavioural gains post-TT happen rather smoothly over time (Figure 5). On the other hand, muscle synergy activations change abruptly at specific days (around day ~65 for Monkey A and around day ~45 for Monkey B; e.g., Figure 6). How do the authors reconcile this tension? In other words, how do they think that this drastic behavioural transition can arise from what appears to be step-by-step, continuous changes in muscle coordination? Is it “just” subtle changes in movements/posture exploiting the mechanical coupling between wrist and finger movements, combined with subtle changes in synergies, and they just happen to all kick in at the same time? This feels to me to be the core of the paper and should be addressed more directly.

      We thank the reviewer for this insightful comment, as it touches upon the central finding of our study. The apparent tension between the smooth behavioral recovery and the abrupt shift in neural strategy is indeed a key feature of the adaptation process. We propose that this reflects the interaction of two distinct, parallel processes operating on different timescales:

      A slow, gradual skill-learning process, where the monkeys incrementally developed and refined a compensatory motor strategy (i.e., the tenodesis effect). This slow refinement is responsible for the smooth improvement seen in the behavioral metrics over many weeks.

      A fast, switch-like adaptive process, which governs the activation of the primary muscle synergies. The initial ‘swap’ strategy, while simple, was biomechanically conflicting and inefficient. The CNS only abandoned this flawed strategy abruptly once the slow learning process had rendered the new compensatory strategy “good enough” to be a viable alternative.

      Therefore, the abrupt neural shift does not cause the behavioral improvement but is rather enabled by the gradual, underlying development of a better motor solution. To address this important point more directly within the manuscript, we added a new subheading to the Discussion section. This section is dedicated to explicitly framing our findings within this multi-timescale learning model, ensuring the link between the gradual behavioral recovery and the abrupt neural shift is clearly articulated.

      (2) The muscle synergy analyses, which are an important part of the paper, could be improved. In particular:

      (a) When measuring the cross-correlation between the activation of synergies, the authors should include error bars and should also look at the lag between the signals.

      We thank the reviewer for these excellent suggestions to improve our analysis.

      Error Bars: We agree that showing trial-to-trial variability is important. In our revision, we have added a shaded envelope (representing the SD across trials) to the cross-correlation plots in Figures 6, 9 and 10.

      Time Lag: We have performed the cross-correlation analysis allowing for variable time lags and extracted the lag yielding the maximum correlation coefficient (max CC) for each session, in addition to the zero-lag correlation presented in the main figures. As hypothesized, allowing variable lags often resulted in high max CC values throughout the adaptation period, potentially obscuring the clear swap-and-revert pattern visible in the zerolag analysis. This is likely because the primary adaptation involved changes in synergy timing rather than fundamental shape. However, the analysis of the lag itself proved informative. We observed significant fluctuations in the optimal lag during the early and mid-adaptation phases, particularly around the time of the ‘switch-back’, before the lag stabilized closer to zero in the late phase.

      We have added a description of this analysis to the Methods section. The results of the lag analysis are now presented in a new Supplementary Figure S6 and S7, and a sentence summarizing this finding has been added to the Results section.

      (b) Figure 7C and related figures, the authors state that the activation of muscle synergies reverts to pre-TT patterns toward the end of the experiments. However, there are noticeable differences for both monkeys (at the end of the “task range” for synergy B for monkey A, and around 50% task range for synergy B for monkey B). The authors should measure this, e.g., by quantifying the per-sample correlation between pre-TT and post-TT activation amplitudes. Same for Figures 8I, J, etc.

      We thank the reviewer for this detailed and insightful suggestion. We agree that our use of the term ‘reversion’ should be nuanced, as the recovery of the synergy activation patterns is substantial but not perfect.

      To formally quantify these remaining differences, we performed a rigorous quantitative comparison between the pre-surgery and final-day post-surgery activation profiles. We calculated the Cosine Similarity to assess the recovery of the temporal shape, and used a Permutation Test (n=10,000) to test for statistical distinctness between the pre- and post-surgery trajectories.

      Results: We found that while the temporal shapes were highly similar (Cosine Correlation > 0.90 for all synergies), the Permutation Test confirmed that the profiles remained statistically distinct (p < 0.0001) in both animals.

      We have added this quantification to the text (Results). This confirms our nuanced interpretation: while the primary temporal features of the synergies reverted, the recovered motor program represents a novel, ‘good enough’ solution that is robust and functional, rather than a mathematically perfect restoration of the original baseline.

      (c) In Figures 9 and 10, the authors show the cross-correlation of the activation coefficients of different synergies; the authors should also look at the correlation between activation profiles because it provides additional information.

      We thank the reviewer for this comment and the opportunity to clarify our terminology. We agree that analyzing the correlation between the full activation profiles is the most informative approach. In our manuscript, the terms ‘activation coefficients’ and ‘activation profiles’ both refer to the complete, time-varying activation patterns of the muscle synergies. Therefore, the crosscorrelation analysis presented in Figures 9 and 10 is indeed the correlation between these full activation profiles. To prevent any potential ambiguity for future readers, we have revised the manuscript to use the term ‘activation profiles’ exclusively and consistently when referring to these time-varying synergy activations.

      (d) The muscle synergy analysis for Monkey B is hindered by the fact that the authors lost the ability to record from the (very) functionally relevant FDS muscle. I’d repeat the synergy analyses without this muscle to understand to what extent the observed changes with respect to baseline are driven by the lack of this data.

      We thank the reviewer for raising this important methodological point. We agree that controlling for changes in the recorded muscle set is crucial for a valid comparison between pre- and post-surgical synergy structures. The reviewer’s concern is based on the premise that the FDS muscle was included in the pre-surgical analysis for Monkey B but absent from the postsurgical analysis.

      We would like to clarify that this is not the case. Due to the loss of the FDS signal post-surgery, we made the deliberate decision to exclude the FDS muscle from ALL synergy analyses for Monkey B, including the pre-surgical baseline period. This was done for the precise reason the reviewer identifies: to ensure a direct and unbiased “apples-to-apples” comparison and to avoid introducing the lack of this muscle as a confound. Therefore, the changes in synergy structure that we report for Monkey B can be confidently attributed to genuine physiological adaptation rather than an artifact of a changing input dataset.

      (e) Figure 11: The authors talk about a key difference in how Synergy B (the extensor finger) evolved between monkeys post-TT. However, to me this figure feels more like a difference in quantity - the time course than quality, since for both monkeys the aaEMG levels pretty much go back to close to baseline levels - even if there’s a statistically significant difference only for Monkey B. What am I missing?

      We thank the reviewer for this insightful question, as it has prompted us to refine our interpretation of this key finding. The reviewer correctly notes that the recovery trajectories of Synergy B appear different, and we agree that our original explanation can be improved.

      A more parsimonious interpretation, and one that we believe aligns better with the data, is that both monkeys likely underwent a similar ‘arms race’, but we captured different phases of this process. In Monkey A, our recordings (starting Day 29) captured the escalating phase of this neuromuscular conflict. In contrast, for Monkey B, recordings began on Day 20, by which time this rapid escalation had likely already occurred and peaked. This difference in the timing of the ‘arms race’ is consistent with our behavioral observations; Monkey A struggled for a longer period before performing the task proficiently, suggesting a more protracted overall adaptation process. Thus, the apparent difference in the figures is likely a reflection of the observational window and the individual adaptation rate of each animal, rather than a fundamental qualitative difference in their adaptive strategy. We have revised the text to present this more unified and coherent interpretation.

      (f) Lines 408-09 and above: The authors claim that “The development of a compensatory strategy, primarily involving the wrist flexor synergy (Synergy C), appears crucial for enabling the final phase of adaptation”, which feels true intuitively and also based on the analysis in Figure 8, but Figure 11 suggests this is only true for Monkey B. How can these statements be reconciled?

      We believe the reviewer may be referring to Monkey A in their comment, as the strong compensatory effect is indeed seen in this animal. The core of this issue, which we have clarified in our revision, is that both monkeys developed a compensatory tenodesis grasp but used different neural strategies to achieve it.

      For Monkey A, strong evidence for this strategy is provided by a clear temporal shift in the activation of its dedicated wrist flexor synergy (Synergy C). As we have now clarified in the manuscript, the peak of this synergy’s activation moved from occurring just after object contact to just before it, a re-timing well-suited to enable a tenodesis grasp.

      For Monkey B, the strategy was one of subtle re-timing rather than scaling. While the total aggregated activation of its primary flexor synergy (Synergy A) did not significantly increase, its temporal profile shifted. Specifically, activation prior to object contact increased, providing the necessary wrist flexion for its assistive tenodesis grasp, which was kinematically confirmed in Figure 12. This was achieved by reallocating activation from the post-contact phase, resulting in an earlier activation peak for the synergy overall. Crucially, a finer-grained analysis reveals a precise temporal sequence within this synergy’s activation: the wrist flexor component (PL) consistently peaked just before object contact to enable hand opening, while the finger flexor component (FDP) peaked just after contact to secure the grasp.

      This timing resolves the apparent biomechanical conflict. It also reveals that while both monkeys converged on the same biomechanical solution (a tenodesis grasp), the observable neural implementation appeared different. However, we must be cautious in directly comparing the computed synergy structures themselves, as the analysis for Monkey B was performed without the FDS muscle. The apparent “multi-functional synergy” in Monkey B is most likely a consequence of this missing data. What is clear and robust, however, is that both monkeys converged on a remarkably similar temporal solution: they both learned to re-time the activation of their key wrist flexor muscles to the pre-grasp phase.

      In Monkey A, this was observed in the temporal shift of its dedicated wrist flexor synergy (Synergy C). In Monkey B, this was observed in the temporal shift of the Palmaris Longus (PL) muscle itself (which, in our computed synergies, was grouped into Synergy A). This convergence on an identical temporal adaptation, regardless of the computed modular organization, is the key finding. We have revised the manuscript to articulate this more precise and defensible interpretation.

      (3) Experimental design: at least for the monkey who was trained on the “artificial task” (Monkey A), it would have been good if the authors had also tested him on naturalistic grasping, like the second monkey, to see to what extent the neural changes generalise across behaviours or are task-specific. Do the authors have some data that could be used to assess this even if less systematically?

      We thank the reviewer for raising this important point regarding the generalizability of our findings across different behaviors. We fully agree that a direct comparison of both tasks in the same animal would have been a valuable experiment. Unfortunately, we do not have systematic data on naturalistic grasping for Monkey A that would allow for such a direct comparison. We therefore view the two tasks as providing complementary evidence. Monkey A’s data shows the adaptation process during a highly stereotyped behavior, while Monkey B’s data demonstrates that a similar two-phase adaptive process occurs during a more naturalistic, unconstrained task. The convergence of these findings strengthens our overall conclusion that this multi-timescale adaptation is a robust principle of motor learning. Nonetheless, the reviewer raises a fascinating question about the task-specific tuning of motor synergies, which remains an excellent direction for future studies.

      (4) Monkey B’s behaviour pre-tendon transfer seems more variable than that of Monkey A (e.g., the larger error bars in Figure 5 compared to monkey A, the fluctuating crosscorrelation between FDS pre and EDC post in Figure 6Q). This should be quantified to better ground the results since it also shows more variability post-TT.

      We thank the reviewer for this excellent suggestion to formally quantify the presurgery behavioral variability. We have performed the suggested analysis on the "Grip Formation Time" metric (Fig. 5A), which was the comparable metric between the two tasks. Our calculation of the Coefficient of Variation (CV) confirms the reviewer’s observation. Monkey B’s pre-surgery performance was substantially more variable (CV = 81.93%) than Monkey A’s (CV = 46.62%). Furthermore, a non-parametric test for equal variances (Ansari-Bradley test) confirmed that this difference is highly statistically significant (p < 0.0001). We have added a description of this analysis to the Methods and reported this finding in the Results section to provide a clearer context for the baseline differences between the subjects.

      (5) Minor: Figure 12 is interesting and supports the idea that monkeys may exploit the biomechanical coupling between wrist and fingers as part of their functional recovery. It would be interesting to measure whether there is a change in such coupling (tenodesis) over time, e.g., by plotting the change in wrist angle vs change in MCP angle as a scatter plot (one dot per trial), and in the same plot show all the days, colour coded by day. Would the relationship remain largely constant or fluctuate slightly early on? I feel this analysis could also help address my point (1) above.

      We thank the reviewer for this excellent and insightful suggestion. We have performed the suggested analysis for Monkey B, plotting the trial-by-trial relationship between wrist and MCP angles for all recording days (New Figure 13).

      The results clearly show the gradual refinement of the tenodesis coupling. Pre-surgery, there was no correlation (R²=0.00). Immediately post-surgery (Day 22), the relationship was weak and variable (R²=0.16), reflecting an exploratory phase. Over the following weeks, the coupling became progressively stronger and more consistent, with the R² value peaking at 0.58 around Day 56, indicating a robust exploitation of the new strategy. The relationship then stabilized at a moderate level (R² ~0.2-0.3) in the final days. This analysis provides direct kinematic evidence for the slow, gradual skill-learning component of our two-state model. It beautifully complements our response to the reviewer’s first point by visualizing the underlying refinement process that occurred concurrently with the more abrupt neural shifts. We have added this new figure and a description of these results to the manuscript.

      Reviewer #2 (Public review):

      Weaknesses:

      The most notable weakness of the study is the incompleteness of the data. [...] As a result, it is difficult to make general conclusions from the study, and it awaits further analysis or the addition of another subject.

      We thank the reviewer for this critical and accurate assessment of the study’s limitations. The reviewer is correct that the datasets for the two monkeys are incomplete in different ways and that the tasks were not identical. We fully acknowledge these limitations throughout the manuscript. Rather than viewing these differences as a weakness that prevents generalization, we propose that they offer a unique strength in the form of complementary evidence. We consider the two animals not as a direct replication, but as two distinct case studies that test the same underlying hypothesis under different conditions.

      Monkey A, with its high-quality EMG and highly stereotyped task, provides a detailed, quantitative view of the neural adaptation process, allowing us to precisely characterize phenomena like the ‘neuromuscular arms race’.

      Monkey B, with its kinematic data and more naturalistic task, provides crucial evidence that the same fundamental principles, a two-phase adaptation and the eventual development of a compensatory strategy, generalize to a less constrained, more behaviorally relevant context. We believe the key finding is the convergence of the results. Despite the differences in individual strategy, task demands, and available data, both animals demonstrated the same core "swapand-revert" adaptive process. We propose that this convergence from heterogeneous sources lends support to the generalizability of our conclusions, suggesting that the multi-timescale adaptation we describe may be a general feature of motor learning following such perturbations. We agree that future studies with more subjects are needed to fully establish this principle. Nonetheless, we feel that the convergent evidence from these two complementary cases provides a valuable foundation for the model we present.

      A second weakness is the insufficient analysis of the movements themselves, particularly for Monkey A. [...] Since the authors have video data for both monkeys, it is surprising that it was not used to extract landmarks for kinematic analysis, or at least hand/endpoint trajectory, and how it is adjusted over time. Adding more behavior data and aligning it with the EMG data would be very helpful for characterizing motor recovery and is needed to support conclusions about underlying neural control strategies for functional improvement.

      We thank the reviewer for this important suggestion. The reviewer’s comment prompted us to re-examine our behavioral data, and we have now performed additional analyses that we agree provide a much clearer link between the neural changes and functional recovery.

      For Monkey A, we have quantified the ‘pull times’ on a day-by-day basis. This analysis reveals a clear, gradual learning curve: pull times were initially long and variable post-surgery but steadily decreased and stabilized over the recovery period. This provides a direct, quantitative measure of motor performance recovery for this animal.

      For Monkey B, we have performed a detailed analysis of the ‘grasp aperture’ prior to object contact. This kinematic analysis is particularly revealing, as it shows the development of the compensatory strategy in real-time. The grasp aperture was initially very small post-surgery, reflecting the monkey’s inability to open its hand. It then steadily increased over the next ~40 days as the monkey learned and refined the compensatory tenodesis grasp, before stabilizing at a new, functional baseline.

      We believe these new analyses directly address the reviewer’s concern by providing a more detailed picture of motor recovery. The grasp aperture data, in particular, offers a clear kinematic correlate for the slow, skill-learning process that we propose runs in parallel to the more abrupt neural reorganization. We have added these results as a new figure in the main text of our revised manuscript.

      Considering specific conclusions, the statement that the monkeys learned to use “tenodesis” over time by increasing activation of a wrist flexor muscle synergy does not seem to be fully supported by the data. [...] Given these issues, it is not clear how to align the EMG and kinematic data and interpret these findings.

      We thank the reviewer for this detailed and critical analysis. They raise an excellent point and have correctly observed that the adaptation is not a simple, uniform increase in wrist flexor synergy amplitude. Our interpretation, which we have clarified in the manuscript, is that the monkeys learned a more sophisticated strategy: a precise re-timing of the wrist flexor activation to occur earlier in the movement, specifically to pre-shape the hand for the grasp.

      For Monkey A: The reviewer correctly notes that the peak amplitude of Synergy C (the wrist flexor synergy) around the moment of grasp (0% task range) is lower in the final phase compared to baseline. However, the crucial change is temporal: the peak of this synergy’s activation shifts from occurring just after the grasp (~+1%) to occurring just before it (~-2%). This re-timing is perfectly suited to enable finger extension via the tenodesis effect immediately prior to object contact. The subsequent lower amplitude may reflect a more efficient, less forceful movement once this new skill was refined.

      For Monkey B: The reviewer is right that this monkey does not have a dedicated wrist flexor synergy and that the overall amplitude of the PL muscle does not increase dramatically. However, a closer look at its activity profile (Fig. S2-AN) reveals a clear and consistent increase in activation specifically in the pre-contact phase (~7% task range). This is the precise neural signature of the assistive tenodesis grasp that is kinematically confirmed in Figure 12. The monkey is not simply scaling up the synergy; it is strategically activating it earlier to prepare for the grasp.

      In summary, the key evidence linking the EMG to the tenodesis strategy is in the temporal domain. The learned re-timing of the wrist flexor activation to the pre-grasp phase is the crucial link that aligns the neural and kinematic data. We have revised the manuscript to make this distinction between amplitude scaling and temporal shifting clearer.

      A more minor point regarding conclusions: statements about poor task performance and high energy expenditure being the costs that drive exploration for a new strategy are speculative and should be presented as such. Although the monkeys did take longer to complete the tasks after the surgery, they were still able to perform it successfully and in less than a second and no measurements of energy expenditure were taken.

      We thank the reviewer for this important point regarding the precision of our language. We agree that statements regarding ‘high energy expenditure’ and the specific drivers for exploring a new strategy are interpretations of the data, not direct measurements, and should be framed as such.

      Our speculation about energetic cost is based on the significant increase in muscle co-activation we observed (e.g., Fig. 11), a phenomenon widely understood to be metabolically expensive. Similarly, while the monkeys were still successful, their prolonged movement times and inefficient motor patterns represent a clear performance deficit compared to their highly optimized presurgical baseline, which we propose acted as a driver for further adaptation. In our full revision, we have carefully revised the manuscript to soften these claims. We have used more speculative language, such as “we hypothesize that...”, “the likely cost of...”, or “may have provided the impetus for...” to ensure that our interpretations are clearly distinguished from our direct empirical findings.

      A small concern is whether the tendon transfer effect may fail over time, either due to scar tissue formation or tendon tearing, and it would be ideal if the integrity of the intervention were re-assessed at the end of the study.

      We thank the reviewer for raising this important point regarding the long-term integrity of the tendon transfer. We agree that a terminal anatomical re-assessment would be an ideal control. While a terminal assessment was not performed as part of this study’s protocol, we were able to monitor the transfer’s integrity throughout the study. We are confident the transfer remained functionally intact for two key reasons:

      (1) Physical Monitoring: We periodically used ultrasound imaging to non-invasively visualize the tendon repair, which allowed us to confirm its continued physical integrity.

      (2) Functional Evidence: This physical confirmation was corroborated by the functional data. Both animals achieved stable, proficient task performance that was maintained for months. Furthermore, the late-phase neuromuscular control strategies became highly consistent. A significant failure, such as a tendon tear or prohibitive mechanical scarring, would be incompatible with this sustained behavioral and neural stability.

      Nevertheless, we agree that a terminal assessment is an excellent methodological suggestion that should be incorporated into the design of future long-term studies of this nature.

      Reviewer #3 (Public review):

      (1) First, I find myself wondering about the physical healing process from the tendon transfer surgery and how it might contribute to the learning. Specifically, how long does it take for the tendons to heal and bear forces? If this itself takes a few months, it would be nice to see some discussion of this.

      We thank the reviewer for this insightful question about the potential contribution of the physical healing process to the adaptation timeline. Our surgical protocol was specifically designed to ensure the tendon transfer was biomechanically robust from the outset, minimizing the role of healing as a rate-limiting factor.

      We used a Pulvertaft weave technique, which is known to achieve mechanical strength equivalent to that of a native tendon shortly after the procedure (Graham et al., 2023). The repair involved more than two weaves and utilized high-strength suture material to maximize its initial forcebearing capacity. While full fibrous integration around the suture site typically occurs within approximately six weeks, the repair itself was strong enough to bear physiological forces immediately post-surgery. Therefore, the prolonged, complex, two-phase multi-month behavioral recovery and the neural reorganization we observed cannot be attributed to a slow physical healing process. Instead, this supports our conclusion that the observed timeline reflects the challenges and constraints of a purely neural adaptation and skill-learning process. To make this crucial point clear to all readers, we have added these details about the surgical method to the Methods section and included a brief discussion of its implications in the Discussion.

      (2) Second, I see that there are some changes in the muscle loadings for each synergy over the days, though they are relatively small. The authors mention that the cosine distances are very small for the conserved synergies compared to distances across synergies, but it would be good to get a sense for how variable this measure is within synergy. For example, what is the cosine similarity for a conserved synergy across different pre-surgery days? This might help inform whether the changes post-surgery are within a normal variation or whether they reflect important changes in how the muscles are being used over time.

      We thank the reviewer for this excellent and insightful suggestion. Establishing a baseline for normal day-to-day variability is an important control for our synergy analysis.

      We have performed this analysis in full. Specifically, to quantify baseline stability, we calculated the cosine similarity between the spatial synergy weights (W) of each individual recording day and the pre-surgery average. This provides a rigorous measure of day-to-day variability relative to the stable baseline structure. We have added these data to Figure 7 (Panel I), which plots the pre-surgery similarity (blue traces) alongside the post-surgery adaptation (red traces).

      We found that baseline stability was remarkably high, with cosine similarity consistently exceeding 0.99 (e.g., Monkey A: 0.99 ± 0.001). This quantification allows the reader to formally assess that the changes observed post-surgery (e.g., drops to ~0.80 or ~0.60 in Monkey B) are well outside the range of normal physiological fluctuation, representing subtle but genuine structural adaptation.

      (3) Last, and maybe most difficult (and possibly out of scope for this work): I would have ideally liked to see some theoretical modeling of the biomechanics so I could more easily understand what the tendon transfer did or how specific synergies affect hand kinematics before and after the surgery. Especially given that the synergies remained consistent, such an analysis could be highly instructive for a reader or to suggest future perturbations to further probe the effects of tendon transfer on long-term learning.

      We thank the reviewer for this excellent and forward-thinking suggestion. We completely agree that a detailed biomechanical model of the tendon transfer would be a powerful tool for understanding the mechanical consequences of the surgery and for interpreting the function of the recorded muscle synergies. However, creating a subject-specific musculoskeletal model with the fidelity required to accurately simulate synergy-to-kinematic transformations is a highly complex project that we feel is well beyond the scope of the current manuscript. Such an endeavor would constitute a major research project in its own right.

      Our study’s primary focus was to provide a detailed, longitudinal characterization of the in-vivo neural adaptation following this perturbation, a dataset that is itself rare and valuable. We aimed to document the physiological learning process as it unfolded over many months. Nonetheless, the reviewer’s point is exceptionally well-taken. Currently, we are constructing a monkey musculoskeletal model and performing tendon transfer on this model to investigate what kind of characteristics in the learning process reproduce the synergy changes observed in the experiments. Although this project is still in progress, to date, we have demonstrated that the robustness of synergies themselves is necessary for changes in muscle activity at the synergy level (Nakajima N, Wang S, Ogihara N, Oya T, Seki K, Funato T, Upper Limb Musculoskeletal Model of Macaque Monkey for Approaching Adaptation Mechanism to Tendon Transfer, Society for Neuroscience 2023, Washington DC, USA, 2023).

      The rich dataset we have collected in the present research could serve as an excellent foundation for developing and validating such a model in the future. We believe that combining these two approaches is a critical and exciting next step for the field, and we have highlighted this as a key future direction in our discussion.

      Recommendations for the authors:

      Reviewing Editor Comments:

      When revising the manuscript for resubmission, please try to improve the visual presentation of the data, which is a point highlighted by all three reviewers during the discussion, including making the presentation of monkey-specific results more consistent across subjects.

      We have comprehensively revised the figures to ensure a consistent and clear visual presentation, as requested. Specifically, we standardized the layout across all main and supplementary figures (placing Monkey A consistently in the top rows or left columns and Monkey B in the bottom rows or right columns) and applied unified color schemes throughout the manuscript. Furthermore, we harmonized the presentation of the analytical results, such as the specific cross-correlation pairings in Figures 9 and 10, to ensure that the data for both subjects are presented with identical logic, facilitating direct comparison.

      Reviewer #1 (Recommendations for the authors):

      (1) Please revise the writing; some words are missing (line 90), and some sentences could be clarified slightly, even if the paper is well written (lines 317-320). The paragraph including the idea of tenodesis could also be further clarified, I think.

      Thank you for pointing these out. We have corrected the missing word (osteoarthritis) on line 90. We have also revised lines 317-320 to remove ambiguity. Furthermore, the section describing the tenodesis effect (now section "Distinct neural implementations...") has been substantially rewritten for improved clarity, incorporating a more detailed explanation of the biomechanics.

      (2) In the Introduction, the authors cite Hunter and Eckstein 2009 and Mercuri and Muntoni 2013 without describing the pathological conditions; this will not be clear for not nonspecialists.

      Thank you. We have added brief descriptions ("osteoarthritis, a degenerative joint disease," and "muscular dystrophy, which involves progressive muscle weakness,") directly into the Introduction sentence where these references appear.

      (3) Data presentation: I often thought that the data could be presented more clearly:

      (a) For example, Figure 3D and 4D should show error bars around the mean to have a sense of the consistency of pre-lesion behaviour. Same for other figures like Figure 6.

      We appreciate the reviewer's suggestion to visualize data consistency. (a) Figures 3D, 4D, and 6 (EMG Profiles): For these figures, we opted to display mean traces and peak markers to clearly illustrate the temporal shifts and relationships between muscles. Overlaying multiple standard deviation envelopes in these comparative plots would significantly reduce legibility. However, to fully address the reviewer's request to see the consistency of pre-lesion behavior, we direct attention to Supplementary Figure S1, which presents the complete EMG profiles with full error tubes (Mean ± SD) for every recorded muscle. (b) Quantitative Analysis Figures: We ensured that variability is explicitly visualized in all statistical analyses. The crosscorrelation time-courses in Figures 6 (G-Q), 9, and 10 are plotted with shaded error tubes to show variance. Similarly, the aggregated EMG analysis in Figure 11 utilizes bar plots with explicit error bars to quantify the statistical consistency of the changes.

      (b) The autocorrelation analysis in Figure 6 should also include measures of lag if it’s not at zero lag. If it’s the latter, please specify it in the Methods.

      We thank the reviewer for this question regarding the cross-correlation analysis presented in Figure 6 (Panels G-J, P-Q). We confirm that this analysis was performed at zero time lag. To clarify this, we have added a sentence to the Methods section (Subsection "Crosscorrelation analysis") explicitly stating that the EMG cross-correlations shown in Figure 6 were calculated at zero lag. We have also added a clarifying note ("at zero time lag") to the description of these panels within the Figure 6 caption.

      (c) Seeing EMG patterns similar to those presented in Figures 3D and 4D at different times post-lesion (e.g., as a Supplementary figure) would also give readers a better intuition of the neural changes.

      We thank the reviewer for this suggestion to provide more intuitive examples of the neural changes. We realize we did not sufficiently highlight this in the main text, but this complete data is already available in the manuscript. Supplementary Figures S1 and S2 provide a comprehensive overview of the EMG patterns for all recorded muscles in Monkey A and Monkey B, respectively. These figures show the pre-surgery and post-surgery average profiles for all recording sessions as well as the average profiles from five different post-surgery landmark days, covering the entire adaptation period. We have added explicit cross-references to these figures in the main text.

      (d) I couldn’t fully understand the analysis in Figure 4E; clarify.

      We thank the reviewer for noticing this oversight. The reviewer is correct that Figure 4E was not referenced in the main text. This panel was intended to show the baseline kinematic profiles (MCP and wrist angles) for Monkey B's control session, corresponding to the average EMGs shown in panel 4D. Given that our more comprehensive kinematic analyses are now presented in Figure 12 and the new Figure 13, we believe panel 4E is largely redundant. To improve the clarity and focus of Figure 4, we have removed panel 4E and its description from the revised manuscript.

      (e) Some figures showing neural changes (e.g., Figures 6G-J, 6P,Q, Figures 9 and 10, and even Figure 11 for different reasons) would become more understandable if they were accompanied by the behavioural changes (e.g., something like Figure 5A on top of them).

      We agree that visualizing the temporal link between neural reorganization and behavioral recovery is essential for interpreting the data. We have implemented this suggestion by overlaying behavioral metrics onto the right y-axes of Figures 6 (G-Q), 9, 10, and 11. However, regarding the specific behavioral metric, we opted to overlay the maladaptive behavior/aberrant reaching metric (from Figure 5B) rather than the grip formation time (Figure 5A). We found that the maladaptive behavior profile provided a clearer and more direct correlate to the neural data, as its peak coincides precisely with the ‘swapped’ synergy phase, thereby effectively illustrating the functional cost of that specific neural state.

      (f) Some figure captions could be improved by adding more detail (e.g., for Figure 6).

      We agree. We have substantially expanded and improved the captions for Figure 6 and Figure 7 to make them more self-contained and guide the reader more effectively through the key findings presented in the panels. We have also reviewed other captions for clarity.

      (g) I’d show the cosine distance between synergies across days as a main figure, e.g., as part of Figure 7, because this is an important result.

      We agree that the longitudinal stability of the synergy structures is a crucial result that deserves prominence. We have implemented this suggestion by adding a new panel, Figure 7 (I, K) for primary synergies and Figure 8 (K, L) for secondary synergies, which plots the cosine similarity of the spatial synergy weights across the entire experimental timeline. This figure explicitly visualizes the high stability of the pre-surgery baseline (blue traces, similarity > 0.99) and contrasts it with the dynamic structural tuning observed during the post-surgery adaptation (red traces), providing a clear, day-by-day account of synergy evolution as requested.

      (h) In Figure 7C, D and G, H, it’d be interesting to also see in the background the EMG for the transferred muscle that belongs to each synergy, to appreciate their relationship.

      We thank the reviewer for this suggestion. To illustrate the close relationship between the primary synergies and their key constituent muscles, while avoiding visual clutter in the complex post-surgery plots, we have modified the pre-surgery panels of Figure 7 (C, D, G, H). In these panels, we have now overlaid the average pre-surgery EMG profile of the primary transferred muscle belonging to that synergy (e.g., FDS for Synergy A, EDC for Synergy B) as a thin, gray, dashed line. This visually confirms the tight correlation between the synergy profile and the muscle’s activity at baseline.

      (i) In page 10, the authors report as maladaptive behaviour the duration of the aberrant reaching component from day 29 (monkey A) and day 20 (monkey B). What was happening before those recording dates? Were the monkeys recovering?

      Thank you for this question. We have added two sentences to the start of the Results section (“Functional Recovery Follows...”) clarifying that the period between surgery and formal recordings included approximately one week of home cage recovery followed by several weeks of assisted task practice. Formal recordings began once the monkeys could perform the task consistently without assistance.

      (j) In the Methods (EMG Analysis), the authors state that they resumed their recordings post-TT “once they (the monkeys) were able to perform the task on their own”. It would be good if the authors made this more precise (e.g., based on success rate or another metric).

      We thank the reviewer for this suggestion to increase precision. We have revised the Methods section to include the specific criteria used for resuming post-surgical recordings. Recordings were restarted once the monkeys were able to perform the task independently (i.e., without assistance from the experimenter) and consistently achieved a successful trial count of at least 100 trials within a single experimental session.

      (k) Line 266- reads “Alternation of EMG activity in non-transferred muscle suggests one possibility: TT might alter the control strategy of coordinated muscle activity for hand movement by modifying the transferred muscles and their agonists as a cohesive unit”, however, some “muscles showed patterns that were incompatible with a simple swap” (Lines 255-256). Doesn’t this observation suggest that what happens is not a simple change in muscle synergies?

      We thank the reviewer for this insightful question regarding the interpretation of muscles with adaptive patterns incompatible with the primary ‘swap-and-revert’. We agree that these observations require careful consideration within the modular framework. Our interpretation is that these muscles do not represent evidence against modular control, but rather reflect the involvement of multiple modules adapting concurrently. Specifically, muscles like FCR and PL, which showed distinct patterns, are primary members of Synergy C (the wrist flexor synergy) in Monkey A. Their adaptive profile is therefore consistent with the task-specific recruitment and retiming of Synergy C as part of the compensatory tenodesis strategy, rather than being a deviation from the swap observed in Synergies A and B. Synergies represent the dominant, shared variance in muscle activity. While they capture the overall strategy, some degree of individual muscle variation or the influence of secondary synergies is expected. We have added a sentence to the Results section to clarify that these diverse patterns likely reflect the differential involvement of muscles in multiple adapting synergies. We believe the overall evidence still strongly supports the modulation of stable synergies as the primary mechanism of adaptation in this paradigm.

      (l) You may want to call synergy A and synergy B, synergy F and synergy E to make recall easier? (Same for synergy C and D, which could be F2 and E2).

      We thank the reviewer for this helpful suggestion aimed at improving clarity. We considered renaming the synergies based on function (e.g., F/E). However, given the number of figures and the complexity of a global change, and the fact that the functional roles of Synergies C and D differed between animals, we decided to retain the original A/B/C/D labels for consistency. To ensure clarity for the reader, we have carefully checked the manuscript to ensure that we consistently define the primary functional role of each synergy (e.g., "Synergy A, the primary finger flexor synergy") when it is discussed.

      (m) Lines 315-317 - “These pattens of changes in synergy 3 and 4, both contributed minimally to the EMG of transferred muscles” -> This statement puts the causality as synergies cause muscles to activate according to certain patterns, which is supported by work by several groups -including the authors- however, they could also reflect biomechanical and task constraints as other have argued; perhaps this tone would be better for the discussion?

      We thank the reviewer for this nuanced point regarding the interpretation of synergy contributions. We agree that the causal relationship between computed synergies and muscle activity is complex and can reflect both neural commands and task constraints. To address this, we have revised the sentence in question in the Results section. Instead of stating that the synergies "contributed minimally," we now state that the changes in these synergies "were associated with minimal EMG activity in the transferred muscles." This phrasing is more descriptive of the observation and less implicitly causal, while retaining the key point within the flow of the results. The subsequent sentences, which offer interpretation, are already framed speculatively ("This suggests...", "may have served...").

      (n) Line 403 How do the authors conclude from the synergy patterns in Figure 11 that the early post-TT is characterised by “an unstable and inefficient neural control strategy”? To me, this is shown clearly in the behaviour, not in these plots, unless I’m missing something?

      We thank the reviewer for this comment, which highlights the need to clearly connect our neural findings to the behavioral outcome. The reviewer is absolutely correct that the behavioral data (Fig. 5) provides the most direct evidence of instability and inefficiency during the early adaptation phase. Our intention was to argue that the neural patterns observed in Figure 11 provide a physiological correlate for this behavioral inefficiency. Specifically, the escalating aggregated EMG activity observed in the conflicted extensor synergy (Synergy B), which we term the ‘arms race’, represents significant muscle co-activation. Such co-activation is widely understood to be energetically costly and reflects a suboptimal control strategy where the CNS is essentially "fighting itself" against the altered mechanics. To make this link clearer, we have revised the concluding sentence of the relevant paragraph in the Discussion ("The early adaptation phase...") to explicitly state that this escalating co-activation is a known marker of inefficient recruitment and that it occurred concurrently with the period of poor behavioral performance shown in Figure 5.

      (o) Lines 469-471. The authors suggest that muscle synergies may be preserved post-TT because a modular approach (to motor control) may be computationally easy and metabolically cheap. To me, recent data suggest that the most parsimonious explanation is what they later say: that the nervous system may not be plastic enough to change this (e.g., see Makin and Krakauer, “Against reorganisation” also in eLife).

      We thank the reviewer for raising this important theoretical point and for referencing the relevant literature on constraints on cortical reorganization. We agree that the preservation of muscle synergies in the face of such a profound perturbation is a key finding that warrants careful interpretation. In our revised Discussion (section "The CNS Defaults to a Modular Strategy..."), we have now explicitly incorporated the perspective that synergy stability may reflect inherent constraints on neural plasticity, citing Makin and Krakauer (2023), alongside our original hypothesis regarding computational and metabolic efficiency. We present these ideas not as mutually exclusive, but as potentially complementary factors that both contribute to the CNS’s apparent preference for modulating existing modules rather than fundamentally restructuring them.

      (p) Lines 501-503. Also on interpretation. Would the metabolic cost indeed be much higher? Couldn’t the observed change in strategy be explained purely based on performance metrics?

      This is an important point. We agree that statements regarding high energy expenditure are interpretations, not direct measurements. We have carefully revised the manuscript (Abstract, Results, and Discussion) to soften these claims, using more speculative language (e.g., "likely costly," "what we propose was...") to clearly distinguish our interpretations from direct empirical findings.

      (q) Lines 538-. The authors link the initial adaptation phase to the fast process reported in adaptation studies and say that this leads to poor retention. However, it seems from their data that the behaviour is stable across (early) days, so doesn’t this rule out such an interpretation?

      We thank the reviewer for this insightful question regarding the interpretation of the early adaptive phase within the two-state model framework. The reviewer correctly notes that the early post-surgical behavior, while maladaptive, appeared relatively stable across days and did not show the rapid decay sometimes associated with the "poor retention" characteristic of the fast system. We agree that this apparent stability requires careful interpretation. In our revised Discussion (section "A Multi-Timescale Model..."), we now propose that the fast system is primarily responsible for the initial, rapid adoption of the ‘swap’ strategy in response to the large error signal. The subsequent persistence of this flawed but stable state for several weeks is likely not due to strong retention by the fast system itself, but rather reflects the time required for the parallel slow system to gradually develop a more effective compensatory strategy (i.e., the tenodesis grasp). Once this alternative strategy became viable, it enabled the abrupt "switchback," which we also attribute to the fast system recalibrating away from the highly costly swap strategy. Therefore, we believe our data is consistent with the involvement of a fast system driving rapid strategic shifts, even if the typical "poor retention" phenotype is masked by the lack of a viable alternative strategy during the early phase.

      Reviewer #2 (Recommendations for the authors):

      (1) The discussion would benefit greatly from a more careful comparison with prior work characterizing the response to experimental or clinical tendon or nerve transfer in different models.

      We thank the reviewer for suggesting these important references and for the recommendation to compare our findings more carefully with prior work. This is an excellent point, and we agree it will significantly strengthen the discussion. In our full revision, we have added a new paragraph to the Discussion section dedicated to this comparison. We discuss how our findings relate to classic work showing primate adaptive capacity beyond simple maladaptive responses (Sperry, 1947), EMG evidence for the persistence of original neural patterns alongside new ones in human patients (Illert et al., 1986), the critical role of altered peripheral biomechanics and myofascial force transmission in complicating adaptation (Maas & Huijing, 2012), and how our observation of synergy stability aligns with evidence for modular adaptation strategies (Berger et al., 2013). This comparison helps situate our unique findings of a multi-timescale process and synergy timing modulation within the broader context of motor relearning after musculoskeletal rearrangement.

      (2) Line 90 - Which disease or condition is studied in Hunter and Eckstein (2009)?

      Thank you. We have clarified this in the Introduction; the reference pertains to osteoarthritis.

      (3) Line 280 for clarity in text and as a reminder to the readers, please state which muscles are involved in each synergy grouping.

      We have updated the text (Results, 'Adaptation occurs through modulating...') to explicitly list the main contributing muscles for each synergy grouping (e.g., Synergy A: FDS and FCU for Monkey A). This provides the requested clarity regarding the functional identity of each synergy while maintaining readability. For the complete, quantitative muscle weight composition including minor contributors, we referred the reader to Figure 7 and Supplementary Table 1.

      (4) Line 180 There are differences in the time course for measurements between the behavioral metrics and EMGs. If not recorded at fixed time intervals, the differences in the time courses for the two monkeys should be explained.

      We thank the reviewer for this question regarding the time courses of our measurements. We interpret this comment in two ways, both of which we have addressed in the revised manuscript.

      First, if the reviewer is asking about the overall recording schedule, they are correct that sessions were not performed at fixed daily intervals, and the specific days sampled differed between monkeys. This non-uniform sampling was due to the practical constraints of longterm behavioral experiments (e.g., animal cooperation, scheduling, weekends) and the aim to capture data during key phases of adaptation. However, within any given session, behavioral (video) and EMG data were always collected concurrently.

      Second, if the reviewer is asking whether the set of days included differs between the behavioral plots (e.g., Fig 5) and the EMG/synergy plots (e.g., Figs 6, 9-11), this is a possibility depending on data quality criteria. Our criterion for including a session in the behavioral analysis was a minimum of 20 successful trials. However, for the more demanding synergy analysis, we required a higher minimum of 100 successful trials to ensure robust factorization. It is possible that a few sessions met the behavioral criterion but not the synergy criterion and were thus excluded from the latter analysis, leading to slight differences in the days presented across figures. To ensure full clarity, we have added text to the Methods section explicitly stating: (A) the rationale for the non-uniform daily sampling schedule, and (B) the specific minimum trial count criteria used for including data in the behavioral versus the synergy analyses, noting if this resulted in different sets of days being analyzed for different figures.

      (5) General figure comments - The figures are informative, but they could be better presented, designed, and formatted to explain the important results in the paper. The figures should be able to explain most of the key results without entirely referring to the text to find some of the details. I had a bit of trouble understanding Figure 9 & 10. I would also like to suggest that bringing raw data into some figures (e.g., EMG of different muscle groups), such as showing stability between the synergies, could improve the results and allow the story to flow with more clarity. Likewise, clearly showing the differences between baseline EMG measurements and post-surgery measurements could improve some of the result figures.

      We thank the reviewer for these important general comments on data presentation. We agree that the figures are the key to our story and are implementing several revisions based on this and other reviewer feedback to improve their clarity.

      General Presentation: We have conducted a thorough review of all figures to improve layout, consistency, and font legibility (addressing R3, 1 and the Reviewing Editor's comments). This includes adjusting the layouts of Figures 3, 4, and 6 for better alignment and clarity.

      Figures 9 & 10 (Cross-correlation): The reviewer mentioned having trouble understanding these figures. In our revision, we have substantially rewritten the captions for Figures 9 and 10 to be much more descriptive. We explicitly walk the reader through how to interpret the plots (e.g., "The ‘swap’ is evidenced by the drop in self-correlation... and a concurrent rise in antagonist-correlation...").

      Including "Raw Data" (EMG): We thank the reviewer for this suggestion to provide more intuitive examples of the neural changes. We realize we did not sufficiently highlight this in the main text, but this complete data is already available in the manuscript. Supplementary Figures S1 and S2 provide a comprehensive overview of the EMG patterns for all recorded muscles in Monkey A and Monkey B, respectively. These figures show the pre-surgery and post-surgery average profiles for all recording sessions as well as the average profiles from five different post-surgery landmark days, covering the entire adaptation period. These figures directly visualize the swap-and-revert pattern in the transferred muscles and their agonists (e.g., EDC, ED23), as well as the diverse and complex adaptations in other nontransferred muscles (e.g., FCR, PL), as requested. To make this clearer, we have added explicit cross-references to Supplementary Figures S1 and S2 within the main Results section to ensure readers are directed to this detailed data.

      Showing Differences (Pre vs. Post): To "clearly show the differences between baseline... and post-surgery measurements," we implemented the point-by-point statistical comparison of pre- vs. final-day synergy profiles (as suggested in R1, 2b). This has resulted in a new Supplementary Figure visually highlighting the precise periods in the task where the final profiles still differ significantly from baseline (Fig. S9).

      We believe these additions (new figures and improved captions) will make the results much clearer and more self-explanatory, as the reviewer suggested.

      (6) Figure 1 A table with all the acronyms would help with identifying all the muscles and their respective synergies (supplemental), especially when describing the muscles in the result of the discussion section.

      This is an excellent suggestion. We have created a comprehensive table (Supplementary Table 1) listing all muscle abbreviations, full names, primary functional groups, and assigned synergies for both monkeys. We have added a reference to this table in the Figure 1 caption and the Methods section.

      (7) Figure 2 - is this mainly from Monkey A? If so, it should be stated.

      We thank the reviewer for pointing out this omission. We have updated the caption for Figure 2 to clarify that the example data shown (ultrasound, trajectories, and quantitative plots) are from Monkey A.

      (8) Figure 3 & Figure 4 seems unbalanced because of the descriptive need to explain Monkey B’s tasks? The figure alignments could be better.

      We thank the reviewer for this comment on the visual presentation of Figures 3 and 4. The reviewer’s observation that the figures appeared ‘unbalanced’ was correct. This was a direct consequence of two issues: (1) the different tasks required slightly different schematics (the "descriptive need" the reviewer mentioned), and (2) the original Figure 4 contained an additional kinematic panel (formerly 4E) that was unique to Monkey B, which broke the parallel structure with Figure 3.

      To address this and significantly improve the alignment, we have now moved the unique kinematic panel (formerly 4E) to a new Supplementary Figure (Supplementary Figure S8). This change has allowed us to re-arrange the panels in Figures 3 and 4 so that they now follow the exact same order. We have also adjusted the layout to ensure that corresponding panels are of a consistent size. We agree that this creates a much better visual balance and makes the comparison between the two monkeys far more direct and clear, as the reviewer suggested.

      (9) Figure 5. It seems like the animals can still perform the task post-surgery, but with high variability. Maybe emphasize the differences in variability between baseline and postsurgery?

      We thank the reviewer for this suggestion to emphasize the changes in variability. We have now quantified this using the Coefficient of Variation (CV) for key behavioral metrics across different phases (Pre-surgery, Early, Mid, Late post-surgery). The results confirm the reviewer’s observation of high variability post-surgery, particularly in the early phase. For instance, Monkey A’s grip formation time CV spiked dramatically (Pre: 47% vs Early: 133%), while Monkey B’s remained high (Pre: 82% vs Early: 76%). Interestingly, while Monkey A’s variability returned close to baseline levels in the late phase (Late: 55%), Monkey B’s variability increased further (Late: 97%), suggesting persistent inconsistency despite functional recovery.

      We also observed metric-specific changes. Monkey A’s pull time became less variable than baseline later on (Pre: 65% vs Late: 43%), suggesting refinement of that action. Conversely, Monkey B’s grasp aperture remained consistently low throughout (Pre: 26% vs Late: 19%), indicating relatively precise kinematic control was maintained or quickly regained. We have added a summary of these findings to the Results section to provide a more complete picture of how behavioral variability evolved relative to baseline during the adaptation process.

      (10) Figure 6 quite a confusing figure. This figure needs to be better presented. The figure legends are hard to see for Monkey A vs Monkey B. At first, I thought Monkey B’s figure legend also represented Monkey A. I would suggest reorganizing the figures for clarity and coherence.

      We agree that the original presentation of Figure 6 was dense and potentially confusing. We have completely reorganized the figure to improve clarity and coherence.

      (1) Clear Separation: The figure is now structured with a strict separation between Monkey A (Left Panels, A-J) and Monkey B (Right Panels, K-Q), with prominent headers for each subject to prevent ambiguity.

      (2) Improved Legends: We have redesigned the legends to be larger and placed them explicitly within their respective subject’s section to ensure it is immediately clear which data they describe.

      (3) Visual Consistency: We have standardized the color schemes and axis layouts across this and all other figures to reduce cognitive load and facilitate easier comparison between subjects.

      (11) Figure 12 - This figure is incomplete without Monkey A’s results. The videos in the supplemental sections seem clear enough for some kinematic analysis. The story could be more supported with more thorough measurements of the kinematics from both animals to show how they differ over time and by highlighting the two phases. As a minor note, it would be helpful to present the kinematic data together with a schematic of when during the task the data are drawn from, using the % task range scale, since that is the standard throughout the paper.

      We thank the reviewer for their suggestions regarding the kinematic analysis. We agree that a parallel kinematic analysis for Monkey A, similar to that in Figure 12, would be ideal. We did attempt this. Unfortunately, while the supplemental videos for Monkey A are sufficient for observing the overall movement trajectory, they are not suitable for the detailed joint angle analysis the reviewer suggests. The videos for Monkey A were recorded at an insufficient frame rate that did not allow to reliably extract the rapid joint angle positions of the wrist and fingers during the grasping movement. This is the reason why this detailed kinematic analysis was limited to Monkey B, for which we had high-speed video recorded at 240 fps, allowing for a robust analysis of these fast movements.

      We have, however, expanded our kinematic analysis for Monkey B to show the refinement of the tenodesis strategy over the full time course (New Figure 13), which does help to highlight the different adaptive phases for that animal. We have also clarified in the manuscript (e.g., in the caption for Figure 12) that the lack of Monkey A data for this specific analysis was due to the lowresolution and low-frame-rate video available.

      We agree that defining the precise timing of the kinematic snapshot relative to our normalized task range is critical for accurate interpretation. In response, we have added a new panel (Figure 12C) that explicitly maps the kinematic snapshot to our standardized task timeline. This schematic clarifies that the joint angle analysis captures the hand configuration during the pre-shaping phase, specifically at 83 ms prior to object contact (which corresponds to -0.02% of the normalized task range). This ensures the kinematic data can be directly interpreted within the same temporal context as the EMG and synergy results presented throughout the paper.

      Reviewer #3 (Recommendations for the authors):

      First and most major: I found many of the figures much too small and incredibly difficult to read. Possibly the most difficult was Figure 7, where I had to zoom in a great deal to read what muscles corresponded to which bars. I don’t have specific suggestions here other than to make sure that figures are legible.

      We thank the reviewer for highlighting this important issue. We have comprehensively revised the figures to ensure they are legible at standard publication sizes. Specific improvements include:

      (1) Figure 7: We have significantly increased the font size of the x-axis muscle labels and optimized the bar chart spacing to ensure the muscle identities are readable without excessive zooming.

      (2) Global Updates: Across all figures, we have increased font sizes for axis labels and titles, removed unnecessary whitespace to maximize the data-to-ink ratio, and exported all final figures in high-resolution vector formats to ensure clarity.

      Second and more minor: I liked the setup of the manuscript, where the authors explained the unique benefits of their experimental methods and the question they were going after (“When confronted with structural changes to the musculoskeletal system, does the CNS adapt by modulating existing synergies, or by shifting toward more fractionated control strategies?”). However, the evolution of the paper made the answer to this question seem very confusing to me as I read it. The results show that monkeys initially modulated existing synergies in phase 1, but then reverted to the original modulation. This, in addition to the way the question was set up initially, made me think the conclusion was going to be that the synergies themselves changed in the second phase, but this paradoxically was not the case--synergies were stable throughout. I was left confused for the back half of the results section, until the discussion on tenodesis and developing compensatory movement strategies. So the answer is that the monkey learns by modulating existing synergies, but using different strategies in different learning phases. I’m not entirely sure how to avoid this confusion, but I wonder if there’s a way to foreshadow this finding earlier on.

      We thank the reviewer for this valuable feedback on the manuscript’s narrative structure. We understand how the initial framing (modulation vs. fractionation) followed by the reversion of the initial modulation could lead to confusion before the compensatory strategy is fully introduced. To address this, we have made two key adjustments in the revised manuscript:

      (1) In the Introduction, after posing the central question, we have added a sentence to subtly foreshadow that the adaptive process might be complex and multi-phasic, requiring analysis over extended timescales.

      (2) In the Results section, at the transition point between describing the reversion of the primary synergy timings and introducing the compensatory tenodesis strategy, we have added a short paragraph to explicitly signal that the reversion was not the complete solution and that a distinct compensatory strategy emerged concurrently.

      We believe these changes improve the narrative flow, provide better signposting for the reader, and mitigate the potential for confusion identified by the reviewer, making it clearer that the ultimate solution involved modulating existing synergies but via different strategies across distinct learning phases. We appreciate the reviewer’s help in identifying this area for improvement.

    1. eLife Assessment

      This useful study uses a chemoinformatics pipeline to identify a list of candidate mosquito repellants that may be pleasant to smell and safe for humans. The strength of evidence and in particular the computational methodology are incomplete because it is insufficiently benchmarked against other leading models. At the high concentrations tested, there may also be off-target effects of the repellents on the mosquitoes that are not considered.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors set up a pipeline to predict insect repellents that are pleasant and safe to humans. This is done by daisy chaining a new classification model based predicting repellents with a published model on predicting human perception. Models use a feature-engineered selection of chemical features to make their predictions. The predicted molecules are then validated against a proxy humanoid (heated brick) and its safety is tested by molecular assays of human cells. The humanistic approach to modeling these authors have taken (which consider cosmetic/aesthetic appeal and safety) is novel and a necessary step for consumer usage. However, the importance of pleasantness over effectiveness is still up for debate (DEET is unpleasant but still used often) and the generalization of safety tests is unknown and assumed. The effectiveness of the prediction models is also still warranted. They pass the authors own behavioral tests, but their contribution to the field is unknown as both models (new and published) have not been rigorously bench-marked to previous models. Moreover, the author's breadth of literature in this field is sparse, ignoring directly related studies.

      Strengths:

      Humanistic approach to modeling consider pleasantness and safety. Chaining models can help limit the candidate odorants from the vastness of odor space.

      Weaknesses:

      The current models need to be bench-marked against leading models predicting similar outcomes. Similarly, many of these papers need to be addressed and discussed in the introduction. The authors might even consider their data sources for model training to increase performance and lexical categorization for interoperability. For instance, the Dravnikes data lexicon, currently used in the human perception lexicon, has been highly criticized for its overlapping and hard to interpret descriptive terms ("FRAGRANT", "AROMATIC").

      Human Perception<br /> Khan, R. M., Luk, C. H., Flinker, A., Aggarwal, A., Lapid, H., Haddad, R., & Sobel, N. (2007). Predicting odor pleasantness from odorant structure: pleasantness as a reflection of the physical world. Journal of Neuroscience, 27(37), 10015-10023.

      Keller, A., Gerkin, R. C., Guan, Y., Dhurandhar, A., Turu, G., Szalai, B., ... & Meyer, P. (2017). Predicting human olfactory perception from chemical features of odor molecules. Science, 355(6327), 820-826.

      Gutiérrez, E. D., Dhurandhar, A., Keller, A., Meyer, P., & Cecchi, G. A. (2018). Predicting natural language descriptions of mono-molecular odorants. Nature communications, 9(1), 4979.

      Lee, B. K., Mayhew, E. J., Sanchez-Lengeling, B., Wei, J. N., Qian, W. W., Little, K. A., ... & Wiltschko, A. B. (2023). A principal odor map unifies diverse tasks in olfactory perception. Science, 381(6661), 999-1006.<br /> Related cleaned data: https://github.com/BioMachineLearning/openpom

      Insect Repellents:<br /> Wright, R. H. (1956). Physical basis of insect repellency. Nature, 178(4534), 638-638.

      Katritzky, A. R., Wang, Z., Slavov, S., Tsikolia, M., Dobchev, D., Akhmedov, N. G., ... & Linthicum, K. J. (2008). Synthesis and bioassay of improved mosquito repellents predicted from chemical structure. Proceedings of the National Academy of Sciences, 105(21), 7359-7364.

      Bernier, U. R., & Tsikolia, M. (2011). Development of Novel Repellents Using Structure− Activity Modeling of Compounds in the USDA Archival Database. In Recent Developments in Invertebrate Repellents (pp. 21-46). American Chemical Society.

      Wei, J. N., Vlot, M., Sanchez-Lengeling, B., Lee, B. K., Berning, L., Vos, M. W., ... & Dechering, K. J. (2022). A deep learning and digital archaeology approach for mosquito repellent discovery. bioRxiv, 2022-09.

      The current study assumes that insect repellents repel via its odor valence to the insect, but this is not accurate. Insect repellents also mask the body odor of humans making them hard to locate. The authors need to consult the literature to understand the localization and landing mechanisms of insects to their hosts. Here, they will understand that heat alone is not the attractant as their behavioral assay would have you believe. I suggest the authors test other behaviors assays to show more convincing evidence of effectiveness. See the following studies:

      De Obaldia, M. E., Morita, T., Dedmon, L. C., Boehmler, D. J., Jiang, C. S., Zeledon, E. V., ... & Vosshall, L. B. (2022). Differential mosquito attraction to humans is associated with skin-derived carboxylic acid levels. Cell, 185(22), 4099-4116.

      McBride, C. S., Baier, F., Omondi, A. B., Spitzer, S. A., Lutomiah, J., Sang, R., ... & Vosshall, L. B. (2014). Evolution of mosquito preference for humans linked to an odorant receptor. Nature, 515(7526), 222-227.

      Wei, J. N., Vlot, M., Sanchez-Lengeling, B., Lee, B. K., Berning, L., Vos, M. W., ... & Dechering, K. J. (2022). A deep learning and digital archaeology approach for mosquito repellent discovery. bioRxiv, 2022-09.

      Comments on revisions:

      The revisions made to the manuscript do not fully address the concerns raised in the previous round of review. The authors are encouraged to consider the following points to strengthen the work.

      The benchmarking of the human perception models against Keller et al. (2017) and Gutiérrez et al. (2018) is insufficient, as the field has progressed considerably in the last five years with newer approaches using larger data sources. Benchmarking against more recent models would better situate the contribution of this work.

      The exclusion of human repellency data from preprint Boyle et al. (2016) is worth reconsidering. For a study that takes an explicitly human-centric modeling approach, human behavioral data on repellency, pleasantness, and usage intent would directly support the central claims of the manuscript.

      The key claims regarding repellency and consumer acceptability would be considerably strengthened by the addition of these data.

    3. Reviewer #2 (Public review):

      Summary:

      This is an interesting study that seeks to identify novel mosquito repellents that smell attractive to humans. This is the second time I have reviewed, and the authors have not done anything to address the weaknesses. Although the subject matter may provide important new information for the development of new repellents, its current breadth is limited without additional assays. Arm-in-cage assays, testing the longevity of the new repellents, other ML analyses and confusion matrices, would strengthen the manuscript and demonstrate innovation. The lack of cohesion and new experimental results weakens the manuscript.

      Strengths:

      The combination of standard machine learning methods with mosquito behavioral tests is a strength.

      Weaknesses:

      The study would be strengthened by describing how other modern ML approaches (RF, decision trees) would classify and identify other potential repellents.

      A comparison of the repellent activity between DEET and the top ten hits identified in this new study indicates little change in repellent activity (~3%), suggesting that DEET remains the gold standard. Without additional toxicity tests and longevity tests, the study is arguably incremental. The study's novelty should be better clarified.

      The Methods in the repellency tests are sparse, and more information would be useful. Testing the top repellents at low doses (<<1%) and for long periods (2-12 h) would strengthen the manuscript. Without this information, the manuscript is lacking in depth.

      Testing human subjects on their olfactory percept of the repellents would also increase the depth and utility of the manuscript. Without additional experiments, the authors' conclusions lack support and have limited impact on the state-of-the-art.

      This manuscript is a mix of different approaches, which makes it lack cohesion. There is the ML method for classifying new repellents that smell good, but no testing of the repellents on human volunteers. The repellents are not tested at realistic concentrations and durations. And the calcium mobilization test is strange, and makes little sense in the context of the other experiments and framing of the manuscript.

      Comments on revisions:

      The authors have a potentially strong manuscript. However, I would urge the authors to address the reviewer comments in a substantive manner.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Mosquito-transmitted diseases cause nearly a million deaths every year and significant worldwide morbidity. Moreover, the geographical range of mosquito vectors is rapidly expanding due to climate change and mosquito-borne disease risks are emerging in new parts of the world.

      Innovation in finding new repellents has been slow due to limitations in current research approaches and high costs for EPA registration (especially for synthetic compounds). Since DEET was discovered in the 1940s only a handful of additional actives have been approved by the EPA for repellent products. In the 20+ years since discovery of insect odorant receptors from genomes, not a single novel repellent compound has been identified that was registered by the EPA. Thus, there is a both a strong need for new approaches to find insect repellents and need for new active ingredients that are safe and strategically effective.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors set up a pipeline to predict insect repellents that are pleasant and safe for humans. This is done by daisy-chaining a new classification model based on predicting repellents with a published model on predicting human perception. Models use a feature-engineered selection of chemical features to make their predictions. The predicted molecules are then validated against a proxy humanoid (heated brick) and its safety is tested by molecular assays of human cells. The humanistic approach to modeling these authors have taken (which considers cosmetic/aesthetic appeal and safety) is novel and a necessary step for consumer usage. However, the importance of pleasantness over effectiveness is still up for debate (DEET is unpleasant but still used often) and the generalization of safety tests is unknown and assumed. The effectiveness of the prediction models is also still warranted. They pass the authors' own behavioral tests, but their contribution to the field is unknown as both models (new and published) have not been rigorously benchmarked to previous models. Moreover, the author's breadth of literature in this field is sparse, ignoring directly related studies.

      Strengths:

      Humanistic approach to modeling considers pleasantness and safety. Chaining models can help limit the candidate odorants from the vastness of odor space.

      Weaknesses:

      The current models need to be bench-marked against leading models predicting similar outcomes. Similarly, many of these papers need to be addressed and discussed in the introduction. The authors might even consider their data sources for model training to increase performance and lexical categorization for interoperability. For instance, the Dravnikes data lexicon, currently used in the human perception lexicon, has been highly criticized for its overlapping and hard-to-interpret descriptive terms ("FRAGRANT", "AROMATIC"). 

      Human Perception:

      Khan, R. M., Luk, C. H., Flinker, A., Aggarwal, A., Lapid, H., Haddad, R., & Sobel, N. (2007). Predicting odor pleasantness from odorant structure: pleasantness as a reflection of the physical world. Journal of Neuroscience, 27(37), 10015-10023.

      Keller, A., Gerkin, R. C., Guan, Y., Dhurandhar, A., Turu, G., Szalai, B., ... & Meyer, P. (2017). Predicting human olfactory perception from chemical features of odor molecules. Science, 355(6327), 820-826.

      Gutiérrez, E. D., Dhurandhar, A., Keller, A., Meyer, P., & Cecchi, G. A. (2018). Predicting natural language descriptions of mono-molecular odorants. Nature communications, 9(1), 4979.

      Lee, B. K., Mayhew, E. J., Sanchez-Lengeling, B., Wei, J. N., Qian, W. W., Little, K. A., ... & Wiltschko, A. B. (2023). A principal odor map unifies diverse tasks in olfactory perception. Science, 381(6661), 999-1006.

      The human perception predictions were performed using models that we had reported in two earlier publications which we have now indicated clearly in the results and methods sections of the VOR: Kowalewski & Ray, iScience (2020b) and Kowalewski, Huynh & Ray, Chem. Senses (2021). Three of the four references pointed out by the referee were cited in these prior studies, which involved computational validation by predicting on a test set of the data which was left out of training (as typically done), and also predicting across different human studies with a high degree of success. A rigorous benchmarking of the odor perception models was done in Kowalewski, Huynh & Ray, Chem. Senses (2021) and a mini-review published in the same issue of the journal by Gerkin, Chem. Senses, (2021). This included a favorable comparison with the two references indicated by the referee: Keller et al. Science (2017) as well as the Gutiérrez et al. Nat. Communication (2018).

      The 4th reference, Lee et al, Science (2023) describes a neural network approach and was published well after our mosquito behavior studies were completed. Although using an advanced Neural network model Lee et al. worked with 2-D structures of compounds in contrast to our 3-D approach. They also did not report cross-study validations or comparisons with Keller et al, 2017 or benchmark to past studies, so it is difficult to compare advances if any. We have added this reference in the VOR.

      The intent of the current study was to move beyond testing approaches, of which there are many, and instead work on a practical use case. As we see it, it is not necessarily the prediction of fragrance character or quality alone that matters but overlap with other predicted bioactivities. From the perspective of human use, a molecule with a pleasing scent that also repels insects is likely to be far more useful than one with an unappealing scent. Accordingly, our task in this study was to select molecules that fit into specific use categories: display strong insect repellency, have pleasing scent profiles, are natural in origin and are potentially repurposed from flavors and fragrances.

      Insect Repellents:

      Wright, R. H. (1956). Physical basis of insect repellency. Nature, 178(4534), 638-638.

      Katritzky, A. R., Wang, Z., Slavov, S., Tsikolia, M., Dobchev, D., Akhmedov, N. G., ... & Linthicum, K. J. (2008). Synthesis and bioassay of improved mosquito repellents predicted from chemical structure. Proceedings of the National Academy of Sciences, 105(21), 7359-7364.

      Bernier, U. R., & Tsikolia, M. (2011). Development of Novel Repellents Using Structure− Activity Modeling of Compounds in the USDA Archival Database. In Recent Developments in Invertebrate Repellents (pp. 21-46). American Chemical Society.

      The Katritzky et al. PNAS (2008) paper is cited in our study, and we have indicated that the chemical analogs reported therein are part of the training data set in our study. We thank the reviewer for pointing us to the book chapter by Bernier & Tsikolia (2011), which reviews the QSAR approaches taken for repellent discovery and in large measure focuses on the Katritzky et al. PNAS (2008) paper. We did cite two relevant studies by Uli Bernier.

      The current study assumes that insect repellents repel via their odor valence to the insect, but this is not accurate. Insect repellents also mask the body odor of humans making them hard to locate. The authors need to consult the literature to understand the localization and landing mechanisms of insects to their hosts. Here, they will understand that heat alone is not the attractant as their behavioral assay would have you believe. I suggest the authors test other behaviour assays to show more convincing evidence of effectiveness. See the following studies:

      De Obaldia, M. E., Morita, T., Dedmon, L. C., Boehmler, D. J., Jiang, C. S., Zeledon, E. V., ... & Vosshall, L. B. (2022). Differential mosquito attraction to humans is associated with skin-derived carboxylic acid levels. Cell, 185(22), 4099-4116.

      McBride, C. S., Baier, F., Omondi, A. B., Spitzer, S. A., Lutomiah, J., Sang, R., ... & Vosshall, L. B. (2014). Evolution of mosquito preference for humans linked to an odorant receptor. Nature, 515(7526), 222-227.

      Wei, J. N., Vlot, M., Sanchez-Lengeling, B., Lee, B. K., Berning, L., Vos, M. W., ... & Dechering, K. J. (2022). A deep learning and digital archaeology approach for mosquito repellent discovery. bioRxiv, 2022-09.

      In this study we took an unbiased approach to compile the training data set, including several known insect repellents of varying chemical structures and volatility, for most of which there is no information on how they are sensed by insects. Not surprisingly, the repellents we identified are varied in structure and in functional groups, and are likely detected in more than one way by the mosquitoes, using olfactory and/or gustatory systems. We did not consider “masking” of skin attraction as a factor in the training data set in this study, which precluded the need to discuss the papers pointed out by the referee. In fact there is an extremely vast and rich body of literature regarding human skin odor, CO<sub>2</sub> and breath emanations, which includes our own contributions of research, and review articles that are not discussed in the current paper.

      We did in fact conduct human arm-in-cage experiments with a few of the compounds reported in this study using female Aedes aegypti mosquitoes; a preprint describes the smaller scale analysis, the results of which show very strong repellency, in Boyle et al. bioRxiv (2016) https://doi.org/10.1101/060178 (Figure 4). That line of experimentation falls outside the scope of this current study and are being pursued in a separate form. We have added the citation for this preprint in the results section of the VOR.

      However, heat with CO<sub>2</sub> as used in this study offers a practical proxy for evaluating prospective repellents in a high-throughput manner. It would certainly be desirable to further evaluate additional candidates from the heat attraction assay with human subjects in the future.

      We thank the reviewer for pointing out the preprint by Wei, et al. bioRxiv (2022). Our approaches differ in that Wei et al do not consider properties such as fragrance and toxicity. We also cannot assume that their newer neural network model is superior because although the model uses a large training dataset, it does not use 3D chemical structures that are extremely relevant for biological activity. While very little information is available for the actives reported in Wei et al., we independently evaluated their top compounds similar or better than DEET (CAS#3731-16-6, 4282-32-0, 2040-04-2, 32940-15-1 and 3446-90-0) and could not find information about toxicity, smell, or natural source. In contrast, the top repellents that we identify here as similar or better than DEET (N=8) are all classified as GRAS (Generally Regarded as Safe) compounds by the Flavor and Extract Manufacturers (FEMA), are all naturally occurring (plum, jasmine, mushroom, grapes, etc), and have pleasant smells. The Dermal toxicity values in rabbits are known for six of our compounds and are at the best possible levels (≥5000mg/kg).

      Reviewer #2 (Public Review):

      Summary:

      This is an interesting study that seeks to identify novel mosquito repellents that smell attractive to humans.

      Strengths:

      The combination of standard machine learning methods with mosquito behavioral tests is a strength.

      Weaknesses:

      The study would be strengthened by describing how other modern ML approaches (RF, decision trees) would classify and identify other potential repellents.

      The current approach already shows a success rate >85% for repellency coefficient >0.5 and identifies eight naturally occurring GRAS compounds with repellency as strong as or greater than DEET. This substantially expands the repertoire of strong natural repellents. Since the 1950s only six active ingredients have been registered by US EPA for use in topical repellents, of which only two are natural in origin (Oil of lemon eucalyptus and catmint oil) and they typically do not protect as well as DEET does. That being said, we have since explored other predictive algorithms, for instance Neural Networks. The experimental evaluation of these newer pipelines will take significant resources and time and will be the focus of future grants.

      A comparison in the repellent activity between DEET and the top ten hits identified in this new study indicates little change in repellent activity (~3%), suggesting that DEET remains the gold standard. Without additional toxicity tests, the study is arguably incremental. The study's novelty should be better clarified.

      There is an urgent need to find new insect repellents that have better chances of being adopted by people who avoid DEET, such as in Africa and Asia. Having more natural actives that are effective, expands the tools against disease transmitting mosquitoes. As mentioned above, the top repellents that we identified as similar to or better than DEET (N=8) are all classified as GRAS (Generally Regarded as Safe) compounds by the Flavor and Extract Manufacturers (FEMA), are all naturally occurring (plum, jasmin, mushroom, grapes), and have pleasant smells. The Dermal toxicity values in rabbits are known for six and they are of the best possible levels (≥5000mg/kg).

      The Methods in the repellency tests are sparse, and more information would be useful. Testing the top repellents at low doses (<<1%) and for long periods (2-12 h) would strengthen the manuscript. Without this information, the manuscript is lacking in depth.

      The US Environmental Protection Agency (EPA) regulates mosquito repellents, and DEET-based commercial products are typically assigned protection times that vary with concentration (10% ~2 hrs, 30% ~5hrs, 100% ~8hrs). These would be the relevant concentrations for testing protection times on human volunteers, not lower as suggested. Such studies fall within the realm of EPA registration efforts, involving extensive GLP-testing for safety, physical chemistry, and Human Subjects Board approvals. This is outside the scope of the current study and is typically accomplished during development efforts.

      Testing human subjects on their olfactory perceptions of the repellents would also increase the depth and utility of the manuscript. Without additional experiments, the authors' conclusions lack support and have limited impact on the state-of-the-art.

      This manuscript is a mix of different approaches, which makes it lack cohesion. There is the ML method for classifying new repellents that smell good, but no testing of the repellents on human volunteers. The repellents are not tested at realistic concentrations and durations. And the calcium mobilization test is strange and makes little sense in the context of the other experiments and framing of the manuscript.

      The human olfaction validation that we present in this paper is consistent with most current publications in the field (for example, Keller et al, Gutiérrez et al.). More systematic validation of the human odor character prediction pipelines used was presented in two previous papers Kowalewski & Ray, iScience (2020b) and Kowalewski, Huynh & Ray, Chem. Senses (2021) and a mini-review published in the same issue of the journal by Gerkin, Chem. Senses, (2021).

      Reviewer #3 (Public Review):

      While I am not a specialist in this field, I do have some knowledge of the subject matter and the computational aspects involved. The authors employ simple machine learning techniques (such as SVM) for the following purposes:

      (a) Prediction of aversive valence.

      (b) Predicting anti-repellent chemicals.

      (c) Predicting calcium mobilization.

      The approach is commonplace in chemoinformatics literature.

      Weaknesses:

      All the above models are presented discretely, making it difficult to discern experiment design principles and connectedness.

      The ML work is rudimentary, lacking adequate details. Chemoinformatics has reached great heights, and SVM does not seem contemporary.

      There is significant existing research on finding repellents.

      In the current study, we aimed to showcase how computational research may be combined with basic science to create scalable pipelines that address real world problems, rather than to demonstrate methodological novelty of chemoinformatics approaches. Specifically we wanted to use different predictive models to identify compounds that display strong insect repellency, have pleasing scent profiles, are natural in origin and are potentially repurposed from flavors and fragrances. Unfortunately, there is very little existing research on insect repellents that have these types of properties, which would make them better candidates for EPA registration. Most tested compounds are synthetic, and are often analogs of known repellents like DEET, and necessitate substantial time and resources to register. Moreover the identities of chemosensory receptors that are responsible for repellency to DEET and other compounds, and that are conserved across Anopheles, Aedes and Culex mosquitoes are not known.

      It is true that the field of cheminformatics has experimented with a variety of newer approaches, based in part on neural networks (e.g., Graph Neural Networks and graph embeddings to encode chemical structure rather than a more conventional Extended Connectivity Fingerprint (ECFP)). Importantly, however, novelty does not imply usefulness. The mosquito behavior experiments that we present show a very high success rate (>85%), validating our approach and identifying several excellent candidates already.

      Strengths:

      Authors attempt to make a case for calcium mobilization in the context of repellency. This aspect sounds interesting but is not surprising.

      Behavioral profiling of repellents could be useful.

      We thank the referee for this comment. We have indeed done behavioral profiling for several repellents that evoke calcium mobilization, but we do not see any clear correlation thus far.

    1. eLife Assessment

      This manuscript proposes a valuable idea on how cortical networks may learn a helpful representation of sensory stimuli. The model implementing this idea is tested in multiple experimental paradigms. However, the evidence remains incomplete as to whether the method supports both invariance and equivariance and whether it can estimate the dynamics of the moving object.

    2. Reviewer #1 (Public review):

      Summary:

      The paper describes a biologically plausible version of JEPA using recurrent neural networks called RPL for recurrent predictive learning. Given an embedding z_t, a recurrent neural network processes these inputs with the form: c_t+1 = RNN(c_t, z_t). Then the predictive network f is predicting the future inputs with the format: min || f(c_t) - stop_grad(z_t+delta t) ||^2. I understand that a prediction error is defined as: e = z_t+delta t - f(c_t) to model cortical measurements in the oddball task.

      The RPL model is also shown to build an internal world model, with "real-world" data like the movement of moving animals or speech signals. The representation is then compared to V1 data and expected prediction error signals in an oddball setting. In a stacked hierarchy of RNN learning with RPL, the higher layers appear to learn high-level latent variables, although gradients are not propagated downward to the lower layers.

      Strengths:

      (1) The paper tackles an open question: Self-supervised learning is thought to be a fundamental principle to explain how computation is structured in the brain. Cortical data suggest qualitatively that prediction error is a core principle of representation learning in the brain, but the field is still looking for a simple yet expressive model that would explain how the cortex learns its representations. RPL contributes in that direction by making a useful link between cortical representation learning in RNN models and the JEPA learning algorithm that was demonstrated to scale to large world model learning from video data by Lecun's group. It is very useful to connect this popular deep learning algorithm to cortical data.

      (2) The model formalism is relatively elegant and simple: Simple next input prediction objectives are conceptually simple but not necessarily trivial to build at scale. There is a clear benefit in comparison with contrastive or IL methods because they are free from dataset-specific data augmentation and negative samples. Thereby moving the comp neuro field towards conceptually simpler models of representation in the cortex. Yet predictive only models (and in particular predictive models in latent space instead of pixel space) are not easy to build in a stable fashion. JEPA family is basically intended to solve this question; it is very nice and timely to bring this to comp neuro.

      (3) The methodology combining comp neuro and deep learning makes sense: The conceptual and qualitative analogy with cortical prediction errors is relevant and consistent with what is expected as a model of self-supervised learning in cortical models. The methodology to compare RPL with IL and CL is methodologically meaningful and grounded: showing, for instance, how some of the models fail to represent some latent structure in some toy datasets is interesting.

      (4) h-RPL: The h-RPL is perhaps the most creative departure from the JEPA model family. It would be interesting to say more about what was particularly difficult to see in the latent variables emerging in the hierarchical model. I often find it magical that layer-wise learning rules of this type are not learning redundant representations. Any insights why this is not the case here would be potentially insightful.

      Weaknesses:

      In general, I fully support the type of question and ideas that the paper is putting forward. It is, however, very hard in this research field to gain insight into specific conceptual contributions or specific bits of experimental data that the model puts forward. In pointing to the following weaknesses, I am encouraging the authors to lay out more clearly what the unique hypothesis is or the contribution of the RPL model that we should remember it for.

      (1) The devil is in the details:

      1a) Comparison with JEPA variants: JEPA variants are integrating different details into the learning algorithm. Integrating, for instance, "masking" of the latent encoder targets, or EMA in the style of BYOL or Siamese networks, for the predicted representations. It is great that RPL does not seem to need any of those (next input prediction is a natural implementation of masking, and EMA does not seem to be used). It is notoriously hard for the JEPA model to work without these features. Since some of these details are sometimes surprisingly crucial for a simulation to work, it would be good to report which of the other important details were key to live without EMA and masking. Is it the difference in learning rate, for instance? Or maybe the tasks considered are simply easy enough for any model to work; if so, it could be useful to acknowledge to what extent this is true.

      1b) Comparison with IL and CL: On a high level, the comparison with IL and CL algorithms is written as conclusive. I suspect that the failure modes of IL and CL that are described are not due to the algorithms themselves, but rather to the construction of invariance statistics or the choice of negative sample sets (the sets of samples among which variance 1 is requested by VICreg). For instance, if variance (or negative sample set) is taken only across time, the variance object identity is expected to collapse. Similarly, if the variance is taken across the object identity, the variance across time can collapse. So I wonder if the failure of IL and CL is induced by the construction of the variance definition.

      (2) Prediction error: When compared to the recording of cortical activity in Figure 7. It is not obvious from the figure which latent space we are talking about mathematically. Is the vector z, c or the prediction error e? This is rather important from a neuroscientific point of view, because the prediction error e is expected to explain the neuronal data. On the other hand, the prediction error e is only used in the learning algorithm to define the loss function, but it is not the communication medium between the RNN units c (or with the encoder z).

      In the brain, since the measurements are recorded as neural activity, they are communication channels between specific units (z or c). It is probably c or z that would already explain the oddball prediction error. I believe that other models, like Forward-forward of Nejad et al., have tried quite hard to address this apparent tension. Whether or not this is resolved by RPL, it thinks it would be beneficial to state the problem and clarify how the algorithm addresses or ignores the issue.

      (3) Successor representation without value? I believe the term successor representation is historically relevant in a reinforcement learning (RL) setting and has a precise mathematical definition. Without RL, I feel that learning successor representation is conceptually identical to learning a transition matrix (aka, a primitive world model). I therefore wonder if the pitch for high-level framing of the successor representation is appropriately described or trivial.

      (4) Learning in RNN: Learning with recurrent networks appears to be a key in this model presented here (it is in the algorithm name). Yet, this aspect of the model and the literature on biologically plausible learning rules for RNN is not really discussed.

    3. Reviewer #2 (Public review):

      This is a very interesting manuscript, which proposes a novel idea on how cortical networks may learn useful representations of sensory stimuli. The model implementing this idea is thoroughly tested in multiple experimental paradigms. The manuscript is very clearly written. I feel it may have a significant impact on our understanding of cortical circuitry.

    4. Reviewer #3 (Public review):

      Summary:

      This paper presents Recurrent Predictive Learning (RPL), a self-supervised model conceptually similar to Joint-Embedding Predictive Architecture (JEPA) models. RPL sequentially observes dynamic scenes to predict subsequent observations. A central claim of the work is that the model's trained representations are simultaneously invariant and equivariant to transformations, such as movement properties that emerge without explicit supervision. These representational qualities are demonstrated through three experiments utilizing two simulated datasets and one naturalistic dataset. Furthermore, the latent embeddings are qualitatively compared with neural data, showing that the model reproduces the successor representation observed in human V1 and the local/global oddball effect in the monkey Prefrontal Cortex.

      Strengths:

      (1) The paper addresses a fundamental question relevant to both computational neuroscience and machine vision: how the brain learns representations that are simultaneously invariant and equivariant to transformations. The manuscript is well-written, easy to follow, and supported by clear visualizations.

      (2) While JEPA-style models have recently gained significant traction in the artificial intelligence community, this paper nicely bridges the gap to neuroscience. By framing these architectures as a theory for visual learning in the brain, the authors provide valuable insights into how predictive frameworks can explain cortical processing.

      (3) The qualitative alignment with V1 and PFC data is a particularly strong contribution, as it offers a potential mechanistic explanation for observed neural phenomena through the lens of self-supervised learning.

      Weaknesses:

      (1) The central claim, that both invariance and equivariance emerge spontaneously, requires further scrutiny (see Ghaemi et al., NeurIPS, 2025; Garrido et al., arXive, 2024). In particular, the synthetic "moving animal" dataset used in this paper may be too simple to fully support this claim. In latent space prediction, a model must predict both the scene content and the dynamics of movement. Because movement (whether ego-motion or external) is often highly uncertain (or multi-modal), predictive models in naturalistic settings often "collapse" toward learning purely invariant representations, ignoring the hard-to-predict dynamics. In the provided simulations, the movements are extremely predictable. In more complex scenarios, the model would likely prioritize content (invariance) over dynamics (equivariance) unless aided by action-conditioning or explicit factor estimation (Zhang et al., ICLR, 2026). The authors' results in Figure 5 using naturalistic video seem to reflect this limitation, given the lower performance on the naturalistic videos compared to the synthetic datasets.

      (2) The framing of the RPL model as an entirely new theory of representation learning is slightly overstated. The focus on prediction in representation space rather than input space is the defining characteristic of JEPA and various other Self-Supervised Learning (SSL) models, even sequential prediction. While this paper clarifies the connection between these AI frameworks and cortical circuits, the work would be strengthened by more explicitly positioning RPL within the context of existing JEPA-style models and prior SSL theories of the visual system.

      (3) A significant challenge in latent-space SSL is avoiding "representational collapse" (where the model provides a trivial constant output). While the paper alludes to JEPA-like solutions, it lacks a detailed explanation (in both the text and the architectural schematics) of the specific technique used to prevent collapse. Consequently, it is difficult to evaluate the authors' claim of "biological plausibility," as the biological equivalents of common machine learning techniques (such as stop gradient) are not discussed.

      (4) Recent work has shown that the capacity (size) of the predictor significantly influences the learned representations in a JEPA-type world model (Gorrido et al., 2024). In simpler scenarios, a large enough predictor can allow a model to "memorize" dynamics rather than learning generalized equivariant features. It would be beneficial to see how the ratio of predictor size to encoder size affects the emergence of these features.

      Methodological Clarifications:

      (1) The authors mention a contrastive learning comparison but provide few details. Since contrastive learning is primarily a technique to avoid collapse, it would be a more rigorous baseline if implemented within the same architecture as RPL to isolate the effect of the predictive objective.

      (2) In the PFC data comparison (Figure 7f), there appears to be a discrepancy where the local and global conditions show nearly identical results in PFC, while different dynamics in the model. It is unclear if this is a visualization error or a genuine model deviation.

      (3) The criteria for selecting specific model variables for comparison with V1 versus PFC are not explicitly defined. Clarification is needed on whether the same latent variables were used for both brain regions or if different layers were selected.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The paper describes a biologically plausible version of JEPA using recurrent neural networks called RPL for recurrent predictive learning. Given an embedding z<sub>t</sub>, a recurrent neural network processes these inputs with the form: c<sub>t</sub>+1 = RNN(c<sub>t</sub>,z<sub>t</sub>). Then the predictive network f is predicting the future inputs with the format: min||f(c<sub>t</sub>) − stop grad(z<sub>t</sub>+∆<sub>t</sub>)||<sup>2</sup>. I understand that a prediction error is defined as: e = z<sub>t</sub>+∆<sub>t</sub> − f(c<sub>t</sub>) to model cortical measurements in the oddball task.

      The RPL model is also shown to build an internal world model, with ”real-world” data like the movement of moving animals or speech signals. The representation is then compared to V1 data and expected prediction error signals in an oddball setting. In a stacked hierarchy of RNN learning with RPL, the higher layers appear to learn high-level latent variables, although gradients are not propagated downward to the lower layers.

      The paper tackles an open question: Self-supervised learning is thought to be a fundamental principle to explain how computation is structured in the brain. Cortical data suggest qualitatively that prediction error is a core principle of representation learning in the brain, but the field is still looking for a simple yet expressive model that would explain how the cortex learns its representations. RPL contributes in that direction by making a useful link between cortical representation learning in RNN models and the JEPA learning algorithm that was demonstrated to scale to large world model learning from video data by Lecun’s group. It is very useful to connect this popular deep learning algorithm to cortical data.

      The model formalism is relatively elegant and simple: Simple next input prediction objectives are conceptually simple but not necessarily trivial to build at scale. There is a clear benefit in comparison with contrastive or IL methods because they are free from dataset-specific data augmentation and negative samples. Thereby moving the comp neuro field towards conceptually simpler models of representation in the cortex. Yet predictive only models (and in particular predictive models in latent space instead of pixel space) are not easy to build in a stable fashion. JEPA family is basically intended to solve this question; it is very nice and timely to bring this to comp neuro.

      The methodology combining comp neuro and deep learning makes sense: The conceptual and qualitative analogy with cortical prediction errors is relevant and consistent with what is expected as a model of self-supervised learning in cortical models. The methodology to compare RPL with IL and CL is methodologically meaningful and grounded: showing, for instance, how some of the models fail to represent some latent structure in some toy datasets is interesting.

      (1.1) h-RPL: The h-RPL is perhaps the most creative departure from the JEPA model family. It would be interesting to say more about what was particularly difficult to see in the latent variables emerging in the hierarchical model. I often find it magical that layer-wise learning rules of this type are not learning redundant representations. Any insights why this is not the case here would be potentially insightful.

      We thank the reviewer for this comment. Regarding representational collapse in h-RPL: each local circuit independently applies the same collapse-preventing strategy as the single-level RPL model: namely, the asymmetric prediction architecture combined with the stop-grad operator. Since this mechanism operates locally within each circuit, it is sufficient to prevent collapse at every level of the hierarchy independently (see also our response to Point P1.3).

      The more subtle question is why the circuits learn non-redundant rather than identical representations across the hierarchy. We believe two mechanisms are at play here: First, the hierarchical encoder is a stacked convolutional network, meaning that receptive field sizes grow with depth. This architectural inductive bias naturally encourages successive circuits to operate on increasingly spatially integrated features, creating a structural pressure toward learning complementary rather than redundant representations. Second, the growing expressivity of the network with depth means that higher circuits have access to richer, more abstract inputs from which they can extract higher-level latent structure that is not already captured by lower circuits. Together these factors: the local collapse-preventing mechanism and the depth-dependent growth in receptive field size and network expressivity presumably explain why h-RPL builds an increasingly refined and non-redundant representational hierarchy.

      What we will do: We will expand our discussion on this point in the revised manuscript. We plan to expand our quantification on how abstractions emerge in h-RPL in future work in which we will also study variations with top-down connections.

      (1.2) In general, I fully support the type of question and ideas that the paper is putting forward. It is, however, very hard in this research field to gain insight into specific conceptual contributions or specific bits of experimental data that the model puts forward. In pointing to the following weaknesses, I am encouraging the authors to lay out more clearly what the unique hypothesis is or the contribution of the RPL model that we should remember it for.

      Thanks for the positive feedback along with the constructive criticism, and we agree that articulating the core contributions more crisply would strengthen the paper.

      At its heart, we believe the paper makes two contributions we hope it will be remembered for. First, while prior work has established that invariant representations can be learned via local Hebbianlike learning rules, we show that learning equivariant representations alongside a latent dynamics model requires something qualitatively different: a local circuit; one with recurrent dynamics and an asymmetric predictive architecture. RPL provides a minimal concrete instantiation of this principle.

      Second, and perhaps more broadly, the model makes a structural prediction about (cortical) neuronal circuit organization: since the encoder, integrator, and predictor each perform functionally distinct computations, the framework implies the existence of corresponding cell types and connectivity patterns one should look for in experimental data.

      What we will do: We will sharpen these above messages in the revised manuscript to ensure these contributions are prominently highlighted throughout the paper.

      (1.3) Comparison with JEPA variants: JEPA variants are integrating different details into the learning algorithm. Integrating, for instance, “masking” of the latent encoder targets, or EMA in the style of BYOL or Siamese networks, for the predicted representations. It is great that RPL does not seem to need any of those (next input prediction is a natural implementation of masking, and EMA does not seem to be used). It is notoriously hard for the JEPA model to work without these features. Since some of these details are sometimes surprisingly crucial for a simulation to work, it would be good to report which of the other important details were key to live without EMA and masking. Is it the difference in learning rate, for instance? Or maybe the tasks considered are simply easy enough for any model to work; if so, it could be useful to acknowledge to what extent this is true.

      We thank the reviewer for raising this important point. There are two key mechanisms that ensure stable, non-trivial training in RPL. First, using a higher learning rate for the predictor relative to the encoder is crucial for stable training. This prevents the predictor from collapsing the encoder representations and was already noted empirically by Chen et al. (2021).

      Second, and more fundamentally, predicting at the level of the memoryless encoder output, rather than at the level of the recurrent integrator, is essential to prevent a degenerate solution in which the RNN simply learns to generate an internally predictable time series unrelated to the input. By anchoring the prediction target to the encoder, the model is forced to ground its representations in the sensory input. Intuitively, otherwise the RNN can simply “make up” a predictable time series, which satisfies the learning objective, but would not yield useful internal representations.

      Beyond these architectural points, previous work from our group (Srinath Halvagal et al., 2023) has shown mathematically that JEPAs without EMA avoid collapse via an implicit variance regularization mechanism, and we believe RPL benefits from the same principle. Indeed, we now have a more complete theoretical understanding of this, including identifiability proofs for the latent dynamical model under relatively mild assumptions (Mikulasch et al., 2026). This work has recently been accepted at ICML. Other than that, one has to ensure that representations are not already nearly collapsed at the beginning of training. In this paper, we used normalization layers (batchnorm) in the encoder to ensure this.

      Finally like all SSL paradigms the augmentation strength is an important hyperparameter that impacts the quality of learned representations. In the temporal predictive setting, the augmentation strength is fixed by the world itself. The only knob we have to play with is the prediction horizon ∆. While we typically focused on next-time-step (∆ = 1) prediction, we saw a clear effect in the case of the speech dataset where ∆ = 8, but not ∆ = 1, yielded useful representations for the tasks (Fig. 5b).

      What we will do: We will discuss the above points more prominently in the discussion to avoid them being overlooked in the methods. Additionally, we will include a plot on the empirical prediction horizon for the speech dataset in the supplementary material for reference.

      (1.4) Comparison with IL and CL: On a high level, the comparison with IL and CL algorithms is written as conclusive. I suspect that the failure modes of IL and CL that are described are not due to the algorithms themselves, but rather to the construction of invariance statistics or the choice of negative sample sets (the sets of samples among which variance 1 is requested by VICreg). For instance, if variance (or negative sample set) is taken only across time, the variance object identity is expected to collapse. Similarly, if the variance is taken across the object identity, the variance across time can collapse. So I wonder if the failure of IL and CL is induced by the construction of the variance definition.

      We thank the reviewer for this thoughtful point. Both RPL and CL implement an implicit variance regularizer by virtue of being JEPAs (Srinath Halvagal et al., 2023), whereas IL uses an explicit regularizer computed along both the batch and time dimensions to avoid representational and dimensional collapse. The failure modes of IL and CL therefore cannot be entirely attributed to the statistics of the input samples chosen for variance regularization, but are instead primarily determined by the choice of prediction and target representations.

      What we will do: We will clarify this in the Methods section of the revised manuscript.

      (1.5) Prediction error: When compared to the recording of cortical activity in Figure 7. It is not obvious from the figure which latent space we are talking about mathematically. Is the vector z, c or the prediction error e? This is rather important from a neuroscientific point of view, because the prediction error e is expected to explain the neuronal data. On the other hand, the prediction error e is only used in the learning algorithm to define the loss function, but it is not the communication medium between the RNN units c (or with the encoder z).

      In the brain, since the measurements are recorded as neural activity, they are communication channels between specific units (z or c). It is probably c or z that would already explain the oddball prediction error. I believe that other models, like Forward-forward of Nejad et al., have tried quite hard to address this apparent tension. Whether or not this is resolved by RPL, it thinks it would be beneficial to state the problem and clarify how the algorithm addresses or ignores the issue.

      Thanks for pointing out the issue with regards to clarity and for raising the important but subtle point about prediction error representation. To answer the immediate question asking which vector we use in Figure 7, it is the vector c corresponding to the integrator representations. We agree this should be stated explicitly and will update the manuscript accordingly.

      On the more general point, we agree that the tension between recordable neural activity and the computational role of prediction errors is an important issue. We do already briefly engage with it in the Discussion (subsection “Relation to previous modeling work”), where we note that under RPL “inter-areal communication is dominated by representations rather than error signals”. However, we agree that this point should be surfaced more directly.

      To elaborate, under classical predictive coding, prediction errors are the inter-areal communication channel and are therefore expected to be directly observable in neural recordings, e.g., as oddball responses. Under RPL, this is not the case: e is computed locally within a circuit and serves only as a learning signal for synaptic plasticity, not as a signal propagated between circuits or areas. What cortex primarily encodes and communicates in our framework are predictive representations, not reconstruction errors. Accordingly, what should map onto recorded population activity are the representations c (and z), while locally computed prediction errors could in principle remain observable as more circumscribed or transient mismatch-like signals within a circuit.

      We would like to push this point further. The reviewer frames this as a tension that RPL needs to resolve, but growing neurophysiological evidence suggests that classical residual-difference prediction errors may not be a dominant mode of cortical encoding in the first place. Furutachi, Franklin, et al. (2024) showed that V1 responses to unexpected visual stimuli do not encode how input deviates from predictions, but instead selectively amplify the representation of the unexpected stimulus itself. Very recently, Furutachi and Hofer (2026) generalize this into a revised framework in which feedforward pathways transmit sensory representations modulated by prediction-error magnitude, rather than residual differences. Vasilevskaya et al. (2026) constrain the space of plausible cortical algorithms via functionalinfluence experiments, also concluding that no variant of standard predictive processing is consistent with the full pattern of layer 2/3 ↔ layer 5 interactions; they propose a JEPA-based model, citing RPL as a promising candidate. The model by Nejad et al. (2025) similarly shares with RPL the property that representations, rather than residual errors, propagate between circuit elements.

      Taken together, the apparent tension may be less a problem RPL needs to resolve than one it is well positioned to explain, remaining consistent with the emerging picture of cortex as encoding amplified sensory features rather than transmitting residual errors across areas.

      What we will do: We will add missing information to the main text and sharpen the Discussion with these arguments.

      (1.6) Successor representation without value? I believe the term successor representation is historically relevant in a reinforcement learning (RL) setting and has a precise mathematical definition. Without RL, I feel that learning successor representation is conceptually identical to learning a transition matrix (aka, a primitive world model). I therefore wonder if the pitch for high-level framing of the successor representation is appropriately described or trivial.

      The reviewer makes a valid point on the concept of successor representations. To answer the immediate question, it is not entirely trivial, as we not only observe the emergence of the transition structure (Fig. 6c), but also the encoding of decaying future (but not past) state occupancy (Fig 6d,e). We largely adapted the terminology “successor-like representations” from the study by (Ekman et al., 2023), but we will elaborate a bit further for why we stuck to it. As nicely pointed out by the reviewer, the term “successor representations” was introduced in the RL literature (Dayan, 1993), but further adopted in neuroscience to describe the idea that a neuronal population encodes a predictive representation that reflects the expected future occupancy of future states under a given policy. Ekman et al. (2023) use the term “successor-like representations” to explain the phenomena where the neural activity in V1 (and hippocampus) represent both current and (discounted) future, but not past, state occupancies in a sequence learning task with no explicitly defined policy or value training. In other words, successor-like representations are simply predictive representations.

      What we will do: To deal with this dichotomy, we will replace “successor-like representations” with the term “predictive representations” in the abstract and clarify this distinction in the Results section of the revised manuscript.

      (1.7) Learning in RNN: Learning with recurrent networks appears to be a key in this model presented here (it is in the algorithm name). Yet, this aspect of the model and the literature on biologically plausible learning rules for RNN is not really discussed.

      We thank the reviewer for raising this concern. While h-RPL is one step toward more biologically plausible and spatially local learning rules, exploring it further in terms of temporal credit assignment is beyond the scope of the present study and would require a more systematic and in-depth analysis. However, moving toward more biologically plausible learning rules is an interesting research direction that we plan to explore, as we also mentioned in the Discussion (“Limitations and future research directions”).

      We think a viable strategy could be to combine a slim spatial credit assignment strategy such as feedback alignment (Nøkland, 2016; Lillicrap et al., 2016) with an online learning rule using eligibility traces for temporal credit assignment such as SuperSpike (Zenke et al., 2018) or e-prop (Bellec et al., 2020). Similar strategies have given promising results for CLAPP (Illing et al., 2021; Zihan et al., 2026).

      What we will do: Following the suggestion, we will discuss biologically plausible learning rules for RNNs in the Discussion.

      Reviewer #2 (Public review):

      This is a very interesting manuscript, which proposes a novel idea on how cortical networks may learn useful representations of sensory stimuli. The model implementing this idea is thoroughly tested in multiple experimental paradigms. The manuscript is very clearly written. I feel it may have a significant impact on our understanding of cortical circuitry.

      Reviewer #3 (Public review):

      This paper presents Recurrent Predictive Learning (RPL), a self-supervised model conceptually similar to Joint-Embedding Predictive Architecture (JEPA) models. RPL sequentially observes dynamic scenes to predict subsequent observations. A central claim of the work is that the model’s trained representations are simultaneously invariant and equivariant to transformations, such as movement properties that emerge without explicit supervision. These representational qualities are demonstrated through three experiments utilizing two simulated datasets and one naturalistic dataset. Furthermore, the latent embeddings are qualitatively compared with neural data, showing that the model reproduces the successor representation observed in human V1 and the local/global oddball effect in the monkey Prefrontal Cortex.

      The paper addresses a fundamental question relevant to both computational neuroscience and machine vision: how the brain learns representations that are simultaneously invariant and equivariant to transformations. The manuscript is well-written, easy to follow, and supported by clear visualizations.

      While JEPA-style models have recently gained significant traction in the artificial intelligence community, this paper nicely bridges the gap to neuroscience. By framing these architectures as a theory for visual learning in the brain, the authors provide valuable insights into how predictive frameworks can explain cortical processing.

      The qualitative alignment with V1 and PFC data is a particularly strong contribution, as it offers a potential mechanistic explanation for observed neural phenomena through the lens of selfsupervised learning.

      (3.1) The central claim, that both invariance and equivariance emerge spontaneously, requires further scrutiny (see Ghaemi et al., NeurIPS, 2025; Garrido et al., arXive, 2024). In particular, the synthetic ”moving animal” dataset used in this paper may be too simple to fully support this claim. In latent space prediction, a model must predict both the scene content and the dynamics of movement. Because movement (whether ego-motion or external) is often highly uncertain (or multi-modal), predictive models in naturalistic settings often ”collapse” toward learning purely invariant representations, ignoring the hard-to-predict dynamics. In the provided simulations, the movements are extremely predictable. In more complex scenarios, the model would likely prioritize content (invariance) over dynamics (equivariance) unless aided by action-conditioning or explicit factor estimation (Zhang et al., ICLR, 2026). The authors’ results in Figure 5 using naturalistic video seem to reflect this limitation, given the lower performance on the naturalistic videos compared to the synthetic datasets.

      We thank the reviewer for the feedback. We agree that further validation on more complex datasets would strengthen the claims, and we take this point seriously. If the reviewer has any suggestions for a specific alternative dataset, we would welcome any recommendations.

      Regarding the mouse video data specifically, we realized that this is a suboptimal benchmark rather than a shortcoming of our method. The culprit presumably is that the mice remain largely stationary, leading to a heavily imbalanced velocity distribution peaked near zero (Supplementary Fig. S9). This imbalance makes equivariance evaluation unreliable regardless of the learning algorithm. For example, end-to-end supervised training results in an R<sup>2</sup> of 0.19 compared to 0.08 ± 0.02 for RPL.

      Regarding the moving animal dataset, we note that the dynamics are not trivial from an SSL perspective: unlike moving MNIST (Srivastava et al., 2015), the dataset includes changes in scale and orientation, both features that invariance-focused SSL models can easily ignore, yet RPL recovers reliably. For example, this discrepancy can be seen in Supplementary Table S1 where we compare to InfoNCE and CPC. That said, we acknowledge the reviewer’s broader concern and will seek to validate RPL on more complex datasets.

      While it would be nice to compare to related work by Ghaemi et al. (2024), this study used 3DIEBench (Garrido et al., 2023). Unfortunately, 3DIEBench’s reliance on pair-based representations with annotated but random augmentations (such as rotations or color changes) precludes the possibility of smooth latent traversals that would be required for RPL to learn from the same dataset. We will look into whether it is computationally feasible to adapt or regenerate a similar dataset that meets the requirements for temporal prediction.

      Regarding stochasticity, we agree that predictive learning in latent space is most natural in approximately deterministic settings, whereas real world sensory information often comprises non-deterministic elements. While a deeper treatment of such stochastic environments is beyond the scope of the present manuscript, it will be the focus of ongoing and future work. Regarding ongoing work, it is worth mentioning that in recent work from our group (Hauri et al., 2026), we have demonstrated that RPL’s core objective can replace the reconstruction loss in Dreamer, achieving competitive performance in complex, stochastic environments. While we did not systematically evaluate equivariance in this study, the results suggests that representation-space predictive learning is viable beyond the deterministic regime.

      What we will do: We will make the point about the real-world mouse video dataset being a poor benchmark and include the additional R<sup>2</sup> values to show that. Further, we will try to identify or generate alternative datasets to back the equivariance claims and discuss our findings in the light of previous work, e.g., Ghaemi et al. (2024). Moreover, we will sharpen our discussion of our model’s limitations in stochastic settings and highlight notable connections to related work.

      (3.2) The framing of the RPL model as an entirely new theory of representation learning is slightly overstated. The focus on prediction in representation space rather than input space is the defining characteristic of JEPA and various other Self-Supervised Learning (SSL) models, even sequential prediction. While this paper clarifies the connection between these AI frameworks and cortical circuits, the work would be strengthened by more explicitly positioning RPL within the context of existing JEPA-style models and prior SSL theories of the visual system.

      Thanks for raising this point. We are unsure what the reviewer refers to. We did not frame our work as ”an entirely new theory of representation learning,” as the reviewer suggests. In fact, we highlight quite the opposite already in the title of our article, which reads: “Understanding neural circuit principles for representation learning through joint-embedding predictive architectures.” We do not claim novelty over JEPA as an ML paradigm, we adopt it precisely because it provides a principled, non-generative framework for predictive representation learning, and our goal is to develop a circuit level instantiation that accounts for neural circuit computation. We already discuss a body of previous work of self-supervised learning and JEPAs at length. Since the reviewer did not specify what they are missing, we will briefly reiterate what is already there.

      Our contribution is a theory of representation learning in the brain, built on JEPAs as the underlying ML framework. The Title and Introduction already position our work quite explicitly this way. Specifically, we mention prior work on JEPAs (CPC, BYOL, SimSiam, I-JEPA, seq-JEPA, V-JEPA, V-JEPA 2), while noting that “most JEPAs developed in machine learning are poor models of cortical computation” because of their reliance on negative sampling, transformers, masking, static images, and/or known parametrized transformations, and motivate RPL as the minimal candidate that “must instead rely on recurrent neural dynamics, learn from streaming sensory input without masking, support both invariant and equivariant representations, and reproduce key neurophysiological observations.”

      The Discussion (“Relation to previous modeling work”) further details the specific novelties of RPL relative to existing sequential JEPA-style and SSL models like CPC (Oord et al., 2018), V-JEPA (Bardes et al., 2024), V-JEPA 2 (Assran et al., 2025), seq-JEPA (Ghaemi et al., 2024). In brief:

      RPL is a recurrent JEPA based on RNN dynamics, not transformers, and learns from streaming sensory input without masking or random negative sampling;

      It explicitly compares three prediction-error topologies (RPL vs. invariance learning vs. contextprediction; Fig. 2, Suppl. Fig. S2, S6) and shows that asymmetric recurrent prediction is essential for jointly learning invariant and equivariant representations;

      Importantly, it does so via pure temporal prediction without access to underlying transformations, a property shared by very few JEPAs. The closest exception is VJ-VCR (Drozdov et al., 2024) which uses an explicit variance-covariance regularization (VCReg) in a JEPA, which we will cite in the revised manuscript;

      It provides the first hierarchical JEPA optimizing local prediction errors at multiple levels (h-RPL, Fig. 8), as envisioned by LeCun (2022) but not previously implemented;

      It connects directly to neurophysiological data: successor-like representations in human V1 and abstract sequence representations in macaque PFC, which provides qualitative correspondence between JEPA components and cortical activity that the existing JEPA literature, focused on ML benchmarks, does not address.

      Finally, our article already includes a discussion paragraph on recent self-supervised learning models in the context of the brain where we discuss work by Nejad et al. (2025) and Asabuki et al. (2025). Most other SSL theories of the visual system rely on static images and recognition tasks (Yerxa et al., 2024; Margalit et al., 2024). However, there are two studies that include temporal prediction objectives and are worth mentioning with more details: First, Bakhtiari et al. (2021) show that representations similar to ventral and dorsal pathways in the visual system can emerge in a two-pathway encoder architecture within the CPC model. Second, Niu et al. (2024) use a “straightening” objective together with VCReg as a practical model of the perceptual straightening hypothesis (H´enaff et al., 2019). Though not a JEPA (i.e., has no predictor network), it can decode equivariant factors in a sequential MNIST dataset where only single factors change throughout a video.

      What we will do: We will carefully review our discussion of previous work and further discuss Drozdov et al. (2024), Bakhtiari et al. (2021), and Niu et al. (2024) in the revised manuscript.

      (3.3) A significant challenge in latent-space SSL is avoiding “representational collapse” (where the model provides a trivial constant output). While the paper alludes to JEPAlike solutions, it lacks a detailed explanation (in both the text and the architectural schematics) of the specific technique used to prevent collapse. Consequently, it is difficult to evaluate the authors’ claim of “biological plausibility,” as the biological equivalents of common machine learning techniques (such as stop gradient) are not discussed.

      Thanks for pointing this out. Our model avoids collapse through the asymmetric stop-grad / predictor architecture. It does not require an EMA, when the predictor learns with a faster learning rate than the rest of the network (see also our response to Point P1.3).

      The use of stop-grad suggests that a circuit learning with RPL needs to compute a vector-based instructive learning signal. While we do not explicitly model the circuit level mechanisms of how this could be implemented in the brain, excitation-inhibition balance is one possibility (Rossbroich et al., 2025). Finally, differences in learning rate can be implemented both structurally or functionally in the brain (see Liu et al. (2025) for instance), or activity normalization is suggested as a canonical computation in biological neural circuits (Carandini et al., 2012).

      What we will do: We will make sure to discuss these putative biological mechanisms in the revised manuscript.

      (3.4) Recent work has shown that the capacity (size) of the predictor significantly influences the learned representations in a JEPA-type world model (Gorrido et al., 2024). In simpler scenarios, a large enough predictor can allow a model to ”memorize” dynamics rather than learning generalized equivariant features. It would be beneficial to see how the ratio of predictor size to encoder size affects the emergence of these features.

      Thanks for raising this concern. We don’t observe noticeable difference in position and velocity decoding when changing the width or depth of the MLP predictor in the moving animals data. However, performance on rotation speed and orientation decoding scales with the changes in width, but not depth of the predictor. This analysis excludes the effect of integrator’s capacity as it directly affects the dimensionality of the representations, even though it also effectively contributes to prediction computation in RPL.

      What we will do: We will include a figure how how task performance varies with the predictor’s width and depth.

      Methodological Clarifications

      (3.5) The authors mention a contrastive learning comparison but provide few details. Since contrastive learning is primarily a technique to avoid collapse, it would be a more rigorous baseline if implemented within the same architecture as RPL to isolate the effect of the predictive objective.

      Thanks for the question. We already use the same network model as in RPL for the contrastive predictive learning (InfoNCE) baseline in Supplementary Table S1 and mentioned in the main text (l.164).

      What we will do: We will mention the architecture of the non-linear predictor used for InfoNCE baseline in Methods more explicitly.

      (3.6) In the PFC data comparison (Figure 7f), there appears to be a discrepancy where the local and global conditions show nearly identical results in PFC, while different dynamics in the model. It is unclear if this is a visualization error or a genuine model deviation.

      Thanks for picking up on this subtlety in the experimental results. To clarify, it is a model deviation but an interesting one. The local and global responses do look quite similar in the original PFC data. They differ in that the global oddball (xY|xx and xx|xY) response has a secondary peak that encodes the presence of the global oddball, whereas the initial response is actually dominated by local oddball encoding (xY vs xx). Concretely, this results in the response to the xx|xY condition only showing up weakly in the data and at a time lag with respect to the initial local oddball response. Our model, however, does not show the transient initial response to local oddballs in the decoding direction for global oddballs. In a sense, the network model encodes the global oddball concept more robustly than is seen in the PFC data. That said, whether this indicates a genuine difference in representational strategies that needs to be further accounted for, or whether it is an issue stemming from limited sub-sampling of PFC neurons, remains unclear.

      (3.7) The criteria for selecting specific model variables for comparison with V1 versus PFC are not explicitly defined. Clarification is needed on whether the same latent variables were used for both brain regions or if different layers were selected.

      To clarify, the successor-like representations in human V1 and abstract representations in macaque PFC are two different experiments, so each has different latent variables requiring different RPL models. The architecture used for each experiment is detailed in Methods and the criteria for selecting each architecture was the simplest that should work given the task complexity. Throughout the paper, all representation analysis is done on the output of integrator (c) unless said otherwise. We hope this resolves the confusion.

      References

      Chen, Xinlei et al. (2021). “Exploring simple siamese representation learning”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15750–15758.

      Srinath Halvagal, Manu et al. (2023). “Implicit variance regularization in non-contrastive SSL”. In: Advances in Neural Information Processing Systems 36, pp. 63409–63436.

      Mikulasch, Fabian A et al. (2026). Understanding Self-Supervised Learning via Latent Distribution Matching. arXiv: 2605.03517[cs.LG].

      Furutachi, Shohei, Alexis D. Franklin, et al. (Sept. 2024). “Cooperative thalamocortical circuit mechanism for sensory prediction errors”. en. In: Nature 633.8029. Publisher: Nature Publishing Group, pp. 398–406. issn: 1476-4687. doi: 10.1038/s41586-024-07851-w.

      Furutachi, Shohei and Sonja B Hofer (2026). “Rethinking Predictive Processing”. In: Annual Review of Neuroscience 49.

      Vasilevskaya, Anna et al. (2026). “A functional influence based circuit motif that constrains the set of plausible algorithms of cortical function”. In: bioRxiv. doi: 10.64898/2026.01.29.702557. eprint: https://www.biorxiv.org/content/early/2026/01/29/2026.01.29.702557.full. pdf.

      Nejad, Kevin Kermani et al. (July 2025). “Self-supervised predictive learning accounts for cortical layer-specificity”. en. In: Nat Commun 16.1, p. 6178. issn: 2041-1723. doi: 10.1038/s41467-025-61399-5.

      Ekman, Matthias et al. (Feb. 2023). “Successor-like representation guides the prediction of future events in human visual cortex and hippocampus”. In: eLife 12. Ed. by Morgan Barense et al., e78904. issn: 2050-084X. doi: 10.7554/eLife.78904.

      Dayan, Peter (1993). “Improving generalization for temporal difference learning: The successor representation”. In: Neural computation 5.4, pp. 613–624.

      Nøkland, Arild (2016). “Direct feedback alignment provides learning in deep neural networks”. In: Advances in neural information processing systems 29.

      Lillicrap, Timothy P et al. (2016). “Random synaptic feedback weights support error backpropagation for deep learning”. In: Nature communications 7.1, p. 13276.

      Zenke, Friedemann et al. (2018). “Superspike: Supervised learning in multilayer spiking neural networks”. In: Neural computation 30.6, pp. 1514–1541.

      Bellec, Guillaume et al. (2020). “A solution to the learning dilemma for recurrent networks of spiking neurons”. In: Nature communications 11.1, p. 3625.

      Illing, Bernd et al. (2021). “Local plasticity rules can learn deep representations using self-supervised contrastive predictions”. In: Advances in Neural Information Processing Systems 34.

      Zihan, Wu S et al. (2026). “Can Local Learning Match Self-Supervised Backpropagation?” In: arXiv preprint arXiv:2601.21683.

      Srivastava, Nitish et al. (2015). “Unsupervised learning of video representations using lstms”. In: International conference on machine learning. PMLR, pp. 843–852.

      Ghaemi, Hafez et al. (2024). “Seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models”. In: NeurIPS 2024 Workshop: Self-Supervised Learning - Theory and Practice.

      Garrido, Quentin et al. (2023). “Self-supervised learning of split invariant equivariant representations”. In: arXiv preprint arXiv:2302.10283.

      Hauri, Michael et al. (2026). “Dreamer-CDP: Improving Reconstruction-free World Models Via Continuous Deterministic Representation Prediction”. In: arXiv preprint arXiv:2603.07083.

      Oord, Aaron van den et al. (July 2018). “Representation Learning with Contrastive Predictive Coding”. In: arXiv:1807.03748 [cs, stat]. arXiv: 1807.03748.

      Bardes, Adrien et al. (2024). V-JEPA: Latent Video Prediction for Visual Representation Learning.

      Assran, Mido et al. (2025). “V-jepa 2: Self-supervised video models enable understanding, prediction and planning”. In: arXiv preprint arXiv:2506.09985.

      Drozdov, Katrina et al. (2024). “Video representation learning with joint-embedding predictive architectures”. In: arXiv preprint arXiv:2412.10925.

      LeCun, Yann (2022). “A Path Towards Autonomous Machine Intelligence Version 0.9.2, 2022-0627”. en. In.

      Asabuki, Toshitake et al. (2025). “Learning predictive signals within a local recurrent circuit”. In: Proceedings of the National Academy of Sciences 122.27, e2414674122. doi: 10.1073/pnas. 2414674122. eprint: https://www.pnas.org/doi/pdf/10.1073/pnas.2414674122.

      Yerxa, Thomas et al. (2024). “Contrastive-equivariant self-supervised learning improves alignment with primate visual area it”. In: Advances in neural information processing systems 37, pp. 96045–96070.

      Margalit, Eshed et al. (2024). “A unifying framework for functional organization in early and higher ventral visual cortex”. In: Neuron 112.14, pp. 2435–2451.

      Bakhtiari, Shahab et al. (2021). “The functional specialization of visual cortex emerges from training parallel pathways with self-supervised predictive learning”. In: Advances in Neural Information Processing Systems. Ed. by M. Ranzato et al. Vol. 34. Curran Associates, Inc., pp. 25164–25178.

      Niu, Julie Xueyan et al. (2024). “Learning predictable and robust neural representations by straightening image sequences”. In: Advances in Neural Information Processing Systems 37, pp. 40316– 40335.

      H´enaff, Olivier J et al. (2019). “Perceptual straightening of natural videos”. In: Nature neuroscience 22.6, pp. 984–991.

      Rossbroich, Julian et al. (2025). “Breaking Balance: Encoding local error signals in perturbations of excitation-inhibition balance”. In: bioRxiv, pp. 2025–05.

      Liu, Peng et al. (2025). “Layer-specific changes in sensory cortex across the lifespan in mice and humans”. In: Nature neuroscience 28.9, pp. 1978–1989.

      Carandini, Matteo et al. (2012). “Normalization as a canonical neural computation”. In: Nature reviews neuroscience 13.1, pp. 51–62.

    1. eLife Assessment

      This valuable study examines how the prelimbic cortex represents learned and generalized threat over time and identifies potentially distinct stable and dynamic subnetworks that may support these functions. The work is conceptually interesting and is strengthened by the longitudinal calcium imaging approach and the inclusion of key control groups. However, the evidence supporting the claims is incomplete, particularly because the interpretations regarding inference, time-dependent representational change, and the dissociation of neural activity from freezing behavior extend beyond what is currently established by the data.

    2. Reviewer #1 (Public review):

      Summary:

      The authors combine discriminative auditory fear conditioning with longitudinal in vivo calcium imaging to ask how prelimbic (PL) representations of learned and generalized threat evolve across recent and remote memory time points. Using two different CS+ frequencies and a no-shock control group, they report that PL population activity tracks graded behavioral generalization, that population similarity is highest for tones eliciting strong threat responding, and that distinct subnetworks can be identified that appear to encode tone-specific sensory features versus learned threat-related response structure.

      To my knowledge, this may be the first study to comprehensively examine neural encoding of fear generalization in prelimbic cortex (PL). The manuscript is ambitious and technically interesting, and several aspects are potentially important. In particular, the suggestion that neurons showing graded, learning-related response patterns become selectively stabilized over time is intriguing. The inclusion of two CS+ training conditions and a no-shock control also strengthens the case that at least some of the reported effects are related to associative learning rather than simple sensory differences. However, in its current form, the manuscript does not yet fully support the strength of the conceptual claims. Several issues limit confidence in the interpretation, including the possibility that repeated testing itself contributes to changes across days, uncertainty about the relationship between neural activity and freezing behavior, limited quantitative documentation of longitudinal cell registration, and a number of problems in figure clarity and statistical framing. Overall, the study contains promising observations, but the claims should be narrowed, and several analyses or controls would be needed to fully support the proposed framework.

      Detailed Comments

      (1) A general concern is that the repeated test procedure itself may contribute to extinction. Because the animals are exposed to multiple CS frequencies across multiple test days, and each tone is presented three times per session, some of the reported changes in behavior and neural activity across days could reflect extinction or repeated nonreinforced retrieval rather than the passage of time per se. This is especially relevant given that the manuscript makes claims about recent versus remote representations and representational drift over 30 days. At a minimum, the authors should discuss this limitation explicitly and temper claims about time-dependent changes. Ideally, they would include a control group in which animals are tested only once or twice (e.g., at an early and later time point with fewer CS frequencies), or a reduced-frequency testing design that minimizes extinction while still allowing evaluation of recent versus remote memory.

      (2) More generally, some of the reported learning-related neural differences may be driven by behavioral differences, particularly freezing, rather than by learning or generalization per se. For example, animals that freeze more to certain frequencies may show corresponding neural response differences simply because freezing alters PL activity. The authors should examine this possibility more directly. Analyses testing whether recorded cells encode freezing behavior, or whether tone frequency-related neural differences remain robust when comparing high- and low-freezing epochs, would help determine whether the reported effects reflect learned stimulus value rather than behavioral state differences.

      (3) A central feature of the manuscript is the analysis of neural response properties over an extended period of time, up to 30 days after learning. However, aside from a brief mention in the Methods that spatial registration was used, the manuscript provides very little quantitative information about this critical aspect of the study. The paper would be strengthened by including explicit metrics describing longitudinal cell tracking, such as the number and proportion of ROIs retained across all sessions, distributions of spatial-footprint correlations or centroid distances across days, and representative examples of matched imaging fields over time. Without this information, it is difficult to assess how strongly the longitudinal claims are supported.

      (4) The text states that "Figs. 1c and 1d show GCaMP6f expression in PL, representative calcium footprints, and activity traces". However, the figure as presented does not clearly show all of these elements, at least not in a way that matches the description in the Results. The correspondence between text and figure should be corrected.

      (5) The labeling of Figure 2a is insufficient for interpretation. The legend states that the panel shows raster plots of sound responsiveness, but the axes and scaling are not clearly defined. It is not clear from the figure what the x-axis represents, whether the y-axis corresponds to individual neurons, where the CS period occurs, or what the activity scale at the right denotes. Also, the term 'rasters' implies that spikes were analyzed. It seems that the spike inference approach (CASCADE) was only used for later analyses. Perhaps 'heat-plot' would be more accurate here? Generally, this figure should be annotated more clearly so that the reader can understand it without referring back to the Methods.

      (6) In relation to Figure 3, the analysis of population-averaged responses across tone frequencies is useful, but the manuscript would be stronger with additional statistical analyses across time and across groups. For example, if the authors want to argue that learning induces graded changes in neural responses and that these evolve across time, they should directly compare within-group responses across days and also compare matched frequencies between the conditioned groups and the no-shock controls. These analyses would help establish whether the observed differences are genuinely learning dependent and whether they change significantly over time.

      (7) The inclusion of two different CS+ frequencies and a no-shock control is a strength of the study and substantially improves the interpretation that graded neural responses are related to learning and generalization rather than to simple sensory processing or passage of time. That said, I am not entirely comfortable with the use of the term "inference" throughout the manuscript. What is being measured here appears closer to sensory generalization than inference in a stronger cognitive sense. The current task does not clearly require that animals infer hidden structure or stimulus value through abstract reasoning; rather, the generalized stimulus may simply be treated as similar to the conditioned cue. The terminology should therefore be reconsidered or softened.

      (8) I also found the use of the term "valence" somewhat problematic. The manuscript appears to use valence to refer to graded responding across tones with different aversive significance, but valence typically refers more broadly to distinctions between appetitive and aversive value. Here, terms such as "threat value," "aversive value," may be more precise. The authors should consider revising this language throughout.

    3. Reviewer #2 (Public review):

      Summary:

      The following points are those that occurred to me across readings of the paper. They are listed in what I take to be the order of their significance. Many of the points relate to the loose use of language and invocation of concepts that are not warranted, given the study design and results obtained.

      Major Comments:

      (1) The concept of ensemble turnover is interesting - the way it is introduced and discussed implies some type of spontaneous change in the neural underpinnings of fear discrimination and generalization in the PL. But, of course, every trial involves an opportunity to learn about the threat CS or the generalization test stimuli, and I am troubled by the thought that stability in the neural underpinnings of fear discrimination and generalization will actually reflect the level of defensive behaviours evoked on different trial types and/or the discrepancy between those behaviours and the outcome of a given trial in the generalization test. That is, stability in the neural underpinnings may be related to an animal's certainty or uncertainty in the contingency between a stimulus and danger; or, put another way, an animal's confidence that danger will or won't occur given the presence of some stimulus. This is not uninteresting. It is, however, not considered anywhere in the paper, which is overloaded with references to inferred threat values and integration of information across different types of stimuli. The protocol is not one that requires inference about anything or integration across anything.

      (2) I appreciate the link to Gu and Johansen in paragraph 3 of the Introduction, but the type of generalization under investigation here is not the same as the type of 'generalization' studied by Gu and Johansen [who used a sensory preconditioning protocol]. Nonetheless, the authors have forced the language used by Gu and Johansen into their paper, and this has created tension [at least for this reader] as the concepts introduced by Gu and Johansen [inference, integration] are simply not relevant given the generalization protocol used here. Here are a few examples of points where the tension might interfere with a reader's understanding:

      a. 'We hypothesized that generalization to novel stimuli depends on stable subnetwork organization that enables comparisons between learned and inferred valence, as well as population-level features that reduce variability across related representations.'

      I understand the words in the hypothesis, but can't form a representation of what is being said because of the reference to terms that stand in need of clarification [inferred valence, variability across related representations], but, ultimately, won't be clarified. This needs to be re-expressed so that the reader can appreciate what is being said.

      b. 'Our results show that stable cortical subnetworks integrate the emotional "gist" of memory and inferred valence for novel cues over time, despite ongoing ensemble reorganization, and that population-level firing rate similarity across stimulus presentations determines threat generalization.'

      Again, what does this mean? How is the gist of a memory integrated with inferred valence for novel cues over time? The statement simply doesn't make sense. This needs to be rewritten for clarity.

      c. 'In CS⁺15 mice, positively modulated sound-responsive neurons exhibited graded tone activity reflecting the contingency learned valence as well as the inferred valence of novel tones across testing days...'.

      Can this be rewritten as 'In CS⁺15 mice, positively modulated sound-responsive neurons exhibited graded activity to the tone CS and its variants that were used to assess generalization.'? The overloading of the text with references to 'contingency learned valence' and 'inferred valence' is unnecessary and makes it much harder to understand what has been shown in the results.

      (3) Re the same passage of text as in 2c:

      Is it the case that these neurons are simply tracking the expression of freezing to the various tones? The same question applies to the results obtained for the CS+3 mice. If this is the case, then why should the results be taken to support the banner statement that 'Sound-modulated PL population responses encode learned and inferred valence' - these analyses do not support that statement. And, as indicated, I don't believe that the language of learned and inferred valence is appropriate to such statements, given the nature of the protocol used and results obtained. It is a study looking at how populations of neurons in the PL respond during presentations of auditory stimuli that were subject to discriminative conditioning, and during tests of generalized freezing to other [intermediate] auditory stimuli.

      (4) It is stated that:

      'In no-shock controls, although both positive and negative responses were present, population activity was not modulated by tone frequency or valence'.

      What does this mean? I can understand that population activity was not modulated by tone frequency. But what does it mean to say that it was not modulated by valence? Why should it have been when none of the tones were conditioned in this group and, hence, mice were responding to all the tones equally? And given that this is true, I don't understand the use of 'valence' here, or the subsequent statements in this paragraph that 'graded responses require associative learning' and that 'PL population responses encode graded sound-valence associations that reflect both learning and inference, closely matching behavioral generalization.' The latter statement is particularly unwarranted and, again, highlights a major issue with the paper. It could and should be rewritten as 'PL population responses reflect behavioral generalization.' There is nothing in the additional language that adds to the reader's understanding of what has been shown. The reference to 'graded sound-valence associations that reflect both learning and inference' is completely unwarranted, given the nature of this study. It is anathema to the vast literature on stimulus generalization. If the authors wished to make statements of this sort, they should have taken a different approach, perhaps using protocols like those featured in Gu and Johansen.

      (5) The section titled, 'Consistently active neurons preserve valence representations as newly recruited neurons sharpen remote memory traces' ends with the following summary:

      'Together, these results indicate that consistently active neurons maintain stable representations of learned and inferred sound associations across time, whereas neurons recruited after conditioning progressively acquire graded tuning at later retrieval stages. This dynamic refinement suggests that cortical memory representations become increasingly selective during systems consolidation, while a stable neuronal subpopulation preserves the core emotional content of the memory.'

      Once again, the summary is not in keeping with the results obtained. The 'dynamic refinement' of representations is far more likely to reflect the repeated testing across days 1, 15, and 30 rather than anything to do with systems consolidation - at the very least, it is the simplest interpretation of the results. The impact of repeated testing is evident in the sharpening of generalization gradients over time, which is contrary to what is otherwise observed in the literature - the incredibly well -documented broadening of generalization gradients with time. Given this impact of repeated testing, surely the changes in the neuronal population that underlie performance are more likely to reflect the learning that occurs on days 1, 15, and 30, which is reflected in reduced freezing to the non-conditioned tones. If this is a reasonable take on the results, then I don't see the basis for invoking systems consolidation at all, and I don't see the basis for inferring a stable neuronal subpopulation that preserves the emotional content of the memory. Rather, non-reinforced presentations of 'never-reinforced' tones result in recruitment of additional neurons that result in suppression of freezing responses to those stimuli.

      (6) In the section titled, 'Population vector similarity at stimulus onset determines degree of generalization', it is stated that:

      'Because population similarity peaked shortly after stimulus onset, we quantified similarity during the first 5 s after tone onset relative to the CS⁺. In CS⁺15 mice, population similarity was highest for 15/15 and 15/11 tone pairs with no differences between them.'

      Isn't this consistent with the view that the population response in the PL simply reflects the level of freezing? Freezing to the 15-15 and 15-11 tones is most likely to be similar on their first presentation prior to the effects of extinction on the 11 Hz tone; hence the results obtained. That is, these results appear to clearly indicate that neuronal responses in the PL reflect the degree of stimulus generalization, as evidenced in freezing behavior. Given all that we know about the involvement of the PL in expressing fear responses, it is not appropriate to claim that 'population vector similarity at stimulus onset *determines* the degree of generalization. The PL responses simply reflect the varying levels of performance displayed to the different types of tones. What have I missed that could be taken to support additional statements?

      Later in the same section, it is stated that 'population-level similarity at stimulus onset scales with behavioral threat generalization and is maximal for tones associated with robust threat responses.' For simplicity and, therefore, clarity, this should be rewritten as 'population-level similarity at stimulus onset reflects behavioral threat generalization.'

      (7) In the section titled, 'Different subnetworks encode acoustic versus learned properties of sound association', it is stated that:

      'Our previous analyses show that learned and inferred associations are represented at the population level. However, these results do not resolve whether graded responses arise from pooled activity of frequency-selective neurons or from subnetworks encoding integrated learned valence across tones.'

      What does it mean to say 'integrated learned valence across tones'? As it presently stands, the meaning of the phrase is unclear. It only makes sense if one supposes that generalized freezing responses to the 11 and 7 kHZ tones reflect separate associations between those tones and the aversive foot shock US. This supposition is inconsistent with the rich literature on generalization of Pavlovian conditioned fear responses. Specifically, it is inconsistent with the many theories of fear generalization, which attribute the reduction in fear as one moves away from the specific conditioned stimulus to a decrement in the ability of the test stimulus to activate the trained CS-US association. My strong impression is that the authors would do well to ground their findings in theories of stimulus/fear generalization, of which there are many. This would better serve the results obtained [and the reader's appreciation of them] - at present, the unnecessary invocation of concepts does very little to enhance the reader's appreciation or understanding of what has been found in the study.

      (8) Another example of what has been a common theme in this review :

      '...we hypothesized that the PL active ensemble segregates into functionally distinct subnetworks: one encoding tone-specific sensory features with dynamic characteristics, and another responding to all frequencies encoding stable core memory content and inferred emotional valence.'

      What does it mean to say 'all frequencies encoding stable core memory content and inferred emotional valence'? Do the authors mean to say '...and another that tracks freezing/defensive responses regardless of whether they were elicited by the trained CS or one of the generalization test stimuli'?

      (9) It is stated that - 'Graded clusters encode emotional valence but constitute only a fraction of the active population; yet valence coding at the population level remains accurate and precise. This indicates that neurons newly recruited into the population-likely frequency-selective and organized within learning-independent clusters-can be shaped by associative processes through modulation of firing activity.'

      What does this mean? Are the authors trying to say that - 'Some clusters of PL neurons track freezing responses. In spite of the fact that these are only a fraction of the total active neuronal population, the population-level response of PL neurons also tracks the levels of fear to the trained tone and its variants used in the test for generalization.' If this is what one wants to say, then the final statement in the reproduced section does not follow. That is, there is no indication that 'neurons newly recruited into the population-likely frequency-selective and organized within learning-independent clusters-can be shaped by associative processes through modulation of firing activity.' As noted, the characteristics of other ensembles that become active across the repeated tests on days 1, 15, and 30 are more likely to reflect learning from non-reinforcement that occurs within and across those sessions. Perhaps this is what is meant by the phrase, 'shaped by associative processes'? If so, it should be stated explicitly instead of left to the reader to work out.

      (10) The following points all relate to the Discussion and reiterate many of the points above.

      a. 'A subset of neurons remains consistently active across sessions, preserving core components of the memory trace and supporting inference of emotional valence for novel sounds, while neurons recruited after conditioning progressively acquire valence selectivity at remote time points.'

      'Inference of emotional valence' is unclear and unwarranted for all of the reasons provided above regarding the use of language.

      b. '...Our data reconcile these views by demonstrating that cortical representations of emotional valence emerge rapidly after learning and persist within stable subnetworks, even as the broader population undergoes substantial turnover. This architecture preserves core mnemonic content while allowing flexibility in the surrounding ensemble.'

      These statements assume that the PL neuronal responses reflect something more than the levels of freezing behavior to the different stimuli; what are the grounds for this assumption?

      c. 'Importantly, these subnetworks encode both learned contingencies and the inferred valence of novel stimuli along a graded representational axis, suggesting that strong recurrent connectivity provides a stable scaffold for emotional memory representations.'

      What is a graded representational axis, and what part of the first statement suggests that 'strong recurrent connectivity provides a stable scaffold for emotional memory representations'? If the authors' goal was to make statements about emotional memory representations vis-à-vis emotional memory content, they should have used protocols that allowed them to probe such content. The auditory fear conditioning protocol used here [followed by tests for generalization to other auditory stimuli that differ in frequency from the conditioned tone] is not one that lends itself to analysis of emotional memory representations or content.

      d. 'Dynamic tone-selective responsive neurons emerge independently of learning, as they are present in both control and experimental mice, reflecting pre-existing PL sensory-driven properties (Hockley & Malmierca, 2024; Zikopoulos & Barbas, 2006).'

      Maybe. They are also likely to have developed as a consequence of the repeated testing on days 1, 15, and 30, which involved intermixed exposures to the tones of different frequencies. That is, rather than 'pre-existing PL sensory-driven properties', the responses of these neurons might reflect the emergence of discrimination between the various tones across testing, and greater suppression of freezing to the non-trained tones compared to the trained tone across the various test intervals.

    4. Reviewer #3 (Public review):

      Summary:

      Normandin et al. explore the coding of stimuli predicting an aversive event in the prelimbic cortex. Stimuli could either be explicitly paired, explicitly unpaired, or novel but with an inferred association with the aversive event (generalization). Long-term tracking of GCaMP-positive neurons allowed them to examine how coding evolves out to a month following training. In general, they found two types of ensemble codes. One was ensembles coding for each stimulus independently, but with enhanced responding to the one eliciting a freezing response. The other was ensembles that responded to all stimuli in proportion to their similarity to the stimulus paired with the aversive event, either increasing or decreasing their activation with the degree of freezing elicited by a stimulus. Importantly, this second set of ensembles was more stable across days, potentially providing a memory trace.

      Strengths:

      (1) The authors track ensembles in prelimbic cortex over long time scales, providing valuable information on the consolidation of neural codes.

      (2) Neural coding of generalization is examined, which is under-examined in the field.

      Weaknesses:

      (1) Difficult to determine if responses treated as encoding stimulus valence are driven instead by the behavior that the stimulus elicits, freezing.

      (2) The study implies that the identified ensembles are causally related to valence memory, but no experimental interventions are performed to justify this.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors combine discriminative auditory fear conditioning with longitudinal in vivo calcium imaging to ask how prelimbic (PL) representations of learned and generalized threat evolve across recent and remote memory time points. Using two different CS+ frequencies and a no-shock control group, they report that PL population activity tracks graded behavioral generalization, that population similarity is highest for tones eliciting strong threat responding, and that distinct subnetworks can be identified that appear to encode tone-specific sensory features versus learned threat-related response structure.

      To my knowledge, this may be the first study to comprehensively examine neural encoding of fear generalization in prelimbic cortex (PL). The manuscript is ambitious and technically interesting, and several aspects are potentially important. In particular, the suggestion that neurons showing graded, learning-related response patterns become selectively stabilized over time is intriguing. The inclusion of two CS+ training conditions and a no-shock control also strengthens the case that at least some of the reported effects are related to associative learning rather than simple sensory differences. However, in its current form, the manuscript does not yet fully support the strength of the conceptual claims. Several issues limit confidence in the interpretation, including the possibility that repeated testing itself contributes to changes across days, uncertainty about the relationship between neural activity and freezing behavior, limited quantitative documentation of longitudinal cell registration, and a number of problems in figure clarity and statistical framing. Overall, the study contains promising observations, but the claims should be narrowed, and several analyses or controls would be needed to fully support the proposed framework.

      Detailed Comments

      (1) A general concern is that the repeated test procedure itself may contribute to extinction. Because the animals are exposed to multiple CS frequencies across multiple test days, and each tone is presented three times per session, some of the reported changes in behavior and neural activity across days could reflect extinction or repeated nonreinforced retrieval rather than the passage of time per se. This is especially relevant given that the manuscript makes claims about recent versus remote representations and representational drift over 30 days. At a minimum, the authors should discuss this limitation explicitly and temper claims about time-dependent changes. Ideally, they would include a control group in which animals are tested only once or twice (e.g., at an early and later time point with fewer CS frequencies), or a reduced-frequency testing design that minimizes extinction while still allowing evaluation of recent versus remote memory.

      We agree with the reviewer that repeated testing is an inherent limitation of longitudinal memory studies and may itself contribute to some neural changes across sessions. However, several aspects of our behavioral design and results argue against extinction or repeated nonreinforced retrieval as the primary drivers of the observed effects. Importantly, discrimination ratios remained stable or increased across time rather than progressively diminishing as would be expected under extinction (this new analysis will be added to the resubmission). Nevertheless, we will address this important point in the Discussion and explicitly acknowledge that repeated retrieval may contribute to some component of the observed representational changes.

      (2) More generally, some of the reported learning-related neural differences may be driven by behavioral differences, particularly freezing, rather than by learning or generalization per se. For example, animals that freeze more to certain frequencies may show corresponding neural response differences simply because freezing alters PL activity. The authors should examine this possibility more directly. Analyses testing whether recorded cells encode freezing behavior, or whether tone frequency-related neural differences remain robust when comparing high- and low-freezing epochs, would help determine whether the reported effects reflect learned stimulus value rather than behavioral state differences.

      We thank the reviewer for raising this important point, which was also noted by the other reviewers. To address this issue, we will implement Reviewer 3’s suggested Generalized Linear Model (GLM) analysis using inferred spiking activity derived from the Ca2+ signals, with both tone identity and freezing behavior included as predictors. Because freezing behavior varies across trials whereas stimulus identity is fixed, this approach will allow us to dissociate their respective contributions to neuronal activity. If, after accounting for freezing behavior, responsive neurons continue to exhibit graded coding consistent with inferred threat value, this would strengthen the interpretation that the identified ensembles reflect generalization gradients related to aversive value rather than freezing behavior alone. Otherwise, we will adjust the conclusions according to the interpretation that freezing itself drives the generalization gradients.

      (3) A central feature of the manuscript is the analysis of neural response properties over an extended period of time, up to 30 days after learning. However, aside from a brief mention in the Methods that spatial registration was used, the manuscript provides very little quantitative information about this critical aspect of the study. The paper would be strengthened by including explicit metrics describing longitudinal cell tracking, such as the number and proportion of ROIs retained across all sessions, distributions of spatial-footprint correlations or centroid distances across days, and representative examples of matched imaging fields over time. Without this information, it is difficult to assess how strongly the longitudinal claims are supported.

      We thank the reviewer for this suggestion. We will include measures of registration quality in the resubmission.

      (4) The text states that "Figs. 1c and 1d show GCaMP6f expression in PL, representative calcium footprints, and activity traces". However, the figure as presented does not clearly show all of these elements, at least not in a way that matches the description in the Results. The correspondence between text and figure should be corrected.

      We will correct correspondence between text and Figure.

      (5) The labeling of Figure 2a is insufficient for interpretation. The legend states that the panel shows raster plots of sound responsiveness, but the axes and scaling are not clearly defined. It is not clear from the figure what the x-axis represents, whether the y-axis corresponds to individual neurons, where the CS period occurs, or what the activity scale at the right denotes. Also, the term 'rasters' implies that spikes were analyzed. It seems that the spike inference approach (CASCADE) was only used for later analyses. Perhaps 'heat-plot' would be more accurate here? Generally, this figure should be annotated more clearly so that the reader can understand it without referring back to the Methods.

      Thank you for this suggestion. We will clarify the labelling of the Figure 2a and call the graphs “activity-plots”.

      (6) In relation to Figure 3, the analysis of population-averaged responses across tone frequencies is useful, but the manuscript would be stronger with additional statistical analyses across time and across groups. For example, if the authors want to argue that learning induces graded changes in neural responses and that these evolve across time, they should directly compare within-group responses across days and also compare matched frequencies between the conditioned groups and the no-shock controls. These analyses would help establish whether the observed differences are genuinely learning dependent and whether they change significantly over time.

      We will redo the Statistics of Figure 3 to take into account the following variables: group (CS15, CS3, no shocks), frequency (3, 7, 11, 15), and day of testing (2, 15, 30).

      (7) The inclusion of two different CS+ frequencies and a no-shock control is a strength of the study and substantially improves the interpretation that graded neural responses are related to learning and generalization rather than to simple sensory processing or passage of time. That said, I am not entirely comfortable with the use of the term "inference" throughout the manuscript. What is being measured here appears closer to sensory generalization than inference in a stronger cognitive sense. The current task does not clearly require that animals infer hidden structure or stimulus value through abstract reasoning; rather, the generalized stimulus may simply be treated as similar to the conditioned cue. The terminology should therefore be reconsidered or softened.

      We thank the reviewer for appreciating the strengths of the experimental design and for this thoughtful suggestion regarding terminology. We agree that the term “inference” may overstate the cognitive processes engaged by the current task. Accordingly, we will revise the terminology throughout the manuscript to describe these effects as graded generalization of threat value across stimuli.

      (8) I also found the use of the term "valence" somewhat problematic. The manuscript appears to use valence to refer to graded responding across tones with different aversive significance, but valence typically refers more broadly to distinctions between appetitive and aversive value. Here, terms such as "threat value," "aversive value," may be more precise. The authors should consider revising this language throughout.

      We will correct the language and use “threat value”.

      Reviewer #2 (Public review):

      Summary:

      The following points are those that occurred to me across readings of the paper. They are listed in what I take to be the order of their significance. Many of the points relate to the loose use of language and invocation of concepts that are not warranted, given the study design and results obtained.

      Major Comments:

      (1) The concept of ensemble turnover is interesting - the way it is introduced and discussed implies some type of spontaneous change in the neural underpinnings of fear discrimination and generalization in the PL. But, of course, every trial involves an opportunity to learn about the threat CS or the generalization test stimuli, and I am troubled by the thought that stability in the neural underpinnings of fear discrimination and generalization will actually reflect the level of defensive behaviours evoked on different trial types and/or the discrepancy between those behaviours and the outcome of a given trial in the generalization test. That is, stability in the neural underpinnings may be related to an animal's certainty or uncertainty in the contingency between a stimulus and danger; or, put another way, an animal's confidence that danger will or won't occur given the presence of some stimulus. This is not uninteresting. It is, however, not considered anywhere in the paper, which is overloaded with references to inferred threat values and integration of information across different types of stimuli. The protocol is not one that requires inference about anything or integration across anything.

      We thank the reviewer for these important points, which we address in further detail below.

      Ongoing learning during test sessions: The reviewer correctly notes that unreinforced test presentations may constitute extinction-learning trials and that some neural changes across days could therefore reflect ongoing learning rather than spontaneous ensemble reorganization. However, new analyses indicate that extinction is unlikely to be the primary driver of our findings. Discrimination ratios do not decay over time; instead, they either sharpen or remain stable across sessions (new analyses to be included in the resubmission). These results argue against robust extinction as the primary source of the neural changes observed across sessions. This interpretation is also consistent with the strength of our conditioning protocol, which used 10 CS+ shock pairings and 10 CS− no-shock pairings specifically to minimize extinction across repeated testing sessions. Nevertheless, we acknowledge that the current design cannot fully dissociate time-dependent consolidation from retrieval-induced plasticity, and we will explicitly discuss this limitation in the revised Discussion.

      Stability reflecting behavioral consistency: We agree this alternative cannot be fully excluded. However, the cluster stability analyses assess identity at the level of response profile across all four frequencies, not response magnitude alone. Tone-selective clusters, which also show consistent behavioral correlates (firing rate correlates with threat-value, Fig. S8), do not show equivalent profile stability, suggesting that the stability of graded clusters is not simply a consequence of behavioral consistency. This point will be added to the Discussion in the resubmission.

      Language of "inference" and "integration": The reviewer is correct that responses to novel tones are consistent with graded stimulus generalization. We will substantially revise the manuscript to replace "inference" and "integration" with more precise language describing graded frequency generalization gradients.

      (2) I appreciate the link to Gu and Johansen in paragraph 3 of the Introduction, but the type of generalization under investigation here is not the same as the type of 'generalization' studied by Gu and Johansen [who used a sensory preconditioning protocol]. Nonetheless, the authors have forced the language used by Gu and Johansen into their paper, and this has created tension [at least for this reader] as the concepts introduced by Gu and Johansen [inference, integration] are simply not relevant given the generalization protocol used here. Here are a few examples of points where the tension might interfere with a reader's understanding:

      We thank the reviewer for these specific and constructive criticisms. We will revise the manuscript throughout to remove or redefine terms like "inferred valence" and "integration," replacing them with clearer, more accurate descriptions of gradient generalization of threat value. Below we address each point raised by the reviewer regarding terminology clarifications.

      (a) 'We hypothesized that generalization to novel stimuli depends on stable subnetwork organization that enables comparisons between learned and inferred valence, as well as population-level features that reduce variability across related representations.'

      I understand the words in the hypothesis, but can't form a representation of what is being said because of the reference to terms that stand in need of clarification [inferred valence, variability across related representations], but, ultimately, won't be clarified. This needs to be re-expressed so that the reader can appreciate what is being said.

      The hypothesis will be rewritten as: "We hypothesized that generalization to tones acoustically similar to the CS+ and CS− depends on the emergence of stable ensembles encoding threat value, and that population-level response similarity across stimuli would correlate with the degree of behavioral fear generalization, consistent with prior work in auditory cortex [1]."

      (b) 'Our results show that stable cortical subnetworks integrate the emotional "gist" of memory and inferred valence for novel cues over time, despite ongoing ensemble reorganization, and that population-level firing rate similarity across stimulus presentations determines threat generalization.'

      Again, what does this mean? How is the gist of a memory integrated with inferred valence for novel cues over time? The statement simply doesn't make sense. This needs to be rewritten for clarity.

      The summary statement will be rewritten: "Our results show that stable cortical sub-ensembles preserve the emotional content of the fear memory over time, despite ongoing ensemble reorganization, and that population-level firing rate similarity in response to tones associated with threat correlates with the degree of behavioral threat generalization."

      (c) 'In CS⁺15 mice, positively modulated sound-responsive neurons exhibited graded tone activity reflecting the contingency learned valence as well as the inferred valence of novel tones across testing days...'.

      Can this be rewritten as 'In CS⁺15 mice, positively modulated sound-responsive neurons exhibited graded activity to the tone CS and its variants that were used to assess generalization.'? The overloading of the text with references to 'contingency learned valence' and 'inferred valence' is unnecessary and makes it much harder to understand what has been shown in the results.

      We will adopt the reviewer's suggested rewording: "In CS+15 mice, positively modulated sound-responsive neurons exhibited graded activity to the tone CS and its variants that were used to assess generalization."

      We will systematically review the entire manuscript to ensure consistency with this revised framing.

      (3) Re the same passage of text as in 2c:

      Is it the case that these neurons are simply tracking the expression of freezing to the various tones? The same question applies to the results obtained for the CS+3 mice. If this is the case, then why should the results be taken to support the banner statement that 'Sound-modulated PL population responses encode learned and inferred valence' - these analyses do not support that statement. And, as indicated, I don't believe that the language of learned and inferred valence is appropriate to such statements, given the nature of the protocol used and results obtained. It is a study looking at how populations of neurons in the PL respond during presentations of auditory stimuli that were subject to discriminative conditioning, and during tests of generalized freezing to other [intermediate] auditory stimuli.

      The reviewer is correct that the graded population responses observed in PL could reflect freezing behavior across tone frequencies rather than encoding an abstract threat-value representation. This important concern was also raised by other reviewers. To address it directly, we will follow Reviewer 3’s suggestion and implement a Generalized Linear Model (GLM) using inferred spiking activity derived from the Ca2+ signals, with both tone identity and freezing behavior included as predictors. This analysis will allow us to dissociate the respective contributions of tone frequency and freezing to the graded neural responses. Based on the outcome of this analysis, we will revise and appropriately adjust our conclusions.

      In addition, we will revise the section heading and surrounding text to remove the terminology of “learned and inferred valence.” Instead, the findings will be described more conservatively as: “PL population responses reflect behavioral generalization to auditory stimuli following discriminative fear conditioning.”

      (4) It is stated that:

      'In no-shock controls, although both positive and negative responses were present, population activity was not modulated by tone frequency or valence'.

      What does this mean? I can understand that population activity was not modulated by tone frequency. But what does it mean to say that it was not modulated by valence? Why should it have been when none of the tones were conditioned in this group and, hence, mice were responding to all the tones equally? And given that this is true, I don't understand the use of 'valence' here, or the subsequent statements in this paragraph that 'graded responses require associative learning' and that 'PL population responses encode graded sound-valence associations that reflect both learning and inference, closely matching behavioral generalization.' The latter statement is particularly unwarranted and, again, highlights a major issue with the paper. It could and should be rewritten as 'PL population responses reflect behavioral generalization.' There is nothing in the additional language that adds to the reader's understanding of what has been shown. The reference to 'graded sound-valence associations that reflect both learning and inference' is completely unwarranted, given the nature of this study. It is anathema to the vast literature on stimulus generalization. If the authors wished to make statements of this sort, they should have taken a different approach, perhaps using protocols like those featured in Gu and Johansen.

      The reviewer is correct that controls do not form threat associations; however, these animals still could respond differentially to distinct frequencies, something that is not reflected in the data. We will correct the section indicating that distinct neutral frequencies do not produce graded responses: "graded responses require associative learning" will be retained but reframed simply as: "graded frequency-dependent population responses were absent in animals that did not receive fear conditioning." The concluding statement of the paragraph will be rewritten as: "PL population responses reflect behavioral generalization to acoustically similar stimuli following discriminative conditioning," in line with the reviewer's suggestion.

      (5) The section titled, 'Consistently active neurons preserve valence representations as newly recruited neurons sharpen remote memory traces' ends with the following summary:

      'Together, these results indicate that consistently active neurons maintain stable representations of learned and inferred sound associations across time, whereas neurons recruited after conditioning progressively acquire graded tuning at later retrieval stages. This dynamic refinement suggests that cortical memory representations become increasingly selective during systems consolidation, while a stable neuronal subpopulation preserves the core emotional content of the memory.'

      Once again, the summary is not in keeping with the results obtained. The 'dynamic refinement' of representations is far more likely to reflect the repeated testing across days 1, 15, and 30 rather than anything to do with systems consolidation - at the very least, it is the simplest interpretation of the results. The impact of repeated testing is evident in the sharpening of generalization gradients over time, which is contrary to what is otherwise observed in the literature - the incredibly well -documented broadening of generalization gradients with time. Given this impact of repeated testing, surely the changes in the neuronal population that underlie performance are more likely to reflect the learning that occurs on days 1, 15, and 30, which is reflected in reduced freezing to the non-conditioned tones. If this is a reasonable take on the results, then I don't see the basis for invoking systems consolidation at all, and I don't see the basis for inferring a stable neuronal subpopulation that preserves the emotional content of the memory. Rather, non-reinforced presentations of 'never-reinforced' tones result in recruitment of additional neurons that result in suppression of freezing responses to those stimuli.

      We respectfully disagree with the reviewer’s interpretation. While repeated testing cannot be entirely excluded as a contributing factor, several lines of evidence suggest that it cannot fully account for our observations.

      Regarding extinction: discrimination ratios between CS+ and all other frequencies either remained stable or increased over time (new analysis included in resubmission), indicating that animals continued to discriminate threat value across the testing period rather than showing the progressive suppression expected under extinction — the opposite of what we observe.

      Regarding the recruitment of new neurons: repeated non-reinforced tone exposure would be expected to produce stimulus-specific adaptation — characterized by reduced, less discriminative neural responsiveness and flatter tuning profiles [2]— not the progressive sharpening we observe. The same would be expected if these neurons represent or are associated with new extinction learning.

      Finally, sharpening of generalization gradients during repeated within-subjects testing has been reported previously [3], suggesting that successive exposures may promote more precise discrimination in some cases. Consistent with this, discrimination learning has also been shown to narrow or sharpen fear generalization gradients rather than broaden them [4], supporting the idea that discriminative conditioning enhances stimulus specificity during testing. Although we cannot exclude the possibility that more extended training could eventually broaden the generalization gradient, under the training parameters and temporal window used in our study, the data support a progressive sharpening of the gradient over time. In the revised Discussion, we will present systems consolidation as the primary interpretive framework and further elaborate on why repeated testing is unlikely to account for the full pattern of behavioral and neural findings reported here.

      (6) In the section titled, 'Population vector similarity at stimulus onset determines degree of generalization', it is stated that:

      'Because population similarity peaked shortly after stimulus onset, we quantified similarity during the first 5 s after tone onset relative to the CS⁺. In CS⁺15 mice, population similarity was highest for 15/15 and 15/11 tone pairs with no differences between them.'

      Isn't this consistent with the view that the population response in the PL simply reflects the level of freezing? Freezing to the 15-15 and 15-11 tones is most likely to be similar on their first presentation prior to the effects of extinction on the 11 Hz tone; hence the results obtained. That is, these results appear to clearly indicate that neuronal responses in the PL reflect the degree of stimulus generalization, as evidenced in freezing behavior. Given all that we know about the involvement of the PL in expressing fear responses, it is not appropriate to claim that 'population vector similarity at stimulus onset *determines* the degree of generalization. The PL responses simply reflect the varying levels of performance displayed to the different types of tones. What have I missed that could be taken to support additional statements?

      The GLM analysis described in our response to reviewers 1 and 3 will directly address the contribution of freezing. We will report these results in the resubmission and revise the interpretive language in the manuscript accordingly.

      However, regarding the analysis of population vector similarity, we need to clarify a point of confusion. The reviewer states “Freezing to the 15-15 and 15-11 tones is most likely to be similar on their first presentation prior to the effects of extinction on the 11 Hz tone; hence the results obtained”. The similarity vectors were calculated by correlating activity across all tone presentations within each testing day, not only the first two presentations. In Fig. 4, “Early” and “Late” refer to the order of a tone within a trial, which we will clarify more explicitly in the resubmission. Notably, repeated-measures analyses did not reveal any effect of the time variable (Fig. 4e,f), indicating that similarity across tone presentations remained high for tones associated with high threat value. Importantly, our data showed no evidence that responses to 11 kHz or 15 kHz in the CS15 group, or to 3 kHz in the CS3 group, exhibited extinction-like patterns at either the behavioral or neural level. Therefore, the persistence of high population similarity across time provides additional evidence against extinction as the primary explanation for our findings.

      We will remove the word "determines" from the manuscript, as our data cannot conclusively establish a causal relationship.

      Later in the same section, it is stated that 'population-level similarity at stimulus onset scales with behavioral threat generalization and is maximal for tones associated with robust threat responses.' For simplicity and, therefore, clarity, this should be rewritten as 'population-level similarity at stimulus onset reflects behavioral threat generalization.'

      We will make this correction.

      (7) In the section titled, 'Different subnetworks encode acoustic versus learned properties of sound association', it is stated that:

      'Our previous analyses show that learned and inferred associations are represented at the population level. However, these results do not resolve whether graded responses arise from pooled activity of frequency-selective neurons or from subnetworks encoding integrated learned valence across tones.'

      What does it mean to say 'integrated learned valence across tones'? As it presently stands, the meaning of the phrase is unclear. It only makes sense if one supposes that generalized freezing responses to the 11 and 7 kHZ tones reflect separate associations between those tones and the aversive foot shock US. This supposition is inconsistent with the rich literature on generalization of Pavlovian conditioned fear responses. Specifically, it is inconsistent with the many theories of fear generalization, which attribute the reduction in fear as one moves away from the specific conditioned stimulus to a decrement in the ability of the test stimulus to activate the trained CS-US association. My strong impression is that the authors would do well to ground their findings in theories of stimulus/fear generalization, of which there are many. This would better serve the results obtained [and the reader's appreciation of them] - at present, the unnecessary invocation of concepts does very little to enhance the reader's appreciation or understanding of what has been found in the study.

      We thank the reviewer for raising this point. The phrase "integrated learned valence across tones" refers specifically to a subpopulation of neurons that respond to all four frequencies in a graded manner, with response magnitude scaling according to threat value. This is distinct from tone-selective neurons, which respond preferentially to a single frequency. The neurons responding to all tones in a graded manner are present only in conditioned animals and not in no-shock controls, demonstrating that their graded response profile is shaped by associative learning.

      We agree, however, that the phrase "integrated learned valence" is unnecessarily opaque and we will replace it with more precise language: these neurons will be described as showing graded frequency-dependent responses whose magnitude scales with threat value. We believe this subpopulation represents a genuinely novel finding that complements the behavioral generalization literature by identifying a specific neural substrate for the generalization gradient within PL.

      (8) Another example of what has been a common theme in this review:

      '...we hypothesized that the PL active ensemble segregates into functionally distinct subnetworks: one encoding tone-specific sensory features with dynamic characteristics, and another responding to all frequencies encoding stable core memory content and inferred emotional valence.'

      What does it mean to say 'all frequencies encoding stable core memory content and inferred emotional valence'? Do the authors mean to say '...and another that tracks freezing/defensive responses regardless of whether they were elicited by the trained CS or one of the generalization test stimuli'?

      As stated in our previous responses, in the resubmission we will determine the contribution of freezing. If we find that freezing predicts graded neural responses, we will adjust the language of the manuscript.

      (9) It is stated that - 'Graded clusters encode emotional valence but constitute only a fraction of the active population; yet valence coding at the population level remains accurate and precise. This indicates that neurons newly recruited into the population-likely frequency-selective and organized within learning-independent clusters-can be shaped by associative processes through modulation of firing activity.'

      What does this mean? Are the authors trying to say that - 'Some clusters of PL neurons track freezing responses. In spite of the fact that these are only a fraction of the total active neuronal population, the population-level response of PL neurons also tracks the levels of fear to the trained tone and its variants used in the test for generalization.' If this is what one wants to say, then the final statement in the reproduced section does not follow. That is, there is no indication that 'neurons newly recruited into the population-likely frequency-selective and organized within learning-independent clusters-can be shaped by associative processes through modulation of firing activity.' As noted, the characteristics of other ensembles that become active across the repeated tests on days 1, 15, and 30 are more likely to reflect learning from non-reinforcement that occurs within and across those sessions. Perhaps this is what is meant by the phrase, 'shaped by associative processes'? If so, it should be stated explicitly instead of left to the reader to work out.

      We thank the reviewer for highlighting the lack of clarity in this passage and agree that the original phrasing was insufficiently precise. What we intended to convey is that only a subset of PL neurons displays graded tuning that tracks behavioral generalization across tones. Nevertheless, despite constituting only a fraction of the total active population, this graded coding is also reflected at the population level. Therefore, we suggest that neurons recruited into the active population after conditioning — likely frequency-selective neurons — contribute to the graded population responses through changes in their firing-rate activity, which is modulated by threat value (Fig. S8). We will rewrite this passage in the resubmission to make this interpretation explicit rather than leaving it to the reader to infer.

      Regarding the reviewer's suggestion that the characteristics of newly recruited neurons more likely reflect learning from non-reinforced exposures during repeated test sessions, we respectfully maintain that this interpretation is difficult to reconcile with two aspects of our data. First, graded-response neurons are absent in no-shock controls that are exposed to nonreinforced repeated testing. Second, as detailed in our responses to previous points, the progressive sharpening of population responses over time is inconsistent with what would be expected from repeated non-reinforced exposure, which would more plausibly produce broader or flatter tuning profiles.

      We agree that the phrase "shaped by associative processes" was ambiguous and will replace it with explicit language clarifying that we refer to fear conditioning as the associative process driving the emergence of graded responses, rather than any learning occurring during the test sessions themselves.

      (10) The following points all relate to the Discussion and reiterate many of the points above. 

      (a) 'A subset of neurons remains consistently active across sessions, preserving core components of the memory trace and supporting inference of emotional valence for novel sounds, while neurons recruited after conditioning progressively acquire valence selectivity at remote time points.'

      'Inference of emotional valence' is unclear and unwarranted for all of the reasons provided above regarding the use of language.

      We will modify the language as stated in the prior points.

      (b) '...Our data reconcile these views by demonstrating that cortical representations of emotional valence emerge rapidly after learning and persist within stable subnetworks, even as the broader population undergoes substantial turnover. This architecture preserves core mnemonic content while allowing flexibility in the surrounding ensemble.'

      These statements assume that the PL neuronal responses reflect something more than the levels of freezing behavior to the different stimuli; what are the grounds for this assumption?

      We will incorporate new analysis (GLM) to better address this point and conclusions.

      (c) 'Importantly, these subnetworks encode both learned contingencies and the inferred valence of novel stimuli along a graded representational axis, suggesting that strong recurrent connectivity provides a stable scaffold for emotional memory representations.'

      What is a graded representational axis, and what part of the first statement suggests that 'strong recurrent connectivity provides a stable scaffold for emotional memory representations'? If the authors' goal was to make statements about emotional memory representations vis-à-vis emotional memory content, they should have used protocols that allowed them to probe such content. The auditory fear conditioning protocol used here [followed by tests for generalization to other auditory stimuli that differ in frequency from the conditioned tone] is not one that lends itself to analysis of emotional memory representations or content.

      We thank the reviewer for this comment and agree that both phrases require clarification or revision.

      By "graded representational axis" we intended to convey that PL population activity varies systematically as a function of stimulus similarity to the conditioned tone — that is, population responses are not categorical but scale continuously with spectral proximity to the CS+. We agree this was not clearly stated and will revise the manuscript accordingly.

      Regarding recurrent connectivity, we agree with the reviewer that nothing in our data directly measures or manipulates connectivity between neurons. This statement was intended as a speculative interpretive hypothesis in the Discussion, motivated by the established literature linking strong recurrent connectivity in prefrontal circuits to stable population-level representations [5]. However, we acknowledge that invoking it in this context, without direct evidence, risks overstating our conclusions. We will revise this sentence to make its speculative nature explicit and ground it more carefully in the cited literature rather than presenting it as an inference from our own data.

      In summary, we will ensure our conclusions will be restricted to population-level coding of learned threat value and its generalization across auditory frequencies. We will revise the relevant passages in the Discussion to ensure that speculative interpretations regarding emotional memory content are either removed or clearly flagged as speculative hypotheses.

      (d) 'Dynamic tone-selective responsive neurons emerge independently of learning, as they are present in both control and experimental mice, reflecting pre-existing PL sensory-driven properties (Hockley & Malmierca, 2024; Zikopoulos & Barbas, 2006).'

      Maybe. They are also likely to have developed as a consequence of the repeated testing on days 1, 15, and 30, which involved intermixed exposures to the tones of different frequencies. That is, rather than 'pre-existing PL sensory-driven properties', the responses of these neurons might reflect the emergence of discrimination between the various tones across testing, and greater suppression of freezing to the non-trained tones compared to the trained tone across the various test intervals.

      We thank the reviewer for this point. Our interpretation that these neurons reflect pre-existing PL sensory-driven properties was based on the observation that tone-selective responses were present in control animals that never received conditioning, consistent with prior reports of sensory responsiveness in PL cortex ([6, 7]. Because these responses emerge from the first time we expose mice to the intermediate frequencies, they cannot be explained by repeated exposure. Moreover, we did not observe progressive refinement, emergence of discrimination-like changes, or suppression of responding to non-reinforced tones in control mice. This difference between conditioned and control animals indicates that repeated tone exposure alone is not sufficient to produce the observed dynamics — associative learning is necessary. We therefore maintain that the tone-selective responses of these neurons reflect pre-existing sensory-driven properties of PL cortex that are present independently of conditioning history.

      In summary, we thank the reviewer for suggesting clarifications to our interpretation, for raising the possibility that freezing behavior may contribute to graded neural responses, and for raising the question of whether repeated tone exposure may contribute to the properties of neurons recruited after conditioning. In the revised manuscript, we will include additional analyses to better dissociate the contributions of freezing behavior and tone identity, clarify passages that were insufficiently precise, and include a paragraph in the Discussion addressing potential alternative explanations alongside our own interpretation of the data.

      Reviewer #3 (Public review):

      Summary:

      Normandin et al. explore the coding of stimuli predicting an aversive event in the prelimbic cortex. Stimuli could either be explicitly paired, explicitly unpaired, or novel but with an inferred association with the aversive event (generalization). Long-term tracking of GCaMP-positive neurons allowed them to examine how coding evolves out to a month following training. In general, they found two types of ensemble codes. One was ensembles coding for each stimulus independently, but with enhanced responding to the one eliciting a freezing response. The other was ensembles that responded to all stimuli in proportion to their similarity to the stimulus paired with the aversive event, either increasing or decreasing their activation with the degree of freezing elicited by a stimulus. Importantly, this second set of ensembles was more stable across days, potentially providing a memory trace.

      Strengths:

      (1) The authors track ensembles in prelimbic cortex over long time scales, providing valuable information on the consolidation of neural codes.

      (2) Neural coding of generalization is examined, which is under-examined in the field.

      We thank the reviewer for appreciating our design to track ensembles over time and the relevance of studying the neural substrates of generalization.

      Weaknesses:

      (1) Difficult to determine if responses treated as encoding stimulus valence are driven instead by the behavior that the stimulus elicits, freezing.

      We thank the reviewer for this thoughtful and constructive comment. We agree that an alternative interpretation is that the graded-response ensembles may partially reflect freezing-related activity rather than mnemonic or salience-related representations of the conditioned stimuli themselves. In the revision, we will acknowledge that prior work has identified PL neurons that encode freezing independently of stimulus identity or associative content. Furthermore, we will implement the reviewer’s suggested generalized linear model (GLM) approach using inferred spiking activity derived from the Ca2+ signals. Specifically, we will include both stimulus identity and freezing behavior as predictors. Because freezing varies across trials whereas stimulus presentation is fixed, this analysis will allow us to dissociate the relative contributions of stimulus-related versus freezing-related activity to the graded neuronal responses. We thank the reviewer for this excellent suggestion.

      If graded stimulus coding remains significant after accounting for freezing behavior, this would strengthen the interpretation that these ensembles encode learned salience or associative properties of the stimuli rather than behavioral output alone. Conversely, if freezing explains a substantial proportion of the variance, we will revise our interpretation accordingly.

      (2) The study implies that the identified ensembles are causally related to valence memory, but no experimental interventions are performed to justify this.

      We appreciate the reviewer's point. We agree that our data are correlational in nature and that establishing a causal relationship between identified ensembles and valence memory would require experimental interventions such holographic two-photon manipulations, which are beyond the scope of the present study but represent an important direction for future work.

      To provide an indirect link between ensemble organization and behavior within the constraints of the current dataset, we will examine inter-individual variability in the revised manuscript. Specifically, we will test whether the proportion of neurons participating in stable graded-response ensembles versus dynamic stimulus-specific ensembles predicts individual differences in freezing behavior and fear generalization across retrieval sessions. If animals with a higher proportion of stable graded-response neurons show stronger discrimination and less generalization to non-conditioned tones, this would strengthen the association between ensemble organization and behavioral outcome, while remaining correlational in interpretation.

      We will modify the manuscript terminology accordingly, replacing causal language with phrasing that accurately reflects the associative nature of our conclusions.

      References

      (1) Aschauer, D.F., et al., Learning-induced biases in the ongoing dynamics of sensory representations predict stimulus generalization. Cell Rep, 2022. 38(6): p. 110340.

      (2) Kato, H.K., S.N. Gillet, and J.S. Isaacson, Flexible Sensory Representations in Auditory Cortex Driven by Behavioral Relevance. Neuron, 2015. 88(5): p. 1027–1039.

      (3) Vervliet, B., et al., Generalization gradients in human predictive learning: Effects of discrimination training and within-subjects testing. Learning and Motivation, 2011. 42(3): p. 210–220.

      (4) Dunsmoor, J.E. and K.S. LaBar, Effects of discrimination training on fear generalization gradients and perceptual classification in humans. Behav Neurosci, 2013. 127(3): p. 350–6.

      (5) Mante, V., et al., Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature, 2013. 503(7474): p. 78–84.

      (6) Hockley, A. and M.S. Malmierca, Auditory processing control by the medial prefrontal cortex: A review of the rodent functional organisation. Hear Res, 2024. 443: p. 108954.

      (7) Zikopoulos, B. and H. Barbas, Prefrontal projections to the thalamic reticular nucleus form a unique circuit for attentional mechanisms. J Neurosci, 2006. 26(28): p. 7348–61.

    1. eLife Assessment

      This short report is an important study that visual acuity declines nonlinearly with cone dropout, while eye motion partially compensates by improving sampling from remaining cones. The method for experimentally simulating cone dropout is compelling, leveraging state-of-the-art imaging and testing in human subjects. Inclusion of additional analysis on absolute cone density and eye motion would further strengthen the study.

    2. Reviewer #1 (Public review):

      The authors demonstrate an innovative approach to investigate the effect of cone dropout on visual acuity using their newly developed olo system. By systematically reducing the coverage of real-world input to the cone photoreceptor mosaic ("cone dropout condition"), the authors are able to assess how having fewer cones leads to reduced vision, in comparison to existing approaches ("pixel dropout condition").

      The capture of a rich dataset, including cone imaging and eye motion, is valuable. Benchmarking with the prior literature, suggesting that good visual acuity can be maintained despite a 50% loss in cone density, is impressive. However, it is known that cone density varies dramatically from the peak cone density location in the foveal center to even a location a few degrees outside of the fovea. In addition, there is a high degree of subject-to-subject variation in peak cone density. Given that the C stimulus is hollow in the middle, the stimulus does not actually hit the location of the peak cone density but must land slightly outside of it. Therefore, considering the actual cone density of where the stimulus lands will be important to discuss and/or analyze.

      The observation of visual acuity maintenance with cone dropout has been a longstanding mystery since the 2013/2018 papers by Ratnam and Foote. The authors should be commended for their approach to addressing this important question. However, there are some simplifications and assumptions being applied to make this jump (i.e., that a 50% reduction in cone stimulation in a healthy eye is comparable to a 50% reduction in cone density in a patient). It seems unlikely that, in a patient's eye, with cone dropout, there will be gaps in the mosaic. Not considering any other non-photoreceptor-related reasons for visual acuity loss, which can occur in patients, the cone aperture acceptance angle may be different due to changes in cone size or packing; the sensitivity of individual cones may also be reduced due to deficits in the visual cycle recovery, which could be affected in disease. Some of these limitations could be addressed and acknowledged more explicitly.

      Overall, this is an impressive study incorporating state-of-the-art technology to probe the fundamental limits of human vision.

    1. eLife Assessment

      Fujita et al. examine the effects of AM-2099, a Nav1.7 inhibitor, on the excitability of human dorsal root ganglion neurons and compare these results to their prior study of Nav1.8 inhibition by suzetrigine. They show that the Nav1.7 inhibitor primarily alters action potential threshold and initiation, but not repetitive firing, whereas Nav1.8 inhibition elicits much stronger inhibition on repetitive firing. These complementary roles of Nav1.7 and Nav1.8 provide a plausible cellular explanation for the limited clinical success of Nav1.7 inhibitors compared to Nav1.8 inhibitors for chronic pain. While the conclusions are important and solid, there are some key shortcomings that should be addressed to strengthen the study.

    2. Reviewer #1 (Public review):

      Summary:

      Fujita and colleagues investigated two selective peripheral nerve voltage-gated sodium channel inhibitors targeting either Nav1.7 or Nav1.8 on the excitability of human dorsal root ganglion neurons. The authors discovered that Nav1.8 inhibition is more effective at suppressing repetitive firing of DRG neurons, and this may explain the greater clinical efficacy observed for suzetrigine.

      Strengths:

      The study is interesting, and the findings are conceptually satisfying in that they may explain one aspect of Nav1.7 vs Nav1.8 targeting success.

      Weaknesses:

      (1) The use of postmortem human DRG neurons provides translational relevance, but the use of these cells is also a liability, given their high degree of variability. Of note are the 10 to 20-fold differences in baseline properties among cells, which dwarf the effects of the test compounds. The experiments may suffer from undersampling.

      (2) A potential confounder when using post-mortem human DRG neurons is heterogeneity of cell types. The methods clearly state that the cells selected for recording were of 'generally' small size, but specific criteria for what constitutes 'small' or other unstated selection criteria were not provided. A table of individual cell capacitance and input resistance values, along with information about individual donors (age, sex, ethnicity), is important to include. Additionally, some discussion of how DRG neuron heterogeneity impacts the findings. This relates to concern #1 about sample size determination and how cell heterogeneity factored into this calculation.

    3. Reviewer #2 (Public review):

      Summary:

      The authors examine the functional role of Nav1.7 voltage-gated sodium channels in human sensory neuron electrogenesis using a Nav1.7 selective inhibitor and human dorsal root ganglion neurons obtained from organ donors. Patch-clamp electrophysiology is used at physiological temperature to measure the impact of Nav1.7 inhibition on sensory neurons' action potential firing. This is an important topic as Nav1.7 and Nav1.8 have been identified as therapeutic targets for the treatment of pain, but there has been mixed success with isoform-specific inhibitors in clinical trials. The data suggest that Nav1.7 and Nav1.8 have overlapping yet complementary functions in nociceptor neurons and that targeting both may be most effective for reducing nociception.

      Strengths:

      The data are of high quality. Action potential properties are measured at 37 degrees Celsius. Threshold is measured using brief pulses. The Nav1.7 inhibitor has been reported to be highly selective for Nav1.7 over Nav1.8 and moderately selective for Nav1.7 over Nav1.1 and Nav1.6. Data are collected using identical conditions and protocols to a previous study on the role of Nav1.8 in similar neurons.

      Weaknesses:

      The study relies on a single Nav1.7 inhibitor that has not been extensively characterized. One prior study indicates that the IC50 is around 140 nM, thus the 600 nM concentration used in this study could be predicted to reduce Nav1.7 currents by 80%. However, there is no voltage-clamp data in the current study to confirm this, and therefore, it is unclear if the batch of AM-2099 is as potent as reported in the paper that initially described its selectivity. The impact of Nav1.7 inhibition is compared to data from a previous study by this lab, and this is a minor concern. It would have been interesting to see if the combined inhibition of Nav1.7 and Nav1.8 completely blocked action potential generation in the human DRG neurons.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Fujita/Jo/Stewart/Osorno et al. investigate the contribution of Nav1.7 in regulating the excitability and firing properties of human dorsal root ganglion (hDRG) neurons in vitro. The authors characterize the effects of a previously reported Nav1.7-selective blocker AM-2099 in cultured hDRG neurons from postmortem organ donors. The authors observed modest changes in many of the properties expected by inhibiting Nav channels, including decreased action potential upstroke rate and amplitude, while increasing the voltage and current thresholds for spike generation. However, AM-2099 did not change the maximum number of APs in response to suprathreshold stimulation, leading the authors to conclude that Nav1.7 inhibition alone has limited efficacy in reducing the firing properties of hDRG neurons and that Nav1.7 blockers may have limited efficacy as analgesics. This is surprising, given that patients with loss-of-function mutations in Nav1.7 suffer from congenital insensitivity to pain. While it may indeed be true that pharmacological inhibition of Nav1.7 is unlikely to produce analgesia, the present study was limited to a single concentration of AM-2099. The manuscript would be significantly strengthened by a more careful and thorough pharmacological characterization of this compound, which has not been widely used or validated in native human DRG neurons.

      Strengths:

      Experiments are well-designed and executed, and the results presented are convincing. The focus on voltage-gated sodium channels in native human DRG neurons is highly relevant to recent efforts to develop safer analgesic options for chronic pain in people.

      Weaknesses:

      Only a single concentration of AM-2099 was used for all experiments. This compound was reported to be selective for cloned human Nav1.7 channels in heterologous systems, but has not been validated in other studies after the original publication in 2016. Since the original study reported a substantial state-dependent block of recombinant Nav1.7 channels, more detailed pharmacological characterization of AM-2099 is needed in human DRG neurons to fully support these claims. This study would be significantly strengthened by the inclusion of dose-response curves to assess how much of the sodium current is inhibited at this concentration, confirming selectivity in hDRG, and whether maximal inhibition of Nav1.7 still has limited efficacy in reducing the firing of native human sensory neurons.

    1. eLife Assessment

      This valuable study analyses correlations between traits of Chinese frog species and their Red List status, finding differences between adults and larvae and thus pointing to the importance of considering different life-cycle stages in this and possibly other animal groups when assessing species extinction risks. The current study is, however, incomplete because of unclear threat categories for tadpoles, the omission of other key species traits, and insufficient statistical analysis.

    2. Reviewer #1 (Public review):

      The manuscript shows that different traits of adults and larvae correlate with Red List status. The authors argue that this shows a big gap in the conservation of amphibians and that the traits of all life stages should be taken into account in amphibian conservation. Specifically, amphibian conservation should do more for the habitats where the larvae live.

      The manuscript is well written and easy to understand. The methods are sound.

      While the study will make an interesting contribution to conservation science, there are many things that I disagree with.

      I don't think that amphibian larvae and their requirements are a "blind spot" as the title suggests. When reading the manuscript, I didn't learn how conservation practice should change in response to the results.

      I wonder whether the relationship between species traits and extinction risk is of great importance for conservation. If a species is Data Deficient on the IUCN Red List, then species traits could be used to predict its Red List category. However, for other conservation projects, I don't see how this would work. How would traits be linked to captive breeding, conservation translocation, pond construction or habitat management in general? In some cases, I can envision a link between species traits and pond hydroperiod.

      Species traits are body size and morphological traits. That makes sense. However, one of the species traits was microhabitat. I find it far-fetched to call habitat a species trait. This is standard habitat ecology. It is well known that habitats matter and that different habitat types face different threats, and consequently, the species that live in those habitats. Furthermore, habitat and morphology may be confounded. For example, tadpoles in lentic and lotic habitats have very different morphologies. So is it habitat or morphology?

      I don't know how the threat status of Chinese amphibians is determined. IUCN has multiple reasons why a species can be Red Listed. One reason is range size, and another reason is population decline. Personally, I don't think they should be pooled in an analysis because they are fundamentally different reasons why a species has a high extinction risk. A reduction in population size of greater than 30% in 10 years or 3 generations is not the same thing as a small distribution range. Another issue is that IUCN developed the Green Status of species. The Green Status shows that even a species which is LC on the Red List may be significantly depleted.

      The species traits in Table 1 are mostly functional/morphological and body size related (and microhabitat). While there may be correlations between traits and Red List status, it is unknown whether this is correlation or causation. In addition, it is difficult to know the conservation interventions that may be necessary now that we know that relative head with and Red List status are correlated.

      In the discussion, the authors explain why body size and other traits may affect extinction risk and whether there is a causal relationship. I agree that body size may have a direct effect because larger species are harvested more frequently (it was interesting to learn that tadpoles are harvested as well). However, as macroecological studies show, smaller species often have larger populations than larger species. Abundance may matter.

      I found it much harder to understand why relative head length and tympanum size correlated with Red List status. I wasn't convinced by the arguments in the discussion. Typanum size may be related to hearing and anthropogenic noise. Several studies are cited which show that frogs alter their calling behaviour in response to noise. Crucially, however, they describe changes in behaviour or properties of the advertisement call, yet none show that noise has effects on population viability. If some anthropogenic stressor affects individuals, then this does not mean that it will cause a population decline. When IUCN published the second global amphibian assessment, did they list noise as a major threat to amphibians?

      There are statements that the tadpole stage is the most important stage: "a critical period for amphibian survival" (line 78-79). While there is high mortality in the tadpole stage, tadpole survival is rather unlikely to affect population survival. Many population models show this. See, for example, Biek et al. 2002 in Conservation Biology. Other papers have argued that the postmetamorphic juvenile stage is most important (Petrovan and Schmidt 2009 Biological Conservation).

      The authors repeatedly make the statement that amphibian conservation should focus more on the tadpole stage. I don't understand why this statement is made. For example, a major activity in amphibian conservation is the restoration and de novo construction of ponds (see Calhoun et al. 2014 PNAS, Moor et al. 2022 PNAS). Ponds are habitats for tadpoles. Others removed fish from amphibian breeding sites because fish prey on tadpoles (and adults; see Vredenburg 2004 PNAS). Semlitsch (2002 in Conservation Biology) argued that the management of pond hydroperiod is a critical element of amphibian recovery plans. Ponds should be temporary because this effectively removes predators that consume tadpoles. Clearly, the tadpole stage is not a neglected stage in amphibian conservation.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors tried to examine whether there are differences in the association between functional traits and extinction risk in adult and tadpole stages in Chinese anurans.

      Strengths:

      Overall, I think the basic idea of the study is interesting and important. It can be applied to other taxa with complex life cycles throughout the animal kingdom.

      Weaknesses:

      I do not think the authors achieve their aims, as the results only partially support their conclusions. The study has several drawbacks that need to be clarified or revised, including the unclear threat categories for tadpoles, model selection and model averaging, the potential problem of AIC, and the omission of other important species traits.

    1. eLife Assessment

      This work provides a fundamental advance through a detailed and integrative analysis of how the tsetse fly feeds on blood, demonstrating that successful penetration depends on subtle structural adaptations rather than extreme forces or unusual anatomy. By combining high-resolution imaging, innovative biomechanical measurements, and experiments on artificial skin, the study offers complementary and compelling evidence, with clear data supporting a robust mechanistic interpretation. These findings have broad significance as they clarify the biomechanics of vector feeding with implications for the transmission of diseases such as African trypanosomiasis across diverse hosts.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript provides a comprehensive and mechanistic analysis of how tsetse flies feed on blood across a wide range of host skin types. The authors combine detailed anatomical characterization of the feeding apparatus with quantitative measurements of mechanical properties, probing forces, and blood uptake, complemented by experiments using artificial skin. They show that tsetse flies do not rely on extreme forces or uniquely specialized structures, but instead on subtle and highly efficient structural and mechanical adaptations (such as the toothed labellum and coordinated proboscis movements) to achieve effective blood pool feeding. The study successfully moves beyond descriptive anatomy to a quantitative, functional analysis that explains how feeding is accomplished across diverse substrates.

      Strengths:

      A major strength of the work is the impressive integration of multiple complementary approaches. Advanced imaging tools provide a convincing three-dimensional view of the proboscis, labellum, and associated structures, while direct force measurements and blood intake quantification place these observations on a solid quantitative footing. The use of artificial skin with different mechanical properties is particularly powerful, as it allows structure-function relationships to be tested under controlled and reproducible conditions. Together, these datasets provide strong and coherent support for the authors' central conclusions. The quantitative treatment of feeding mechanics represents a significant advance over largely descriptive prior work by others (e.g., Gibson W et al 2017) and establishes a valuable mechanistic insight for studying blood feeding in insect vectors more broadly.

      Weaknesses:

      The study focuses almost entirely on uninfected flies and does not address how infection might alter feeding mechanics or performance. Previous work has shown that trypanosome infection can affect salivary gland function and feeding time (Van Den Abbeele et al 2010), and even cause damage to mouthparts, all of which can influence feeding behavior and efficiency. While this does not detract from the technical quality or the core findings of the study, a more explicit discussion of these biological variables would help place the results in a broader transmission-relevant context and clarify how generalizable the conclusions are to natural infection settings.

      Overall, this is an outstanding and carefully executed study that will have a significant impact on the fields of vector biology and parasite transmission.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript presents an impressively detailed, multidisciplinary analysis of the mechanics of blood feeding in Glossina spp. Combining SEM, CLSM, µCT, FIB‑SEM, macro‑videography, and quantitative force measurements, the authors characterize the structures and biomechanics of attachment, proboscis deployment, tissue penetration, and blood uptake. They also examine interactions with diverse host‑type substrates, from human skin equivalents to cow, deer, and lizard skin, and integrate these with force measurements to quantify penetration and retraction dynamics.

      The work's key conclusion is that the tsetse fly does not rely on any single exceptional morphological innovation, but rather uses a suite of subtle structural features and retractive forces to feed efficiently across diverse hosts. This result is novel, insightful, and evolutionarily compelling. Overall, this is a strong manuscript that combines methodological sophistication with biological relevance. It should be of high interest to researchers studying vector biology, biomechanics, parasite transmission, and vector-host interactions.

      Strengths:

      (1) The combination of SEM, CLSM, µCT, and FIB‑SEM provides an unusually comprehensive anatomical characterization of the tsetse feeding apparatus.

      (2) The direct measurement of proboscis penetration and retraction forces across diverse substrates is highly original and fills a major knowledge gap in vector-host interaction mechanics.

      (3) The study bridges morphology, mechanics, behavior, and host tissue properties, which strengthens the overall conclusions.

      (4) Imaging of trypanosomes within the hypopharynx and surrounding tissue during feeding provides new information about parasite delivery mechanisms.

      Main Comments:

      (1) The authors conclude that feeding versatility arises from the sum of subtle adaptations. This interpretation is reasonable, but it would help to sharpen which findings most robustly support this statement. For example, the relative similarity of proboscis forces across skin types is compelling evidence that the proboscis is broadly tuned rather than specialized. The observation that tsetse targets softer interscale regions on lizard skin suggests behavioural selectivity, not morphological specialisation. It would strengthen the discussion to highlight which data most directly refute the hypothesis of a unique specialization.

      (2) A central finding is that retraction forces exceed penetration forces across substrates, implying that backward pulling is a key component of wound creation. However, the biological interpretation could be deepened. Specifically, do the authors believe retraction serves primarily to enlarge the pool‑feeding site? How does this compare mechanically to mosquito fascicle oscillation or other blood‑feeding arthropods (especially other flies such as those in the tabanidae family)? Could retraction forces contribute to anchoring or resisting host grooming behaviors?

      (3) The study analyzes a diverse set of substrates, which is a strength. However, some caveats deserve explicit discussion. Human skin equivalents and dermal equivalents lack the full mechanical complexity of real skin (e.g., innervation, perfusion, tension). Frozen or ethanol‑stored samples, particularly reptile skin, may also exhibit altered mechanical properties compared to live tissues. These limitations do not undermine the findings but should be explicitly acknowledged as they influence the interpretation of absolute force magnitudes.

      (4) The SEM and FIB‑SEM images showing trypanosomes in the hypopharynx and surrounding tissue during penetration are visually striking and suggest rapid dispersal. It would be helpful to connect these observations more clearly to the kinetics of parasite deposition and whether mechanical tissue laceration is likely to increase inoculation efficiency. Without conducting additional experiments, the authors could discuss whether these findings support or modify existing models of salivary-gland-derived parasite release.

      (5) The authors demonstrate that tsetse attachment abilities fall within the range of generalist insects and are far lower than those of obligate ectoparasites. However, the manuscript could discuss how attachment forces relate to the tsetse's ecological context, e.g., whether their attachment is generally brief, whether host shaking strongly selects for grip strength, etc. Is there evidence that other Glossina species or tabanids with different host preferences show variation in attachment performance? This would broaden the relevance of the findings.

      (6) In video 4, could the authors clarify whether the observed maxillary vibrations are hypothesized to reduce penetration resistance or serve another function?

    4. Reviewer #3 (Public review):

      Summary:

      Human and animal trypanosomiasis are fatal illnesses caused by African trypanosomes transmitted by tsetse flies during a bloodmeal. Thus, tsetse fly feeding is the key physical step in disease transmission to mammals. Tsetse fly feeding is not a new story, but it is revisited here through the application of sophisticated imaging techniques and novel biomechanical methods of analysis. The authors aim to provide a high-resolution picture of the structures and forces involved in feeding to provide mechanistic insights into the process of feeding, from attachment, penetration, drinking and retraction of the feeding parts.

      Largely, the authors have achieved their aims. They (i) examine the structures and forces involved in attachment; (ii) they provide detailed multi image analysis of the proboscis providing insights into its probing ability and physical mechanism of penetration; (iii) they conduct a controlled analysis of the physical forces involved in penetration and report that they are in the low nM range, not especially strong but much higher that the mosquito bite and finally they provide a first analysis of blood uptake during feeding.

      Strengths:

      The study images the tsetse fly feeding structures in unprecedented detail, with resolution to the uM scale, in 3-D, and during feeding. The resulting images are dramatic and insightful (and beautiful and frightening!), so researchers interested in trypanosomes, tsetse flies, or blood feeding by flies in general will want to see.

      They conclude that flies attach strongly to smooth surfaces because of interactions possible via the array of acanthae of the pulvillus pad at the ends of the tarsi. The estimated attachment forces are similar in male & female flies, in the low mM range (they look impressively strong in video 1). They provide a very striking analysis of the proboscis and labellum and associated tooth structures (Figures 4 & 5). I recall many years ago observing that tsetse flies are messy feeders, and these structures, especially the rasping teeth structures on the reverse folded labial tips, explain why! This seems more like a chainsaw than a jigsaw in action, but the authors are probably correct that these structures and the probing/retraction mechanism explain many features of tsetse fly feeding and their ability to feed on a wide range of hosts with very different skin types.

      The impressive aspect of this paper is the range of imaging techniques (CLSM, SEM, uCT, FIB SEM), the quality of the images, which attests to the obvious care taken with sample preparation. The biomechanical analysis, especially the penetration analysis, is impressive. Finally, the paper is clearly written and presented; it was a very easy read and, overall, a very engaging study.

      Weaknesses:

      I suppose it could be said that the paper is a descriptive study; it doesn't really test a hypothesis, but that is not a prerequisite for sharing it. Perhaps the least convincing parts are the imaging of the flexible versus rigid parts of the structures, which is based on the amount of resilin (flexible) and chitin-protein (stiff), based on their autofluorescence. It seems odd that the joints would be less blue (stiffer) in Figure 1i, or what the blue structures correspond to in Figure 6B-D.

    1. eLife Assessment

      This useful work addresses a longstanding question of how the extant genetic code came to be selected and conserved almost universally across life. Using a mutational approach and a small set of reporters, the authors demonstrate that the mutational impact was similar for non-standard genetic codes. Considering the limitations of the approach, the data are incomplete in supporting the claim of having provided 'experimental verification of the error minimization theory'.

    2. Reviewer #1 (Public review):

      In this manuscript, the authors investigate the relationship between genetic codes and their robustness to single-point mutations. They construct ten alternative genetic codes by reassigning nine codons to Leu, Ser, or Ala, and assess mutational robustness using three reporter proteins subjected to error-prone PCR. This represents an interesting experimental approach to addressing the hypothesis that the standard genetic code is optimized for mutational robustness.

      Major comment:

      While I find the experimental design valuable, I am not fully convinced by the authors' conclusion that "alterations of the genetic code within the ranges explored in this study have no significant effect on mutational robustness". The current analysis is based on the functional output of three individual reporter proteins. Given that cellular systems involve far more complex interactions, it would be more appropriate to limit this conclusion to mutational robustness at the level of individual protein activity, rather than making broader generalizations.

      Specific comments:

      (1) tRNA modification and expression efficiency (Page 5, line 131).

      The authors attribute the observed inefficiency to the lack of chemical modifications in the tRNAs used. However, gene expression efficiency can also be strongly influenced by DNA sequence design. To better support this claim, it would be helpful to compare luciferase activity when expressed using native E. coli tRNAs. This comparison could clarify whether the observed effects are due to tRNA modification status or other sequence-dependent factors.

      (2) Discrepancy between expression level and activity (Figure S7 vs Figure S8).

      Although GAL expression levels appear similar across different genetic codes (Figure S7), their activities differ substantially (Figure S8), even in the low-mutation library. This discrepancy warrants further investigation. Possible explanations include differences in protein folding efficiency or translational error rates, as mentioned by the authors in the main text.

      To address this, the authors could analyze the protein products using mass spectrometry. If this is not feasible due to low expression levels, alternative approaches such as SDS-PAGE (e.g., with radiolabeling or Western blotting) would still provide valuable information. Additionally, comparing activity after in vitro refolding could help distinguish between folding defects and sequence-level errors. While I understand that the primary aim of this study is to compare mutational robustness across genetic codes, discussing these observations would significantly enhance the mechanistic insight of the work.

      (3) Protein expression analysis for additional reporters.

      Since protein expression levels are critical for interpreting reporter activity, similar analyses should also be performed for luciferase (Luc) and mSG in both high- and low-mutation libraries. This would ensure that differences in activity are not confounded by variations in protein abundance.

    3. Reviewer #2 (Public review):

      Summary:

      The study addresses the long-standing question in molecular biology and genetics: why has nature selected the current genetic code (SGC, or standard genetic code)? The authors have tested 'error minimization theory', one of the prevailing hypotheses to explain this. Their approach is to create a minimum genetic code (MGC) and its variants (3^9 theoretical possible codes). Using three parameters to quantify the effect of mutations (Polarity, volume, and hydropathy), they computationally test the cost of these genetic codes (3^9) by simulations. Finally, they test this cost experimentally using an in vitro translation system with 10 select genetic code variants with a range of costs (low to high). They use three randomly mutated reporter genes for this purpose - beta-galactosidase, luciferase, and mSG. They find no correlation between the cost of the genetic code and the reporters' output. Based on these observations, they suggest that error-minimization theory may not explain the current egocentric code.

      The question they are asking is very exciting, and their approach is solid. The authors are very careful in their analyses and conclusions.

      Major Concerns:

      (1) The rationale for using MGC instead of SGC: It is unclear why the authors rely on the MGC for this analysis when the central question concerns the SGC. If the goal is to evaluate whether the SGC minimizes mutational cost, a more direct approach would be to generate alternative variants of the SGC itself and compare their mutational cost distributions. At present, it is difficult to assess whether conclusions drawn from this comparison are fully relevant to the stated biological question.

      (2) The mutational cost analysis appears biologically oversimplified because all amino acid substitutions are treated equivalently. The analysis assumes that all mutations contribute equally to fitness consequences, which does not reflect biological reality. In natural proteins, the impact of an amino acid substitution depends strongly on its structural and functional context. For example, substitutions affecting catalytic residues, ligand-binding interfaces, phosphorylation sites, or other regulatory motifs can severely impair protein function even when associated changes in polarity, hydropathy, or volume are minimal. Conversely, substitutions in structurally permissive or functionally dispensable regions may have little or no measurable effect despite larger physicochemical differences. Therefore, changes in polarity, hydropathy, and volume alone do not necessarily predict functional consequences.

      (3) It is not clear why they increased the concentration of the two tRNAs in near-SGC. Have they maintained the same tRNA concentrations in experiments explained in Fig 5 for all 10 genetic codes tested?

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Miyachi and Ichihashi investigate whether the arrangement of the genetic code affects mutational robustness. Using an in vitro minimal genetic code with vacant codons, they constructed 10 non-standard genetic codes by reassigning Ala, Ser, and Leu, generating codes with replacement costs that were generally higher than those of the standard genetic code across several amino acid property measures. They then tested how random mutations affected the activity of reporter proteins translated under these altered codes. Although error minimization theory predicts that higher-cost codes should make mutations more harmful, the authors report that protein function declined to a similar extent across all codes examined, suggesting that mutational robustness remains largely unchanged within the range of genetic code alterations tested here.

      Strengths:

      This is an interesting study that investigates one of the most fundamental and intriguing questions in molecular evolution: the emergence of the genetic code, which is nearly universal across nature. The in vitro approach is a powerful aspect of the work and provides an opportunity to examine this phenomenon experimentally at a depth that has previously been inaccessible.

      Weaknesses:

      However, the authors' use of random mutation libraries has certain limitations that prevent the study from realizing its full potential to uncover the mechanisms governing the molecular evolution of the genetic code.

      Major points:

      (1) Statistical analyses are missing for several of the manuscript's main claims. This issue applies throughout the paper, including, but not limited to, Figures 1D, 2B, 4B-D, and 5B.

      (2) In Figure 2A, the authors modify the NanoLuc gene by reassigning Ala, Leu, or Ser to new codons and elegantly show that the in vitro availability of the corresponding tRNAs is important for protein function. However, the functional importance of the specific modified positions within NanoLuc is not clear. As a result, it is difficult to determine what the expected consequences of these codon changes should be, which in turn limits the interpretation of the observed changes in protein activity. To improve the interpretability of this experiment, the authors should report exactly how many codons were modified in each variant and, ideally, examine the effect of progressively increasing the number of reassigned codons.

      (3) The calculations presented in Figure 3 raise an interesting conceptual question: why does the near-standard genetic code not exhibit the lowest cost? One possible explanation is that the standard genetic code evolved under multiple competing constraints and is therefore not expected to be optimal for any single cost metric, while still achieving strong overall performance. In this context, it would be informative if the authors combined the three cost measures into a single integrated index and examined whether the near-SGC performs more favorably when all three dimensions are considered together. Such an analysis could add important depth to the study.

      (4) It is difficult to assess the consequences of the random mutations presented in Figure 4 on reporter gene function based solely on the reported "error rate/base" parameter. In particular, the x-axis in Figure 4B should be converted into the estimated number of mutations per gene. This would make the results more intuitive and would allow the reader to better evaluate the expected degree of disruption to protein function.

      (5) A central limitation of the random mutagenesis libraries used in Figure 5, which also underlie one of the manuscript's main claims, is that the exact mutations and their distribution across the reporter genes are not reported. In addition, protein activity is measured only at the level of the entire library, without directly linking individual mutations to their functional consequences. This substantially limits mechanistic interpretation. In my view, this issue can only be addressed convincingly if the authors test a set of defined variants carrying specific mutations and directly evaluate their functional effects.

      (6) Related to the previous point, in Figures 5C, 5E, and 5G, the authors present the ratio between low-mutation-rate and high-mutation-rate libraries. However, because each library contains a different collection of mutations, it is unclear what can be inferred from these comparisons. To overcome this limitation, the authors should assess the effects of altered genetic codes on specific, defined mutations rather than on heterogeneous mutation pools alone.

      (7) Along the same lines, in Figures 5C, 5E, and 5G, it is unclear why the effects of random mutations would be expected to correlate with the three calculated cost metrics, given that the positions, identities, and functional relevance of the mutations within the genes are not known. Without this information, the biological meaning of these correlations remains difficult to evaluate.

      (8) For each mutagenesis library, the number of variants, the average number of mutations per variant, and the distribution of mutation positions should be reported clearly and transparently. These details are important for evaluating the strength of the conclusions.

      (9) Because only three amino acids were manipulated in the non-standard genetic codes, it remains unclear whether these particular amino acids occupy positions in the reporter proteins that are especially important for function and therefore likely to generate strong phenotypic effects. More broadly, it is not clear whether the assay is sufficiently sensitive to detect the effects of only a subset of deleterious variants within a pooled library. This point should be addressed more explicitly.

    1. eLife Assessment

      This important study fills a major geographic and temporal gap in understanding Paleocene mammal evolution in Asia and proposes an intriguing "brawn before bite" hypothesis grounded in diverse analytical approaches. The work rests on a solid methodological base. Some limitations remain, including uncertainty introduced by pooling different tooth positions, limited dietary interpretation, and the predominantly herbivorous taxonomic focus, which narrows the ecological scope of the conclusions. However, the manuscript provides a substantially strengthened and well-supported contribution, while appropriately inviting further work to clarify dietary trends, broader ecological context, and links between dental trait evolution and environmental change.

    2. Reviewer #2 (Public review):

      Summary:

      This study uses dental traits of a large sample of Chinese mammals to tract evolutionary patterns through the Paleocene. It presents and argues for a 'brawn before bite' hypothesis -- mammals increased in body size disparity before evolving more specialized or adapted dentitions. The study makes use of an impressive array of analyses, including dental topographic, finite element, and integration analyses, which help to provide a unique insight into mammalian evolutionary patterns.

      Strengths:

      This paper helps to fill in a major gap in our knowledge of Paleocene mammal patterns in Asia, which is especially important because of the diversification of placentals at that time. The total sample of teeth is impressive and required considerable effort for scanning and analyzing. And there is a wealth of results for DTA, FEA, and integration analyses. Further, some of the results are especially interesting, such as the novel 'brawn before bite' hypothesis and the possible link between shifts in dental traits and arid environments in the Late Paleocene. Overall, I enjoyed reading the paper and I think the results will be of interest to a broad audience.

      Weaknesses:

      For the original draft of the manuscript, I had four major concerns with the study, especially related to the sampling, diet, and evidence for the 'brawn before bite' hypothesis. I still believe that the original issues that I raised may be weaknesses of the study. For example, there is still limited discussion on diets (even though the dental topographic analyses used in the study are designed for inferring diets). And I find the results a little challenging to interpret because teeth of multiple positions are included in the same samples, which seems problematic. That said, the authors have addressed each of my previous concerns and have made major revisions, including running new analyses, and thus I support the paper.

    3. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This important study fills a major geographic and temporal gap in understanding Paleocene mammal evolution in Asia and proposes an intriguing "brawn before bite" hypothesis grounded in diverse analytical approaches. However, the findings are incomplete because limitations in sampling design - such as the use of worn or damaged teeth, the pooling of different tooth positions, and the lack of independence among teeth from the same individuals - introduce uncertainties that weaken support for the reported disparity patterns. The taxonomic focus on predominantly herbivorous clades also narrows the ecological scope of the results. Clarifying methodological choices, expanding the ecological context, and tempering evolutionary interpretations would substantially strengthen the study.

      We have now thoroughly revised our manuscript in response to the editor and reviewer’s comments. In particular with regard to:

      (1) Sampling design: we clarified our methods section to indicate that we did not use worn or broken teeth in our initial analyses. We added the following sentence around line 690:

      “These tooth positions were selected from a broader examination of ~300 individual teeth from 72 specimens. We vetted the specimens and excluded 99 tooth positions (~33% of teeth initially chosen for possible inclusion) from our analyses because they either (1) were partially or completely broken at the crown, (2) were in an advanced stage of attritional wear where no cusps could be identified, or (3) possessed a combination of the two aforementioned conditions.”

      (2) Pooled versus by-tooth position analyses: we repeated the three major analyses (DTA & FEA variability through time, tooth size and variability through time, and DTA-FEA correlation through time) for individual molars (upper M1-3, lower m1-3) and select premolars (upper P3-P4 and lower p4; lower and upper p2 samples contained fewer than 5 specimens across the three time intervals, lower p3 contained only 2 specimens for the middle Paleocene, so they were excluded from the sub-partition analyses).

      For DTA & FEA variability through time (summarized as a new figure, Fig. S5, also pasted below), OPCR, DNE, and FEA trait data are supported in 78-100% of the per-tooth analyses for both the early-middle Paleocene and middle-late comparisons. By contrast, RFI and Slope data are replicated in only 22-56% of the per-tooth analyses. We qualified the main text reporting and discussion to include these sensitivity analyses so readers can assess nuances in the data when comparing pooled sample versus per-tooth analyses.

      For tooth size and variability through time (summarized in a new table, Table S3, also pasted below), we observed broad concordance in the pooled analyses and the per-tooth partitioned analyses. Different tooth positions provide strong support for different aspects of the observed trends, with the lower fourth premolar being the strongest driver of the overall trend. All of the significant trends in per-tooth analyses are in the same direction (i.e., decreasing size disparity and size mean through time) as the pooled sample. We added qualifying clarification in the text to bring attention to these refined results.

      For DTA-FEA correlation through time, we generated per-tooth correlation plots in three new figures (Figs. S9-11, only Fig. S10 shown here as an example). We observed that upper M1 patterns general reflect the trend recovered from analysis of the overall dataset, but M2 and M3 results display inconsistent DTA-FEA correlations, possibly due to small sample sizes. Lower molar patterns generally replicate those recovered in the overall analyses, but lower M1 and M2 signals appear to be stronger than those for lower M3. Finally, low sample sizes make premolar correlations unstable, with general pattern showing EP-MP strengthening then MP-LP stasis or weakening. Given these findings, it appears that the results in the pooled sample correlation plots are mainly driven by lower molar signals. It is not possible to conclude the other tooth position display different patterns because of the limited sample sizes.

      (3) Ecological scope of the study: although carnivorans and mesonychids are recorded from some of the time intervals examined in this study, our sampling choice of pantodonts and anagalids reflects the high abundance of available dental specimens in those clades, permitting us to make the strongest statistical inference given the incomplete fossil record. Additionally, all sampled taxa come from archaic clades that have not been determined to be specifically herbivorous; we included an additional paragraph in the introduction to explain this:

      “A major challenge with expanding analyses of post K-Pg recovery to Paleocene mammal assemblages elsewhere in the world is the generally stratigraphically limited nature of early Cenozoic sequences. In Asia, Paleocene localities in China represent the best studied to date[11]. From the earliest Paleocene, highly regional and endemic faunas are known from a handful of sedimentary basins (Fig. S1A). Among the faunal elements, only the archaic clades Anagalida and Pantodonta are consistently sampled across the major subdivisions of the Paleocene[11]. An additional complication with ecomorphological analysis of these early mammals is the uncertainty in their dietary ecology, as they are beyond the reach of conventional phylogenetic bracketing approaches to dietary reconstruction. Phenomic analysis of the placental radiation supports insectivory as the ancestral diet of the hypothetical placental ancestor, but uncertainty in the post K-Pg availability of insects and plants in some regions leave some doubt as to the accuracy of this ancestral state reconstruction[1]. Herein we treat the archaic Paleocene taxa in our analyses as having generalized diets rather than categorizing them as insectivores, herbivores, or carnivores.”

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work provides valuable new insights into the Paleocene Asian mammal recovery and diversification dynamics during the first ten million years post-dinosaur extinction. Studies that have examined the mammalian recovery and diversification post-dinosaur extinction have primarily focused on the North American mammal fossil record, and it's unclear if patterns documented in North America are characteristic of global patterns. This study examines dietary metrics of Paleocene Asian mammals and found that there is a body size disparity increase before dietary niche expansion and that dietary metrics track climatic and paleobotanical trends of Asia during the first 10 million years after the dinosaur extinction.

      Strengths:

      The Asian Paleocene mammal fossil record is greatly understudied, and this work begins to fill important gaps. In particular, the use of interdisciplinary data (i.e., climatic and paleobotanical) is really interesting in conjunction with observed dietary metric trends.

      Weaknesses:

      While this work has the potential to be exciting and contribute greatly to our understanding of mammalian evolution during the first 10 million years post-dinosaur extinction, the major weakness is in the dental topographic analysis (DTA) dataset.

      There are several specimens in Figure 1 that have broken cusps, deep wear facets, and general abrasion. Thus, any values generated from DTA are not accurate and cannot be used to support their claims. Furthermore, the authors analyze all tooth positions at once, which makes this study seem comprehensive (200 individual teeth), but it's unclear what sort of noise this introduces to the study. Typically, DTA studies will analyze a singular tooth position (e.g., Pampush et al. 2018 Biol. J. Linn. Soc.), allowing for more meaningful comparisons and an understanding of what value differences mean. Even so, the dataset consists of only 48 specimens. This means that even if all the specimens were pristinely preserved and generated DTA values could be trusted, it's still only 48 specimens (representing 4 different clades) to capture patterns across 10 million years. For example, the authors note that their results show an increase in OPCR and DNE values from the middle to the late Paleocene in pantodonts. However, if a singular tooth position is analyzed, such as the lower second molar, the middle and late Paleocene partitions are only represented by a singular specimen each. With a sample size this small, it's unlikely that the authors are capturing real trends, which makes the claims of this study highly questionable.

      With regard to sampling design: we clarified our methods section to indicate that we did not use worn or broken teeth in our initial analyses. We added the following sentence around line 690:

      “These tooth positions were selected from a broader examination of ~300 individual teeth from 72 specimens. We vetted the specimens and excluded 99 tooth positions (~33% of teeth initially chosen for possible inclusion) from our analyses because they either (1) were partially or completely broken at the crown, (2) were in an advanced stage of attritional wear where no cusps could be identified, or (3) possessed a combination of the two aforementioned conditions.”

      With regard to pooled versus by-tooth position analyses: we repeated the three major analyses (DTA & FEA variability through time, tooth size and variability through time, and DTA-FEA correlation through time) for individual molars (upper M1-3, lower m1-3) and select premolars (upper P3-P4 and lower p4; lower and upper p2 samples contained fewer than 5 specimens across the three time intervals, lower p3 contained only 2 specimens for the middle Paleocene, so they were excluded from the sub-partition analyses).

      For DTA & FEA variability through time (summarized as a new figure, Fig. S5, also pasted below), OPCR, DNE, and FEA trait data are supported in 78-100% of the per-tooth analyses for both the early-middle Paleocene and middle-late comparisons. By contrast, RFI and Slope data are replicated in only 22-56% of the per-tooth analyses. We qualified the main text reporting and discussion to include these sensitivity analyses so readers can assess nuances in the data when comparing pooled sample versus per-tooth analyses.

      For the tooth size and variability through time (summarized in a new table, Table S3, also pasted below), we observed broad concordance in the pooled analyses and the per-tooth partitioned analyses. Different tooth positions provide strong support for different aspects of the observed trends, with the lower fourth premolar being the strongest driver of the overall trend. All of the significant trends in per-tooth analyses are in the same direction (i.e., decreasing size disparity and size mean through time) as the pooled sample. We added qualifying clarification in the text to bring attention to these refined results.

      For DTA-FEA correlation through time, we generated per-tooth correlation plots in three new figures (Figs. S8-10, only Fig. S9 shown here as an example). We observed that upper M1 patterns general reflect the trend recovered from analysis of the overall dataset, but M2 and M3 results display inconsistent DTA-FEA correlations, possibly due to small sample sizes. Lower molar patterns generally replicate those recovered in the overall analyses, but lower M1 and M2 signals appear to be stronger than those for lower M3. Finally, low sample sizes make premolar correlations unstable, with general pattern showing EP-MP strengthening then MP-LP stasis or weakening. Given these findings, it appears that the results in the pooled sample correlation plots are mainly driven by lower molar signals. It is not possible to conclude the other tooth position display different patterns because of the limited sample sizes.

      Reviewer #2 (Public review):

      Summary:

      This study uses dental traits of a large sample of Chinese mammals to track evolutionary patterns through the Paleocene. It presents and argues for a 'brawn before bite' hypothesis - mammals increased in body size disparity before evolving more specialized or adapted dentitions. The study makes use of an impressive array of analyses, including dental topographic, finite element, and integration analyses, which help to provide a unique insight into mammalian evolutionary patterns.

      Strengths:

      This paper helps to fill in a major gap in our knowledge of Paleocene mammal patterns in Asia, which is especially important because of the diversification of placentals at that time. The total sample of teeth is impressive and required considerable effort for scanning and analyzing. And there is a wealth of results for DTA, FEA, and integration analyses. Further, some of the results are especially interesting, such as the novel 'brawn before bite' hypothesis and the possible link between shifts in dental traits and arid environments in the Late Paleocene. Overall, I enjoyed reading the paper, and I think the results will be of interest to a broad audience.

      Weaknesses:

      I have four major concerns with the study, especially related to the sampling of teeth and taxa, that I discuss in more detail below. Due to these issues, I believe that the study is incomplete in its support of the 'brawn before bite' hypothesis. Although my concerns are significant, many of them can be addressed with some simple updates/revisions to analyses or text, and I try to provide constructive advice throughout my review.

      (1) If I understand correctly, teeth of different tooth positions (e.g., premolars and molars), and those from the same specimen, are lumped into the same analyses. And unless I missed it, no justification is given for these methodological choices (besides testing for differences in proportions of tooth positions per time bin; L902). I think this creates some major statistical concerns. For example, DTA values for premolars and molars aren't directly comparable (I don't think?) because they have different functions (e.g., greater grinding function for molars). My recommendation is to perform different disparity-through-time analyses for each tooth position, assuming the sample sizes are big enough per time bin. Or, if the authors maintain their current methods/results, they should provide justification in the main text for that choice.

      With regard to pooled versus by-tooth position analyses: we repeated the three major analyses (DTA & FEA variability through time, tooth size and variability through time, and DTA-FEA correlation through time) for individual molars (upper M1-3, lower m1-3) and select premolars (upper P3-P4 and lower p4; lower and upper p2 samples contained fewer than 5 specimens across the three time intervals, lower p3 contained only 2 specimens for the middle Paleocene, so they were excluded from the sub-partition analyses).

      For DTA & FEA variability through time (summarized as a new figure, Fig. S5, also pasted below), OPCR, DNE, and FEA trait data are supported in 78-100% of the per-tooth analyses for both the early-middle Paleocene and middle-late comparisons. By contrast, RFI and Slope data are replicated in only 22-56% of the per-tooth analyses. We qualified the main text reporting and discussion to include these sensitivity analyses so readers can assess nuances in the data when comparing pooled sample versus per-tooth analyses.

      For the tooth size and variability through time (summarized in a new table, Table S3, also pasted below), we observed broad concordance in the pooled analyses and the per-tooth partitioned analyses. Different tooth positions provide strong support for different aspects of the observed trends, with the lower fourth premolar being the strongest driver of the overall trend. All of the significant trends in per-tooth analyses are in the same direction (i.e., decreasing size disparity and size mean through time) as the pooled sample. We added qualifying clarification in the text to bring attention to these refined results.

      For DTA-FEA correlation through time, we generated per-tooth correlation plots in three new figures (Figs. S8-10, only Fig. S9 shown here as an example). We observed that upper M1 patterns general reflect the trend recovered from analysis of the overall dataset, but M2 and M3 results display inconsistent DTA-FEA correlations, possibly due to small sample sizes. Lower molar patterns generally replicate those recovered in the overall analyses, but lower M1 and M2 signals appear to be stronger than those for lower M3. Finally, low sample sizes make premolar correlations unstable, with general pattern showing EP-MP strengthening then MP-LP stasis or weakening. Given these findings, it appears that the results in the pooled sample correlation plots are mainly driven by lower molar signals. It is not possible to conclude the other tooth position display different patterns because of the limited sample sizes.

      Also, I think lumping teeth from the same specimen into your analyses creates a major statistical concern because the observations aren't independent. In other words, the teeth of the same individual should have relatively similar DTA values, which can greatly bias your results. This is essentially the same issue as phylogenetic non-independence, but taken to a much greater extreme.

      It seems like it'd be much more appropriate to perform specimen-level analyses (e.g., Wilson 2013) or species-level analyses (e.g., Grossnickle & Newham 2016) and report those results in the main text. If the authors believe that their methods are justified, then they should explain this in the text.

      Based on the per-tooth partition analyses we performed and reported above, the results now show that the overall trends described in the previous draft of the study is a composite of signals from different regions of the dentition. For example, the OPCR, DNE, and FEA trends persist across most tooth positions, whereas the Slope and RFI trends are mainly driven by lower fourth premolar patterns. The tooth size results are also mainly driven by lower fourth premolar patterns, but tooth disparity trends are broadly supported across tooth positions. These observations indicate that the overall trends remain valid, but there are nuances as to which tooth positions are driving which components of the trends. As such, we deem the overall results to be valid, and focused our revision on providing the nuances so readers can assess through-time patterns in more detail than in the previous version of the study.

      (2) Maybe I misunderstood, but it sounds like the sampling is almost exclusively clades that are primarily herbivorous/omnivorous (Pantodonta, Arctostylopida, Anagalida, and maybe Tillodonta), which means that the full ecomorphological diversity of the time bins is not being sampled (e.g., insectivores aren't fully sampled). Similarly, the authors say that they "focused sampling" on those major clades and "Additional data were collected on other clades ... opportunistically" (L628). If they favored sampling of specific clades, then doesn't that also bias their results?

      If the study is primarily focused on a few herbivorous clades, then the Introduction should be reframed to reflect this. You could explain that you're specifically tracking herbivore patterns after the K-Pg.

      We appreciate the reviewer’s suggestion that our sampling may have focused on putative herbivorous clades more than others. However, at the early stage of placental evolution during the Paleocene, and in particular among the endemic forms we studied from south China, it is unclear to us that such clearcut ecomorphological categories were present amongst the fossil mammals. Thus, we take a more agnostic approach and do not define the dietary categories of the sample taxa (and by extension, those of the unsampled taxa). Although we recognize that representatives of certain clades, such as Carnivora, may be more reasonably interpreted as carnivores/insectivores/omnivores and, in the current context, remains unsampled, we point out the fact that including tooth samples from rare taxa such as carnivores likely would have biased the analyses temporally. Chinese Paleocene carnivores are known only from one of the three time intervals analyzed (representing only a handful of specimens), and so would potentially inflate the disparity in that time interval relative to the others (if dentitions specialized for carnivory is assumed to be present in the Paleocene). To clarify this point, we added a paragraph in the introduction:

      “A major challenge with expanding analyses of post K-Pg recovery to Paleocene mammal assemblages elsewhere in the world is the generally stratigraphically limited nature of early Cenozoic sequences. In Asia, Paleocene localities in China represent the best studied to date[11]. From the earliest Paleocene, highly regional and endemic faunas are known from a handful of sedimentary basins (Fig. S1A). Among the faunal elements, only the archaic clades Anagalida and Pantodonta are consistently sampled across the major subdivisions of the Paleocene[11]. An additional complication with ecomorphological analysis of these early mammals is the uncertainty in their dietary ecology, as they are beyond the reach of conventional phylogenetic bracketing approaches to dietary reconstruction. Phenomic analysis of the placental radiation supports insectivory as the ancestral diet of the hypothetical placental ancestor, but uncertainty in the post K-Pg availability of insects and plants in some regions leave some doubt as to the accuracy of this ancestral state reconstruction[1]. Herein we treat the archaic Paleocene taxa in our analyses as having generalized diets rather than categorizing them as insectivores, herbivores, or carnivores.”

      (3) There are a lot of topics lacking background information, which makes the paper challenging to read for non-experts. Maybe the authors are hindered by a short word limit. But if they can expand their main text, then I strongly recommend the following:

      a) The authors should discuss diets. Much of the data are diet correlates (DTA values), but diets are almost never mentioned, except in the Methods. For example, the authors say: "An overall shift towards increased dental topographic trait magnitudes ..." (L137). Does that mean there was a shift toward increased herbivory? If so, why not mention the dietary shift? And if most of the sampled taxa are herbivores (see above comment), then shouldn't herbivory be a focal point of the paper?

      We edited the introduction to say that “We used dental topographical traits as indicators of ecomorphological diversity[28] and examined temporal shifts in tooth crown complexity, curvature, and height and their association with tooth performance in terms of deformation resistance using topographic and simulation analyses.” And also added the following to the methods section, in order to clarify that we are using DTA as a general ecomorphological proxy, and not a direct dietary proxy.

      “Overall, we use these DTA traits as indicators of ecomorphological capacity, but do not link them explicitly to dietary categories. The craniodental morphology of archaic placental clades in general have not been demonstrated to share the same structure-function linkages as crown mammals, so the aforementioned linkages between DTA and dietary ecology in extant species only serve as evidence that DTA is a potentially useful ecomorphological proxy, without the application of those DTA-diet relationships to the Paleocene fossil mammal dataset.”

      b) The authors should expand on "we used dentitions as ecological indicators" (L75). For non-experts, how/why are dentitions linked to ecology? And, again, why not mention diet? A strong link between tooth shape and diet is a critical assumption here (and one I'm sure that all mammalogists agree with), but the authors don't provide justification (at least in the Introduction) for that assumption. Many relevant papers cited later in the Methods could be cited in the Introduction (e.g., Evans et al. 2007).

      We added the following sentence to clarify our usage of tooth crowns as ecomorphological proxies: “Teeth are among the most well-preserved parts of fossil mammals, and the fact that they interface directly with the environment through mastication makes them suitable elements for studying potential ecology-morphology linkages.”

      c) Include a better introduction of the sample, such as explicitly stating that your sample only includes placentals (assuming that's the case) and is focused on three major clades. Are non-placentals like multituberculates or stem placentals/eutherians found at Chinese Paleocene fossil localities and not sampled in the study, or are they absent in the sampled area?

      We modified the following sentence to indicate our sampling focus on placentals: “Our analyses focused on placental mammals from three of the most fossiliferous and biogeographically isolated Paleocene sedimentary sequences in paleotropical Asia: The Nanxiong, Qianshan, and Chijiang Basins in present-day south China 23–27 (Fig. S1)”

      d) The way in which "integration" is being used should be defined. That is a loaded term which has been defined in different ways. I also recommend providing more explanation on the integration analyses and what the results mean.

      If the authors don't have space to expand the main text, then they should at least expand on the topics in the supplement, with appropriate citations to the supplement in the main text.

      We replaced all mentions of “integration” with “covariation” to avoid using the loaded terminology. Covariation more accurately reflects the correlation between two sets of traits (DTA vs FEA) without invoking developmental mechanisms implied by modularity/integration.

      (4) Finally, I'm not convinced that the results fully support the 'brawn before bite' hypothesis. I like the hypothesis. However, the 'brawn before ...' part of the hypothesis assumes that body size disparity (L63) increased first, and I don't think that pattern is ever shown. First, body size disparity is never reported or plotted (at least that I could find) - the authors just show the violin plots of the body sizes (Figures 1B, S6A). Second, the authors don't show evidence of an actual increase in body size disparity. Instead, they seem to assume that there was a rapid diversification in the earliest Paleocene, and thus the early Paleocene bin has already "reached maximum saturation" (L148). But what if the body size disparity in the latest Cretaceous was the same as that in the Paleocene? (Although that's unlikely, note that papers like Clauset & Redner 2009 and Grossnickle & Newham 2016 found evidence of greater body size disparity in the latest Cretaceous than is commonly recognized.) Similarly, what if body size disparity increased rapidly in the Eocene? Wouldn't that suggest a 'BITE before brawn' hypothesis? So, without showing when an increase in body size diversity occurred, I don't think that the authors can make a strong argument for 'brawn before [insert any trait]".

      Although it's probably well beyond the scope of the study to add Cretaceous or Eocene data, the authors could at least review literature on body size patterns during those times to provide greater evidence for an earliest Paleocene increase in size disparity.

      We added a sentence in the discussion of body size during the Paleocene to note that the largest late Cretaceous fossil mammals in China are shrew- to gopher-sized, whereas the largest early Paleocene Chinese Endemic Pantodonts are dog-sized:

      “Dog-sized CEPs such as Bemalambda reached sizes not seen in late Cretaceous mammals from China such as Zhangolestes and Kryptobaatar, which are shrew- to gopher-sized [Meng 2014]”

      Reference: Meng, J. (2014). Mesozoic mammals of China: implications for phylogeny and early evolution of mammals. Natl. Sci. Rev. 1, 521–542. 10.1093/nsr/nwu070.

      Furthermore, we tempered our discussion to restrict the “brawn before bite” hypothesis to post K-Pg recovery in the Paleocene. Body size patterns shifted in the Eocene as crown clades replaced the archaic endemic clades analyzed in our study, and much larger taxa began to appear after the PETM. Such body size shift patterns are based on different clades and likely different dynamics compared to the 10-million year interval examined in our study, so we refrain from commenting on post-Paleocene times.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In regard to the DTA dataset: Was there a method used to 'fix' these teeth before dental topographic analyses were implemented? If so, this should be explicitly stated. If not, the authors should explain why broken, worn, or abraded teeth were used.

      We excluded the incomplete teeth from our analyses. We added the following sentence for clarification: “These tooth positions were selected from a broader examination of ~300 individual teeth from 72 specimens. We vetted the specimens and excluded 99 tooth positions (~33% of teeth initially chosen for possible inclusion) from our analyses because they either (1) were partially or completely broken at the crown, (2) were in an advanced stage of attritional wear where no cusps could be identified, or (3) possessed a combination of the two aforementioned conditions.”

      (2) The authors should explicitly explain why all tooth positions were analyzed together. Again, this is not something that is typically done, and some explanation would be helpful for readers.

      We added a paragraph in the methods section to explain both our pooled sampling approach, as well as the per-tooth analyses added in this revised manuscript:

      “Given the rarity of Paleocene fossil material from China, we combined data from different tooth positions into three pooled samples, one for each of the time intervals examined (early, middle, late Paleocene). We treated the pooled samples as representative of the range of dental topographic features and bite performance traits available to the mammal taxa under study. In this way, the variance estimates are interpreted as measures of the morphological and performance heterogeneity present in each time interval dataset. To further tease out the possibility of specific tooth positions driving the overall trends observed in the pooled samples, we also performed the DTA, FEA, DTA-FEA correlation, and tooth size through-time analyses using per-tooth data partitions.”

      (3) I think the authors should hedge their claims a bit more and recognize the limitations of their study (e.g., sample size and tooth preservation).

      We thank the reviewer for raising this important point. We carefully read through the main text and further tempered our interpretations based on the limitations of our data. Additionally, we added a paragraph in the supplemental text to summarize the major sources of uncertainty in the sample:

      “Sample and methodological limitations

      The highly fragmentary nature of early Cenozoic mammal fossils in Asia means that even the best preserved faunas studied herein contain much missing information. First, the absence of a high-resolution chronological framework prevents the fossil data from being analyzed on a continuous time axis; the binning of the samples into three main intervals within a 10-million-year period hinders additional hypotheses about the environmental and climatic correlations of the dental structure-performance results presented. Second, the uneven sampling of the available mammalian assemblage throughout the Paleocene sites in China limits the breadth of ecomorphological categories included in the analyses; rarer taxa representing more specialized carnivore, insectivore, or herbivore forms were not included in our sampling. Third, the spatial discontinuity of stratigraphically younger (Eocene) and older (Cretaceous) mammal assemblages means that body size and ecomorphological shifts bracketing the Paleocene cannot currently be analyzed alongside the dataset presented. These limitations should be taken into account when considering the interpretations made in the main text.”

      Reviewer #2 (Recommendations for the authors):

      I'm including my Line Comments here as recommendations for the authors. But note that many of my recommendations are also in my Public Review.

      L22: "3% of sites"? Do you mean 3% of global sites?

      Yes, we revised the sentence to indicate 3% of global sites. Thank you for this suggestion.

      L35: This is nitpicky because it's not crucial to your study, but I can't help but point out that the Long Fuse, etc, hypotheses are specifically about the DIVERGENCE TIMES for Placentalia and major subclades, NOT the 'adaptive radiation' of placentals like you imply in your text. Adaptive radiations include ecomorphological diversification and are driven by ecological opportunity (e.g., Schluter 2000). (Emphasis on 'ecological.') The long fuse, short fuse, and explosive models do not include an ecological component - i.e., the diversifications could have occurred without ecological diversification. Instead, for hypotheses that are specifically on the adaptive/ecological radiation of mammals, see the Early Rise, Suppression (or Dinosaur Incumbency; Benevento et al. 2023 Palaeontology), and Late Rise hypotheses (Grossnickle et al. 2019 TREE). These hypotheses apply broadly to all mammals, not just placentals (see Box 1's figure in Grossnickle et al. 2019), but they can still be applied to mammalian subclades like eutherians/placentals (e.g., see Thomas Halliday papers).

      Thank you for helping to clarify the adaptive radiation vs. divergence time concepts. We edited this sentence to mention the adaptive radiation hypotheses instead, adding in the references provided by the reviewer.

      L39-40: I think your comment is probably accurate. But keep in mind that advocates of the Early Rise and Delayed Rise hypotheses (see citations within Grossnickle et al. 2019) might argue that other time periods, other than the Paleocene, are equally or more important.

      We added a reference to Grossnickle et al. 2019 to bring attention to potential arguments otherwise. Thank you for the suggestion.

      L48: I think the inclusion of "at higher latitudes" is a little distracting or misleading and should be erased. It implies that the taxonomic diversification was ONLY rapid at higher latitudes. But many of the references that you cite include analyses at the global or continental scale (e.g., Alroy 1999, Grossnickle & Newham 2016) and don't distinguish patterns at different latitudes. If you want to keep the point about latitudes, then I recommend inserting a separate sentence on that point.

      We removed “at higher latitudes”.

      L50: Isn't "stem lineages and those with no living relatives" somewhat redundant? Or do you mean something like "stem placental/eutherian lineages and extinct placental subgroups"?

      Yes, we adopted the suggested phrasing. Thank you.

      L53: I recommend starting a new paragraph around here (maybe starting with "Distinct from ...") that focuses specifically on introducing the 'brawn before [ecomorphological trait]' hypothesis.

      Done.

      L56: "large herbivores and their predators"? Are you just referring to mammals? Wilson (2013), which you cite, and Grossnickle & Newham (2016) argued that dietary specialists were targeted at the K-Pg, but none of the herbivores were "large" (at least relative to Cenozoic herbivores). And most faunivorous mammals at the time were probably insectivorous and not preying on herbivorous mammals, besides maybe a few outlying taxa (e.g., Altacreodus, Nanocuris). I'd revise your sentence for clarity.

      We removed “disproportionately impacting large herbivores and their predators” for clarity.

      L63: I'd replace "ecometric" with "ecomorphological". Ecometrics commonly refers to using fossil traits to infer paleo environments/climate (e.g., see papers by David Polly, Michelle Lawing, etc), which I don't think is what you're referring to here. (E.g., I don't think that brain size or jaw shape patterns were/are used to infer paleo environments.)

      Revised. Thank you.

      L85: I strongly advise against making conclusions like this: "Dental height and sharpness variability ... [spiked] in the middle Paleocene corresponding to a short-lived negative excursion in global temperature." That implies that the change in dentitions is linked to global temperature changes, which I don't think your results support. Later in the text you highlight the temporal uncertainty of your time bin ages (L650) and say that the middle Paleocene bin could be as old as ~62 Ma (L646), which is well before the negative excursion (and looks to be more in line with a positive excursion!), at least according to the Figure 1 time scale (see comment below). So, I don't think that your results even support your statement.

      We reworded this sentence to say “Dental height and sharpness variability were low in the beginning and end of the time interval, with a peak in the middle Paleocene. This pattern is observed both when dentitions are considered holistically and by tooth position in the lower dentition (Fig. S5; upper teeth display the opposite pattern).”

      L144: Using variance for disparity seems fine. But keep in mind that other disparity metrics, such as range (or sum-of-ranges for multivariate data), might produce different results. For instance, variance of RFI and Slope spike in the middle Paleocene, like you point out, but based on the values in Figure 1A, it looks like the ranges stay relatively constant through the Paleocene (although I realize that the ranges might change with bootstrapping). So, your choice of disparity metric might have a big influence on your conclusions. Alternatively, you could calculate disparity using multiple metrics (e.g., Brusatte et al. 2012 Nature Communications; Grossnickle & Newham 2016 supplemental analyses), even if it's just for supplemental analyses.

      Thank you for bringing the choice of disparity measures to our attention. We conducted a parallel set of bootstrapped disparity calculation and comparison analyses using range lengths (maximum trait value – minimum trait value for a given trait) and summarized the through-time trends as for variance-based results (Fig. S5). Overall, very similar trends are observed, providing support for the variance-based data interpretation presented in the main text. We added explanation of this additional sensitivity testing both in the main text and in the supplemental text.

      L147: "body size disparity ... (Fig. 1B, S6A, Table 1, Data S5)." But I don't see disparity calculated or plotted in any of the figures/tables that you cite. You test for differences in disparity between time bins (Table 1), but that doesn't provide the actual disparity patterns.

      We generated a new figure (Fig. S8) to show the tooth size variance and range levels across time and data partitions, and modified this sentence to say that “Over the same time interval examined, body size disparity and mean were higher in the early Paleocene than in subsequent time intervals (Fig. S8, Table S3; also supported by premolar 4 and upper molar partition analyses), indicating that substantial increases in the disparity of dental complexity, curvature, and height lagged behind maximum size disparity tooth size during the Paleocene.”

      L151-153: Maybe. But you're basing this on a much narrower temporal range (Paleocene) than the brain and jaw studies, and I think those studies observed big increases in brain/jaw disparity in the Eocene, which you don't sample. And as I explained elsewhere, I'm not convinced that your results strongly support the same pattern. At a minimum, I recommend tempering your conclusions to better reflect the uncertainty of your results.

      We tempered our statements here to say that “This suggests a ‘brawn before bite’ pattern in endemic Asian mammals, partially mirroring the endocranial and jaw functional morphology patterns identified in their North American and European counterparts [21,22]. These findings raise the possibility that an initial size-driven post-K-Pg recovery followed by ecomorphological radiation was a global phenomenon, even as regional tectonic events such as the initial collision of the Indian subcontinent with Asia and Deccan Traps volcanism influenced local mammal evolution.”

      L170: I'm not well-versed in integration (and modularity) studies, so maybe this reflects my ignorance, but I had trouble understanding sentences like this: "These findings indicate that form-function malleability, the coexistence of distinct topography-performance relationships in each time and taxon partition while overall integration between the two trait groups increases between time bins, was present throughout the Paleocene." If there is space, I recommend revising and/or breaking apart long, jargon-y sentences like that (throughout the paper) so that they're more digestible for readers.

      We simplified complex sentences such as the one the reviewer noted, in order to communicate our findings and interpretations more clearly. Thank you for the suggestion.

      L183: It's probably fine to assume most placental orders arose in the Paleocene based on fossil evidence. But keep in mind that molecular studies often argue that many orders arose in the Late Cretaceous.

      We revised the statement to indicate a “Cretaceous/Paleocene” origin of many modern mammal orders.

      L200-207: Again, this might just reflect my ignorance concerning integration analyses, but I recommend expanding on this text to better explain how your integration results support this conclusion. It seems really interesting, and I like the Garden of Eden hypothesis. It's just not immediately clear to me how your results support that hypothesis. A little more background on how to interpret the integration results would be helpful.

      We expanded the discussion here to say that “Such flexibility in dental form-function linkage permits ‘mix and match’ trait combinations rather than evolutionary change as a single unit, potentially enhancing the evolvability of feeding ecological traits as new environmental conditions arose [Goswami et al. 2015]”

      Reference: Goswami, A., Binder, W.J., Meachen, J., and O’Keefe, F.R. (2015). The fossil record of phenotypic integration and modularity: A deep-time perspective on developmental and evolutionary dynamics. Proc. Natl. Acad. Sci. 112, 4891–4896. 10.1073/pnas.1403667112.

      L218: "reached maximum tooth size disparity early". Again, I don't see size disparity plotted or reported. And without baseline comparisons (Late K or Eocene), it's hard to interpret your results and evaluate what 'maximum' means (Figure 1B).

      We revised the sentence to now say “In response, Paleocene mammal clades in south China between dental topography and bite performance later, all the while maintaining high levels of variability in dental complexity and convexity (Fig. 1).”

      Figure 1A: The time scale in the top left of the figure looks off. Shouldn't the K-Pg be at 66 Ma (not 65 Ma) and the P-E boundary at 56 Ma (not ~54 or 55)?

      We revised Fig. 1 to fix the time scale so that K-Pg is at 65.5 Ma and the P-E boundary at 56 Ma. Thank you for catching this.

      Figure 1A: Is there a different y-axis scale for the variance (red line) results?

      Yes, the y axes for the variance curves were missing. We added them back in. Thank you.

      L628-629: As I explained above, it feels like you focused your sampling just on herbivorous/omnivorous groups, and, if true, this is an important point that should be discussed at the forefront of the paper. Does your sample truly represent the total ecological diversity of the mammalian faunas at the time?

      We agree with the reviewer about the potential partial sampling of the range of ecomorphological diversity when only the most abundant clades are included in the analyses. However, we refrain from interpreting the dietary groupings represented in the dataset using an assumption of functional morphology from crown/extant clades. We added a paragraph in the introduction to bring attention to the inherent uncertainty in the ecological diversity of the dataset:

      “A major challenge with expanding analyses of post K-Pg recovery to Paleocene mammal assemblages elsewhere in the world is the stratigraphically limited nature of early Cenozoic sequences that produce fossil mammals. In Asia, Paleocene localities in China represent the best studied to date 11. From the earliest Paleocene, highly regional and endemic faunas are known from a handful of sedimentary basins (Fig. S1A). Among the faunal elements, only the archaic placental clades Anagalida and Pantodonta are consistently sampled across the major subdivisions of the Paleocene 11. An additional complication with ecomorphological analysis of these early mammals is the uncertainty in their dietary ecology, as they are beyond the reach of conventional phylogenetic bracketing approaches to dietary reconstruction. Phenomic analysis of the placental radiation supports insectivory as the ancestral diet of the hypothetical placental ancestor, but uncertainty in the post K-Pg availability of insects and plants in some regions leave some doubt as to the accuracy of this ancestral state reconstruction 1. Herein we treat the archaic Paleocene taxa in our analyses as having uncharacterized diets rather than categorizing them as insectivores, herbivores, or carnivores. “

      L653: Sorry if this is mentioned elsewhere, but did you avoid using teeth with especially worn or broken cusps? You might expand on how you chose teeth for your sample.

      We left out this detail in the original submission. Thank you for pointing this out. We had to exclude a third of the teeth because they were too worn or broken. We added the following explanation to the methods section:

      “These tooth positions were selected from a broader examination of ~300 individual teeth from 72 specimens. We vetted the specimens and excluded 99 tooth positions (~33% of teeth initially chosen for possible inclusion) from our analyses because they either (1) were partially or completely broken at the crown, (2) were in an advanced stage of attritional wear where no cusps could be identified, or (3) possessed a combination of the two aforementioned conditions.”

      L654: "specimens" should be "teeth", correct? In the preceding sentence, you say that there are 200 teeth from only 48 specimens.

      Corrected.

    1. eLife Assessment

      This important study links allelic expression imbalance with replication timing, suggesting a stochastic model for haploinsufficiency in dosage-sensitive disease. The integration of allele-specific RNA-seq and replication timing in clonal systems provides solid evidence for an association between asynchronous replication and allelic imbalance, although the scope and generality should be addressed in future work. This study will interest epigeneticists and genome regulation researchers studying replication timing and monoallelic expression, as well as developmental biologists and human geneticists concerned with clonal heterogeneity, haploinsufficiency, and variable disease penetrance.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #2 (Public review):

      Summary:

      The authors pair analysis of replication timing and allele-specific expression in clonal populations of primary human cells. They combine these data with previously published data on clones from transformed human cell lines. They identify a number of genomic regions that display asynchronous replication timing in at least one clone and correlate these regions with allele-specific expression of genes within them. They also observe that several interesting gene sets, including genes that are associated with human diseases, map to asynchronously replicating regions. This is a good experimental approach that builds on already published data demonstrating the connection between allelic imbalance and replication timing.

      - This is a research topic that touches on a few sub-fields of biology, and thus to make the paper more approachable we would recommend a careful edit of the text for clarity and precision of language.

      - Authors point out that this is a decades-old field; we would suggest to use terminology established within the field is possible. Allelic imbalance has been referred to as AI, MAE (monoallelic expression), RMAE (random monoallelic expression) etc. The paper whose mouse data the authors make use of uses Asynchronous Stochastic Replication Timing (ASRT) instead of VERT to refer to the same phenomenon.

      - Methods do not provide fully sufficient detail to fully evaluate or reproduce these experiments.

      - It is helpful to show representative loci as the authors do in Fig 1F and G and Fig 2 but these panels are very densely rendered and thus difficult to process visually - even the cartoon version (1D) is thick with overlapping lines. The point that allelic imbalance is enriched in VERTs would be enhanced if the authors could present the allelic ratio for all genes found in all VERTs, demonstrating how replication timing on either chromosome affects the allelic ratio.

      - The authors make the important point that VERTs are unlikely to be shared among different cell types and tissues (Fig 1i), but then find an enrichment for neuronal and immune genes in VERT regions identified in ACPs. It follows that these same genes are unlikely to be in such regions in the tissues where they are relevant. Some of the GO terms presented are too broad to suggest any biological significance to the result, even if there is statistical significance (for example, the top term for LCL clones 'Cytoplasm' is associated with 12,000 genes, and the second term for mouse clones 'Membrane' is associated with 10,000). It would be helpful to focus on GO terms lower in the GO hierarchy.

      - Figure 3 highlights the association of related gene clusters with VERTs but the VERTs are assigned based on variable replication timing in just 1 or 2 clones. This is an interesting observation, but to make the point that "VERT regions frequently coincide with gene clusters in the human genome" there needs to be a systematic assessment of replication timing at all gene clusters across all clones, and a statistical test for significance.

      - It is an interesting hypothesis that VERTs are conserved between species at syntenic loci. If such regions are really conserved, one would expect that replication timing at these sites would be consistently asynchronous. However the data presented shows that in human clones these VERTs can be specific to an individual donor (as in 5A) or an individual clone (as in 5H).

      - The finding that VERTs coincide with neurodevelopmental disease genes in immune and cartilage cells is at odds with the previous statements and data about the tissue specificity of VERTs. In order to support the claim that neurodevelopmental disease associated genes reside in asynchronously replicating regions, and are thus more prone to allelic imbalance, it would be helpful if the authors demonstrated this phenomenon in neuronal cells.

      - The authors consistently lean on sparse samples (i.e. a single clone) within a modestly sized dataset (4 clones from 2 donors each) to propose a new model for haploinsufficiency in human disease. It may well be but the consistent focus on limited elements in the data and perhaps an overreach in the interpretation makes it difficult to appreciate the very good experiments presented.

      - This section refers to the revised version of the paper.

      We would like to thank the authors for the changes and explanations offered. Although we don't fully agree with a few answers offered, overall the answers and changes in the manuscript have significantly improved the work presented. As such it should be of interest to many readers.

    3. Author response:

      The following is the authors’ response to the original reviews

      General Statements

      We thank the reviewers for their thoughtful and constructive comments, which substantially improved our manuscript. In response, we have revised the text and figures throughout to address the points raised. Specifically, we have:

      i. Refined our definition of Inactivation/Stability Centers (I/SCs): We limit this designation to loci where both Allelic Expression Imbalance (AEI) and Variable Epigenetic Replication Timing (VERT) were detected, either in the present study or in previously published work.

      ii. Expanded methodological clarity: We provide detailed descriptions of how VERT regions were identified, annotated, and quantified, including thresholds for allelic imbalance, replication timing variability, and sampling depth. We also justify the ≥80% AEI cutoff, which is based on recently published studies showing that modest allelic biases can have biological and clinical significance.

      iii. Enhanced benchmarking and validation: In addition to the analysis of X inactivation in female ACP cells, we now include comparisons between imprinted and non-imprinted regions to benchmark the magnitude of allelic replication timing imbalance, demonstrating that the magnitude of imbalance observed at non-imprinted VERT regions is comparable to known imprinted regions.

      iv. Address tissue specificity and sampling limitations: We now discuss how the data derived from a limited number of clones, tissues, and individuals support the identification of robust AEI and VERT patterns.  In the future, additional tissues and individuals will be required to capture the full diversity of I/SC regulation.

      v. Clarify biological relevance: We have expanded our discussion to highlight the consistency of AEI findings across cell types, including examples of genes implicated in neurodevelopmental and neurodegenerative disorders, and we clarify our model of how I/SC regulation contributes to haploinsufficiency, variable expressivity, and incomplete penetrance in human disease.

      vi. Improved figures and supplemental data: We have updated figure legends for clarity, added a new supplementary figure benchmarking imprinted regions, added supplementary tables containing: the full description of our GO analysis, the list of I/SCs where we have detected both VERT and AEI, the ratios of the number of transcripts derived from early and late replicating alleles for the I/SCs illustrated in all figures, and we have cross-referenced all supplementary tables.

      Point-by-point description of the revisions

      Reviewer 1:

      The existence of VERT regions is well supported, but the number of regions called as ISCs may be inflated by permissive thresholds (e.g., AEI {greater than or equal to} 0.8 or {less than or equal to} 0.2 in a single clone). This risks conflating transient stochastic differences with stable ISCs.

      We selected the >80% (or <20%) allelic imbalance threshold, along with the requirement of at least one biallelic clone, as our criterion for significant AEI. This choice was guided by a recent study demonstrating that allelic imbalance, as low as a 65%/35%, is enough to effect disease penetrance in humans (Nature 2025; 637:1186–1197). For completeness, results obtained using more stringent thresholds (>90% and >95% imbalance) are presented in Supplementary Table 2.

      Furthermore, it is unlikely that transient stochastic differences in allelic expression, such as those detected by single-cell RNA sequencing assays (Nat. Rev. Genet. 2015; 16:653–664), would be captured by our approach. Each clone in our study was expanded from a single cell to over one million cells before both RNA-seq and Repli-seq analyses, effectively averaging out transient transcriptional and/or replication fluctuations, and thus reflecting stable, mitotically heritable epigenetic states.

      Reviewer 1:

      More robust approaches would include using magnitude of imbalance, annotating VERTs by genomic location, applying stricter thresholds for replication timing, and benchmarking AEI distributions against the X chromosome.

      All VERT regions identified in this study were annotated according to both the magnitude of allelic imbalance and their genomic coordinates, using 250 kb windows for the human samples and 50 kb windows for the mouse samples (see Supplementary Tables 1 and 6). Figure 1c directly compares the magnitude of imbalance, defined as outliers in the standard deviation, for both allelic replication timing and allelic expression across autosomal and X-linked loci in female ACP cells.

      In addition, we detected allelic replication asynchrony at 12 known imprinted loci, and the standard deviation of replication timing at these loci, measured in 250 kb windows, is comparable to that observed across the >350 VERT regions detected at non-imprinted sites. For comparisons, we have highlighted the imprinted regions with + symbols in Figures 1e, 2d, 3c, 6g, 7e, 7g, and we have highlighted the imprinted regions in Supplemental Table 1, and in the Data Source files. For additional comparisons, we have included Supplemental Figure 1 to illustrate the magnitude of replication timing imbalance and allele-specific gene expression at two autosomal imprinted regions.

      Reviewer 1:

      Figures and text would benefit from improved clarity: axis labels are missing in places (e.g., Fig. 1c, Fig. 2g), legends should explain chromosome arm colors, and cluttered figures such as Fig. 1j could be re-visualized for interpretability.

      Figure labels have been added to Figs. 1c and 2g, and legends modified for clarity.

      Reviewer 1:

      “…the claim of cell-type specificity is not convincingly demonstrated given the small sample size (n=4) and strong batch confounding between lymphoblastoid and cartilage progenitors.” And “Hierarchical clustering is confounded by batch and based on presence/absence calls that lack quantitative resolution.”

      We agree that the limited number of individuals and clones, as well as the comparison between only two distinct tissue types (LCLs and ACPs), have quantitative limitations. Our primary intent was to evaluate whether any I/SCs were shared between independently derived clonal cell lines from different tissues to determine whether there is evidence of tissue-specific I/SC usage, rather than to make quantitative claims about global cell-type specificity.

      To address this concern, we have replaced the hierarchical clustering analysis, in Figure 1i, with a Venn diagram that more directly illustrates the overlap and tissue-specific distribution of VERT regions detected in the different clonal sets. This revised representation avoids assumptions about clustering relationships and removes batch-driven bias, while still conveying the key observation that many VERT regions are shared across tissues and others appear tissue-restricted.

      Reviewer 1:

      While syntenic VERT regions across mouse and human are intriguing, they complicate interpretation of strong clustering by cell type. Sampling depth may also have exaggerated allelic imbalance calls.

      We note that the human LCLs used in our study are B cells, and immunoglobulin gene rearrangements were used to confirm the clonal uniqueness of each line. Similarly, the mouse replication timing data analyzed here was generated from pre-B cells, which also undergo immunoglobulin gene rearrangements. Thus, both the human LCL and mouse pre-B cell datasets were derived from B-cell lineages, providing a consistent cellular context for comparative analysis.

      Sequencing depth is an important consideration for all variant base calls. Without fully haplotype-resolved genomes, previous studies relied on calculating per-SNP calls of allelic imbalance based on reads covering a single nucleotide locus. To improve sequencing depth supporting the identification of VERT and AEI regions, we utilized haplotype-resolved genomes that allowed all informative allele-specific reads to be pooled across all heterozygous SNPs within genomic windows or expressed genes. For AEI, we set a minimum threshold of 20 informative allele-specific reads per gene, a minimum FDR-corrected p-value of <=0.05, and a minimum of 80% vs 20% allelic imbalance. Importantly, a recent study showed that allelic imbalance as low as a 65%/35% is clinically relevant in humans (Nature 2025; 637:1186–1197). We reiterate that more stringent thresholds (>90% and >95% imbalance) are presented in Supplementary Table 2.

      Reviewer 1:

      Gene set enrichment analysis should be restricted to avoid inflated significance from overly broad categories.

      Reviewer 2:

      Some of the GO terms presented are too broad to suggest any biological significance to the result, even if there is statistical significance (for example, the top term for LCL clones 'Cytoplasm' is associated with 12,000 genes, and the second term for mouse clones 'Membrane' is associated with 10,000). It would be helpful to focus on GO terms lower in the GO hierarchy.

      We now include our complete Gene Ontology analysis, with more specific biological categories, in Supplemental Table 5.

      Reviewer 2:

      Allelic imbalance has been referred to as AI, MAE (monoallelic expression), RMAE (random monoallelic expression) etc. The paper whose mouse data the authors make use of uses Asynchronous Stochastic Replication Timing (ASRT) instead of VERT to refer to the same phenomenon. Creating unnecessary jargon makes the paper more difficult to read and adds needless complexity to an already complex field.

      While we agree that allelic expression imbalance has been described by different investigators using many different phrases, we believe that MAE, RMAE and AI do not represent an accurate description of the phenomenon. In our study [and our previous study; Nat Commun. 2022; 13(1):6301] we used clonal analysis of allele-specific expression and found that while some clones display equivalent levels of expression between alleles of a given gene (i.e. bi-allelic expression) other clones express only one allele (i.e. mono-allelic expression), and yet other clones have undetectable expression (i.e. silent on both alleles). This pattern of allele-restricted expression indicates that each allele independently adopts either an expressed or silent state. Importantly, because these expression states are mitotically stable, allele-autonomous, and independent of parental origin, we refer to the choice of the expressed allele as stochastic. Given this variability, we believe that the phrase “Allelic Expression Imbalance” (AEI) represents a more accurate descriptor for this phenomenon. We also point out that “Allelic Expression Imbalance” has also been used by other investigators >120 times in the Pubmed database.

      In addition, the replication asynchrony that exists at these loci is not consistent with purely ASynchronous Replication Timing (ASRT) between alleles. We found that each allele can independently adopt either earlier or later replication timing in different clones. This variability results in some clones exhibiting pronounced asynchrony between alleles, while in others, the two alleles replicate synchronously, with both adopting either the earlier or later timing state. As reported in our previous study (Nat. Commun. 2022; 13:6301), this behavior reflects a stochastic and allele-autonomous process, leading us to describe these loci as exhibiting Variable Epigenetic Replication Timing (VERT), which we believe is a more accurate descriptor of this phenomenon.

      Reviewer 2:

      The point that allelic imbalance is enriched in VERTs would be enhanced if the authors could present the allelic ratio for all genes found in all VERTs, demonstrating how replication timing on either chromosome affects the allelic ratio.

      The stochastic nature of allelic expression and replication timing observed at VERT loci indicates that each allele independently acquires its epigenetic state. In addition, there are typically more than one transcription unit, both protein coding and non-coding, within each VERT region, and each transcription unit also acquires its expressed or silent state independently.  Therefore, the expressed or silent status of one allele of a transcription unit does not predict the replication timing or expression status of the same or opposite allele of any other transcription unit within the VERT region. Accordingly, the Early/Late pattern of replication timing that we detect, both in this study and in our previous work (Nat. Commun. 2022; 13:6301), is not correlated with which allele is transcriptionally active. This supports our conclusion that asynchronous replication timing is not a downstream consequence of monoallelic transcription, but rather an independent epigenetic feature of I/SCs. Regardless, because each transcription unit is independent, we provide the expression ratios for all transcripts that are generated from the VERT regions for the coding and non-coding transcription units in Figures 1, 2, and 6; shown in Supplemental Table 9. This analysis indicated that 4,017 informative reads were derived from the earlier replication allele and 3,161 informative reads were derived from the later replication allele, generating an allelic ratio of 1.3 (early/late) and a binomial P value of 1.0.

      In addition, a similar analysis of imprinted loci reveals that even at genomic regions with parent-of-origin–specific expression, the replication timing of each allele does not align with transcriptional activity, i.e. both early- and late-replicating alleles can be transcriptionally active, depending on the gene. This observation is consistent with the complex organization of many imprinted domains, where genes on opposite alleles exhibit reciprocal expression patterns. To illustrate this point, we now include Supplemental Figure 1 demonstrating that imprinted loci harbor genes expressed from both the earlier- and later-replicating alleles. In addition, quantification of the total number of transcripts at the DLK1/MEG8 imprinted locus (Supplementary Figure 1a-1c) indicates that the ratio of transcripts derived from the early versus late replicating alleles is equivalent (i.e. a ratio of 1.0; See Supplemental Table 9).

      Reviewer 2:

      Figure 3 highlights the association of related gene clusters with VERTs but the VERTs are assigned based on variable replication timing in just 1 or 2 clones. This is an interesting observation, but to make the point that "VERT regions frequently coincide with gene clusters in the human genome" there needs to be a systematic assessment of replication timing at all gene clusters across all clones, and a statistical test for significance.

      Our intent in Figure 3 was not to suggest that all gene clusters are subject to VERT and AEI, but rather to highlight that several well-characterized multigene families that are known to exhibit AEI, such as olfactory receptor, protocadherin, and HLA gene clusters, coincide with VERT regions at their genomic locations. These examples serve as representative illustrations demonstrating that I/SC-associated regulation occurs at established AEI loci organized in gene clusters.

      To clarify this point, we have revised the text to explicitly state that Figure 3 presents illustrative examples of known AEI-associated gene clusters overlapping with VERT regions, rather than a comprehensive or statistically exhaustive analysis of all gene clusters across the genome.

      Reviewer 2:

      It is an interesting hypothesis that VERTs are conserved between species at synentic loci. If such regions are really conserved, one would expect that replication timing at these sites would be consistently asynchronous. However the data presented shows that in human clones these VERTs can be specific to an individual donor (as in 5A) or an individual clone (as in 5H).

      As discussed in our Limitations Section, our analysis was restricted to a limited number of cell types, clones, and individuals, which may not capture the full diversity of I/SC usage across tissues and populations. While our dataset was sufficient to identify robust patterns of AEI and VERT, it likely represents only a subset of the broader landscape of I/SC regulation in both humans and mice. We anticipate that future studies incorporating a wider range of tissues, individuals, and clonal analyses will uncover an even greater degree of conservation and diversity in I/SC usage across genomes.

      Reviewer 2:

      In order to support the claim that neurodevelopmental disease associated genes reside in asynchronously replicating regions, and are thus more prone to allelic imbalance, the authors would need to demonstrate this phenomenon in neuronal cells.

      We make two points that address this critique: First, many of the neurodevelopmental disease genes located within or adjacent to VERT regions are not exclusively expressed in neuronal cells and have previously been shown to exhibit AEI in non-neuronal contexts. For example, Gimelbrant and Chess (Science, 2007; 318:1136–1140) demonstrated AEI of the Parkinson disease genes SNCA and LRRK2 in lymphoblastoid cell lines (LCLs), and in our previous study, we detected AEI of DNAJC6, another Parkinson disease gene, also in LCL cells (Nat. Commun. 2022; 13:6301). In the present study, using cartilage progenitor cells, we identified VERT and AEI of several epilepsy-associated genes, including SCN1A, SCN2A (Fig. 6b), GABRA1(Fig. 6e), and SAMD12 (Fig. 6j), as well as a gene implicated in autism and neurodevelopmental disorders, SEMA5A (Fig. 5c), indicating that these genes are not exclusive to neuronal cell types.

      Second, independent studies from the Dr. E. Heard laboratory have provided further evidence that AEI occurs in neuronal lineages. Using mouse neural progenitor cells (NPCs), they identified genes subject to AEI (Dev. Cell, 2014; 28:366–380) and they later evaluated AEI of syntenic human neurodevelopmental disease genes, including Snca, App, Eya4, and Grik2 (Nat. Commun. 2021; 12:5330). In addition, and consistent with our use of AEI, they used the phrase “Allelic Expression Imbalance” to describe the epigenetic expression biases at these genes.

      Together, these findings reinforce that AEI, and by extension I/SC regulation, is not restricted to specific cell types, but rather represents a generalizable mechanism of stochastic epigenetic regulation that includes genes relevant to neurodevelopment and disease.

      Reviewer 2:

      However, the authors consistently lean on thin evidence (i.e. a single clone) within a modestly sized dataset (4 clones from 2 donors each) to propose a new model for haploinsufficiency in human disease. The consistent focus on limited elements in the data and perhaps an overreach in the interpretation makes it difficult to appreciate what is in fact a very good experiment.

      We agree that our analysis was conducted on a modest number of clones and individuals, which we explicitly acknowledge as a limitation of the present study. However, several key points support the robustness and broader relevance of our conclusions:

      i. Clonal Design and Replication: The strength of our approach lies in its clonal resolution. Each clone represents a single-cell–derived population expanded to over a million cells, enabling direct detection of stable, mitotically heritable allele-specific epigenetic states that would not be apparent in population-averaged data. Importantly, many of the VERT regions we identified are shared between independent clones from different donors and across distinct cell types (ACP and LCL), demonstrating reproducibility and biological consistency.

      ii. Cross-Species Validation: We further identified syntenic VERT regions in mouse pre-B cell clones, including at loci known to exhibit AEI in prior studies, providing independent validation and evolutionary conservation of the phenomenon.

      iii. Integration with Published Evidence: Our findings extend prior observations of AEI and VERT (e.g. Gimelbrant et al. Science 2007; Heskett et al. Nat. Commun. 2022) and are fully consistent with known stochastic allelic expression imbalance of autosomal genes. We also draw parallels with the absence of cellular selection mechanisms that dictate dominant inheritance patterns for loss of function alleles for X linked disease genes (reviewed in: J Clin Invest, 2008, 20-23; and Nat Rev Genet. 2025, 26, 571–580). Our proposed model linking I/SC regulation to haploinsufficiency is therefore a synthesis of our results with an extensive body of published data, not an inference drawn from isolated observations.

      iv. Scope and Framing: We have revised the manuscript to clarify that our proposed model represents a mechanistic framework, not a definitive or exclusive explanation, for how stochastic allelic regulation could contribute to dosage-sensitive disease phenotypes. We also explicitly discuss the need for larger datasets and additional tissues to refine and test this model.

      In summary, while we recognize the limited sampling depth inherent to clonal analyses, the consistency of our observations across donors, cell types, and species, together with prior corroborating studies, supports the validity of the conclusions and justifies the broader conceptual implications.

    1. eLife Assessment

      This important study highlights how cell size influences various cellular responses, with a particular focus on ferroptosis. The evidence presented is convincing, employing multiple model systems and experimental approaches to support the conclusions. This work will be of significant interest to the fields of cell size, ferroptosis, and cancer biology.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      The study by Zatulovskiy et al. examined how cell size influences cell susceptibility to ferroptosis. The authors found a size dependence specifically for ferroptosis-inducing drug Era2, but not for other drugs. Using various human cell lines (HMEC, HT 1080, RPE 1), the authors generated populations of small and large G1 cells by FACS, CDK4/6 inhibition (palbociclib), or inducible cyclin D1 knockdown, and measured cell susceptibility to ferroptosis. Larger cells were more resistant than smaller cells. Mechanistically, larger cells showed reduced plasma membrane lipid peroxidation, higher glutathione concentrations, and changes in relevant cellular proteins levels, as analyzed using previously published data. Deleting ACSL4, which is involved in ferroptosis, partly eliminated the size dependence of ferroptosis. The work concludes that cell size is a key determinant of ferroptosis susceptibility. Overall, this work expands our understanding of how cell size is correlated with functional properties of cells, which can have implications for biomedical sciences.

      Strengths:

      The study establishes a credible link between cell size and susceptibility to ferroptosis, as induced by Era2. Experimental replication is sufficient, and key conclusions rely on data from multiple cell lines and on multiple approaches to manipulate cell size. This suggests that the conceptual findings made in this paper could reflect a more fundamental feature of mammalian cells. In addition, this work provides an interesting contrast to another recent study about size-dependency of ferroptosis (https://doi.org/10.1016/j.isci.2025.112363), where increased cell size heightened sensitivity to the GPX4 inhibitor RSL3.

      Original Weaknesses:

      Disentangling cell size effects from other confounding factors, such as the cell cycle or overall metabolic rate, is challenging, and the authors have managed to qualitatively prove that cell size influences Era2-induced ferroptosis. However, the quantitative nature of this link between cell size and susceptibility to ferroptosis remains somewhat unclear due to the confounding factors that are present in many of their experiments. Notably, the quantitative nature of this link could also be cell type and growth condition -dependent, which remain to be investigated in detail. It should also be noted that this work focused on cell culture studies, and it remains unclear how much the findings of this paper could influence therapeutic strategies in vivo.

      Comments on revised version:

      I would first like to emphasize that I find this work solid, and I think the authors have done good work with the revisions.

      My only remaining recommendation is that the authors aim to more carefully examine the magnitude of the observed cell size-dependency in ferroptosis susceptibility. Their manuscript contains several experiments where the quantitative nature of this link remains unclear due to confounding factors, such as the cell cycle. For example, in Fig 2B&C, it seems that accumulation of cells in G1 (from ~60% to ~95%) decreases ferroptosis equally to the effect caused by cell volume doubling (from day 2 to day 4 of palbo treatment), suggesting that cell cycle has a much more pronounced effect on ferroptosis than cell size (especially when considering the size change from day 0 to day 2). However, the magnitude of the cell size effect is not consistent between all experiments shown. This is not surprising, as the authors use different approaches to changing cell size and different cell lines, but it makes the work more qualitative than quantitative. Notably, another confounding factor is the cell's metabolic/biosynthetic rate. It seems reasonable to assume that prolonged palbociclib treatment will decrease metabolic and protein synthesis rates (normalized to cell size), and this could make the cells less susceptible to ferroptosis. The rapamycin treatment results shown by the authors also support this notion. One approach to examining this could be to grow cells in various growth conditions to manipulate their growth & metabolic rate.

    3. Reviewer #2 (Public review):

      Summary:

      The authors set out to understand how cell phenotypes differ depending on the size of the cell, specifically here how cell size affects cell death. Using human cell lines (HMEC, HT-1080, RPE-1), the authors examined cell size through FACS sorting, CDK4/6 inhibition and inducible cyclin D1 knockdown. They identify that larger cells are more resistant to ferroptosis induced by system xc<sup>-</sup> inhibition (erastin2), but more sensitive to GPX4 inhibition (RSL3), highlighting pathway-specific size dependencies.

      Mechanistically, larger cells exhibited:

      - Higher glutathione levels, supporting lipid peroxide detoxification

      - Increased ferritin expression, promoting iron sequestration

      - Lower ACSL4 levels, reducing incorporation of peroxidation-prone lipids

      The findings are supported by high-throughput microscopy, flow cytometry (BODIPY-C11 lipid peroxidation assays), and proteomic analyses. The study concludes that cell size influences proteome composition and metabolic capacity, thereby shaping cell death decisions, an insight with implications for aging, cancer, and ferroptosis-based therapies.

      Major Strengths:

      - use of multiple cell lines to validate their findings

      - use of multiple, complimentary approaches

      - well designed screen and experiments throughout

      - clearly written, logical flow and easy to follow

      - relevance for multiple fields

      Weaknesses:

      - Lack of in-depth mechanistic investigation

      - Experiments are all in vitro and so, as yet, it is uncertain what the in vivo consequence would be

      General Assessment:

      This study presents a mechanistic link between cell size and ferroptosis susceptibility. Using high-throughput microscopy, proteomics, and genetic perturbations across multiple human cell lines, the authors demonstrate that larger cells are more resistant to ferroptosis induced by system xc<sup>-</sup> inhibition (erastin2). This resistance is attributed to elevated glutathione production, increased ferritin-mediated iron sequestration, and reduced ACSL4-dependent lipid peroxidation. The experimental design is rigorous and multifaceted, with consistent results across cell types and size manipulation methods. While the study is limited to in vitro systems, its conceptual and mechanistic insights lay the groundwork for future in vivo and translational investigations.

      Advance:

      This work is the first to systematically show that cell size directly influences ferroptosis susceptibility via proteome scaling. It reconciles previous findings that large cells are sensitized to GPX4 inhibition (RSL3) by demonstrating that the ferroptosis pathway targeted system xc<sup>-</sup> vs GPX4 determines the direction of size-dependent vulnerability. The study provides a conceptual advance by positioning cell size as a regulatory axis in cell death decisions, and a mechanistic advance by identifying size-dependent changes in glutathione metabolism, ferritin levels, and ACSL4 expression.

      Audience:

      This research will be of interest to specialists in cell death, ferroptosis, redox biology, and cancer biology. It also holds relevance for aging researchers and translational scientists exploring ferroptosis-based therapies. The findings may influence how cell size heterogeneity is considered in therapeutic design, particularly in oncology and senescence-targeting strategies.

      Comments on revised version:

      We have no additional comments after revision. Thank you for addressing our initial queries.

    4. Reviewer #3 (Public review):

      In this manuscript, Zatulovskiy and colleagues elaborate on their previous work describing cell size-dependent changes in the proteome by investigating whether these changes can be correlated in differences in cell physiology. Using a cleverly-designed high throughput screen, they searched for compounds that differently-sized cells display differential sensitivity towards. Their primary hit, Era2, is involved in the ferroptosis pathway and serves as the starting point for a detailed study of how excess cell size protects cells from ferroptosis-induced cell death via: 1) lower concentrations of ACSL4 (which produces peroxidation-prone PUFAs), 2) increased ferritin concentrations, and 3) increased GSH concentrations.

      Overall, the experiments in this manuscript are well-designed and interpreted. It is an extremely well-written manuscript with a clear trajectory of logic.

      Comments on the revised version:

      The authors have addressed my original concerns adequately. I do not need to see it again, if there are further revisions.

    5. Author response:

      General Statements

      We were pleased to see that all three reviewers support publication after revision. No one questions the premise that cell size influences ferroptosis susceptibility. The main concerns fall into two categories: (A) disentangling “Cell size vs cell cycle”, which is the biggest issue for Reviewer #1 and partially for #3. (B) Additional mechanistic tests including SLC7A11 and ferritin functional tests (Reviewer #2) and lysosomal iron (via LysoRhoNox) and some further ACSL4 experiments (Reviewer #3). Other reviewer concerns are more minor.

      In our revision, we have addressed the reviewer’s specific criticisms with additional experiments as described below. We believe the constructive feedback from peer reviews helped us to significantly extend our mechanistic findings and strengthen the manuscript through revision.

      Point-by-point description of the revisions

      Reviewer #1:

      Summary:

      The study by Zatulovskiy et al. examined how cell size influences cell susceptibility to ferroptosis. The authors found a size dependence specifically for ferroptosis-inducing drug Era2, but not for other drugs. Using various human cell lines (HMEC, HT 1080, RPE 1), the authors generated populations of small and large G1 cells by FACS, CDK4/6 inhibition (palbociclib), or inducible cyclin D1 knockdown, and measured cell susceptibility to ferroptosis. Larger cells were more resistant than smaller cells. Mechanistically, larger cells showed reduced plasma membrane lipid peroxidation, higher glutathione concentrations, and changes in relevant cellular proteins levels, as analyzed using previously published data. Deleting ACSL4, which is involved in ferroptosis, partly eliminated the size dependence of ferroptosis. The work concludes that cell size is a key determinant of ferroptosis susceptibility.

      My major concerns about this work focus on whether many of the results reflect cell size or cell cycle effects, and whether the FACS-based size-scaling analyses have some misleading features to their design & presentation. If these concerns can be addressed with new experiments, then the conclusions of this paper are justified. If these concerns cannot be addressed, then the authors should more directly acknowledge the alternative hypothesis that cell cycle effects may explain many of their results.

      The experiments seem to be replicated sufficiently, and most conclusions rely on data from multiple cell lines. My minor comments focus on needs to provide statistics and method details, and on suggestions on how to improve text clarity, but these edits are easily done and don't require new experiments. Overall, this is an interesting study, and it should be published once the concerns below are addressed.

      Major comments:

      In experiments reported in Fig 1 and 2A, the authors sort small and large cells in G1, plate them, and later start the drug treatments & cell monitoring. Are these cells actively cycling (progressing in the cell cycle), and how fast? The large cells are likely to enter S phase earlier than the small cells, so by the time that the authors start their drug treatments, they may be comparing cells in different cell cycle stages, which could influence drug sensitivity more than cell size (as the authors also suggest later in Fig 2). This needs to be controlled for.

      Furthermore, even if the cells remain in G1 after sorting until the drug treatments are started, the authors should address the fact that the drugs are present for a long time, thus targeting the cells in various cell cycle stages.

      We agree with the reviewer that the cell cycle stage could affect ferroptosis susceptibility and could be a confounding effect in asynchronous cells. One of us (Dixon) reported the cell cycle effects on ferroptosis previously, and we observe them in this manuscript too (Fig. 2B,C,E). We now state this more clearly both in the Results and in the Discussion sections, where we write:

      Line 159: “We note that non-arrested cells had a lower susceptibility to Era2-induced ferroptosis compared to cells that were arrested in G1 for 2-3 days, despite being smaller in size. This is likely due to the difference in the fraction of cells in different cell cycle phases between arrested and non-arrested conditions since cells in S/G2/M phases are known to be more resistant to ferroptosis than cells in G0/G1 phases (Rodencal et al, 2024; Kuganesan et al, 2023)”

      Line 533: “Cells in G1 phase of the cell cycle were reported to be more susceptible to ferroptosis (Rodencal et al, 2024; Kuganesan et al, 2023), which suggested that ferroptosis inducers could be used in combination with cancer drugs, like the CDK4/6 inhibitor palbociclib, that arrest cells in G1 phase of the cell cycle (Herrera-Abreu et al, 2024). However, while CDK4/6 inhibitors arrest cells in G1, they do not inhibit cell growth, such that the longer they are arrested, the larger the cells grow (Lanz et al, 2022; Crozier et al, 2023; Manohar et al, 2023). This results in a complex, nonmonotonic ferroptotic response dynamics in cells treated with CDK4/6 inhibitors (Fig. 2B,E). Just following CDK4/6 inhibitor treatment, as more and more cells are arrested in G1 phase, cells become more sensitive to both RSL3- and erastin-induced ferroptosis (Kuganesan et al, 2023; Rodencal et al, 2024). However, the longer the cells are arrested, the larger they become, which further promotes their susceptibility to RSL3 (Fig. S1B) but reduces their susceptibility to Era2-induced ferroptosis (Fig. 2B). The fact that the cell cycle arrest and cell size increase have opposing effects on Era2-induced ferroptosis susceptibility could explain why different studies reported seemingly contradictory results, where sometimes an increased and sometimes a decreased or unchanged sensitivity to system x<sub>c</sub><sup>-</sup> inhibitors was observed depending on the cell type, duration and type of cell cycle arrest (Lee et al, 2024; Kuganesan et al, 2023; Rodencal et al, 2024). Such complex interplay between the cell cycle and cell size effects on ferroptosis suggests that combination therapies utilizing CDK4/6 inhibitors and ferroptosis inducers would have to carefully choose a dosage schedule.”

      Given the potentially confounding effects of the cell cycle in cycling cells sorted by size, we performed an additional experiment, in which RPE-1 cells were pre-treated with the CDK4/6 inhibitor palbociclib to synchronize them in G1 phase prior to treatment. These cells were then continuously exposed to palbociclib during the Era2 treatment (Fig. 2C-E). RPE-1 cells pretreated with palbociclib for 2 and 4 days had the same cell cycle distribution with 94% of cells being arrested in G1, but with different sizes. Cells treated with palbociclib for 4 days were significantly larger and more resistant to Era2.

      Additionally, in the experiment shown in Fig. 5E,F, where we FACS-sorted WT and ACSL4 KO HMEC cells by cell size, and then measured Era2 susceptibility, we pre-treated the cells with palbociclib for 24 h to synchronize them in G1 prior to the sorting. We then cultured the cells in the presence of palbociclib during the Era2 treatment to avoid the cell cycle effects observed in Fig. 2. In this case, we still observe that larger cells are more resistant to Era2, consistent with our conclusion that cell size protects against Era2-induced ferroptosis.

      Can the G1 arrest-driven changes in drug susceptibility (Fig 2 C-D) be attributed to cell size? Can the authors rescue the palbociclib treatment with rapamycin or other growth inhibitors that allow size to remain small during G1 arrest?

      We have attempted to perform these experiments, but when we co-treated the cells with palbociclib and mTORC inhibitors, but observed variable results, which are likely due to the fact that prolonged mTORC inhibition itself rewires cellular metabolism and reduces cell susceptibility to ferroptosis, as one of us (Dixon) found previously (Armenta et al. (2022), Ferroptosis inhibition by lysosome-dependent catabolism of extracellular protein. Cell Chemical Biology 29: 1588-1600.e7). Our results were consistent with this previous report and is now included in a new supporting figure panel (Fig. S3C):

      Thus, upon palbociclib+rapamycin co-treatment there seems to be a competition between cellsize-mediated and metabolism-mediated effects of mTORC inhibition on ferroptosis, which leads to variable outcomes.

      In Fig 2E-F, is the cell cycle distribution of the samples influenced by CCND1 shRNA induction? Are the drug sensitivity effects due to cell size or cell cycle changes?

      The CCND1 manipulation model is extensively characterized in our recent work cited in this manuscript (You et al. (2025), Cell size-dependent mRNA transcription drives proteome remodeling. 2025.10.30.685141 doi:10.1101/2025.10.30.685141). Indeed, CCND1 shRNA cells have a slightly elongated G1 phase due to a ~30% reduction in Cyclin D1 concentration: the G1 fraction changes from ~70% in wild-type to ~80% in CCND1 shRNA cells, which could potentially affect the ferroptosis susceptibility, but the additional results obtained on synchronized RPE-1 cells, described above (Fig. 2C-E), support the conclusion that the primary effect on Era2 sensitivity is due to cell size.

      Can the authors address the meaningfulness of the FACS-based size-scaling results in cases where cell-to-cell variability is very large? For example, in Fig 4D&G, the results are so variable even in identically sized cells that the importance of the size-scaling pattern seems questionable.

      We do observe variability in fluorescent probe-based measurements of GSH and lipid oxidation, which could be due to biological (natural cell heterogeneity) and/or technical (low sensitivity of the probes) reasons. However, when we look at binned data and compare the mean values ± s.e.m. for each bin, we observe a robust and reproducible trend (black line with dark-grey shaded area), even though the SD is quite broad (lighter shaded area). We believe such trends are meaningful when describing cell death in probabilistic terms as we do. I.e., the GSH measurement might not be precise enough to predict cell death for a given individual cell, but the statistical trend is clear and these measurements help predict cell death probabilities for cells of different sizes.

      In Figs 4B-D, the cell size axis seems to have over 4-fold size variability, but when the authors show the analysis of this data (Figs 4E-G) the variability is only 2-fold. What was excluded and on what basis?

      To address this point, we have now clarified in the Methods section how the data were processed and what data points we excluded from this analysis:

      Line 671: “For all binned flow cytometry data plots, the cells below the 2nd and above the 98th cell size percentiles were excluded to remove the extreme outliers. Then, the remaining data were binned by size and plotted as background-corrected average fluorescence intensity for each bin against the bin’s average cell size. Bins with fewer than 200 cells were excluded from the analysis to reduce noise.”

      Typically, such pre-processing reduces the size range, mostly from the large-cell end, because of the long right tail of the size distribution containing a few very large cells.

      Based on the methods section & figure legends of Fig 4B-I, the RPE cells were not pre-sorted to include only G1 cells, nor did the assay account for cell cycle differences. How can these data be used to explain results from earlier figures, where analyses were exclusively focused on size differences in G1?

      This is a valid point: Cells in the GSH measurement experiment were not gated by Hoechst signal for G1 phase because the channel normally used for Hoechst staining was in this case occupied by the MCB probe. However, given the data in Fig. 4A,B showing that the GSH production machinery is superscaling when measured specifically in G1-phase cells, we believe the flow cytometry data in Fig. 4C-J showing GSH concentration increasing with cell size across the whole cell cycle is very likely true for G1 cells as well.

      Minor comments:

      I recommend clarifying in the early introduction that all size changes discussed are in the absence of DNA content increase.”

      We have now clarified this in the introduction (Line 41 and Line 81).

      The introduction seems to cite primary research and review paper in the same sentences, which is a bit misleading as the reviews don't seem to add new evidence.

      We have removed review citations where they did not provide additional context.

      OPTIONAL

      In the second introduction paragraph, consider the classification/description of the three different mechanisms. Currently, it seems that these mechanisms are not independent of each other, and the details provided about each mechanism are inconsistent.”

      We have now modified this paragraph to make the description more consistent.

      Please provide statistics for the IC50 values reported based on Fig 1C. Were small and large cells statistically different? Are the IC50 values reported as +/- standard deviation or some other metric?

      This has now been clarified in the text as follows:

      “For example, at the 72 h time point, the Era2 IC50 was 28 ± 11 µM (mean ± SD) for large cells versus 2.0 ± 1.4 µM for small cells (Student’s t-test: p = 0.039) (Fig. 1C).”

      Providing more insight into why Era2 and RSL3 treatments yield more opposite responses would be of great interest to the field.”

      We agree this is an important point that should be discussed in more detail. In the field of ferroptosis, context-dependent (i.e., cell type-specific) effects are common and multiple groups including our own (Dixon) have published extensively on genes and mechanisms that can lead to differences between erastin2 and RSL3 sensitivity. For example, there are studies showing that the mTOR pathway or the p53 pathway can either prevent or promote ferroptosis, depending on the cell type and/or other currently unknown variables. To address more specifically the differences between Era2 and RSL3 in the context our observed cell-sizedependent response, we have now added more data and discussion. In the Results section we added panel 4B and the following text:

      Line 359: “While the upregulation of GSH biosynthesis may promote the resistance of larger cells to ferroptosis, such an upregulation alone cannot explain why larger cells become more resistant to ferroptosis induced by the cystine import inhibitor Era2, but not, for example, by the GPX4 inhibitor RSL3 (Chan et al, 2025) (Figs. 2B, S1B). We found previously that upon mTORC1 inhibition cells can evade cystine deprivation-induced ferroptosis by uptake and catabolism of cysteine-rich extracellular proteins, mostly albumin (Armenta et al, 2022) (Fig. S3C). This process involves albumin degradation in lysosomes, predominantly by cathepsin B (CatB), and subsequent export of cystine from lysosomes to fuel the synthesis of glutathione. Large cells undergo proteome rearrangements similar to those occurring upon mTORC1 inhibition (Zatulovskiy et al, 2022). This suggests that large cells may upregulate CatB expression to bypass the Era2-induced cystine import inhibition via system xc-. To test this hypothesis, we used flow cytometry to measure how the expression of cathepsin B and the system xc- cystine/glutamate transporter SLC7A11 (xCT) scales with cell size (Fig. 4B). We found that SLC7A11 concentration modestly decreases, while CatB concentration significantly increases with cell size (Fig. 4B). This shift in the ratio between SLC7A11 and CatB supports the hypothesis that larger cells may rely less on cystine import via system xc- and thus become more resistant to system xc- inhibition by Era2.”

      Additionally, in the Discussion we added the following:

      Line 578: “We show that large cells may become resistant specifically to Era2 but not RSL3 through the upregulation of lysosomal function, particularly cathepsin B expression, which enables the uptake and catabolism of cysteine-rich extracellular proteins. A size-dependent shift in the ratio between SLC7A11 and cathepsin B makes large cells less dependent on cystine import via system xc-, and thus, more resistant to Era2. In addition to this, it was reported that RSL3 can induce ferroptosis independently of GPX4 and may target other selenoproteins (DeAngelo et al, 2025; Cheff et al, 2023), which could also contribute to the difference in sizedependent responses to RSL3 and Era2.”

      Is the BODIPY-C11 labeling specific to plasma membrane, as suggested by the writing of the authors, or do the results shown integrate signals over all cell membranes?

      We thank the reviewer for pointing this out. BODIPY-C11 581/591 stains many membranes in the cell, not just the plasma membrane. We have changed the wording in the manuscript to reflect this.

      How exactly is gating done for the flow cytometry samples? Especially when analyzing size-scaling, the results are likely to be sensitive to outliers, such as those seen in Fig 4C (a subpopulation of very low CFSE stained cells). Can the authors clarify their methods and/or display supplementary figures with gating examples?

      We have now specified our gating strategy in the Methods section (Line 663) and added a corresponding Supplementary Figure S5.

      In Fig 4, total protein staining was used as a control, whereas Fig 5B b-actin was used as a control. Why did the authors rely on different controls approaches for essentially the same measurements? Are these controls comparable?

      In our flow cytometry experiments, we consistently use live-cell total protein stain (CFSE) for live cells, and anti-Tubulin immunofluorescent staining for fixed cells, both of which scale in proportion to cell volume and act as a read-out for total cellular protein content (Lanz and Zatulovskiy et al., Mol Cell 2022; Berenson et al. MBoC 2019), which we use to calculate concentrations of other cellular components (analogous to loading controls). In Fig. 5B, betaActin is used as a reference - a protein whose concentration does not change with cell size, as opposed to ACSL4 whose concentration decreases with cell size. In this plot, both ACSL4 and beta-Actin amounts were normalized to alpha-Tubulin, which is analogous to a concentration calculation using loading control. This is now explained in more detail in the Figure legend.

      Reviewer #1 (Significance):

      I work in the cell size research field, and I am familiar with other related works in this field. My evaluation reflects a specialist's view of this study. Overall, this study will be of a large interest to a small group of specialists, and specific aspects of the work will also gain some interest from broader basic research audiences studying mechanisms of drug responses and ferroptosis in general. However, I do not see this work gaining very broad interest across larger audiences, simply because the field of cell size research is not of broad interest, and this is not a landmark study for the field.

      The field of cell size research has long searched for size-dependent functions, as these could help explain why cell size matters. This study is a nice addition to our field, helping establish ferroptosis as a size-dependent function. However, the significance of this work relies on how clearly the authors can establish that their results are cell size rather than cell cycle effects (see major comments above). Should the authors address these concerns, then this study will provide some conceptual and mechanistic insight.

      Regarding mechanistic insights, this work is in stark contrast to a recent study about sizedependency of ferroptosis (https://doi.org/10.1016/j.isci.2025.112363), where increased cell size heightened sensitivity to the GPX4 inhibitor RSL3, thus suggesting an opposite conclusion than what the authors observed with the drug Era2. The authors examined this contradiction, and while their results with the drug RSL3 agreed with the recent study, they did not explain why different drug mechanisms yield opposite results. Providing more insights into this discrepancy would increase the impact of this work.

      Regardless of the impact of this work, I want to emphasize that I am fully supportive of seeing this work published once the technical concerns have been addressed. Our field will benefit from this work, and this work could catalyze important future research. The general topic studied here has the potential to become very important.

      We thank the reviewer for their thoughtful assessment and for supporting publication pending resolution of the technical concerns. We respectfully disagree that our audience is likely narrow: Reviewer #2 noted broad relevance to specialists in cell death/ferroptosis, redox biology, cancer biology, aging, and translational efforts in ferroptosis-based therapies, and Reviewer #3 similarly emphasized both cell size and ferroptosis/cell death communities. We therefore believe the work will be of interest across multiple active fields, particularly because it highlights how cell size heterogeneity can shape drug response.

      We agree that the significance hinges on clearly distinguishing cell size from cell-cycle effects, and we have strengthened the corresponding controls/analyses and adjusted language accordingly (see responses to major comments above). We also addressed the reported discrepancy between Era2 and RSL3 size-dependencies by adding new data (Fig. 4B) and expanded discussion. We very much hope that the reviewer appreciates the efforts we have made to strengthen this manuscript and resolve the technical concerns. For these reasons, we believe this work will have an impact on several fields and gain a broad readership.

      Reviewer #2:

      Zatulovskiy et al. demonstrate that cell size modulates susceptibility to ferroptosis, a form of iron-dependent cell death driven by lipid peroxidation. Using human cell lines (HMEC, HT-1080, RPE-1), the authors examined cell size through FACS sorting, CDK4/6 inhibition and inducible cyclin D1 knockdown. They found that larger cells are more resistant to ferroptosis induced by system xc<sup>-</sup>⁻ inhibition (erastin2), but more sensitive to GPX4 inhibition (RSL3), highlighting pathway-specific size dependencies.

      Mechanistically, larger cells exhibited:

      - Higher glutathione levels, supporting lipid peroxide detoxification

      - Increased ferritin expression, promoting iron sequestration

      - Lower ACSL4 levels, reducing incorporation of peroxidation-prone lipids

      These findings were supported by high-throughput microscopy, flow cytometry (BODIPY-C11 lipid peroxidation assays), and proteomic analyses. The study concludes that cell size influences proteome composition and metabolic capacity, thereby shaping cell death decisions, an insight with implications for aging, cancer, and ferroptosis-based therapies.

      Major Comments

      (1) Direct evaluation of SLC7A11 abundance and function is needed

      The opposite size-dependent effects of erastin2 and RSL3 strongly suggest a role for SLC7A11/system xc<sup>-</sup> activity in size-dependent ferroptosis resistance. However, SLC7A11 levels were not quantified due to insufficient peptide detection in the proteomic data. o Direct measurement of SLC7A11 protein levels (immunoblotting or flow cytometry) in small vs large cells would test whether its expression scales with size.

      a) Functional perturbation (siRNA/CRISPR knockdown) followed by erastin2 treatment would provide mechanistic validation. o Use of additional SLC7A11 inhibitors (e.g., sulfasalazine, sorafenib) could further test whether the size resistance phenotype is xc<sup>-</sup>-specific.

      We agree that the difference in size-dependent responses to RSL3 and Era2 is an important point that needs further investigation and discussion, as other reviewers also pointed out. To address more specifically the differences between Era2 and RSL3 in the context of cell-sizedependent response, we have now added more data and discussion. In the Results section we added panel 4B measuring SLC7A11 and Cathepsin B scaling with cell size and the following text:

      Line 359: “While the upregulation of GSH biosynthesis may promote the resistance of larger cells to ferroptosis, such an upregulation alone cannot explain why larger cells become more resistant to ferroptosis induced by the cystine import inhibitor Era2, but not, for example, by the GPX4 inhibitor RSL3 (Chan et al, 2025) (Figs. 2B, S1B). We found previously that upon mTORC1 inhibition cells can evade cystine deprivation-induced ferroptosis by uptake and catabolism of cysteine-rich extracellular proteins, mostly albumin (Armenta et al, 2022) (Fig. S3C). This process involves albumin degradation in lysosomes, predominantly by cathepsin B (CatB), and subsequent export of cystine from lysosomes to fuel the synthesis of glutathione. Large cells undergo proteome rearrangements similar to those occurring upon mTORC1 inhibition (Zatulovskiy et al, 2022). This suggests that large cells may upregulate CatB expression to bypass the Era2-induced cystine import inhibition via system xc-. To test this hypothesis, we used flow cytometry to measure how the expression of cathepsin B and the system xc- cystine/glutamate transporter SLC7A11 (xCT) scales with cell size (Fig. 4B). We found that SLC7A11 concentration modestly decreases, while CatB concentration significantly increases with cell size (Fig. 4B). This shift in the ratio between SLC7A11 and CatB supports the hypothesis that larger cells may rely less on cystine import via system xc- and thus become more resistant to system xc- inhibition by Era2.”

      Additionally, in the Discussion we added the following:

      Line 578: “We show that large cells may become resistant specifically to Era2 but not RSL3 through the upregulation of lysosomal function, particularly cathepsin B expression, which enables the uptake and catabolism of cysteine-rich extracellular proteins. A size-dependent shift in the ratio between SLC7A11 and cathepsin B makes large cells less dependent on cystine import via system xc<sup>-</sup>, and thus, more resistant to Era2. In addition to this, it was reported that RSL3 can induce ferroptosis independently of GPX4 and may target other selenoproteins (DeAngelo et al, 2025; Cheff et al, 2023), which could also contribute to the difference in sizedependent responses to RSL3 and Era2.”

      (2) Functional tests of ferritin contribution to resistance are needed Although elevated ferritin (FTH1/FTL) levels in larger cells represent a strong correlational signal, definitive experimental evidence establishing causality is currently lacking. o Measuring the labile iron pool directly in size-stratified populations would strengthen the link. o Knockdown of FTH1 or FTL could reveal whether ferritin upregulation is necessary for the resistance of large cells to ferroptosis.

      We thank the reviewer for raising this point. We have now completed additional experiments, as suggested by the reviewer, and found that iron chelation is unlikely to mediate the sizedependent response to Era2. We have modified the manuscript accordingly and added the following data and discussion to address this point:

      Line 296: “The observed increase in ferritin concentration with cell size could therefore lead to additional Fe2+ ion chelation, which in turn would protect large cells from iron-dependent lipid peroxidation and ferroptosis. However, when we measured the concentration of labile intracellular Fe2+ using a fluorescent probe FerroOrange (Hirayama et al, 2020), we did not observe any size-dependent decrease in labile iron concentration (Fig. S2A). Previous work suggests a link between increased sequestration of ferrous iron in lysosomes and resistance to ferroptosis. It was reported that senescent cells, which are also large (Fig. S3A,B), gain resistance to ferroptosis through lysosomal alkalinization and sequestration of ferrous iron in lysosomes (Loo et al, 2025). We therefore tested whether the superscaling of lysosomes observed in large cells (Lanz et al, 2022; You et al, 2025) promotes Era2 resistance through lysosomal iron sequestration. To do this, we stained the cells with the lysosomal iron detection probe Lyso-FerroRed (Saimoto et al, 2025) and measured its scaling using flow cytometry (Fig. S2B). We observed that the amount of Lyso-FerroRed, and therefore, the amount of lysosomal iron, scaled in direct proportion to cell size, just like the total cellular protein content (Fig. S2B). These results indicate that iron chelation by ferritin and its sequestration in lysosomes are unlikely to play a crucial role in size-dependent decrease in Era2 sensitivity.”

      (3) Relevance to senescence should be addressed experimentally or explicitly discussed

      Given that senescent cells are enlarged and accumulate in aged and tumour tissues, testing senescent models for erastin2 resistance would greatly strengthen the physiological significance.”

      We agree that an increase in cell size contributing to the resistance of senescent cells to ferroptosis is intriguing. We have now added a Supplementary Figure S3 and discussion of this point in the manuscript as follows:

      Discussion line 552: “…our data suggest that previously reported resistance of senescent cells to ferroptosis can at least partially be due to the increased cell size, a well-established hallmark of senescence.”

      Minor Comments

      (1) Mechanistic nuance regarding RSL3 should be included

      RSL3 has been reported to induce ferroptosis independently of GPX4 (PMID: 37087975, PMID: 40392234) and may target other selenoproteins such as TXNRD1. This nuance would help explain the observed divergence between RSL3 and erastin2 sensitivity across sizes.

      We have now added this in the Discussion as suggested by the reviewer (line 583):

      “In addition to this, it was reported that RSL3 can induce ferroptosis independently of GPX4 and may target other selenoproteins (DeAngelo et al, 2025; Cheff et al, 2023), which could also contribute to the difference in size-dependent responses to RSL3 and Era2.”

      (2) Dynamic range of BODIPY-C11 assays needs commentary

      Despite high erastin2 doses, the oxidized BODIPY signal remains close to DMSO levels. The authors should comment on whether this reflects high GSH buffering capacity, probe limitations, or other factors.”

      We believe there are both technical (narrow dynamic range of the probe) and biological reasons for the relatively small (2-3 fold) difference in Oxidized-to-Non-oxidized BODIPY-C11 ratios between DMSO and Era2-treated cells. The biological reason is that the cells continue producing GSH until they fully deplete the cystine pool, which happens ~20-24 h after Era2 addition. Once the cystine pool is depleted, the cells very rapidly deplete GSH and initiate cell death. Therefore, there is only a short time window where cells are strongly depleted of GSH before dying. We see this small fraction of cells with a high Oxidized BODIPY-C11 signal in our flow cytometry experiments and in previous microscopy analysis of BODIPY-C11 (Murray et al., Protocol for detection of ferroptosis in cultured cells. STAR Protoc. 2023), but at our chosen time point (20h Era2) most cells are not as bright because we aimed to analyze the population before the onset of widespread cell death.

      (3) Western blot for shCycD1 depletion should be included

      CycD1 depletion usually causes cells to stop proliferating, which is not the case here. Therefore, depletion must be partial. The level of depletion should be shown by immunblotting.”

      The CCND1 manipulation model is extensively characterized in our recent work cited in this manuscript (You et al. (2025), Cell size-dependent mRNA transcription drives proteome remodeling. 2025.10.30.685141 doi:10.1101/2025.10.30.685141). CCND1 shRNA cells do not fully arrest in G0/G1 because the concentration of Cyclin D1 protein in this system is only partially decreased, as the reviewer noted. As a result, the cells have a slightly elongated G1 phase due to a ~30% reduction in Cyclin D1 concentration, but continue to proliferate. The G1 fraction changes from ~70% in wild-type to ~80% in CCND1 shRNA cells.

      Reviewer #2 (Significance):

      General Assessment: This study presents a mechanistic link between cell size and ferroptosis susceptibility. Using high-throughput microscopy, proteomics, and genetic perturbations across multiple human cell lines, the authors demonstrate that larger cells are more resistant to ferroptosis induced by system xc<sup>-</sup> inhibition (erastin2). This resistance is attributed to elevated glutathione production, increased ferritinmediated iron sequestration, and reduced ACSL4-dependent lipid peroxidation. The experimental design is rigorous and multifaceted, with consistent results across cell types and size manipulation methods. While the study is limited to in vitro systems, its conceptual and mechanistic insights lay the groundwork for future in vivo and translational investigations.

      Advance: This work is the first to systematically show that cell size directly influences ferroptosis susceptibility via proteome scaling. It reconciles previous findings that large cells are sensitized to GPX4 inhibition (RSL3) by demonstrating that the ferroptosis pathway targeted system xc<sup>-</sup> vs GPX4 determines the direction of size-dependent vulnerability. The study provides a conceptual advance by positioning cell size as a regulatory axis in cell death decisions, and a mechanistic advance by identifying size-dependent changes in glutathione metabolism, ferritin levels, and ACSL4 expression.

      Audience: This research will be of interest to specialists in cell death, ferroptosis, redox biology, and cancer biology. It also holds relevance for aging researchers and translational scientists exploring ferroptosis-based therapies. The findings may influence how cell size heterogeneity is considered in therapeutic design, particularly in oncology and senescence-targeting strategies.

      Field of Expertise: Translational cancer biology, cell cycle regulation, proteomics, therapy resistance, molecular mechanisms of cell death.

      We thank Reviewer #2 for their careful and constructive assessment of our manuscript. We were happy that they appreciated the rigor of our multifaceted approach. We are also grateful for their thoughtful perspective on the conceptual and mechanistic advances, and for highlighting the broader relevance of this work to ferroptosis biology, redox regulation, cancer and aging research.

      Reviewer #3 (Evidence, reproducibility and clarity):

      In this manuscript, Zatulovskiy and colleagues elaborate on their previous work describing cell size-dependent changes in the proteome by investigating whether these changes can be correlated in differences in cell physiology. Using a cleverly-designed high throughput screen, they searched for compounds that differently-sized cells display differential sensitivity towards. Their primary hit, Era2, is involved in the ferroptosis pathway and serves as the starting point for a detailed study of how excess cell size protects cells from ferroptosis-induced cell death via: 1) lower concentrations of ACSL4 (which produces peroxidation-prone PUFAs), 2) increased ferritin concentrations, and 3) increased GSH concentrations.

      Overall, the experiments in this manuscript are well-designed and interpreted. It is an extremely well-written manuscript with a clear trajectory of logic. I have only a few major concerns that should be addressed before publication:

      We thank Reviewer #3 for their careful reading of the manuscript and for the clear summary of our study and its central findings. We appreciate their positive assessment of the experimental design, interpretation, and overall clarity of the writing and logical flow. We are also grateful for their constructive feedback and take their major concerns seriously; we have addressed each point in detail below.

      Major concerns:

      (1) In Figure 3E, the authors gate their flow cytometry data using SYTOX so that they are only analyzing live cells. Based on their gating scheme, it seems like there are really a lot of dead cells. Presumably the cells that died were the most sensitive to Era2, so it seems an oversight to discard these cells. Of course, it is not appropriate to analyze dead cells, but this could potentially be solved by using a shorter treatment duration than 24 hours wherein fewer cells die.”

      This is a good point. To address it, we have now replaced this panel with a time point where most cells are still alive (20 h, 0.2 µM Era2), as suggested by the reviewer (Fig. 3E,F). This did not change the conclusion that BODIPY-C11 oxidation decreases with cell size.

      (2) In Figure 5, are the small, medium, and large bins for ACSL4 KO cells the same as for WT cells? If the ACSL4 KO cells are just bigger to begin with, this could explain why the "small" bin has greater cell survival than the WT small bin. Moreover, is the overlap between the three bins the same in the WT and KO cells?

      This is an important point that we now address with data shown in Fig. S4B. We have now added a Supplementary Figure S4B to show the relative size of small, medium, and large WT and ACSL4 KO HMEC cells. As seen from this graph, the ACSL4 KO cells are not bigger than WT cells. Importantly, the fold-range between the small and large FACS-sorted cells is similar (~1.9 to 2-fold).

      (3) Loo, et al. Nat Comms 2025 similarly found that senescent cells (which are enlarged) are resistant to ferroptosis using the same inhibitor as the authors. In contrast to the authors, they show that this is due to lysosomal alkalinization and sequestration of ferrous iron in lysosomes. Given that Lanz et al. 2022 found that lysosomal components super-scale with cell size, it seems like this would be an important hypothesis to address. Free lysosomal iron can be easily measured with the LysoRhoNox stain. Loo et al. was able to restore ferroptosis sensitivity in senescent cells using the V-ATPase activator EN6, so it would be important for the authors to address whether this (or similar) treatment would have the same effect in enlarged cells.

      This is an excellent point. We have now performed this experiment and added it to the manuscript, as suggested by the reviewer. Based on the Lyso-FerroRed staining (another brand name for the LysoRhoNox probe), we do not see an increase in lysosomal iron sequestration in large cells (Fig. S2B):

      Line 301: “Previous work suggests a link between increased sequestration of ferrous iron in lysosomes and resistance to ferroptosis. It was reported that senescent cells, which are also large (Fig. S3A,B), gain resistance to ferroptosis through lysosomal alkalinization and sequestration of ferrous iron in lysosomes (Loo et al, 2025). We therefore tested whether the superscaling of lysosomes observed in large cells (Lanz et al, 2022; You et al, 2025) promotes Era2 resistance through lysosomal iron sequestration. To do this, we stained the cells with the lysosomal iron detection probe Lyso-FerroRed (Saimoto et al, 2025) and measured its scaling using flow cytometry (Fig. S2B). We observed that the amount of Lyso-FerroRed, and therefore, the amount of lysosomal iron, scaled in direct proportion to cell size, just like the total cellular protein content (Fig. S2B). These results indicate that iron chelation by ferritin and its sequestration in lysosomes are unlikely to play a crucial role in size-dependent decrease in Era2 sensitivity.”

      Minor concerns:

      (1) It would be helpful if this manuscript were re-submitted with line numbers to more easily reference the text.

      We have added line numbers for convenience.

      (2) In Figure 5A and other figures that reproduce data from Lanz et al. 2022, it would be helpful to have a summary curve for the overall abundance of each protein rather than only the individual peptide curves. These plots (particularly Figure 5A) are difficult to interpret since some peptides were presumably more abundant / measured with higher confidence than others.

      We have added the average ACSL4 protein slope line to Fig. 5A.

      (3) In Figure 5, the authors show the validation of the ACSL4 KO HT-1080 cell line but not HMEC, even though both are used in this figure. It would be useful to show both. Additionally, the authors switch back and forth between the two cell lines for this figure, and it is not clear why.

      We have added the HMEC ACSL4 KO validation Western blot in Fig. S4A.

      For the BODIPY oxidation experiment (Fig. 5D), we used HT-1080 instead of HMEC because HT1080 cells are sensitive to lower concentrations of Era2, and therefore, we could better optimize the Era2 concentrations and treatment durations to measure BODIPY oxidation at the time point when most cells are still alive but demonstrate a pronounced oxidized BODIPY signal.

      (4) In Figure 5B, the authors use antibody-based staining of ACSL4 and flow cytometry to correlate a loss of ACSL4 expression with increased cell size, validating the proteomics data in Figure 5A. This does not seem like a good way to do this. Firstly, fixing cells with formaldehyde alters their size (is this proportional across differently sized cells? It's impossible to know), which makes it inappropriate to use SSC as a proxy for size in this particular situation. Secondly, the normalization scheme here doesn't make sense. If actin was used as a reference protein, why was tubulin used to normalize ACSL4 abundance? Overall, this seems like a very round-about experiment that could have just been addressed by doing a simple western blot with the four size bins sorted from live cells (as it was in the proteomics). If the issue is that ACSL4 is not detectable by western in the HMEC cells, another solution would be plating the live, sorted bins on coverslips and measuring by IF (or using the HT-1080 cells).

      We prefer IF flow cytometry to Western blotting for protein scaling analysis because it is more quantitative and provides cell size and protein content information for each individual cell. While in principle, different-sized cells might change their size differently during fixation, the cells that were larger or smaller prior to the fixation remain larger or smaller after fixation as well.

      Therefore, the SSC measurement after fixation still provides reliable information on size ranking, even if SSC does not perfectly linearly scale with cell volume. We do not use the SSC information to calculate protein concentrations here. Instead, we divide the amount of our protein of interest in the cell by the amount of constitutively-expressed Tubulin, which acts as an analogue of a loading control in this experiment. In Fig. 5B, both ACSL4 and Actin were normalized to Tubulin to estimate their concentrations. Actin is used just as a reference protein to show how the concentration of a perfectly scaling protein remains constant across cell size, as opposed to the sub-scaling ACSL4. Tubulin in this case was used as a proxy for total cellular protein content, which scales linearly in proportion to cell volume. This approach for determining the scaling behaviors of different proteins was previously validated in Lanz et al., Mol Cell 2022.

      (5) In Figure 5E/5F, the authors pre-arrest the cells in G1 with palbociclib before size-sorting them. The pre-arrest is not done in other experiments using this cell line for sizesorting, so it would be important for the authors to comment on why this was done for this experiment but not others.”

      As we found in Fig. 2B-E, the cell cycle has confounding effects on size-dependent ferroptosis susceptibility measurements (as discussed in detail in our response to the first major point of Reviewer #1 above). Briefly, to avoid these confounding effects and isolate the effects of cell size from the effects of the cell cycle, we pre-synchronized the cells with 24 h treatment with palbociclib in Fig. 5E,F. This is now better clarified in the text, as follows:

      Line 456: “In this experiment, we synchronized cells in G1 phase using palbociclib prior to cell sorting and also incubated the sorted cells in the presence of palbociclib during Era2 treatment to isolate cell size effects from the previously observed confounding effects of the cell cycle on ferroptosis (Fig. 2B,E).”

      (6) Conceptually, it is difficult for me to understand why large cell size sensitizes cells to GPX4 inhibition but confers resistance to Era2 treatment. Particularly given the pathway described in Figure 3A, I am having trouble understanding why these would convey such opposing phenotypes. Shouldn't the extra ferritin in the bigger cells also help them cope with GPX4 inhibition if, as the authors state in the discussion, the increased sensitivity to the GPX4 inhibitor is reported to be mediated by (among other things) iron accumulation? A deeper discussion of this seeming-incongruity would be helpful for contextualizing the broader role of cell size in determining ferroptosis sensitivity.

      We agree this is an important point, which was also raised by the other reviewers. As such, we note that context-dependent (i.e., cell type-specific) effects are common in the ferroptosis field, and multiple groups including our own (Dixon) have published extensively on genes and mechanisms that can lead to differences between erastin2 and RSL3. For example, there are studies showing that the mTOR pathway or the p53 pathway can both prevent and promote ferroptosis, depending on the cell type or some other hidden variable.

      To better address the differences between Era2 and RSL3 in the context of the cell-sizedependent response, we have now added more data and discussion. In the Results section we added panel 4B and the following text:

      Line 359: “While the upregulation of GSH biosynthesis may promote the resistance of larger cells to ferroptosis, such an upregulation alone cannot explain why larger cells become more resistant to ferroptosis induced by the cystine import inhibitor Era2, but not, for example, by the GPX4 inhibitor RSL3 (Chan et al, 2025) (Figs. 2B, S1B). We found previously that upon mTORC1 inhibition cells can evade cystine deprivation-induced ferroptosis by uptake and catabolism of cysteine-rich extracellular proteins, mostly albumin (Armenta et al, 2022) (Fig. S3C). This process involves albumin degradation in lysosomes, predominantly by cathepsin B (CatB), and subsequent export of cystine from lysosomes to fuel the synthesis of glutathione. Large cells undergo proteome rearrangements similar to those occurring upon mTORC1 inhibition (Zatulovskiy et al, 2022). This suggests that large cells may upregulate CatB expression to bypass the Era2-induced cystine import inhibition via system xc-. To test this hypothesis, we used flow cytometry to measure how the expression of cathepsin B and the system xc- cystine/glutamate transporter SLC7A11 (xCT) scales with cell size (Fig. 4B). We found that SLC7A11 concentration modestly decreases, while CatB concentration significantly increases with cell size (Fig. 4B). This shift in the ratio between SLC7A11 and CatB supports the hypothesis that larger cells may rely less on cystine import via system xc- and thus become more resistant to system xc- inhibition by Era2.”

      Additionally, in the Discussion we added the following:

      Line 578: “We show that large cells may become resistant specifically to Era2 but not RSL3 through the upregulation of lysosomal function, particularly cathepsin B expression, which enables the uptake and catabolism of cysteine-rich extracellular proteins. A size-dependent shift in the ratio between SLC7A11 and cathepsin B makes large cells less dependent on cystine import via system xc-, and thus, more resistant to Era2. In addition to this, it was reported that RSL3 can induce ferroptosis independently of GPX4 and may target other selenoproteins (DeAngelo et al, 2025; Cheff et al, 2023), which could also contribute to the difference in sizedependent responses to RSL3 and Era2.”

    1. eLife Assessment

      This study thoroughly assesses tactile acuity on women's breasts, for which no dependable data currently exists. The study provides two important contributions, by convincingly showing that tactile acuity on the breast is poor in comparison to other body parts, and that acuity is worst in larger breasts, indicating that the number of tactile sensors is fixed. This study will be of interest to the broader community of touch, as well as those interested in breast reconstruction and sexual function.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Senior Editor without further input from the original reviewers. The authors have moderated their claims and discussed the limitations of their experimental design more transparently. The previous reviews are included for reference.]

      Comments on previous version:

      The authors investigated tactile spatial perception on the breast using discrimination, categorization, and direct localization tasks. They reach four main conclusions:

      (1) The breast has poor tactile spatial resolution.<br /> This conclusion is based on comparing just noticeable differences, a marker of tactile spatial resolution, across four body regions, two on the breast. The data compellingly support the conclusion; the study outshines other studies on tactile spatial resolution that tend to use problematic measures of tactile resolution, such as two-point-discrimination thresholds. The result will interest researchers in the field and possibly in other fields due to the intriguing tension between the finding and the sexually arousing function of touching the breast.

      The manuscript incorrectly describes the result as poor spatial acuity. Acuity measures the average absolute error, and acuity is good when response biases are absent. Precision relates to the error variance. It is common to see high precision with low acuity or vice versa. Just noticeable differences assess precision or spatial resolution, while points of subjective equality evaluate acuity or bias. Similar confusions between these terms appear throughout the manuscript.

      A paragraph within the next section seems to follow up on this insight by examining the across-participant consistency of the differences in tactile spatial resolution between body parts. To this aim, pairwise rank correlations between body sites are conducted. This analysis raises red flags from a statistical point of view. 1) An ANOVA and its follow-up tests assume no variation in the size of the tested effect but varying base values across participants. Thus, if significant differences between conditions are confirmed by the original statistical analysis, most participants will have better spatial resolution in one condition than the other condition, and the difference between body sites will be similar across participants. 2) Correlations are power-hungry, and non-parametric tests are power-hungry. Thus, the number of participants needed for a reliable rank correlation analysis far exceeds that of the study. In sum, a correlation should emerge between body sites associated with significantly different tactile JNDs; however, these correlations might only be significant for body sites with pronounced differences due to the sample size.

      (2) Larger breasts are associated with lower tactile spatial resolution<br /> This conclusion is based on a strong correlation between participants' JNDs and the size of their breasts. The depicted correlation convincingly supports the conclusion. The sample size is below that recommended for correlations based on power analyses, but simulations show that spurious correlations of the reported size are extremely unlikely at N=18. Moreover, visual inspection rules out that outliers drive these correlations. Thus, they are convincing. This result is of interest to the field, as it aligns with the hypothesis that nerve fibers are more sparsely distributed across larger body parts.

      (3) The nipple is a unit<br /> The data do not support this conclusion. The conclusion that the nipple is perceived as a unit is based on poor tactile localization performance for touches on the nipple compared to the areola. The problem is that the localization task is a quadrant identification task with the center being at the nipple. Quadrants for the areola could be significantly larger due to the relative size of the areola and the nipple; the results section seems to suggest this was accounted for when placing the tactile stimuli within the quadrants, but the methods section suggests otherwise. Additionally, the areola has an advantage because of its distance from the nipple, which leads to larger Euclidean distances between the centers of the quadrants than for the nipple. Thus, participants should do better for the areola than for the nipple even if both sites have the same tactile resolution.

      To justify the conclusion that the nipple is a unit, additional data would be required. 1) One could compare psychometric curves with the nipple as the center and psychometric curves with a nearby point on the areola as the center. 2) Performance in the quadrant task could be compared for the nipple and an equally sized portion of the areola and tactile locations that have the same distance to the border between quadrants in skin coordinates. 3) Tactile resolution could be directly measured for both body sites using a tactile orientation task with either a two-dot probe or a haptic grating.

      Categorization accuracy in each area was tested against chance using a Monte Carlo test, which is fine, though the calculation of the test statistic, Z, should be reported in the Methods section, as there are several options. Localization accuracies are then compared between areas using a paired t-test. It is a bit confusing that once a distribution-approximating test is used, and once a test that assumes Gaussian distributions when the data is Bernoulli/Binomial distributed. Sampling-based and t-tests are very robust, so these surprising choices should have hardly any effect on the results.

      A correlation based on N=4 participants is dangerously underpowered. A quick simulation shows that correlation coefficients of randomly sampled numbers are uniformly distributed at such a low sample size. This likely spurious correlation is not analyzed, but quite prominently featured in a figure and discussed in the text, which is worrisome.

      (4) Localization of tactile events on the breast is biased towards the nipple<br /> The conclusion that tactile percepts are drawn toward the nipple is based on localization biases for tactile stimuli on the breast compared to the back. Unfortunately, the way participants reported the tactile locations introduces a major confound. Participants indicated the perceived locations of the tactile stimulus on 3D models of these body parts. The nipple is a highly distinctive and cognitively represented landmark, far more so than the scapula, making it very likely that responses were biased toward the nipple regardless of the actual percepts. One imperfect but better alternative would have been to ask participants to identify locations on a neutral grey patch and help them relate this patch to their skin by repeatedly tracing its outline on the skin.

      Participants also saw their localization responses for the previously touched locations. This is unlikely to induce bias towards the nipple, but it renders any estimate of the size and variance of the errors unreliable. Participants will always make sure that the marked locations are sufficiently distant from each other.

      The statistical analysis is again a homebrew solution and hard to follow. It remains unclear why standard and straightforward measures of bias, such as regressing reported against actual locations, were not used.

      Null-hypothesis significance testing only lets scientists either reject the null hypothesis or not. The latter does NOT mean the Null hypothesis is true, i.e., it can never be concluded that there is no effect. This rule applies to every NHST test. However, it raises particular concerns with distribution tests. The only conclusion possible is that the data are unlikely from a population with the tested distribution; these tests do not provide insight into the actual distribution of the data, regardless of whether the result is significant or not.

    3. Reviewer #2 (Public review):

      Summary:

      The authors tested tactile acuity on the breast of females using several tasks.

      Results:

      Tactile acuity, assessed by just-noticeable differences in judging whether a touch was above or below a comparison stimulus, was lower on both the lateral and medial breast than on the hand and back. Acuity also scaled inversely with breast size, echoing earlier findings that larger hands exhibit lower acuity, presumably because a similar number of tactile receptors must be distributed over larger or smaller body surfaces. Observing this principle in the breast as on the hand strengthens the view that fixed innervation is a general organizing principle of the tactile system. Both methodology and analysis appear sound.

      Most participants were unable to localize touch to a specific quadrant of the nipple, suggesting it is perceived as a single tactile unit. However, the study does not address whether touches to the nipple and areola are confused; conceptualizing the nipple as a perceptual (landmark) unit would suggest that such confusion should not take place. Aside from this limitation, the methodology and analysis appear sound.

      Absolute touch localization, assessed by asking participants to indicate locations on a 3D rendering of their own torso, revealed a bias toward the nipple. The authors interpret this as evidence that the nipple serves as a landmark attracting perceived touch. However, as reviewers noted during review, alternative explanations cannot be fully ruled out: because the stimulus array was centered on the nipple, the observed bias may stem from stimulus distribution rather than landmark status. Aside from this caveat, the methodology and analysis appear sound.

      Overall assessment:

      The study offers a welcome exception to the prevailing bias in tactile research that limits investigation to the hand and arm. Its support for the fixed innervation hypothesis and its suggestion that the nipple may serve as a potential landmark-though requiring further scrutiny-illustrate the value of extending research to other body regions. By employing multiple tasks, the authors address several key aspects of tactile perception and create links to earlier findings.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript incorrectly describes the result as poor spatial acuity. Acuity measures the average absolute error, and acuity is good when response biases are absent. Precision relates to the error variance. It is common to see high precision with low acuity or vice versa. Just noticeable differences assess precision or spatial resolution, while points of subjective equality evaluate acuity or bias. Similar confusions between these terms appear throughout the manuscript.

      While I do not agree with the reviewer's usage of the word “acuity” and a cursory Google search does not agree with the provided definition, I have replaced acuity with precision as appropriate to improve clarity.

      A paragraph within the next section seems to follow up on this insight by examining the across-participant consistency of the differences in tactile spatial resolution between body parts. To this aim, pairwise rank correlations between body sites are conducted. This analysis raises red flags from a statistical point of view. 1) An ANOVA and its follow-up tests assume no variation in the size of the tested effect but varying base values across participants. Thus, if significant differences between conditions are confirmed by the original statistical analysis, most participants will have better spatial resolution in one condition than the other condition, and the difference between body sites will be similar across participants. 2) Correlations are power-hungry, and non-parametric tests are power-hungry. Thus, the number of participants needed for a reliable rank correlation analysis far exceeds that of the study. In sum, a correlation should emerge between body sites associated with significantly different tactile JNDs; however, these correlations might only be significant for body sites with pronounced differences due to the sample size.

      We have entirely removed this result from both the text and supplement.

      The data do not support this conclusion. The conclusion that the nipple is perceived as a unit is based on poor tactile localization performance for touches on the nipple compared to the areola. The problem is that the localization task is a quadrant identification task with the center being at the nipple. Quadrants for the areola could be significantly larger due to the relative size of the areola and the nipple; the results section seems to suggest this was accounted for when placing the tactile stimuli within the quadrants, but the methods section suggests otherwise. Additionally, the areola has an advantage because of its distance from the nipple, which leads to larger Euclidean distances between the centers of the quadrants than for the nipple. Thus, participants should do better for the areola than for the nipple even if both sites have the same tactile resolution.

      We agree with this interpretation and have updated the language throughout.

      Categorization accuracy in each area was tested against chance using a Monte Carlo test, which is fine, though the calculation of the test statistic, Z, should be reported in the Methods section, as there are several options. Localization accuracies are then compared between areas using a paired t-test. It is a bit confusing that once a distribution-approximating test is used, and once a test that assumes Gaussian distributions when the data is Bernoulli/Binomial distributed. Sampling-based and t-tests are very robust, so these surprising choices should have hardly any effect on the results.

      Excellent point. We have replaced the paired t-test with a signed rank test and added text to the methods to expand upon this.

      A correlation based on N=4 participants is dangerously underpowered. A quick simulation shows that correlation coefficients of randomly sampled numbers are uniformly distributed at such a low sample size. This likely spurious correlation is not analyzed, but quite prominently featured in a figure and discussed in the text, which is worrisome.

      We have removed this panel to reduce this concern.

      The conclusion that tactile percepts are drawn toward the nipple is based on localization biases for tactile stimuli on the breast compared to the back. Unfortunately, the way participants reported the tactile locations introduces a major confound. Participants indicated the perceived locations of the tactile stimulus on 3D models of these body parts. The nipple is a highly distinctive and cognitively represented landmark, far more so than the scapula, making it very likely that responses were biased toward the nipple regardless of the actual percepts. One imperfect but better alternative would have been to ask participants to identify locations on a neutral grey patch and help them relate this patch to their skin by repeatedly tracing its outline on the skin.

      While I wholeheartedly agree with the sentiments of the reviewer, in our experience performing these tests across many women we have found that the variability of the morphology of the breast makes it incredibly hard for women to perform this task in the way the reviewer is describing. Consequently, there is likely no perfect version of the task. That said, we have endeavored to acknowledge the limitations of the approach in the discussion.

      Participants also saw their localization responses for the previously touched locations. This is unlikely to induce bias towards the nipple, but it renders any estimate of the size and variance of the errors unreliable. Participants will always make sure that the marked locations are sufficiently distant from each other.

      I again respectfully disagree with this interpretation. If the participants were to always make sure marked locations were sufficiently distant from each other then the degree of error and bias would be similar between regions given that the visual pattern would be almost identical. As this is not true in the data, I disagree with the premise, though we hope the changes to the discussion acknowledge limitations with the data collection method.

      Null-hypothesis significance testing only lets scientists either reject the null hypothesis or not. The latter does NOT mean the Null hypothesis is true, i.e., it can never be concluded that there is no effect. This rule applies to every NHST test. However, it raises particular concerns with distribution tests. The only conclusion possible is that the data are unlikely from a population with the tested distribution; these tests do not provide insight into the actual distribution of the data, regardless of whether the result is significant or not.

      Thank you for this comment. We have updated the language to make it explicit that we do not mean to imply failing to deviate from the Null distribution does not mean that they are in fact Null in nature.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      I am wondering whether the interpretation of "the nipple as a sensory unit" is also supported by localization performance as reported in the analysis around Fig. 3 and supplementary Fig. 2. I cannot really see the error lines in that figure, and cannot tell whether any of the touches were on the nipple proper. Specifically I am wondering whether touch to the nipple is reliably attributed to the nipple, and touch to the areola to the areola, or whether confusion exists between the two. The description of the nipple as a sensory unit implies reliable attribution of touch to the respective area. Also the discussion (lines 309ff) is ambiguous about this.

      Thank you for this comment. We have removed language about the nipple being a unit and reframed the text in the discussion. We have also clarified that touches were indeed on the nipple.

      typos etc.

      lines 68-71 - implied causality is not backed up by evidence and could be the other way around than stated here

      line 82 grammar is inconsistent

      lines 199-200, "on the nipple" occurs twice

      Thank you for catching these. We have addressed the typos and grammar. We have also added a citation to the sentence where this exact hypothesis is stated. We have also relaxed the language to imply it is indeed a hypothesis.

    1. eLife Assessment

      This important work demonstrates the role of physically linking the core and CTD kinase modules of TFIIH via separate domains of subunit Tfb3 in confining RNA Polymerase II Serine 5 CTD phosphorylation to promoter regions of transcribed genes in budding yeast. The main findings, resulting from analyses of viable Tfb3 mutants in which the linkage between TFIIH core and kinase modules has been severed, are supported by solid evidence from in vitro and in vivo experiments. The new findings raise the intriguing possibility that the Tfb3-mediated connection between core and kinase modules of TFIIH is an evolutionary addition to an ancestral state of physically unconnected enzymes.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous rounds of review.]

      Giordano et al. demonstrate that yeast cells expressing separated N- and C-terminal regions of Tfb3 are viable and grow well. Using this creative and powerful tool, the authors effectively uncouple CTD Ser5 phosphorylation at promoters and assess its impact on transcription. This strategy is complementary to previous approaches, such as Kin28 depletion or the use of CDK7 inhibitors. The results are largely consistent with earlier studies, reinforcing the importance of the Tfb3 linkage in mediating CTD Ser5 phosphorylation at promoters and subsequent transcription.

      Notably, the authors also observe effects attributable to the Tfb3 linker itself, beyond its role as a simple physical connection between the N- and C-terminal domains. These findings provide functional insight into the Tfb3 linker, which had previously been observed in structural studies but lacked clear functional relevance. Overall, I am very positive about the publication of this manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      This work advances our understanding of how TFIIH coordinates DNA melting and CTD phosphorylation during transcription initiation. The finding that untethered kinase activity becomes "unfocused," phosphorylating the CTD at ser5 throughout the coding sequence rather than being promoter-restricted, suggests that the TFIIH Core-Kinase linkage not only targets the kinase to promoters but also constrains its activity in a spatial and temporal manner.

      Strengths:

      The experiments presented are straightforward and the model for coupling initiation and CTD phosphorylation and for evolution of these linked processes are interesting and novel. The results have important implications for the regulation of initiation and CTD phosphorylation.

    4. Reviewer #3 (Public review):

      Summary:

      Eukaryotic gene transcription requires a large assemblage of protein complexes that govern the molecular events required for RNA Polymerase II to produce mRNAs. One of these complexes, TFIIH, comprises two modules, one of which promotes DNA unwinding at promoters, while the other contains a kinase (Kin28 in yeast) that phosphorylates the repeated motif at the C-terminal domain (CTD) of the largest subunit of Pol II. Kin28 phosphorylation of Ser5 in the YSPTSPS motif of the CTD is normally highly localized at promoter regions, and marks the beginning of a cycle of phosphorylation events and accompanying protein association with the CTD during the transition from initiation to elongation.

      The two modules of TFIIH are linked by Tfb3. Tfb3 consists of two globular regions, an N-terminal domain that contacts the Core module of TFIIH and a C-terminal domain that contacts the kinase module, connected by a linker. In this paper, Giordano et al. test the role of Tfb3 as a connector between the two modules of TFIIH in yeast. They show that while no or very slow growth occurs if only the C-terminal or N-terminal region of Tfb3 is present, near normal growth is observed when the two unlinked regions are expressed. Consistent with this result, the separate domains are shown to interact with the two distinct TFIIH modules. ChIP experiments show that the Core module of TFIIH maintains its localization at gene promoters when the Tfb3 domains are separated, while localization of the kinase module, and of Ser5 phosphorylation on the CTD of Pol II, is disrupted. Finally, the authors examine the effect of separating the Tfb3 domains on another function of TFIIH, namely nucleotide excision repair, and find little or no effect when only the N-terminal region of Tfb3 or the two unlinked domains are present.

      Strengths:

      Experiments involving expression of Tfb3 domains in yeast are well-controlled and the data regarding viability, interaction of the separate Tfb3 domains with TFIIH modules, genome-wide localization of the TFIIH modules and of phosphorylated Ser5 CTDs, and of effects on NER, are convincing. The experiments are consistent with current models of TFIIH structure and function and support a model in which Tfb3 tethers the kinase module of TFIIH close to initiation sites to prevent its promiscuous action on elongating Pol II.

    5. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This important work demonstrates the role of physically linking the core and CTD kinase modules of TFIIH via separate domains of subunit Tfb3 in confining RNA Polymerase II Serine 5 CTD phosphorylation to promoter regions of transcribed genes in budding yeast. The main findings, resulting from analyses of viable Tfb3 mutants in which the linkage between TFIIH core and kinase modules has been severed, are supported by solid evidence from in vitro and in vivo experiments. The new findings raise the intriguing possibility that the Tfb3-mediated connection between core and kinase modules of TFIIH is an evolutionary addition to an ancestral state of physically unconnected enzymes.

      After consultation with the referees, we would like to suggest that you insert text into the RESULTS section acknowledging two limitations of your findings remaining in the revised manuscript, as follows:

      (i) It remains possible that Kin28 abundance was reduced by splitting Tfb3, which could be a factor in reducing its occupancies at gene promoters.

      In response, the paper now contains the following sentence:

      “Kin28 levels in extracts were below the limit of detection for our antibody, so we cannot rule out that the drop in ChIP signal is partly due to reduced Kin28 levels in the split Tfb3 strains. However, the viability of the cells (Figure 2) and the Tfb3-TAP purifications (Figure 3) argue against a complete loss of Kin28.”

      (ii) Lower than wild-type expression of the Tfb3 truncations might contribute to their mutant phenotypes shown in Figs. 2 & 5.

      In response, the paper now contains the following sentence:

      “There was some variation in protein expression levels (Figure 3A, left panel, lanes 1-4), and reduced levels of the split Tfb3 may contribute to the slow growth phenotypes.”

      Public Reviews:

      Reviewer #1 (Public review):

      Giordano et al. demonstrate that yeast cells expressing separated N- and C-terminal regions of Tfb3 are viable and grow well. Using this creative and powerful tool, the authors effectively uncouple CTD Ser5 phosphorylation at promoters and assess its impact on transcription. This strategy is complementary to previous approaches, such as Kin28 depletion or the use of CDK7 inhibitors. The results are largely consistent with earlier studies, reinforcing the importance of the Tfb3 linkage in mediating CTD Ser5 phosphorylation at promoters and subsequent transcription.

      Notably, the authors also observe effects attributable to the Tfb3 linker itself, beyond its role as a simple physical connection between the N- and C-terminal domains. These findings provide functional insight into the Tfb3 linker, which had previously been observed in structural studies but lacked clear functional relevance. Overall, I am very positive about the publication of this manuscript and offer a few minor comments below that may help to further strengthen the study.

      We appreciate the reviewer’s positive assessment of our work and suggestions for improvement.

      Page 4 PIC structures show the linker emerging from the N-terminal domain as a long alpha-helix running along the interface between the two ATPase subunits, followed by a turn and a short stretch of helix just N-terminal to a disordered region that connects to the C-terminal region (see schematic in Fig. 1A).

      The linker helix was only observed in the poised PIC (Abril-Garrido et al., 2023), not other fully-engaged PIC structures.

      Thanks for clarifying. We note that some structures of TFIIH alone also see the long helix. Accordingly, we modified this section to read:

      “In many TFIIH and PIC structures the linker is not visible, presumably due to flexibility. However, when it is seen (Abril-Garrido et al., 2023; Greber et al., 2019), the linker emerges from the N-terminal domain as a long alpha-helix running along the interface between the two ATPase subunits…”

      Page 8 Recent structures (reviewed in (Yu et al., 2023)) show that the Kinase Module would block interactions between the Core Module and other NER factors. Therefore, TFIIH either enters into the NER complex as free Core Module, or the Kinase Module must dissociate soon after.

      To my knowledge, this is still controversial in the NER field. I note the potential function on the kinase module is likely attributed to the N-terminal region of Tfb3 through its binding to Rad3.

      We are not experts on NER, but in reviews of the field this appears to be a widely held assumption. A 2008 paper from the Egly lab (Coin et al., DOI 10.1016/j.molcel.2008.04.024) is usually cited, which shows that the interaction between XPD (metazoan Rad3) and XPA is likely incompatible with XPD-MAT1 interaction. In addition to the Yu 2023 review, we now also cite a more recent publication that more extensively reviews the models for core TFIIH interactions (van Sluis et al, 2025). We looked at the multiple recently published structures of various TCR-NER and GG-NER intermediate complexes, and none of them show the CAK module or even the Tfb3/Mat1 N-term, even though those proteins were typically included during assembly. We also consulted with our colleagues Johannes Walter and Lucas Farnung, who are studying various TC-NER intermediates biochemically and structurally. Although the CAK module is included in their assembly reactions, it is not visible in their cryoEM structures. They tell me that the presence of CAK would be compatible with early TC-NER intermediates, but is predicted to overlap with later interactions of XPD with the TC-NER factor STK19 (see Mevissen et al., Cell 2024). To be conservative, we modified the sentence to say “Recent structures … suggest” rather than “show”.

      Because the yeast strains used in Fig. 6 retain the N-terminal region of Tfb3, the UV sensitivity assay presented here is unlikely to directly address the contribution of the kinase module to NER.

      We agree that our experiment only shows that the connection between Tfb3 N- and C-term domains is not necessary for NER. The individual domains might still be able to function independently. Accordingly, we changed the heading of that section from “Disconnected core TFIIH does not cause an NER defect” to “Split Tfb3 does not cause an NER defect.” This more closely matches the figure legend title.

      Page 11. Notably, release of the Tfb3 Linker contact also results in the long alpha-helix becoming disordered (Abril-Garrido et al., 2023), which could allow the kinase access to a far larger radius of area. This flexibility could help the kinase reach both proximal and distal repeats within the CTD, which can theoretically extend quite far from the RNApII body.

      Although the kinase module was resolved at low resolution in all PIC-Mediator structures, these structural studies consistently reveal the same overall positioning of the kinase module on Mediator, indicating that its localization is constrained rather than variable. This observation suggests that the linker region may help position the kinase module at this specific site, likely through direct interactions with the PIC or Mediator. This idea is further supported by numerous cross-links between the linker region and Mediator (Robinson et al., 2016).

      That is true. But please note that this sentence was meant to describe movement of the kinase module AFTER release from Mediator (see previous sentence). Re-reading the passage, we realized the confusion is because we propose multiple possible pathways in that paragraph. In the first half, we suggest the capture of the kinase module by Mediator might trigger the conformation changes in the linker. In the second half (where it says “Alternatively….”) we suggest the Mediator-CAK interaction could instead come first, and the release of this contact could free the CAK module to move around. We have modified the paragraph to make it clear these are two different distinct models.

      Comments on revisions:

      Revised ms clarified all my points, including those I previously misunderstood.

      Thanks again for helping us improve the manuscript.

      Reviewer #2 (Public review):

      Summary:

      This work advances our understanding of how TFIIH coordinates DNA melting and CTD phosphorylation during transcription initiation. The finding that untethered kinase activity becomes "unfocused," phosphorylating the CTD at ser5 throughout the coding sequence rather than being promoter-restricted, suggests that the TFIIH Core-Kinase linkage not only targets the kinase to promoters but also constrains its activity in a spatial and temporal manner.

      Strengths:

      The experiments presented are straightforward and the model for coupling initiation and CTD phosphorylation and for evolution of these linked processes are interesting and novel. The results have important implications for the regulation of initiation and CTD phosphorylation.

      Comments on revisions:

      The revised version with revisions to figures, text and new data has addressed all of our prior comments.

      We thank the reviewer for helping us improve the paper.

      Reviewer #3 (Public review):

      Summary:

      Eukaryotic gene transcription requires a large assemblage of protein complexes that govern the molecular events required for RNA Polymerase II to produce mRNAs. One of these complexes, TFIIH, comprises two modules, one of which promotes DNA unwinding at promoters, while the other contains a kinase (Kin28 in yeast) that phosphorylates the repeated motif at the C-terminal domain (CTD) of the largest subunit of Pol II. Kin28 phosphorylation of Ser5 in the YSPTSPS motif of the CTD is normally highly localized at promoter regions, and marks the beginning of a cycle of phosphorylation events and accompanying protein association with the CTD during the transition from initiation to elongation.

      The two modules of TFIIH are linked by Tfb3. Tfb3 consists of two globular regions, an N-terminal domain that contacts the Core module of TFIIH and a C-terminal domain that contacts the kinase module, connected by a linker. In this paper, Giordano et al. test the role of Tfb3 as a connector between the two modules of TFIIH in yeast. They show that while no or very slow growth occurs if only the C-terminal or N-terminal region of Tfb3 is present, near normal growth is observed when the two unlinked regions are expressed. Consistent with this result, the separate domains are shown to interact with the two distinct TFIIH modules. ChIP experiments show that the Core module of TFIIH maintains its localization at gene promoters when the Tfb3 domains are separated, while localization of the kinase module, and of Ser5 phosphorylation on the CTD of Pol II, is disrupted. Finally, the authors examine the effect of separating the Tfb3 domains on another function of TFIIH, namely nucleotide excision repair, and find little or no effect when only the N-terminal region of Tfb3 or the two unlinked domains are present.

      Strengths:

      Experiments involving expression of Tfb3 domains in yeast are well-controlled and the data regarding viability, interaction of the separate Tfb3 domains with TFIIH modules, genome-wide localization of the TFIIH modules and of phosphorylated Ser5 CTDs, and of effects on NER, are convincing. The experiments are consistent with current models of TFIIH structure and function and support a model in which Tfb3 tethers the kinase module of TFIIH close to initiation sites to prevent its promiscuous action on elongating Pol II.

      We appreciate that the reviewer finds that our main conclusions are convincing.

      Weaknesses:

      The work is limited in scope and does not provide major insights into the mechanism of transcription. The main addition to current models of transcription is that tethering of Kin28 to Tfb3 may limit kinase action from occurring downstream from the initiation site.

      The first described experiment, which purports to show that three kinases cannot function in place of Kin28 when tethered (by fusion) to Tfb3 is missing the crucial control of showing that Kin28 can support viability in the same context. This result also does not connect with the rest of the manuscript, although the experiment apparently motivated the subsequent studies reported here.

      We elected not to do this control experiment for several reasons. As reviewer 3 points out, this kinase fusion experiment turned out to be somewhat disconnected from the rest of the paper. Even though it didn’t work, we included it in the paper because the results led us to the realization that the Tfb3 C-term was actually not fully essential for viability as reported, which in turn led us to the idea of splitting Tfb3. Structural studies (https://doi.org/10.1126/sciadv.abd4420, https://doi.org/10.1073/pnas.2009627117, https://doi.org/10.7554/eLife.44771) show that, in addition to providing linkage to the core module, the C-term of Tfb3 induces a conformation change in Kin28/Cdk7 necessary for full kinase activity (which is likely why the strains without C-term are just barely viable). If we were to pursue why the fusions didn’t work, we could tether Kin28 directly to the Tfb3 linker (and may try this in the future), but then would need to also express the C-term separately for its activating function. Even then, this would be an imperfect control for the fusion experiments in Figure 1. Because were trying to best mimic Kin28 being tethered via the accessory subunit Tfb3/Mat1, in the Figure 1 experiment we did not directly attach the kinases to Tfb3. For Ctk1/Cdk12, we fused the Tfb3 linker to the Ctk3 accessory subunit (analogous to Tfb3), and for Bur1/Cdk9, we fused to the cyclin subunit Bur2 (there is no known third subunit in this complex). The one exception was Mpk1, which has no partner subunits and is not a CDK. There are many reasons why this high-risk protein fusion experiment may not have worked, but chose not to pursue it further at this time.

      Finally, the authors present the interesting and reasonable speculation that the TFIIH complex and connecting Tfb3 found in mammals and yeast may have evolved from an earlier state in which the two TFIIH subdomains were present as unconnected, distinct enzymes. It will be interesting to have this idea tested more thoroughly as more molecular evolutionary data becomes available.

      Comments on revisions:

      For the most part, the authors have satisfactorily addressed my previous critique. In particular, they have added to their discussion of evolutionary implications, and performed an experiment casting doubt on the assertion of a dominant negative effect, and as a consequence removed this claim from the manuscript. I also pointed out that the fusion experiments that lead off the Results section are missing the crucial control of including a Tfb3-Kin28 fusion. The authors have elected not to perform this control experiment, pointing out that even this control would be imperfect in some respects, and agreeing that this experiment is somewhat disconnected from the rest of the paper. The reason for including it, in spite of its somewhat tangential nature, is that it provides something of a rationale for the experiments that follow. I don't so much mind their retaining the experiment, as the absence of this control (and indeed, the results) does not so much impact the later results. However, I think if it is to be included, this shortcoming should be explicitly recognized, especially as a service to younger scientists who could benefit from an exposition that includes a thorough consideration of potential control experimenents.

      We thank the reviewer for helping us improve the paper.

    1. eLife Assessment

      This manuscript reports a high-quality genome assembly of the European cuttlefish, Sepia officinalis, a representative species of the Cephalopod lineage. This solid work relies on current best practices in genome sequencing and assembly, combining PacBio HiFi long reads and Hi-C chromatin conformation capture, and on state-of-the-art comparative genomic analyses, including chromosome number evolution and analyses of expanded gene families. The resulting genome will be a valuable resource for researchers interested in cuttlefish biology and comparative genomics in general.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have carefully considered all the reviewers' comments. The newly added analyses, figures, and text sections are of high quality, and we commend the authors for their in-depth revision of the manuscript.]

      This manuscript presents a high-quality, chromosome-level genome assembly of the European cuttlefish (Sepia officinalis), a representative species of the cephalopod lineage. Using state-of-the-art sequencing and scaffolding technologies -including PacBio HiFi long reads and Hi-C chromatin conformation capture - the authors deliver a genome assembly with exceptional contiguity and completeness, as evidenced by high BUSCO scores. This genome resource fills a significant gap in cephalopod genomics and offers a valuable foundation for studies in neurobiology, behavior, and evolutionary biology. However, there are several major aspects that need to be strengthened.

    3. Reviewer #2 (Public review):

      This paper concerns an interesting organism, Sepia officinalis. However, in the opinion of this reviewer, the paper reads somewhat like a genome report. The authors have used 23x PacBio HiFi in conjunction with relatively low coverage (11x) Hi-C to scaffold the genome into a karyotype of 47 chromosomes. They have used a combination of short and long read RNA seq to annotate the genome in what looks like a very good annotation. The paper offers basic analyses of the Busco evaluation, some descriptive analyses of gene family and repeat content, and a bit more focused analysis on synteny among sequenced squids. Generally, the data will be useful.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, authors Simone Rencken and co-authors present and investigate the genome of the common cuttlefish Sepia officinalis.

      Strengths:

      The authors explain in a detailed yet concise manner the main steps for a genome assembly, with very robust methods for validation, and according to current best practices. In addition to the chromosomal assembly, the authors confirmed the presence of 47 chromosomes using Hi-C data and multiple species synteny. They also generated a comprehensive gene annotation, with assessments of gene completeness, providing a useful resource for the community of researchers interested in cuttlefish biology and comparative genomics.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1 (Public review):

      Summary:

      This manuscript presents a high-quality, chromosome-level genome assembly of the European cuttlefish (Sepia officinalis), a representative species of the cephalopod lineage. Using state-of-the-art sequencing and scaffolding technologies -including PacBio HiFi long reads and Hi-C chromatin conformation capture - the authors deliver a genome assembly with exceptional contiguity and completeness, as evidenced by high BUSCO scores. This genome resource fills a significant gap in cephalopod genomics and offers a valuable foundation for studies in neurobiology, behavior, and evolutionary biology. However, there are several major aspects that need to be strengthened.

      Major Revisions Recommended:

      (1) Single-individual genome limitation

      The genome assembly is based on a single individual, which appears to be male. While this approach is common in genome projects, it does not capture the full genetic diversity of the species. As S. officinalis exhibits a wide geographical range and possible population structure, future efforts (or discussion in this manuscript) should consider re-sequencing multiple individuals - of both sexes and from diverse geographic origins - to characterize population-level variation, sex-linked features, and structural polymorphisms.

      We thank the reviewer for this summary and the important point raised. While sequencing additional individuals, unfortunately, lies outside the scope of our study, we used the published data from the DToL assembly (from a male individual from a different geographical origin) to begin to investigate their differences.

      First, we attempted to create a mixed assembly from both datasets, as also suggested by Reviewer 2, to increase data coverage and genetic information. Even though the heterozygosity estimate is quite low (ca. 1%), the mixed assembly produced severely inflated and fragmented results, yielding an assembly ca. 3× larger than expected, with the top 46 contigs covering only ~5% of the total length - a sign of over duplication and failed haplotype collapse.

      This result is not surprising when considering the assembly algorithms: most programs, including hifiasm used in this study, assume a single diploid individual (or a trio assembly including data from both parents), so using multiple individuals breaks this assumption. Assembly pipelines infer homozygous/heterozygous coverage cutoffs from the k-mer histogram. Mixing individuals raises apparent heterozygosity far above true diploid levels, turning the expected bimodal k-mer profile into a complex multimodal distribution. This misleads the phasing and purging steps in the assembly pipeline, causing over-expansion and fragmentation of the assembly.

      Second, we created separate assemblies from the raw data sets of MPIBR and DToL using the exact same pipeline and parameters to avoid the technical problem described above. These assemblies are directly comparable, and after aligning them, it is possible to build a pangenome graph that we believe would help to address the points raised by the reviewer. Pangenome graphs can represent cross-individual variation more accurately and improve read alignment in regions of high genomic variation, which can aid population-level analyses [1]. We agree on the importance of this work, yet collecting data from more individuals and the construction and analysis of a pangenome graph lies beyond the scope of this manuscript and should be part of future efforts by the cephalopod genomics field.

      (2) Limited experimental validation of chromosomal inferences

      The study reports chromosome-scale scaffolding using Hi-C data and proposes a revised karyotype for S. officinalis. However, these inferences would be significantly strengthened by orthogonal validation methods. In particular, fluorescence in situ hybridization (FISH) or karyotyping from cytogenetic preparations would provide direct confirmation of chromosome number and structural arrangements. The reliance solely on Hi-C contact maps for inferring chromosomal organization should be acknowledged as a limitation or supplemented with such validations.

      We appreciate the reviewer’s point regarding the value of orthogonal validation methods to support the chromosome-scale scaffolding and proposed karyotype. We acknowledge that relying solely on Hi-C contact maps to infer chromosome number and structure presents limitations, as also becomes apparent in our detailed analysis of both S. officinalis genome assemblies (in Figure 2 and Supplementary Figure 3 of the revised manuscript). We attempted to complement these analyses with cytogenetic approaches. Unfortunately, the availability of suitable mitotic tissue was limited. Moreover, our karyotyping trials proved challenging: resolving the ≥92 (2n) chromosomes in situ was not feasible due to their high number and the small size of the nuclei (approximately 5 µm in diameter on average).

      We now highlight this point as an important direction for future work in our discussion (line 456-466):

      “Additional methods such as cytogenetic karyotyping or optical mapping such as BioNano [141] (imaging of fluorescently tagged, linearized DNA) could be used to validate chromosome numbers. However, whereas karyotypes of octopuses have been consistent throughout the literature (1n=30) [142,143], those measured in decapods vary greatly. For example, 1n=46 chromosomes have been reported for two species of cuttlefish (A. esculentum and A. lycidas) and three loliginid squids [85]; 1n=36 has been reported for A. Arabica [86] and 1n=24 in A. pharaonis [87]. In S. officinalis, a karyotype of 1n=52 is reported for testis samples [88]. Combining cytogenetic preparations with fluorescent labeling of centromeric or telomeric sequences, as demonstrated in the octopus A. aerolatus [143] could help resolve these issues. Establishing a routine staining protocol would enable comprehensive tests at the species- and population-level.”

      (3) Shallow discussion of chromosomal evolution

      The manuscript briefly mentions chromosomal number differences among cephalopods but does not explore their evolutionary or functional implications. A more thorough comparative analysis - linking chromosomal rearrangements (e.g., fusions, fissions) with ecological adaptation, life history, or neural complexity - would greatly enhance the impact of the findings. Referencing chromosomal dynamics in related taxa and possible links to behavioral innovations would contextualize these results more effectively.

      We agree with the reviewer that this is a fascinating topic of research that demands further attention and have extended our discussion, which now reads (line 476-501):

      “In addition to studying chromosomal topology in phylogenetic reconstructions, some of the most interesting aspects of these rearrangements relate to changes of and innovation in regulatory elements that underlie phenotypic diversity. In coleoid cephalopods, it is thought that an ancient large-scale genome rearrangement was combined with lineage-specific changes and repeat expansions [48–50]. This restructuring gave rise to hundreds of tightly linked, evolutionarily unique microsyntenies, corresponding to distinct topological compartments with specialized regulatory architectures that contribute to complex, tissue-specific expression patterns in the nervous system and elsewhere [43]. Extending this, chromosomal conformation analyses in E. scolopes revealed that co-regulated eye and light-organ genes cluster at topologically associating domain (TAD) boundaries, and that an evolutionarily recent rearrangement at the dachshund (DAC) locus may have been instrumental in the emergence of the symbiotic light organ in Euprymna - directly linking specific chromosomal topology to morphological innovation [44].

      To understand the broader functional impact of these changes across coleoids, a recent study investigating Micro-C, RNA-seq, and ATAC-seq data from multiple species revealed broadly conserved chromatin domains, but also many lineage-specific chromatin loops that form novel regulatory signatures and impact expression profiles across species and tissues [149].

      Despite the observed small-scale regulatory changes, the chromosomes of decapods are considered to be more closely related to the ancestral coleoid karyotype than those of octopods. The derived octopod karyotype becomes apparent when comparing it to the genome of the vampire squid, an early-branching octopodiform (sister to all octopods) which retained features of the decapod, ancestral karyotype [150]. Taken together, the conserved karyotype of decapods accommodates fine-scale regulatory diversity that might underlie morphological diversity among species, which suggests that many regulatory innovations are still being evolutionarily explored through rearrangements within the existing chromosomes.”

      (4) Underdeveloped gene family and pathway analysis

      While the authors identify expansions in gene families such as protocadherins and C2H2 zinc finger transcription factors, the functional significance of these expansions remains speculative. The manuscript would benefit from:

      (a) Functional enrichment analyses (e.g., GO, KEGG) targeting these gene families.

      (b) Expression profiling across tissues or developmental stages to infer regulatory roles.

      (c) Comparison with expression or expansion patterns in other cephalopods with known behavioral complexity (e.g., Octopus bimaculoides, Euprymna scolopes).

      (d) Potential integration of transcriptomic or epigenomic data to support regulatory hypotheses.

      We thank the reviewer for these constructive suggestions and have substantially expanded the functional characterization of expanded gene families in the revised manuscript.

      To address points a) + b), we performed GO enrichment analyses for all expanded gene families (orthogroups), both for the largest gene families and the most significantly expanded families identified from our CAFE5 analysis. Further, we cross-referenced all S. officinalis members of each expanded orthogroup against differentially expressed genes in our bulk RNA-seq data from multiple tissues (initially collected to improve the gene modeling), allowing us to infer tissue-specific expression patterns for the expanded families.

      To address point (c), the species-resolved copy-number profiles from our orthogroup analysis directly situate the S. officinalis expansions within the broader coleoid context, including O. bimaculoides, O. vulgaris, E. scolopes, and D. pealeii, enabling direct comparison of expansion scale and lineage specificity across species with varying degrees of behavioural complexity. We note that the C2H2 zinc finger and protocadherin expansions show distinct phylogenetic profiles consistent with independent radiations in octopods and decapodiforms, in agreement with recent studies.

      Regarding point (d), no epigenomic data for S. officinalis was publicly available at the time of writing, thus we focused on the transcriptomic data from this study, as described above.

      We describe this analysis in two additional results paragraphs to the manuscript, one modified (Figure 4) and two new figures (Figure 5 and Supplementary Figure 7), which are reproduced (lines 294-400):

      “Analysis of expanded gene families

      We sought to investigate the S. officinalis gene annotation and place it in the context of gene repertoires from other cephalopod or molluscan species. First, we collected available genome annotations from 12 other molluscan species (Table 2) and clustered them using OrthoFinder v.3.1.0 [122], resulting in 23,658 orthogroups, hereafter named gene families.

      First, we investigated 36 of the gene families that contain more than 100 genes in any of the species, with 17 of these families containing at least one gene of S. officinalis, that reflect large-scale gene family expansions (Figure 4E). We used the InterProScan and eggNOG-mapper annotations to infer functional roles of these genes, selecting the most common gene annotation as the name of the gene family.

      The zinc finger C2H2-type transcription factors (TFs) were grouped into three of the large gene families, with the largest family (OG0000000) only present in decapod cephalopods. This likely reflects the largely independent expansions in the octopod and decapod lineages that date back to a burst of transposon activity ca. 25 million years ago [46,48,49]. The largest expansion across mollusks occurs in the cadherin-like family (OG0000001): 310 in S. officinalis, 283 in D. pealeii, 209 in A. lycidas, 102 in O. vulgaris, 55 in O. bimaculoides, with low but non-zero counts in bivalves (C. virginica, M. gigas). This profile is consistent with the protocadherin expansion first described in O. bimaculoides [46] and subsequently shown to be present across cephalopods [48,49,123].

      HPGDS (OG0000005, hematopoietic prostaglandin D synthase) is a glutathione-S-transferase family member that catalyzes the conversion of prostaglandins, which have well-described roles in immune responses in vertebrates and insects [124,125]. This family shows a broad expansion in decapods, with a lesser expansion in octopods. Additionally, members of the glutathione-S-transferase families have been co-opted as S-crystallins, structural proteins found in the lens of cephalopods that may, or may not, retain enzymatic functions [126,127].

      Two large families are mostly lineage-restricted. The RING-type zinc finger family (OG0000058) has 103 copies in S. officinalis and 26 in A. lycidas but is absent in all other species except for E. scolopes. Conversely, OG0000002 (unknown function) has 479 copies in E. scolopes and only a few copies in the other species. This interesting Sepiolid-specific expansion warrants further characterization.

      We estimated gene family evolution rates using CAFE5 [128] for all families with less than 100 copies in any species (this excludes the families described above, as very large copy-number differences between species preclude likelihood calculations under the applied birth-death model). After comparing different model parameters, we chose a gamma model with three rate categories, allowing for evolutionary rate variation among gene families. Out of the 12,895 gene families analyzed, 1,813 showed a significant (p < 0.05) expansion or contraction in at least one of the species. We focused our analysis on the 30 most significantly expanded families; among them were several retrotransposon-associated domains that have expanded specifically in S. officinalis five families carrying Retrovirus-related Pol polyprotein domains, two Reverse transcriptase domain families, and four Ribonuclease H-like families (Supplementary Figure 7A). There was no coordinate-based overlap of the coding sequences with annotated TEs from the RepeatMasker output (Methods).

      In addition to the three large gene families of C2H2 zinc finger expansions, 45 gene families containing this TF type showed a significant change in the CAFE5 analysis. Notably, eight of the significant gene families, as well as four of the largest gene families, were annotated as CCHC-type zinc fingers, which contain a “zinc knuckle” motif that is characteristic of retroviral nucleocapsid proteins [129] and is functionally integrated in the genomes of several species, including humans [130].

      Some gene families without any relationship to retrotransposons were also expanded. For example, the UGT2A1-related family is a UDP-glucuronosyltransferase, a class of enzymes central to phase II detoxification and conjugation of metabolites, reported in other mollusks in the context of environmental chemical tolerance [131], and in insects in the context of pigmentation [132]. We also detected a family of homeodomain-like proteins, representing an expansion of this important TF family.

      Tissue-specific expression of expanded gene families

      To place the identified gene families in a functional context, we profiled their expression in the bulk RNA-seq data (taken from multiple tissues of S. officinalis) used originally for gene modeling (Figure 5A). Principal component analysis (PCA) revealed the largest axis of variation in gene expression to separate brain tissues from peripheral tissues, with skin being the most transcriptomically distinct (Figure 5A), consistent with the high number of tissue-specific differentially expressed (DE) genes identified in non-neural tissues (Figure 5B). We identified the genes belonging to expanded families that were differentially expressed across tissues and enriched gene ontology [133,134] (GO) terms for them to gain additional insight. The large families excluded from CAFE5 modelling and the significantly expanded families identified by CAFE5 were analyzed separately.

      Eleven of the largest gene families were expressed in our data (Figure 5C) and five had enriched GO terms (Figure 5D,E). Among them, the cadherin family showed brain-restricted expression and GO terms related to cell–cell adhesion and calcium binding, consistent with their role in neuronal connectivity and circuit formation [46,135]. Two C2H2 zinc finger gene families were expressed in the optic and vertical/subvertical lobes of the brain and in the skin, with GO terms related to DNA-binding, transcriptional regulation or development. The RING-type zinc finger family was expressed specifically in the skin, with GO terms including zinc binding and ubiquitin protein ligase activity, the canonical function of RING-domain E3 ligases [136]. Genes of the HPGDS/S-crystallin family were expressed in the brain (basal and optic lobes and posterior subesophageal mass) and skin, with GO terms related to glutathione metabolism, matching their described enzymatic function. We did not find expression in the retina, which is expected given that S-crystallins are expressed in lentigenic cells of the eye [42,137] and these cells were not included during sampling.

      Among the 30 most significantly expanded families examined (out of 1,813 total), expression was widespread (20/30) and tissue-specific differential expression was common (17/30), suggesting that a substantial proportion of expanded paralogs represent functional coding sequences with specialized spatial deployment (Supplementary Figure 7B). Ten of the retrotransposon-associated families were differentially expressed in the brain (optic and vertical/subvertical lobes) and skin, arguing against these loci being inactive repeat fragments and supporting their inclusion as transcribed gene models. Two significantly expanded families showed both differential expression and enriched GO terms (Supplementary Figure 7C). The first was the UGT2A1-related family, which had the largest number of differentially expressed genes overall, with expression concentrated in the skin, retina and posterior subesophageal mass of the brain. Enriched GO terms matched the described enzymatic function for this family, namely UDP-glycosyltransferase activity. The second gene family was the homeodomain-like family with enrichment for DNA binding terms consistent with their role as transcription factors, and was preferentially expressed in the vertical and subvertical brain lobes with weaker expression in other areas.

      Collectively, many differentially expressed genes from expanded families were restricted to specific tissues or brain subregions (Figure 5F and Supplementary Figure 7D), indicating that paralogs within an expanded family have adopted distinct spatial expression domains and possibly, specialized functions.”

      Reviewer 2 (Public review):

      Summary:

      This paper concerns an interesting organism, Sepia officinalis. However, in the opinion of this reviewer, the paper reads somewhat like a genome report. The authors have used 23x PacBio HiFi in conjunction with relatively low coverage (11x) Hi-C to scaffold the genome into a karyotype of 47 chromosomes. They have used a combination of short and long read RNA seq to annotate the genome in what looks like a very good annotation. The paper offers basic analyses of the Busco evaluation, some descriptive analyses of gene family and repeat content, and a bit more focused analysis on synteny among sequenced squids. Generally, the data will be useful.

      Strengths:

      This is a high-quality annotation, and the data ultimately will be useful to other researchers. I appreciate trying to understand what's happening between assemblies of S. officinalis.

      Weaknesses:

      I don't believe the data at hand makes a strong case for the argument of 47 chromosomes. This is my biggest sticking point with the paper, and it is for a few reasons:

      (1) The authors point to assembly differences between the DToL assembly and the one presented in the manuscript and seem to claim that DToL is incorrect. However, the DToL assembly (xcSepOffi3.1) is based on much deeper HiFi and HiC coverage than the one at hand (51x and 80+x respectively). There are many things to try here, including:

      (a) Downloading the DToL data and reassembling using a common pipeline.

      (b) Downsampling the DToL data to similar coverage as what the authors have achieved.

      (c) Combining your data and that of DToL for even deeper coverage (heterozygosity is low enough that I don't imagine this impeding things too badly).

      We thank the reviewer for these helpful suggestions and want to clarify that we did not seek to point out errors in the DToL assembly, but rather to investigate the unexpected discrepancies between the two assemblies. It is correct that the DToL data has a much higher coverage than our data. We followed the individual suggestions and incorporated them into the revised manuscript. We reproduce the relevant sections below, and provide additional information:

      (a) Downloading the DToL data and reassembling using a common pipeline.

      We downloaded the DToL data and reassembled it using a common pipeline, yielding the results listed in Author response table 1. The DToL assembly is more contiguous, which is mainly due to its higher HiFi coverage. It also receives slightly better BUSCO scores (computed using odb12 as recommended by Reviewer 3).

      Author response table 1.

      Full statistics of S. officinalis assemblies from two independent datasets, assembled using a common pipeline.

      The updated manuscript now reads (lines 146-159):

      “A chromosome-scale assembly for Sepia officinalis was released recently by the Wellcome Sanger Institute’s Darwin Tree of Life project [75] (DToL, GCA_964300435.1). That genome was assembled from a male individual using high coverage PacBio Sequel II (~51x) and Arima2 Hi-C (~80x) data, with a final assembly size of 5.8 Gb. The the haploid chromosome number was estimated to be 49. To compare both S. officinalis datasets directly, we downloaded the DToL data and created two new assemblies using the pipeline described above (hifiasm using PacBio HiFi and Hi-C data). The resulting assemblies were overall very similar, with the DToL assembly having a slightly higher contiguity (N50 length, see Table 1) and BUSCO completeness (Supplementary Figure 2A,B) due to their higher sequencing coverage.”

      To further compare the two datasets, we added a new Figure 2 to the revised manuscript and the following paragraph to the results (lines 160-169):

      “After scaffolding with YAHS, both datasets reached the previously identified chromosome numbers (1n=47 for MPIBR and 1n=49 for DToL, Figure 2A,B). To further investigate this surprising discrepancy, we aligned both assemblies using Winnowmap [89] to locate the differences between them (Figure 2C). We observed four “breakpoints” (BP) of chromosome scaffolds: one in the MPIBR assembly compared to DToL (BP1: DToL_5 = MPIBR_40+44) and three in the DToL assembly compared to MPIBR (BP2: DToL_31+40 = MPIBR_2, BP3: DToL_41+46 = MPIBR_6, BP4: DToL_44+45 = MPIBR_7). We also aligned the assemblies to the chromosome-scale genome of another cuttlefish Acanthosepion esculentum (1n=46, GCA_964036315.1). In this alignment, all four breakpoints were collinear with single A. esculentum chromosomes (Figure 2D).”

      (b) Downsampling the DToL data to similar coverage as what the authors have achieved.

      Instead of downsampling the DToL data, we decided to analyze the Hi-C and HiFi data for both assemblies, focusing on the four “breakpoints” between the assemblies and the A. esculentum genome that we described above. First, we performed a QC analysis of the Hi-C reads using pairtools [2], the result is visualized in Author response image 1. The percentage of valid Hi-C read pairs, i.e., cis pairs with insert distances of more than 1 kb and trans pairs, following the Dovetail genomics QC manual (https://dovetail-analysis.readthedocs.io/en/latest/whole_genome/qc.html). When Hi-C pairs were aligned to the primary contigs from hifiasm (as is used for scaffolding with YAHS), the DToL HiC data contains fewer valid read pairs (11.4%) than the MPIBR data (43.1%), possibly due to using a different tissue (eye vs. optic lobe) and HiC kit (Arima 2 vs. Dovetail OmniC) for the library preparation. Nonetheless, due to the much higher overall coverage, the amount of valid read pairs is still 2.35x higher for DToL (144,014,368 pairs) than for MPIBR (61,318,955 pairs). The higher trans fraction (i.e. HiC pairs across contigs) is dependent on the length of the primary contigs, so the higher trans fraction for the MPIBR data can be explained by the lower contiguity of its primary contigs. It is conceivable that for both assemblies, the low numbers of valid read pairs introduce a technical fragmentation of certain chromosomes, as indicated by the identified breakpoints (Figure 2).

      Author response image 1.

      Analysis of Hi-C read pairs from both S. officinalis assemblies. Hi-C reads were aligned to the primary contigs from hifiasm (as is used for scaffolding with YAHS) and analyzed using pairtools. Note the higher fraction of long-range contacts (at least 1 kb cis pairs or trans pairs) in the MPIBR data (top) compared to DToL (bottom). Due to overall higher coverage, the absolute number of read pairs is higher for DToL than for MPIBR data.

      Second, we performed a detailed analysis of read coverage along the breakpoint junctions of the discrepant chromosomes/scaffolds between both assemblies. We included a description of the results and a new Supplementary Figure 3 in the manuscript, (lines 171-207):

      “To better understand the potential cause of these divergent chromosome numbers, we analyzed the Hi-C and HiFi coverage in the breakpoint regions (Supplementary Figure 3A). First, we aligned the Hi-Fi reads to the scaffolds and extracted all alignments along the 200 kb terminal scaffold windows to find any notable drops in coverage, or reads spanning any of the scaffold junctions. We detected no spanning reads. This is not surprising given that no contigs were assembled at these sites, resulting in the observed scaffold junctions. More interestingly, we noted a ~5-fold decrease in HiFi coverage along the DToL scaffold_40 (part of BP2) relative to its flanking regions, indicating a highly repetitive, low-mappability region at this boundary.

      Next, we realigned the Hi-C data to the scaffolded assemblies using bwa-mem2 [91] and extracted all trans HiC pairs (between-scaffold contacts) using pairtools [92]. We normalized trans HiC contacts to the scaffold length and compared contact rates between breakpoint scaffolds to the baseline contact rate (computed from pairs of scaffolds with a clear 1-to-1 match between assemblies), and the contact rate within scaffolds (intra-scaffold pairs) (Supplementary Figure 3B,C). The contact rates within breakpoints were consistently lower than within scaffolds, likely falling below the threshold to be merged during assembly. However, the contact rates at three of four breakpoints (BP1, BP3, BP4) were significantly elevated above the genome-wide background distribution (empirical p = 0.010, 0.005, 0.005 respectively), suggesting that they may represent intra-chromosomal contacts disrupted by a misassembly. Notably, BP2 was not significant (empirical p = 0.170), likely due to the low coverage and mappability around the DToL scaffold_40 boundary. Considered jointly, the three DToL breakpoint scaffold pairs showed significantly higher trans contact rates than the background (Wilcoxon rank-sum, one-tailed, U = 1771, p = 0.004).

      Lastly, we analyzed the repeat landscape around the 200 kb scaffold ends using RepeatMasker [93] and the custom repeat library that we had generated for Sepia officinalis (described further below). Compared to control scaffolds of the same assembly, we observed consistently elevated repeat content at the breakpoint junctions (mean 71.5% vs 67.6% masked bases), with an enrichment of unclassified repeats (32.1% vs 30.0%), which could explain a repeat-driven assembly fragmentation or scaffolding failure. The BP2 DToL scaffold_40 junction window was 99.99% masked (99.2% unclassified repeats), providing a likely mechanistic explanation for both the HiFi coverage drop and the absence of a significant trans Hi-C signal at this breakpoint. Taken together, these analyses suggest that the different chromosome numbers across the two S. officinalis assemblies are due to technical reasons, caused by repeat-rich scaffold boundaries that impair HiFi and Hi-C read alignment and in turn, correct assembly in these regions.”

      (c) Combining your data and that of DToL for even deeper coverage (heterozygosity is low enough that I don't imagine this impeding things too badly).

      When combining the data to achieve a higher coverage, we ran into the assembly fragmentation issues detailed above in response 1) to Reviewer 1.

      (2) Looking at Figure 1, there appears to be a misjoin at chromosome 42. Looking carefully at Figure S1, that misjoin does not appear on any of the panels - this is confusing. Given the size of that chromosome and the authors' chromosome numbering, I'm guessing this is a manual merge (as it's larger than most of the chromosomes numerically close (40, 41, 43, etc). Further, staring closely at Figure 1, there appear to be cross-scaffold contacts between 42 and 43 and 42 and 44. Secondarily there are contacts between 43 and 44. This bit of the assembly seems potentially problematic.

      This is a great observation, indeed the HiC maps differ between Figure 1 and Figure S1. Figure 1 is the result of scaffolding with YAHS and manual curation, whereas Figure S1 was scaffolded using HapHiC. We updated the figure legend to clarify this important difference. HapHiC produces very clean contact maps without the need for manual curation, but when analyzed at a higher resolution, the tool broke many contigs and ultimately compromised the assembly quality, possibly due to our comparatively low HiC coverage. Thus, we preferred to use YAHS and manual curation, which is perhaps inherently error-prone, as becomes apparent in the regions of the assembly that are pointed out by the reviewer.

      Reviewer 3 (Public review):

      Summary:

      In this study, authors Simone Rencken and co-authors present and investigate the genome of the common cuttlefish Sepia officinalis.

      Strengths:

      The authors explain in a detailed yet concise manner the main steps for a genome assembly, with very robust methods for validation, and according to current best practices. In addition to the chromosomal assembly, the authors confirmed the presence of 47 chromosomes using Hi-C data and multiple species synteny. They also generated a comprehensive gene annotation, with assessments of gene completeness, providing a useful resource for the community of researchers interested in cuttlefish biology and comparative genomics.

      Weaknesses:

      While the study touches upon the subjects of gene content, TE activity, or species-level comparisons, the study does not provide in-depth investigations of these.

      We thank the reviewer for their positive assessment of our manuscript. We acknowledge the descriptive nature and limitations of our previous analyses of gene content, TE distribution, and species comparisons. Our focus for the initial submission was to provide a high-quality assembly that could serve as a resource for anyone interested in Sepia officinalis or related species. However, we agree that greater insight into genome content is valuable as well. In the revised manuscript, we included a more detailed analysis of expanded gene families and GO enrichment analysis of our bulkRNAseq data, which we summarized in response 4) to reviewer 1.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor Revisions Recommended:

      (1) Figure and legend clarity

      Several figures lack sufficient annotation. All figures, including supplementary ones, should include:

      (a) Clear axis labels.

      (b) Descriptions of statistical measures (n values, error bars, statistical tests).

      (c) Legends that allow the figure to be understood independently of the main text.

      We updated the figures accordingly.

      (2) Terminology and formatting

      (a) Consistency in gene and species nomenclature should be maintained throughout (e.g., italicizing gene names and Latin binomials).

      (b) Ensure that abbreviations (e.g., Hi-C, BUSCO, FISH) are defined upon first use.

      We updated the nomenclature throughout the text and checked the definition of abbreviations used in the text. Further, we updated the names of several cuttlefish species according to the recent revision of genera, e.g. Sepia esculenta was changed to Acanthosepion esculentum [3].

      (3) Literature coverage

      The references primarily focus on earlier studies from 2010-2020. It would strengthen the context to include recent high-impact studies on cephalopod genomics and chromosomal biology published in the last 3 years (e.g., 2022-2024).

      We apologize for this oversight and have extended the manuscript to discuss more of these recent studies.

      (4) Clarify methods

      While the methods section is generally detailed, some critical aspects are underspecified:

      (a) Parameters used in genome annotation tools (e.g., BRAKER, RepeatMasker).

      We thank the reviewer for bringing our attention to this shortcoming, and have added the missing parameters to the methods section. Additionally, the full code is available at https://gitlab.mpcdf.mpg.de/mpibr/laur/cuttlefishomics/soffgenome

      (b) Criteria for ortholog clustering and gene family expansion analysis.

      The details have been added to the methods section, which now reads (lines 828-853):

      “Orthogroups were inferred across 13 molluscan species (Table 2), including S. officinalis, using OrthoFinder v3.1.0 [122] with default parameters. The input proteomes included the longest protein isoform per gene for each species. The rooted species tree from OrthoFinder [182,184] was converted to an ultrametric tree using the R package ape [183] v5.8.1.

      Gene families were filtered by removing orthogroups present in only a single species, and by separating orthogroups containing 100 or more gene copies in any species, as extreme copy-number differences in gene families prevent likelihood calculation under the applied birth-death model.

      Gene family evolution rates were estimated using CAFE5 [128] v5.1.1 on the filtered orthogroups, using the ultrametric species tree as input. Four models were evaluated: the base model (single global lambda), and Gamma models with k = 2, 3, and 4 rate categories, which allow evolutionary rate variation among gene families. The Gamma k = 3 model was selected based on the best (lowest) final log-likelihood score. All subsequent statistical inferences were performed under this model.

      For families showing statistically significant expansion or contraction (p < 0.05 after Bonferroni correction), branch-specific copy-number changes were extracted from the CAFE5 output. Families were categorized as S. officinalis-specific, coleoid-specific, or broad expansions based on the distribution of significant changes across the phylogeny.

      To assess whether expanded gene families in S. officinalis contained genes derived from or embedded within repetitive elements, a coordinate-based overlap analysis was performed. For each gene in an expanded orthogroup, the overlap between its coding sequence (CDS) coordinates and RepeatMasker annotations was computed using bedtools intersect v2.30 [185]. To avoid double-counting when multiple repeat annotations overlapped the same coding bases, overlapping repeat intervals were merged per gene prior to summing covered bases, and the overlap fraction was computed as merged covered bases divided by total CDS length.”

      (c) Thresholds or cutoffs for synteny or duplication detection.

      We included the details in the updated methods (lines 755-781):

      “Synteny analyses between all chromosomes of the compared species were performed using the R package GENESPACE v.1.2.3 [175] with default parameters, described briefly below. Protein sequence similarity was first estimated using DIAMOND2 [109] in fast mode, and orthogroups and pairwise orthologues were inferred using OrthoFinder v2.5 [176] with hierarchical orthogroups (HOGs) enabled. Prior to synteny inference, tandem arrays were condensed to their most central representative gene, and gene rank order was recalculated on these array-representative genes to reduce confounding effects of tandem duplication on collinearity detection.

      Syntenic blocks were identified pairwise between all genome combinations using MCScanX [177], constrained to DIAMOND hits where both query and target genes belonged to the same orthogroup (onlyOgAnchors = TRUE). Initial anchor hits were clustered into large syntenic regions using a density-based spatial clustering approach (dbscan [178]), with a minimum block size of five anchor genes (blkSize = 5) and a maximum of five intervening non-anchor genes permitted within a block (nGaps = 5). Anchor clustering used a search radius of 25 gene-rank positions (blkRadius = 25). All hits falling within a syntenic buffer of 100 gene-rank positions around confirmed block anchors (synBuff = 100) were retained as syntenic. No secondary syntenic hits were included (nSecondaryHits = 0). Syntenic orthogroups were integrated across all pairwise comparisons and collapsed into a pan-genome annotation anchored to. S. officinalis was used as the reference genome.

      Syntenic relationships were visualized as riparian plots and pairwise dotplots using the built-in plotting functions of GENESPACE v1.2.3. Riparian plots were constructed using physical chromosomal coordinates (useOrder = FALSE) with S. officinalis as the reference, displaying all three genomes. A second riparian plot was generated highlighting a region of interest. Pairwise dotplots were produced species for the S. officinalisD. pealeii and S. officinalisE. scolopes genome comparisons, displaying only synteny-validated hits (type = "syntenic") with a minimum synteny score of 10 (minScore = 10) and a minimum of 10 genes per chromosome pair required for display (minGenes2plot = 10).”

      Reviewer #2 (Recommendations for the authors):

      Line 153 should be supplemental Figure 3B.

      The text was referring to the correct Figure 2B (three species synteny comparison). It is now updated to Figure 3B in the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) L37: Perhaps add a comparison with other species (mammals, Drosophila, etc.) to put this number in context.

      We agree with this recommendation and added numbers for Drosophila and mouse to the text (lines 40-45):

      “Coleoid cephalopods (octopus, squid, cuttlefish) are a highly derived group of mollusks, characterized by the largest nervous systems among all invertebrates (ca. 500 million neurons in an adult octopus of which 200 million are in the central brain [1,2], compared to ca. 140,000 in the fruit fly [3] or 70 million in the mouse [4]) and specializations with a great historical importance for neuroscience (e.g., “giant axons” [5] and “giant synapses” [6–8]).”

      (2) L51, 279: "Octopodiformes" is a superorder, not a genus or a species name. It should not go in italics.

      We updated this throughout the text.

      (3) L53: "even smaller" seems odd here, because the argument of the sentence is to stress the large genome size of Octopodiformes. Perhaps start the sentence by stating that it is sometimes smaller, but often larger.

      We rephrased the sentence for clarity, it now reads (lines 55-58):

      “While the genomes of Octopodiformes (Octopus, Eledone, Argonauta) are either smaller than (1.1 Gigabases or Gb [45]) or comparable in size to that of humans (around 3 Gb [46,47]) the typical genomes of Decapodiformes (squids and cuttlefish) often reach 6 Gb [48,49].”

      (4) L90: What tool was used to estimate the k-mer distribution of the long reads? Jellyfish? FastK? It's not mentioned anywhere in the text.

      (5) L95: What k-mer size did the authors use to estimate k-mer distribution?

      We thank the reviewer for pointing out this missing information, and have included the details in the methods (lines 692-694):

      “The k-mer distribution was estimated using Meryl [165] within the Merfin [166] package with a k-mer size of 21, and genomeGenome size was estimated using GenomeScope [77] from Illumina short reads and PacBio HiFi data.”

      (6) L99: What about using the most recent BUSCO databases? odb12?

      We thank the reviewer for this question, which prompted us to compute BUSCO scores using the more recent odb12 database. The results are shown in Supplementary Figure 2C. Both gene sets have been refined by including more species and using a more stringent filtering approach, so the more recent database contains fewer and more conserved genes [4]. For the mollusca gene sets, a great improvement in completeness was observed between odb10 and odb12 (Supplementary Figure 2C); the metazoan completeness was marginally increased. Therefore, we evaluated all new assemblies produced since the first submission with the odb12 database.

      (7) L107: How many scaffolds were obtained in total? After manual curation, how many of the scaffolds were placed in the "correct" chromosomes? How many scaffolds were in the shrapnel? Were these scaffolds mostly repetitive regions? Or did they contain important genetic information?

      These are important questions. To evaluate the content of the “shrapnel”, we split the manually curated assembly into the 47 chromosomes and the 1840 residual scaffolds, and computed BUSCO scores for both. While the 47 chromosome scaffolds contain the majority of conserved genes: C:92.9%[S:92.7%,D:0.1%],F:4.0%,M:3.1% with metazoa_odb12 and C:88.7%[S:88.0%,D:0.7%],F:4.4%,M:6.9% with mollusca_odb12, the unplaced scaffolds still contain a few BUSCOs: C:2.5%[S:2.4%,D:0.1%],F:2.4%,M:95.1% from metazoa_odb12 and C:1.9%[S:1.7%,D:0.2%],F:1.2%,M:96.9% from mollusca_odb12. Even if only a few BUSCOs are present on these scaffolds, it means they contain important genetic information. Additionally, we observed low, but non-zero alignment of RNA reads to these scaffolds. We observed a slightly elevated repeat content in the unplaced scaffolds (Author response image 2), and a variable base composition (Figure 1C) compared to the chromosome scaffolds.

      Author response image 2.

      Quantification of repeat content in chromosome scaffolds and unplaced residual scaffolds. Density plot showing fraction of repeat masked bases in total sequence length for chromosome scaffolds (i.e. scaffolds 1-47) in teal and all remaining small scaffolds (1840 scaffolds) in purple. Median repeat fraction is shown as vertical lines.

      The slightly elevated repeat content in the unplaced scaffolds provides a likely explanation for their fragmented state: repeat-rich regions are inherently difficult to assemble and scaffold, as repetitive sequences cause ambiguous read alignments that prevent contigs from being confidently joined or anchored to chromosomal scaffolds during HiC-based scaffolding. This is consistent with the near-complete absence of BUSCO genes from the unplaced scaffolds - not because these fragments lack biologically relevant sequence entirely, as evidenced by the residual BUSCO hits and RNA read alignments, but because the gene-rich portions of the genome are largely captured in the 47 chromosome scaffolds. The unplaced scaffolds instead likely represent fragmented contigs from repetitive or low-complexity genomic regions, such as centromeres, telomeres, and transposable element clusters, where assembly graph complexity and collapsed repeats prevent confident placement. The variable base composition further supports this interpretation, as GC-extreme or low-complexity sequences are disproportionately represented in assembly shrapnel. Together, these observations suggest that the unplaced scaffolds contain limited unique coding content but reflect genuine repeat-rich genomic sequence that cannot currently be placed without additional long-range information, such as optical mapping or ultra-long reads.

      (8) L33, 53, 240, 255, 279: Decapodiformes, not in italics.

      We changed this throughout the text.

      (9) L228: Can you put this expansion in perspective with other taxa?

      We added a more detailed comparison of our gene family expansion with different species to the revised manuscript, as detailed in response 4 to reviewer 1.

      (10) L251: "However, our results show how difficult it still is to assemble large genomes with high karyotype numbers." Can you clarify how your results show this, because it is equally spectacular to assemble the karyotype with only PacBio and Hi-C data (and no linkage mapping).

      Indeed, it is correct that the recent improvements in data quality and scaffolding algorithms enable these “spectacular” chromosome-scale assemblies without the need for linkage mapping. This sentence reflected our expectation to resolve a clear karyotype as has been demonstrated for multiple cephalopod genomes in recent years, including two cuttlefish species (Octopus bimaculoides, Octopus vulgaris, Euprymna scolopes, Euprymna berryi, Acanthosepion lycidas and Acanthosepion esculenta). To our knowledge, none of these publications used linkage mapping or cytogenetic methods to confirm the karyotype. In this light, our resulting chromosome number and the discrepancy to a second assembly of the same species led us to this conclusion. We updated the section in the revised discussion as follows (lines 466-473):

      “Taken together, our results illustrate the difficulty of assembling large genomes with high repeat content and large karyotypes, at least from sequencing data alone. Internal validation methods and genome comparisons across species are therefore important. Convergence of reliable estimates will, in turn, help identify chromosomal fusion-with-mixing events (FWM; fusion of two ancestral chromosomes followed by extensive shuffling of their gene content) that are clade specific. Early branching order in Decapodiformes has been notoriously unstable [53,84,94,144–147]; thus, such rare and irreversible FWM characters could be useful in further phylogenetic analysis of this clade [51,148].”

      (11) L419: Why use the phased haplotype 1 instead of the primary assembly generated by hifiasm?

      We thank the reviewer for this important question. We used the phased haplotype assembly because it provides a biologically coherent representation with the least amount of duplication by avoiding allele-collapsing and haplotype-switching that can be present in the primary assembly. We reasoned that this would result in clearer gene models and a more accurate representation of structural variation. However, we acknowledge that this comes at the cost of reduced contiguity and completeness, as becomes apparent in our BUSCO comparison shown in Supplementary Figure 2, where the phased haplotypes have fewer duplicated genes than the primary assembly, but more missing genes in turn. When reassembling both datasets for our comparison, we used the primary assembly to use the longest contigs as input for scaffolding.

      (12) L444: It is unclear from what tissues and life stages RNA-seq data were used or were available from other species.

      This is an important detail. RNA-seq data was collected from two adult Sepia officinalis, from various tissues (whole brain, retina, skin, mantle, arm, tentacle). For the long-read PacBio Isoseq data, tissue was taken from the animal used for genome sequencing (6 months old), and tissue for short-read Illumina RNA-seq was taken from another adult (8 months old). The data have been released on SRA (study accession SRP570862), where all sample details are listed as well. We added the SRA accession to the data availability section of the revised manuscript. We clarified the relevant sections in the methods:

      lines 628-629:

      “RNA was isolated from various flash-frozen tissues (different brain areas, mantle/epidermis, arm/tentacle; 5-10 mg each).”

      lines 678-680:

      “For short-read RNA sequencing, tissue from another animal (8-month-old adult, F0 from eggs collected in Normandie, France) was used. RNA was isolated from various flash-frozen tissues (different brain areas, skin and retina; 5 mg each).”

      (13) L454, 469: Why is minimap2 in italics? It wasn't formatted like this before. Same for StringTie.

      We thank the reviewer for their detailed methods review. In the updated methods section, all formatting of used softwares was harmonized.

      (14) L461: Lophotrochozoa is a clade, not a genus or species. Not in italics.

      This is now changed throughout the revised manuscript.

      (15) Figure 1D: Axes labels are hard to read.

      We have now increased the axis label size.

      (16) Figure 2: Consider increasing font sizes. Many chromosome orientations seem to be flipped across species, which makes it harder to see smaller-scale rearrangements or notice less conserved chromosomes. Would it make sense to standardize these?

      We increased the font sizes and plotted only fully collinear syntenic blocks (instead of aggregated syntenic regions, the default of GENESPACE) for improved readability.

      References:

      Below are references cited in our responses. References from the reproduced manuscript sections are included in the revised manuscript.

      (1) Secomandi, S., Gallo, G.R., Rossi, R., Rodríguez Fernandes, C., Jarvis, E.D., Bonisoli-Alquati, A., Gianfranceschi, L., and Formenti, G. (2025). Pangenome graphs and their applications in biodiversity genomics. Nat. Genet. 57, 13–26. https://doi.org/10.1038/s41588-024-02029-6.

      (2) Open2C, Abdennur, N., Fudenberg, G., Flyamer, I.M., Galitsyna, A.A., Goloborodko, A., Imakaev, M., and Venev, S.V. (2023). Pairtools: from sequencing data to chromosome contacts. Preprint at bioRxiv, https://doi.org/10.1101/2023.02.13.528389 https://doi.org/10.1101/2023.02.13.528389.

      (3) Lupše, N., Reid, A., Taite, M., Kubodera, T., and Allcock, A.L. (2023). Cuttlefishes (Cephalopoda, Sepiidae): the bare bones—an hypothesis of relationships. Mar. Biol. 170, 93. https://doi.org/10.1007/s00227-023-04195-3.

      (4) Tegenfeldt, F., Kuznetsov, D., Manni, M., Berkeley, M., Zdobnov, E.M., and Kriventseva, E.V. (2025). OrthoDB and BUSCO update: annotation of orthologs with wider sampling of genomes. Nucleic Acids Res. 53, D516–D522. https://doi.org/10.1093/nar/gkae987.

    1. eLife Assessment

      This study presents valuable evidence of sex differences in oxycodone relapse-related behavior alongside novel characterization of synaptic adaptations in the paraventricular thalamus - nucleus accumbens shell circuit. The authors show that females exhibit heightened cue-induced seeking after 14 days, but not 1 day, of abstinence, while both sexes display similar time-dependent strengthening of paraventricular thalamus - nucleus accumbens shell glutamatergic transmission. The revised manuscript strengthens the work through improved statistical analyses, clearer interpretation, and expanded integration with prior literature. The strength of evidence is solid. However, association among experiments is incomplete, as the sex-specific behavioral effect is not reflected in circuit-level plasticity, and no causal manipulations test pathway involvement in relapse. Future work could link these circuit adaptations to sex-specific relapse vulnerability.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Alonso-Caraballo et al, is a novel piece of work that examines the impact of oxycodone self-administration on neural plasticity within paraventricular thalamic (PVT) to nucleus accumbens shell (Shell) pathway - two regions shown to play a key role in cue-induced drug seeking on their own, and whether this plasticity varies based on abstinence period and biological sex.

      Strengths:

      The authors show using a clinically relevant long-access model of opioid self-administration promotes dependence and acute withdrawal in both male and female rats. During subsequent cue-induced relapse tests at 1 or 14-days following the conclusion of self-administration, data show that while both male and females demonstrate drug-seeking behavior at both time points, females show a further elevation in responding on day 14 versus day 1 that is not observed in the males. When accounting for past work showing elevations in drug seeking in males after 30 days, these data indicate that craving-induced relapse for opioids may develop faster and may be more pronounced in females compared to males.

      These behavioral findings were paralleled by use of ex vivo acute slice electrophysiology and circuit-specific ex vivo optogenetics to examine the impact of oxycodone self-administration on synaptic strength within the paraventricular thalamus (PVT) to nucleus accumbens shell (NAcSh) pathway(s). Data support a time-dependent but sex independent strengthening of glutamatergic signaling at PVT-to-NAcSh medium spiny neurons (MSNs) that is only present following a relapse test at 14 days post abstinence in males versus females, providing the first evidence that opioid self-administration and/or cue-induced drug-seeking augments this pathway. Using an extensive set of physiological measures, the authors show that this increased synaptic strength reflects a upregulation of presynaptic release probability. Further, this upregulation of excitatory signaling aligned temporally with an increase in MSN excitability, as assessed by increases in action potential firing frequency. Finally, the authors provide the first evidence that similar to other inputs to the NAcSh, PVT projections innervate both MSN as well as local interneurons, promoting a GABA-A specific feedforward inhibitory circuit. Interestingly, unlike direct excitatory inputs to MSNs, no changes were observed ostensibly within this feedforward circuit, highlighting a selective enhancement of excitatory drive and output of MSNs with protracted abstinence.

      Overall, these data highlight a potential role for heightened synaptic strength within the PVT-NAcSh pathway in cue-induced relapse behavior during protracted abstinence and identify a potential therapeutic target during abstinence to reduce relapse risk in abstaining individuals.

      Weaknesses:

      Overall, the experimental approach and data provided appear rigorous and support their overall conclusions and achieve their goal of understanding how opioid self-administration impacts synaptic strength within the PVT-NAcSh pathway. Although not undermining these data, there are a few potential weaknesses that reduce the impact of the work. For example, the inability to directly assess whether cue-induced drug-seeking is in fact augmented compared to daily intake during self-administration in the maintenance face only permits the authors to denote that reexposure to cues and the context is sufficient to promote active lever pressing without demonstrating whether seeking behavior is in fact elevated further during a cue test. This is notably understandable as drug available sessions were 6-hours versus a 1hour relapse test. Importantly, it is clearly demonstrated that drug seeking is higher on average in female mice after 14 days versus 1 day.

      With regard to interpretation of electrophysiology findings, the lack of inclusion of an abstinence only group does not permit interpretations to parse out whether observed increases in synaptic strength (or the lack of) reflect abstinence or an interaction between abstinence period and re-exposure to the operant chamber, as slices were taken 30-45 min post relapse test. While much literature has shown that drug induced adaptations in the NAc requires a post drug period for plasticity to measurably emerge, studies have also shown that re-exposure to heroin-associated cues following abstinence seemingly "reverses" increases in cell excitability in prelimbic-NAc pyramidal neurons (Kokane et al., 2023) and that depotentiation of morphine-induced increases in synaptic strength in the NAc shell can be depotentiated by drug re-exopsure -- an effect also observed with cocaine re-exposure (Madayag et al., 2019). Notably, the lack of effect at 14 but not 1 day supports the likelihood that the relapse test does not in fact influence the plasticity within the PVT-NAcSh circuit.

      While the lack of effect on AMPAR:NMDAR ratio and rectification indices do support the notion that enhanced EPSC amplitudes in input-output curves do not reflect a change in AMPAR subunit expression (i.e., increased GluA2-lacking receptors that exhibit inward rectification at depolarized potential) nor a change in postsynaptic sensitivity to glutamate, without direct assessment of AMPAR-specific and NMDAR-specific input-output curves, it doesn't definitively exclude the possibility that both AMPA and NMDA receptor currents are being upregulated, thus negating an observable change in postsynaptic strength.

      Overall, these findings provide novel insight into how the PVT-NAcSh pathway is altered by opioid self-administration and whether this is unique based on abstinence period and sex. Importantly, these were the primary objectives stated by the author. Data highlight a potential role for the observed adaptations in relapse behavior and identify a potential therapeutic target during abstinence to reduce relapse risk in abstaining individuals. However, it should be noted that no causal link is demonstrated without experiments to reduce/prevent relapse.

      Comments on revisions:

      The authors addressed previous concerns brought up, specifically by clarifying data interpretation as well as text modifications related to potential caveats of these interpretations. However, I recommend that the title be changed to not focus on sex differences to avoid misunderstanding. The authors should also address the lack of difference physiologically compared to the behavior as a caveat more clearly in the discussion (i.e. likely suggests this isn't the pathway driving the difference).

    3. Reviewer #2 (Public review):

      Summary:

      This is an interesting paper from Alonso-Caraballo and colleagues that examines the influence of opioid use, acute and prolonged abstinence, and sex on cue-induced relapse and paraventricular thalamus (PVT) to nucleus accumbens shell (NAcSh) medium spiny neurons circuit physiology. The study presents a valuable finding that following prolonged, but not acute abstinence from oxycodone self-administration, female rodents exhibit higher relapse rates to drug paired cues. Additionally, the study presents the useful finding that prolonged abstinence increased PVT-NAcSh MSN synaptic strength in both sexes, an effect that is likely due to presynaptic adaptations. While the evidence to support these two findings is solid, further experiments are required to determine the functional role of the PVT-NAcSh MSN circuit in relapse following prolonged oxycodone abstinence, and the mechanism underlying the heightened relapse vulnerability in females in this model of opioid use disorder.

      Strengths:

      The paper is interesting, well written and presented, and the experiments are well designed and conducted. The revised analysis of spike count data that models the hierarchical structure of the data is appropriate to overcome low animal numbers and the potential for oversampling. The authors are transparent in reporting the results related to this analysis in figure 5 and acknowledge the study is underpowered to confirm the trend of increased intrinsic excitability in male MSNs following prolonged oxycodone analysis.

      Weaknesses:

      A major weakness of this study is the disconnect between the behavioral and neurophysiological data reported. While a striking sex difference in relapse-like behavior is observed, there are no statistically significant sex differences in any of the neurophysiological data reported. Moreover, without an experiment to functionally test the role of the PVT-NAc projection in relapse-like behavior following prolonged oxycodone these two arms of the study seem divorced.

      While the authors don't directly conclude that the PVT-NAc MSN circuit is required for relapse following prolonged oxycodone abstinences, in the introduction the authors state they aim to test the hypothesis that increased synaptic strength in PVT-NAcSh projections are necessary for drug-seeking. This study does not include the required experiments to test this hypothesis.

      Impact:

      The topic is of interest to the field of substance use disorders and gives solid evidence for the need to consider targeted therapeutics aimed at relapse prevention in opioid use disorder.

    4. Reviewer #3 (Public review):

      Summary:

      Alonso-Caraballo et al. use behavioral testing and ex vivo patch-clamp electrophysiology combined with circuit-specific optogenetic stimulation of PVT terminals to examine how oxycodone self-administration and abstinence duration shape cue-induced relapse and PVT-NAcSh synaptic transmission in male and female rats. In the revision, the authors reanalyzed intrinsic excitability using nested hierarchical GLMMs, acknowledged the low power in the male prolonged-abstinence group, and expanded the discussion of relevant PVT-NAc literature. These changes improve the manuscript. That said, most of the revisions are textual and the main experimental gap remains. Both sexes show increased oxycodone seeking compared to saline at 14 days, but only females show a time-dependent incubation from 1 to 14 days, and the PVT-NAcSh synaptic strengthening is the same in both sexes. Nothing in the revision brings those two observations closer together. The excitability data also come from NAcSh MSNs with no confirmation of PVT connectivity, which limits what circuit-specific conclusions can be drawn. The study is a solid characterization of abstinence-related synaptic changes in this pathway, but some of the conclusions still go further than the data allow.

      Strengths:

      The behavioral characterization is thorough and well-executed, covering self-administration, somatic withdrawal, and cue-induced relapse across two abstinence durations in both sexes. The sex-specific escalation in oxycodone seeking from 1 to 14 days in females but not males is a clear and compelling finding. The use of circuit-specific ex vivo optogenetics to isolate PVT terminal inputs onto NAcSh neurons is a genuine methodological strength, and the demonstration of feedforward inhibitory recruitment through local GABAergic interneurons adds meaningful novelty to the circuit characterization. The reanalysis of intrinsic excitability using nested hierarchical GLMMs appropriately accounts for the non-independence of cells recorded within the same animal and is a real improvement over the original approach. The expanded discussion of prior PVT-NAc work, particularly the more accurate treatment of Keyes et al. (2020) and Paniccia et al. (2024), better situates the findings within the existing literature.

      Weaknesses:

      The core limitation of the study remains unchanged after revision. The PVT-NAcSh synaptic strengthening after prolonged abstinence is statistically indistinguishable between sexes, while females but not males show a time-dependent escalation in oxycodone seeking from 1 to 14 days of abstinence. The Discussion proposes hormonal modulation or differences in upstream inputs as possible explanations, but none of these are tested and the gap is left unresolved. The intrinsic excitability recordings come from NAcSh MSNs with no confirmation that those neurons receive direct PVT input, which was raised in the original review, acknowledged in the revision, and not experimentally addressed. The male prolonged-abstinence excitability trend has approximately 20% statistical power and is non-significant, yet the Discussion interprets it as a potential neuroadaptation that could facilitate signal flow through the PVT-NAcSh circuit and contribute to relapse, which goes well beyond what the data support. The failure to distinguish between D1 and D2 MSNs remains a significant limitation given that cell-type-specific plasticity at PVT-NAc synapses has been shown to be directly relevant to opioid seeking in prior work. Finally, the Conclusion builds a mechanistic framework around D2 MSNs, PV interneurons, and D1 MSNs that is drawn from studies using different drugs or experimental designs, and none of these cell-type-specific mechanisms are tested in the present experiments.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Although not undermining these data, there are a few potential weaknesses that reduce the impact of the work. For example, the inability to directly assess whether cue-induced drug-seeking is in fact augmented compared to daily intake during self-administration in the maintenance face only permits the authors to denote that re-exposure to cues and the context is sufficient to promote active lever pressing without demonstrating whether seeking behavior is in fact elevated further during a cue test. This is notably understandable as drug available sessions were 6-hours versus a 1-hour relapse test. Importantly, it is clearly demonstrated that drug seeking is higher on average in female mice after 14 days versus 1 day.

      We agree that the current design does not allow us to directly assess whether cue induced drug-seeking is augmented relative to the average self-administration intake. However, this comparison was not a question examined in the manuscript and was not an intended interpretation of the data. Our analyses and interpretations focused on comparisons between saline and oxycodone groups tested under identical cue-induced relapse conditions. While it does not change or contradict the reviewer’s point, we would also like to clarify that the relapse test was 2 hours long.

      (2) With regard to the interpretation of electrophysiology findings, the lack of inclusion of an abstinence-only group does not permit interpretations to parse out whether observed increases in synaptic strength (or the lack of) reflect abstinence or an interaction between abstinence period and re-exposure to the operant chamber, as slices were taken 30-45 min post relapse test.

      The inclusion of an abstinence-only control group would have been required to definitively dissociate synaptic changes driven by abstinence alone from those arising from an interaction between abstinence and re-exposure to the operant context during the relapse test. In the present study, electrophysiological recordings were intentionally performed 30 to 45 minutes following the relapse test to capture synaptic modifications associated with cue-induced drug-seeking after abstinence. Accordingly, we interpret these findings as reflecting the neural state following relapse rather than abstinence alone, and we have revised the text accordingly to clarify this point.

      (3) With regard to the interpretation of electrophysiology findings, the lack of inclusion of an abstinence-only group does not permit interpretations to parse out whether observed increases in synaptic strength (or the lack of) reflect abstinence or an interaction between abstinence period and re-exposure to the operant chamber, as slices were taken 30-45 min post relapse test. While much literature has shown that drug-induced adaptations in the NAc require a post-drug period for plasticity to measurably emerge, studies have also shown that re-exposure to heroin-associated cues following abstinence seemingly "reverses" increases in cell excitability in prelimbic-NAc pyramidal neurons (Kokane et al., 2023) and that depotentiation of morphine-induced increases in synaptic strength in the NAc shell can be depotentiated by drug re-exposure - an effect also observed with cocaine re-exposure (Madayag et al., 2019). Notably, the lack of effect at 14 but not 1 day supports the likelihood that the relapse test does not in fact influence the plasticity within the PVT-NAcSh circuit.

      We thank the reviewer for highlighting relevant literature showing that drug or cue re exposure can modify or reverse drug-induced plasticity in NAc-related circuits. We want to clarify that, in our dataset, synaptic changes in the PVT-NAcSh pathway are seen after 14 days of abstinence, but not after 1 day. Therefore, the lack of effect at the earlier time point and its appearance after extended abstinence support the idea of time-dependent plasticity. Although electrophysiological recordings were taken soon after the relapse test, this temporal pattern argues against relapse testing alone as the primary driver of the observed synaptic changes. We have updated the text to clarify this point.

      (4) While the lack of effect on AMPAR:NMDAR ratio and rectification indices do support the notion that enhanced EPSC amplitudes in input-output curves do not reflect a change in AMPAR subunit expression (i.e., increased GluA2-lacking receptors that exhibit inward rectification at depolarized potential) nor a change in postsynaptic sensitivity to glutamate, without direct assessment of AMPAR-specific and NMDAR-specific input output curves, it doesn't definitively exclude the possibility that both AMPA and NMDA receptor currents are being upregulated, thus negating an observable change in postsynaptic strength.

      We agree that unchanged AMPAR/NMDAR ratios and rectification index suggest against altered AMPAR subunit composition or simple postsynaptic sensitivity changes. Although receptor-specific input-output analyses would be necessary to definitively rule out proportional increases in both AMPA and NMDA receptor currents, we have updated the manuscript to clarify that our conclusions are limited to the synaptic measures we obtained. The revised text now states that acute or prolonged abstinence “might have no detectable postsynaptic effects as assessed by these synaptic measures” at PVT-NAcSh synapses.

      Reviewer #2 (Public review):

      (5) While this paper is certainly interesting, and well-written, and the experiments seem to be well performed, the behavioral and physiological effects observed are somewhat divorced. Specifically, what accounts for the heightened relapse in females? Since no opioid-related sex differences were observed in PVT-NAcSh neurophysiology, it is unclear how the behavioral and neurophysiological data fit together. Furthermore, the lack of functional manipulation of PVT-NAcSh circuitry leaves one to wonder if this circuit is even important for the behavior that the authors are measuring. I would be more positive about this study if the authors were able to resolve either of the two issues noted above.

      A key challenge in circuit-based studies of motivated behavior is connecting circuit-level plasticity to complex, sex-dependent behavioral phenotypes. In this study, we do not mean to imply that synaptic plasticity within the PVT-NAcSh projection alone explains the increased relapse seen in females. Instead, our electrophysiological data indicate that this projection experiences time-dependent, abstinence-dependent changes in synaptic strength, offering important insights into when and where circuit-level adaptations may occur. We also believe that the lack of obvious sex differences in PVT-NAcSh synaptic strength does not rule out this circuit's role in sex-specific behavior. Growing evidence suggests that sex differences in relapse and motivated behaviors may stem from different modulation of shared circuits (for example, via ovarian hormones, neuromodulatory tone, or upstream inputs), rather than from significant differences in baseline synaptic properties within a given projection. Regarding circuit relevance, extensive previous research has identified the PVTNAcSh pathway as a critical regulator of cue-induced reward seeking and relapse. Our findings expand on this by showing that this projection displays abstinence-dependent synaptic strengthening after oxycodone self-administration. Although functional manipulation of this circuit is needed to confirm its causal role, such experiments were beyond the scope of this study.

      (6) There are insufficient animals in some cases. For example, in Figure 4, the Male Saline 14-day abstinence group (n = 3 rats) has less than half of the excitability as compared to the Male Saline 1-day abstinence group (n = 7 rats). This is likely due to variance between animals and, possibly, oversampling. Thus, more rats need to be added to the 14-day abstinence group. Additionally, the range of n neurons/rat should be reported for each experiment to ensure readers that oversampling from single animals is not occurring.

      We appreciate the reviewer's concern regarding the number of animals and the potential for oversampling. We take this concern seriously and have substantially revised our statistical approach in response.

      All spike count data were reanalyzed using nested hierarchical Poisson generalized linear mixed-effects models (GLMMs), fitted separately for each sex and abstinence duration. Each model included injected current (mean-centered), drug condition, and their interaction as fixed effects, with random intercepts and slopes for injected current at the animal level, and random intercepts for cells nested within animals. Importantly, this reanalysis changed several of our original conclusions. Effects that appeared significant under the conventional cell-level analysis were no longer statistically significant once the hierarchical structure of the data was properly modeled. We report these corrected results transparently throughout the revised manuscript.

      However, in males after prolonged abstinence, oxycodone-treated animals showed a higher spike output than controls, with a large effect size. Post-hoc analysis showed only 20% power with current sample (3 saline, 4 oxycodone rats). To reach 80% power, 13 rats per group are needed. We report this as a trend that warrants further study and have revised related sections to reflect this. The data suggest a possible neuroadaptation in males that the study is underpowered to confirm, not a null effect.

      In response to this comment, we have updated Figure 5, the Results and Discussion sections, and the Statistics/Methods section to clearly describe the nested hierarchical modeling approach, report corrected statistical values, and acknowledge the power limitation for the male prolonged abstinence group. The figure legend now reports the number of neurons recorded per rat, showing the distribution across animals rather than individual subjects.

      (7) The IPSC data, for example in Figure 4, is one of the more novel experiments in the manuscript. However, it is quite challenging to see the difference between males and females, saline and oxycodone, at low stimulation intensities within the graph. Authors should expand this so that reviewers/readers can see those data, especially considering other work suggesting that PVT synaptic input onto select NAc interneurons is disrupted following opioid self-administration. Additional comment: It's also interesting that the IPSC amplitude seems to be maximal at ~2mW of light, whereas ~11 mW is required to evoke maximal EPSC amplitude. It would be interesting to know the authors' thoughts on why this may be.

      While visual separation between conditions at low light levels is subtle, we addressed this directly using linear mixed-effects modeling, which evaluates IPSC amplitudes across the full range of stimulation intensities while accounting for repeated measurements from cells nested within animals. This approach provides greater sensitivity than visual inspection alone and avoids over interpretation of noise at individual stimulation levels.

      Using this framework, we observed robust main effects of light intensity in both males and females, indicating preserved recruitment of inhibitory synaptic responses as stimulation increased. Importantly, no significant Light × Condition interactions were detected in either sex, indicating that the scaling of IPSC amplitudes with light intensity was not altered by oxycodone exposure.

      With respect to the observation that IPSC amplitudes appear to reach near-maximal levels at lower light intensities (~2 mW) compared to EPSCs (~11 mW), we agree that this distinction is intriguing. One possible explanation is that the depend on the recruitment of local interneurons. However, the number of interneurons activated by PVT interneurons is limited and inhibitory responses may reach a plateau at relatively low light intensities once these interneurons are fully recruited.

      On the other hand, the increased intensity of photostimulation would result in an increase of monosynaptic EPSC amplitude over a wider range of stimulation (light) intensities, as increased intensity of light would recruit more ChR2-expressing PVT fibers, resulting in larger EPSCs.

      (8) There is an inadequate description of what has been done to date on the PVT-NAc projection regarding opioid withdrawal, seeking, disinhibition, and the effects on synaptic physiology therein. For example, a critical paper, Keyes et al., 2020 Neuron, is not cited. Additionally, Paniccia et al., 2024 Neuron is inaccurately cited and insufficiently described. Both manuscripts should be described in some detail within the introduction, and the findings should be accurately contextualized within the broader circuit within the discussion.

      In the revised manuscript, we expanded the Discussion to give a more thorough overview of previous research on the PVT-NAc pathway in relation to opioid-related behaviors and synaptic changes. Specifically, we added more detail about Keyes et al., 2020 and Paniccia et al., 2024, clarifying their findings and placing them within the context of the circuit mechanisms studied in our work. We also revised the text to ensure the descriptions of these studies are accurate and that their conclusions are properly related to our findings.

      (9) Related to the above, the authors should provide a more comprehensive description of how PVT synapses onto cell-type specific neurons in the NAc which expand beyond MSNs, especially considering that PVT has been shown to influence drug/opioid seeking through the innervation of NAc neurons that are not MSNs. For example, see PMIDs 33947849, 36369508, 28973852, 38141605.

      In the revised manuscript, we expanded the Discussion to describe the diversity of PVT projections within the NAc and the potential role of non-MSN neuronal populations in drug-related behaviors. We added discussion on the broader circuit context and other cell types where relevant to the focus on synaptic transmission onto MSNs. Since our experiments specifically examined synaptic physiology in MSNs, we focused the literature discussion on studies most directly related to MSNtargeted PVT inputs and opioid-related behaviors.

      Reviewer #3 (Public review):

      (10) Additional experiments could strengthen the results and help clarify synaptic mechanisms underpinning behavioral sex differences.

      We agree that additional experiments focused on identifying cell-type-specific mechanisms within the PVT-NAcSh circuit would further enhance understanding of the neural substrates behind the observed behavioral sex differences. In the revised manuscript, we have expanded the Discussion to explicitly acknowledge these limitations and clarify the scope of our current study. Specifically, we discuss the possibility that sex-specific adaptations might occur in particular neuronal subpopulations or circuit components that were not resolved in the present experiments. We also mention that future research using cell-type–specific approaches will be necessary to determine if such mechanisms contribute to the increased oxycodone seeking seen in females after prolonged abstinence. We appreciate the reviewer’s suggestions and have incorporated this perspective into the revised manuscript to better contextualize our findings and outline future directions.

    1. eLife Assessment

      This study investigates the role of the Z-disc protein Zasp52 in Drosophila flight muscles and provides evidence that an intrinsically disordered region (IDR) helps to stabilize and promote the localization of the protein to the Z-disc. Overall, this represents an important study that provides insights into Z-disc function and maintenance. The data are convincing, supported by strong genetic evidence and behavioral tests, well-controlled experiments, and detailed statistical analyses. Additional functional analyses designed to tease out specialized regions within the newly described isoform of Zasp52 would further strengthen models regarding the function of the protein.

    2. Reviewer #1 (Public review):

      The manuscript by Ho and Schock investigates the role of the Z-disc protein Zasp52 during Drosophila flight muscle development. It was known before, mainly by findings from this group, that Zasp52 is required for normal sarcomere morphogenesis, specifically Z-disc morphogenesis in indirect flight muscles. But the exact molecular mechanism by which Zasp52 contributes, apart from the fact that it is localised there and is somehow involved in multimerization/cross-linking, was not clear. This paper proposes that an intrinsically disordered region (IDR) in Zasp52 is needed for some of its functions, by stabilising Zasp52 localisation at the Z-disc. Specifically, the IDR in Zasp52 is proposed to be required for Z-disc maintenance during the mechanical challenges of flight, while being dispensable for the initial morphogenesis during development. This hypothesis is supported by strong genetic evidence and behavioural tests, deleting Zasp's IDR impairs flight from mid-age onwards, while a block in flight activity lifts the phenotype.

      However, some of the phenotypic analysis, in particular the bending of the sarcomere, likely upon mechanical challenge by muscle contractions, needs more detailed investigations to be fully convincing.

      Strengths:

      (1) The linker in the alternatively spliced exon 15 of Zasp52 was deleted with a state-of-the-art genetic editing strategy. Surprisingly, flies are homozygous viable, showing that this long part of the Zasp52 protein is not essential for animal survival or sarcomere morphogenesis.

      (2) The observed sarcomere phenotypes with age, especially the bending Z-discs, are new and exciting.

      (3) The displayed EM images document interesting phenotypes.

      (4) Most of the observed phenotypes can be rescued by re-expression of the long Zasp52 isoform, which does contain the IDR region, but not by a shorter one without it, suggesting that IDR is important.

      (5) FRAP data measure the local turnover of a short-ZaspGFP and show that this increased in the Zasp mutant lacking the IDR domain, suggesting that Zasp-IDR might stabilise Zasp at the Z-disc.

      (6) Interestingly, flight and sarcomere morphology phenotypes can be rescued by preventing the flies from flying, suggesting that they are mechanically induced.

      Weaknesses:

      (1) The western blot quantifications of Zasp isoform expression are weak. No error bars are indicated in the quantifications; the quantifications appear to be more qualitative than quantitative. According to band intensities, the long Zasp isoforms seem to be less present compared to the shorter ones, even in the flight muscles.

      (2) The phenotypic analysis of the sarcomere appears somewhat superficial throughout the paper. Only Zasp52 and phalloidin are shown; no other Z-disc or thick filament proteins. At least myosin stainings and overview images are important to better judge the phenotypic variations. Are the variants between individuals or regional in the same muscle?

      (3) EM images would benefit from better quantification.

      (4) Other proteins were not analysed with the FRAP-based turnover assay for comparison in wild type and mutant. All Z-proteins might turn over faster in the mutant with the defective Z-disc.

    3. Reviewer #2 (Public review):

      Summary and Strengths:

      This in-depth genetic analysis of Zasp52 function in Drosophila indirect flight muscle (IFM) provides an interesting perspective regarding the role of a partially disordered region (IDR) in exon 15e. This exon seems to be exclusively present in IFM and contributes to the prevention of myofibril disintegration during aging, likely due to interactions of this region with Z-disc insertion and/or stability. The addition of an isoform (PR) that lacks exon 15e serves as a nice control to illustrate the necessity of exon 15e in muscle structure and function. Overall, the manuscript is exceptionally well-written, logical, with nicely controlled experiments and detailed statistical analysis that largely support the conclusions drawn by the authors. While exon 15e is clearly involved in preventing muscle degeneration, a solid role for thin filament stability is not clearly shown (as mentioned in the abstract). In addition, which regions/how the proteins of the IDR may contribute are unclear.

      Weaknesses:

      (1) It is not clear in Figure S1A where exon 15e fits within the Zasp52 locus schematic. This is important as a premise of this paper describes this region to be key, and proof from multiple prediction programs would lend more weight to the prediction of the exon being largely disordered. Inclusion of the discussed short linear motifs, comparison with Canoe or LBD3 for similarities and/or an Alphafold structure would help make the authors' point (colorized with known domains).

      (2) Interesting that immobilization rescues the deterioration phenotypes. The authors should explain in more detail how this was done to avoid dehydration/starvation of the flies.

      (3) There is a lot of discussion about the potential function of the IDR region, specifically a putative actin binding motif or other 'ordered' regions that may contain short linear motifs. It would strengthen the findings to show which of these may be essential for Zasp52 function in the IFM. The ability to bind actin could be tested biochemically, and/or smaller deletions could be made to unequivocally test the role of the ABD vs other predicted motifs using genetics. If some of these regions are more ordered, where do they lie within, and do they form a predicted fold or structure that gives insight into function?

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript by Ho and Schock investigates the role of the Z-disc protein Zasp52 during Drosophila flight muscle development. It was known before, mainly by findings from this group, that Zasp52 is required for normal sarcomere morphogenesis, specifically Z-disc morphogenesis in indirect flight muscles. But the exact molecular mechanism by which Zasp52 contributes, apart from the fact that it is localised there and is somehow involved in multimerization/cross-linking, was not clear. This paper proposes that an intrinsically disordered region (IDR) in Zasp52 is needed for some of its functions, by stabilising Zasp52 localisation at the Z-disc. Specifically, the IDR in Zasp52 is proposed to be required for Z-disc maintenance during the mechanical challenges of flight, while being dispensable for the initial morphogenesis during development. This hypothesis is supported by strong genetic evidence and behavioural tests, deleting Zasp's IDR impairs flight from mid-age onwards, while a block in flight activity lifts the phenotype.

      However, some of the phenotypic analysis, in particular the bending of the sarcomere, likely upon mechanical challenge by muscle contractions, needs more detailed investigations to be fully convincing.

      Strengths:

      (1) The linker in the alternatively spliced exon 15 of Zasp52 was deleted with a state-of-the-art genetic editing strategy. Surprisingly, flies are homozygous viable, showing that this long part of the Zasp52 protein is not essential for animal survival or sarcomere morphogenesis.

      (2) The observed sarcomere phenotypes with age, especially the bending Z-discs, are new and exciting.

      (3) The displayed EM images document interesting phenotypes.

      (4) Most of the observed phenotypes can be rescued by re-expression of the long Zasp52 isoform, which does contain the IDR region, but not by a shorter one without it, suggesting that IDR is important.

      (5) FRAP data measure the local turnover of a short-ZaspGFP and show that this increased in the Zasp mutant lacking the IDR domain, suggesting that Zasp-IDR might stabilise Zasp at the Z-disc.

      (6) Interestingly, flight and sarcomere morphology phenotypes can be rescued by preventing the flies from flying, suggesting that they are mechanically induced.

      Weaknesses:

      (1) The western blot quantifications of Zasp isoform expression are weak. No error bars are indicated in the quantifications; the quantifications appear to be more qualitative than quantitative. According to band intensities, the long Zasp isoforms seem to be less present compared to the shorter ones, even in the flight muscles.

      We will work on including quantifications with error bars for the Western blots in our resubmission. It is important to keep in mind that the main point in figure 1B is that there are plenty of exon15e-containing isoforms in IFM, in contrast to other tissues with very limited exon15e-containing isoforms. This is confirmed by the analysis of RNAseq data in figure 1C, and of course, by the flightless phenotype of the exon15e mutant.

      (2) The phenotypic analysis of the sarcomere appears somewhat superficial throughout the paper. Only Zasp52 and phalloidin are shown; no other Z-disc or thick filament proteins. At least myosin stainings and overview images are important to better judge the phenotypic variations. Are the variants between individuals or regional in the same muscle?

      Our images are representative of the observed phenotypes. We aim to provide overview images and other stainings to better illustrate the phenotypic variations in the revised version. Phenotypes are consistently present across all individuals, as reflected in our replicates. Interestingly, they appear to not be randomly interspersed among the sarcomeres but concentrated in certain regions of muscle more than others.

      (3) EM images would benefit from better quantification.

      We do not believe that EM images can be meaningfully quantified, because of the many selection steps preceding image acquisition.

      (4) Other proteins were not analysed with the FRAP-based turnover assay for comparison in wild type and mutant. All Z-proteins might turn over faster in the mutant with the defective Z-disc.

      This is the point we are trying to make. The Zasp52 IDR acts like a glue stabilizing all Z-disc proteins. We performed this experiment as a first step to explore whether an exon15e-lacking system exhibited modified dynamics, and we aim to provide more data in the revised version.

      Reviewer #2 (Public review):

      Summary and Strengths:

      This in-depth genetic analysis of Zasp52 function in Drosophila indirect flight muscle (IFM) provides an interesting perspective regarding the role of a partially disordered region (IDR) in exon 15e. This exon seems to be exclusively present in IFM and contributes to the prevention of myofibril disintegration during aging, likely due to interactions of this region with Z-disc insertion and/or stability. The addition of an isoform (PR) that lacks exon 15e serves as a nice control to illustrate the necessity of exon 15e in muscle structure and function. Overall, the manuscript is exceptionally well-written, logical, with nicely controlled experiments and detailed statistical analysis that largely support the conclusions drawn by the authors. While exon 15e is clearly involved in preventing muscle degeneration, a solid role for thin filament stability is not clearly shown (as mentioned in the abstract). In addition, which regions/how the proteins of the IDR may contribute are unclear.

      Weaknesses:

      (1) It is not clear in Figure S1A where exon 15e fits within the Zasp52 locus schematic. This is important as a premise of this paper describes this region to be key, and proof from multiple prediction programs would lend more weight to the prediction of the exon being largely disordered. Inclusion of the discussed short linear motifs, comparison with Canoe or LBD3 for similarities and/or an Alphafold structure would help make the authors' point (colorized with known domains).

      We will add a bar below figure S2A to show the region corresponding to exon 15e. We used three disorder prediction programs and one structure (order) prediction program. The majority of exon15e is completely disordered and of very low confidence score, and thus uninformative to display as an Alphafold structure. Likewise, IDR’s are very difficult to classify, therefore we cannot say much more than that LDB3, Zasp52, and Canoe contain IDRs, with Zasp52 and Canoe both having an actin-binding domain within the IDR. We will provide more data on the function of the ABD in the revised version.

      (2) Interesting that immobilization rescues the deterioration phenotypes. The authors should explain in more detail how this was done to avoid dehydration/starvation of the flies.

      We will provide more details in the revised version.

      (3) There is a lot of discussion about the potential function of the IDR region, specifically a putative actin binding motif or other 'ordered' regions that may contain short linear motifs. It would strengthen the findings to show which of these may be essential for Zasp52 function in the IFM. The ability to bind actin could be tested biochemically, and/or smaller deletions could be made to unequivocally test the role of the ABD vs other predicted motifs using genetics. If some of these regions are more ordered, where do they lie within, and do they form a predicted fold or structure that gives insight into function?

      We will provide data on the function of the ABD in the revised version.

    1. eLife Assessment

      This important study identified Mex3a protein with dual RNA-binding protein/ubiquitin ligase function as a pivotal regulator of olfactory sensory neurons (OSN) differentiation and lineage fidelity. The authors employed a combination of systems biology approaches (e.g., single-cell RNA sequencing, proteomics) and newly developed animal models (e.g., HyperTRIBE) to provide solid evidence that abrogation of Mex3a disrupts cilia structure and polarity of OSNs. Notwithstanding that this article is of a broad potential interest across different biomedical disciplines ranging from RNA to developmental biology, additional mechanistic data connecting identified Mex3a mRNA targets and ensuing OSN phenotypes would further strengthen this study.

    2. Reviewer #1 (Public review):

      The study by Escamilla del Arenal et al. utilized a conditional knockout mouse model to study the role of Mex3a in immature olfactory sensory neurons (OSN). Mex3a is a dual-functional protein that has RNA-binding function and ubiquitin-E3 ligase activity. The results revealed that Mex3a expression is critical for proper OSN differentiation and contributes to cell surface protein trafficking and translation, cilia structure, and planar cell polarity in mature neurons. Moreover, Mex3a enforces lineage fidelity, selectively repressing sustentacular programs in neurons and neuronal programs in sustentacular cells.

      In addition, the authors established an in vivo HyperTRIBE mouse model to identify Mex3a RNA targets and incorporated UbiFast into the Mex3a conditional knockout (cKO) model to find its protein targets to investigate how Mex3a regulates OSN differentiation. The experimental systems are laborious and comprehensive, which allowed the authors to identify new Mex3a putative targets in OSN.

      The phenotypic results derived from the conditional Mex3a cKO mice are solid. Mechanistic findings also revealed that, in addition to facilitating protein degradation, Mex3a may confer K27 ubiquitin linkage on its target proteins, which has a non-proteolytic role but affects target protein activity, other post-translational modifications, or protein-protein interactions. However, among all Mex3a putative targets, the authors decided to emphasize on the Mex3a-mediated K27 ubiquitination on stress granule protein Serbp1 and ribosome protein Rps7, and the association between Mex3a expression and Serbp1 and p-eEF2 ribosome recruitment. This Mex3a-Serbp1-p-eEF2 ribosome recruitment axis, although it can be important in Unfolded Protein Response (UPR) signaling, seems rather general and cannot explain the striking lineage-specific phenotypes observed in the mouse model. The authors need to provide more solid evidence to demonstrate that K27-Ubiquitinylation of Serbp1 is a key step of Mex3a function in OSN differentiation to strengthen the relation between the phenotypes and mechanism presented in this study.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Arenal and colleagues demonstrate that loss of Mex3a leads to defects in cell surface protein trafficking, translation, ciliary structure, and planar cell polarity in mature neurons. Through proteomic analyses, the authors show that Mex3a depletion alters the abundance of proteins involved in vesicular transport, lipid metabolism, and ribosome biogenesis. Using the HyperTRIBE approach, the authors further identify targets of Mex3a and provide evidence supporting a role for K27-linked ubiquitination in regulating these substrates. Mechanistically, the study suggests that Mex3a levels influence the recruitment of SERBP1 and phosphorylated eEF2 (p-eEF2) to ribosomes, contributing to translational repression.

      Strengths:

      Overall, this is a very interesting and well-written manuscript that significantly advances our understanding of Mex3a function and its role in neuronal development, particularly in olfactory sensory neurons. The data are clearly presented and thoughtfully interpreted.

      Weaknesses:

      I have a few minor comments that may further strengthen the manuscript and improve its accessibility to a broader readership.

      (1) In Figure 3B, the authors describe Mex3a localization to cytoplasmic granules. However, it is unclear how these compartments were defined. It would strengthen the conclusions if the authors included co-localization experiments using established cytoplasmic granule markers (e.g., stress granule markers) to define the identity of these structures more precisely. This would clarify whether Mex3a associates with stress granules, RNA processing bodies, or another class of ribonucleoprotein granules.

      (2) Functional validation of K27-linked ubiquitination on SERBP1<br /> To further define the functional significance of K27-linked ubiquitination, it would be informative to mutate the relevant lysine residue(s) on SERBP1 and examine whether this alters its recruitment to ribosomes or affects translational repression. Such an experiment would provide more direct evidence that K27-linked ubiquitination of SERBP1 mediates the observed translational effects.

      (3) Discussion of vesicular trafficking and lipid metabolism targets<br /> The identification of Mex3a targets involved in vesicular trafficking and lipid metabolism, including COPII coat components such as Sec31a and lipid regulatory proteins such as Sec14 and PIP5K1A, is particularly intriguing. The authors may wish to expand the Discussion to address how regulation of these proteins could contribute to defects in plasma membrane trafficking and planar cell polarity. Integrating these findings with the observed cell surface trafficking phenotypes would further enhance the mechanistic framework of the study.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors investigate the role of the KH and RING domain-containing protein Mex3a in the differentiation and maturation of olfactory sensory neurons. Using conditional knockout of Mex3a in immature neurons, they show that mature olfactory sensory neurons display defects in membrane protein trafficking, including olfactory receptors and Adcy3, together with abnormalities in ciliary radial organization and planar cell polarity. Through single-cell RNA sequencing and quantitative proteomics, the authors further show that Mex3a-deficient neurons fail to properly resolve the unfolded protein response and exhibit transcriptomic features suggestive of lineage mixing with sustentacular cells. The study also introduces a methodological advance by adapting HyperTRIBE for use in transgenic mice, which enables the identification of in vivo Mex3a RNA targets, including components of Wnt signaling that appear to be under translational repression by Mex3a. The authors then pursue one of these targets to further explore the role of Mex3a in translational repression.

      Strengths:

      First, it addresses an important biological and conceptual question. Mex3a is a multifunctional protein with the potential to couple RNA regulation, protein homeostasis, and key cellular processes, yet its in vivo role in neuronal differentiation remains poorly understood. By focusing on Mex3a in olfactory sensory neurons, the manuscript asks a timely and important question of how post-transcriptional regulation contributes to the maturation of highly specialized neurons, including the establishment of ciliary architecture, membrane protein trafficking, and cell polarity. Second, the generation and validation of an inducible in vivo mouse HyperTRIBE system represents a technical advance. By incorporating the Adar deaminase domain into a transgenic mouse model, the authors establish a rigorous and useful approach for identifying Mex3a RNA targets in vivo, which is likely to be valuable to the wider RNA biology community. Third, the study integrates the Mex3a knockout model with single-cell RNA sequencing, quantitative mass spectrometry-based proteomics, ubiquitin profiling, and ribosome-related analyses, providing a broad and multilayered view of the Mex3a knockout phenotype. Finally, the imaging analyses revealing altered ciliary content and organization in olfactory sensory neurons identify an interesting and potentially important link between Mex3a, cilia biology, and vesicular trafficking. More broadly, the manuscript reflects a very substantial experimental effort, and each individual dataset has the potential to be useful for the field.

      Weaknesses:

      A main weakness of the manuscript is that the mechanistic links between the major findings remain somewhat correlative, and the biological narrative is not fully sustained through the later figures. The study documents defects in membrane trafficking, ciliary radial organization, and planar cell polarity, and it identifies candidate targets with clear relevance to these processes, including factors linked to vesicle trafficking. However, the manuscript then shifts its mechanistic focus toward translational regulators such as Serbp1 and Rps7, without adequately connecting these later analyses back to the core phenotypes established earlier. As a result, there is a noticeable disconnect between the phenotypic emphasis of the study and the mechanistic validation that follows.

      A second weakness is that, given the breadth and potential importance of the datasets generated, validation remains limited for several of the major conclusions. This reduces confidence in the interpretation of the single-cell, proteomic, ubiquitin-related, and ribosome-associated analyses, and also limits the future value of these datasets as a resource for the field. Because the manuscript aims to address several major questions at once, stronger validation and clearer integration across the different experimental arms are needed for the conclusions to feel fully supported.

      Finally, the HEK293T overexpression experiments are less solid than the in vivo analyses and do not provide equally strong support for the proposed mechanisms. In this context, some of the observed effects on cytoskeletal organization, membrane-less granule formation, and ribosome profiles may be indirect, which makes it difficult to weigh these findings alongside the much stronger in vivo phenotypes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors report the results of a tDCS brain stimulation study (verum vs sham stimulation of left DLPFC; between-subjects) in 46 participants, using an intense stimulation protocol over 2 weeks, combined with an experience-sampling approach, plus follow-up measures after 6 months.

      Strengths:

      The authors are studying a relevant and interesting research question using an intriguing design, following participants quite intensely over time and even at a follow-up time point. The use of an experience-sampling approach is another strength of the work.

      Weaknesses:

      There are quite a few weaknesses, some related to the actual study and some more strongly related to the reporting about the study in the manuscript. The concerns are listed roughly in the order in which they appear in the manuscript.

      We truly appreciate your dedicating time and efforts to review our manuscript. Yes, we do perceive that those weaknesses you raised all make sense. We agree with you on almost all the suggestions that you detailed below, particularly in clarifying statistics and sample size determination. Please see specific responses below.

      Major Comments

      (1) In the introduction, the authors present procrastination nearly as if it were the most relevant and problematic issue there is in psychology. Surely, procrastination is a relevant and study-worthy topic, but that is also true if it is presented in more modest (and appropriate) terms. The manuscript mentions that procrastination is a main cause of psychopathology and bodily disease. These claims could possibly be described as 'sensationalized'. Also, the studies to support these claims seem to report associations, not causal mechanisms, as is implied in the manuscript.

      Thank you for this very practical suggestion. We agree that the current statements to underline the importance of procrastination are somewhat overreaching. Upon revision, we have overall toned down such claims by explicitly stating them as “associative evidence”, and rewritten a portion of terms in a more modest and balanced style. Please see specific revisions in the main text below:

      Introduction Section (Page 5, Line 64-81)

      “Procrastination is increasingly becoming a prevalent behavioral problem around the world, which reflects the irrational voluntary postponement of scheduled tasks albeit being worse off for such delays (Blake, 2019; Steel, 2007). In the epidemiological investigations, more than 15% of adults were identified as having chronic procrastination problems, and the situation for students was worse as 70-80% of undergraduates engaged in procrastination (American College Health Association, 2022; Ferrari et al., 2005). Moreover, the behavioral genetic evidence indicates a certain heritability of procrastination in human beings as well (Gustavson et al., 2017; Gustavson et al., 2014, 2015). In addition to its prevalence, the undesirable associations between procrastination behavior and health also warrant cautions. There is cumulative evidence to show the close associations between procrastination behavior and working performance, financial status, interpersonal relationships, and subjective well-being (Ferrari, 1994; Pychyl & Sirois, 2016; Steel et al., 2021). Further, as the prospective cohort studies indicated, many mental health problems emerge alongside procrastination, particularly in sleep problems, depression, and anxiety (Hairston & Shpitalni, 2016; Johansson et al., 2023). Even worse, chronic procrastination behavior has been observed to impair general health, as manifested by the intimate associations with close system disruption, gastrointestinal disturbance, as well as a high risk of hypertension and cardiovascular disease (Sirois, 2015; Sirois, 2016). ... ”

      (2) It is laudable that the study was pre-registered; however, the cited OSF repository cannot be accessed and therefore, the OSF materials cannot be used to (a) check the preregistration or to (b) fill in the gaps and uncertainties about the exact analyses the authors conducted (this is important because the description of the analyses is insufficiently detailed and it is often unclear how they analyzed the data).

      We are sorry to encounter a serious technical barrier making our preregistration invisible and inaccessible. The OSF has disabled my OSF account, as it claimed to detect “suspicious user’s activities” in my account (please see the screenshot below). This results in no access to all materials already deposited in this OSF account, including this preregistration. We have contacted the OSF team, but received no valid technical solution to recover this preregistered report. We reckon that this may be triggered by my affiliation change to the Third Military Medical University of the People’s Liberation Army (PLA).

      To address this unexpected circumstance and to ensure transparency, we have explicitly reported this case in the main text, and added the “Reconstructed Preregistration Statement” into the Supplemental Materials (SM). Also, as it has been out of best practices in preregistration, in addition to transparently reporting this case, we have removed this statement regarding preregistration elsewhere throughout the whole revised manuscript. Furthermore, we fully understand the gaps of comprehending the statistics of this study, resulting from inadequate methodological details in the reporting. Therefore, we have clearly reported extensive details in the Methods section to clarify how to conduct those analyses, favoring the smooth evaluations of our conclusions. Please see what we have added in the lines below (Comments #4-9).

      Methods Section (Page 5, Line 186-191)

      “This study fully adhered to CONSORT reporting guidelines, and was originally preregistered in the OSF repository (10.17605/OSF.IO/Y3EDT). However, due to the technical constraint related to OSF account service (see SM), this OSF page is no longer accessible. For transparency and best practices of open science, based on the original protocol documentations, a preregistration statement has been reconstructed to clarify aprior hypotheses, sample size determinations, and analysis plans for this study (Table S1).”

      (3) Related to the previous point: I find it impossible to check the analyses with respect to their appropriateness because too little detail and/or explanation is given. Therefore, I find it impossible to evaluate whether the conclusions are valid and warranted.

      Again, we apologize for confusing you because of inadequate statistical and methodological details. As you may know, this manuscript has ever been reviewed by Nature Human Behaviour, which editorially constrained the paper length. Thus, a substantial number of details had to be omitted or removed. As you kindly suggested, we have diligently added extensive descriptions to clarify how we carried out statistical analyses in the present study. Please see specific instances underneath.

      (4) Why is a medium effect size chosen for the a priori power analysis? Is it reasonable to assume a medium effect size? This should be discussed/motivated. Related: 18 participants for a medium effect size in a between-subjects design strikes me as implausibly low; even for a within-subjects design, it would appear low (but perhaps I am just not fully understanding the details of the power analysis).

      Thank you for raising this crucial question. We have determined this a priori effect size based on the existing work we published previously (Xu et al., 2023, J Exp Psychol Gen;152(4):1122-1133). In our pilot study (Xu et al., 2023), we identified a significant interaction effect between the single-session tDCS stimulation (active vs sham) and time (pre-test vs post-test) (t = 2.38, p = .02, n = 27; 95% CI [0.14, 1.49]) for changing procrastination willingness in the laboratory settings, indicating a medium effect size. Therefore, this pilot study provides supportive evidence to determine this effect size a priori. To clarify, we have explicitly justified the selection of this effect size in the Methods section.

      Methods Section (Page 5, Line 206-215)

      “A full randomized block design was used to assign participants to both groups (active neuromodulation group, NM; sham-control group, SC) (see Fig. 2C). As the pilot study probing into the effect of single-session tDCS stimulation to change procrastination willingness indicated (t = 2.38, p = .02, 95% CI [0.14, 1.49]; Xu et al., 2023), statistical power was predetermined by G*Power at a relatively medium effect size (1-β err prob = 0.80, f = 0.25), yielding the total sample size at 18 to reach acceptable power (see SM Methods and Fig. S1)....”

      We fully understand that this sample size to reach a medium effect size is seemingly low, and that the18 participants for each group are apparently limited in any case. Upon double-checking these power analyses, we confirmed that this sample size requirement is indeed correct. Please see the G*Power outputs in Author response image 1.

      Author response image 1.

      Despite the absence of algorithmic errors in the power analysis here, we are aware that this limited sample size may hamper statistical robustness. To tackle this weakness, we have clearly warranted such cautions in the Limitation section:

      Limitations Section (Page 12, Line 637-640)

      “... In addition to technical limitations, given the apparently limited size of the sample (total N = 46), it warrants caution in generalizing these findings elsewhere, and necessitates further validations in a large-scale cohort.”

      (5) It remains somewhat ambiguous whether the sham group had the same number of stimulation sessions as the verum stimulation group; please clarify: Did both groups come in the same number of times into the lab? I.e., were all procedures identical except whether the stimulation was verum or sham?

      Yes, we fully followed the CONSORT pipeline to carry out this double-blind trial, and thus confirmed that all the participants in both groups had the same number of stimulation sessions in our lab. That is to say, except for the stimulation type (verum vs sham), all the procedures, equipment and even the room were identical for all the participants. For clarification, we have clearly stated this in the main text:

      Results Section (Page 9, Line 419-423)

      “In both groups, almost all participants (93.2%, 41/44) reported perceiving acceptable pain stemming from current stimulation, and believed they were receiving treatment (91.30% (21/23) for active neuromodulation group (NM), 86.95% (20/23) for sham control group (SC), x<sup>2</sup> = 0.224, p = .636). All the participants were engaged in the identical experimental procedures excepting to stimulation’s type (active vs sham). ...”

      (6) The TDM analysis and hyperbolic discounting approach were unclear to me; this needs to be described in more detail, otherwise it cannot be evaluated.

      We apologize for the inadequate details, which hindered a precise understanding of the TDM and the hyperbolic discounting model. The Temporal Decision Model (TDM) was originally proposed by our team (Xu et al., 2023; Zhang et al., 2019, 2020, 2021), which theoretically conceptualizes procrastination as the failure of trade-off between task outcome value (i.e., motivation to take actions now for pursuing task reward) and task aversiveness (i.e., motivations to take away from playing actions now for avoiding negative experiences). Once task aversiveness overrides the pursuit of task outcome values, the procrastination emerges. One overarching hypothesis in this theoretical model is that the task aversiveness is hyperbolically discounted when approaching the deadline: it would be discounted sharply when far from the deadline but discounted slowly when nearing the deadline (Zhang et al., 2019). Considering the nonlinear dynamics inherent in this hyperbolic discounting, we therefore employed a log-spaced temporal sampling scheme (Myerson et al., 2001) to strengthen curve-fitting performance (please see the schematic diagram (https://uen.pressbooks.pub/behavioraleconomics/chapter/the-reality-of-homo-sapiens, where each point indicates a sampling time)):

      Specifically, based on the log-spaced temporal sampling rule, five time points were first selected to fulfill the statistical prerequisites for hyperbolic model fitting, with increasing sampling density toward the deadline (e.g., for a task due at 20:00: sampling occurred at 10:00, 16:00, 18:00, 19:30, 20:00). At each time point, participants reported task aversiveness (A) on a 0–100 Visual Analog Scale (VAS). Then, task aversiveness discounting was calculated as 1- (A<sub>t</sub> / A<sub>earliest</sub>), where t<sub>earliest</sub> was the earliest sampling point (e.g., 10:00), serving as the reference for immediate execution. Subsequently, using the GraphPad Prisma software (v9, 525), we estimated the AUC from these five data points based on the Myerson algorithm (Myerson et al., 2001), which was computed as the trapezoidal integration of task aversiveness discounting over time. By this modelling method, a higher AUC reflects stronger temporal discounting of task aversiveness, which means that participants experience a faster decline in subjective aversiveness as execution is delayed, yielding lower effective aversiveness and reduced avoidance behavior. That is to say, if a participant showcases a greater discounting of task aversiveness as reflected by a higher AUC, she/he experiences a more pronounced reduction in subjective aversiveness upon postponement, plausibly yielding less procrastination. As you kindly suggested, we have added these details to explicitly clarify how to use the hyperbolic discounting approach for determining sampling time points and for calculating AUC of task aversiveness discounting.

      Methods Section (Page 6, Line 268-283)

      “On the Task day, we developed a mobile app to implement experience sampling method (ESM) for tracking one’s real-time evaluation of task aversiveness and task outcome value (see Fig. 1). The task aversiveness describes how disagreeable one perceives when performing a given real-life task to be, whereas outcome value refers to the subjective benefits of the task outcome brought about by completing the task before the deadline (Zhang & Feng, 2020). As theoretically conceptualized by the temporal decision model (TDM) of procrastination, the perceived task aversiveness is hyperbolically discounted when approaching deadline, showing sharply discounting when faring away from deadline but slowly discounting once nearing deadline (Zhang & Feng, 2020; Zhang et al., 2021). Thus, considering this nonlinear dynamics inherent in this hyperbolic discounting, the five recording moments of ESM were selected per task a priori by using a log-spaced temporal sampling scheme (Myerson et al., 2001), with increasing sampling density toward the deadline, such as moments of 10:00 (earliest), 16:00, 18:00, 19:30, 20:00 (deadline). The five sampling points could meet statistical prerequisite in the hyperbolic model fitting, requiring ≥ 4 points (Green & Myerson, 2004). To do so, recording moments of tasks were individually tailored for each task per participant in this ESM procedure.”

      Methods Section (Page 7, Line 318-334)

      “... As articulated temporal decision theoretical model above, the task aversiveness evoked by executing a task was temporally dynamic in a hyperbolic discounting pattern, with sharply discounting in faring away from deadline but slowly discounting in nearing deadline (Zhang & Feng, 2020). To quantitatively characterize the task aversiveness with consideration for its dynamics, the model-free area under the curve (AUC) was calculated. Specifically, based on the log-spaced temporal sampling rule, task aversiveness was measured by 100-point visual analog scale at the five sampling moments. Then, the task aversiveness discounting (A) was calculated as 1- (A(t) / A(earliest)), where t(earliest) was the earliest sampling point, serving as the reference for immediate execution. Subsequently, using the GraphPad Prisma software (v9, 525), the AUC was computed as the trapezoidal integration between task aversiveness discounting and time across five data points, basing on the Myerson algorithm (Myerson et al., 2001). By doing so, a higher AUC reflects stronger temporal discounting of task aversiveness along with nearing deadline, which means that participants experience a faster decline in subjective aversiveness as execution is delayed, yielding lower effective aversiveness and reduced avoidance behavior. As for the task outcome value, it was theoretically posited as a relatively stable evaluation of the task (Zhang & Feng, 2020; Zhang et al., 2021).”

      References

      Myerson, J., Green, L., & Warusawitharana, M. (2001). Area under the curve as a measure of discounting. Journal of the experimental analysis of behavior, 76(2), 235–243. https://doi.org/10.1901/jeab.2001.76-235

      Xu, T., Zhang, S., Zhou, F., & Feng, T. (2023). Stimulation of left dorsolateral prefrontal cortex enhances willingness for task completion by amplifying task outcome value. Journal of experimental psychology. General, 152(4), 1122–1133. https://doi.org/10.1037/xge0001312

      Zhang, S., Verguts, T., Zhang, C., Feng, P., Chen, Q., & Feng, T. (2021). Outcome Value and Task Aversiveness Impact Task Procrastination through Separate Neural Pathways. Cerebral cortex (New York, N.Y. : 1991), 31(8), 3846–3855. https://doi.org/10.1093/cercor/bhab053

      Zhang, S., Liu, P., & Feng, T. (2019). To do it now or later: The cognitive mechanisms and neural substrates underlying procrastination. Wiley interdisciplinary reviews. Cognitive science, 10(4), e1492. https://doi.org/10.1002/wcs.1492

      Zhang, S., & Feng, T. (2020). Modeling procrastination: Asymmetric decisions to act between the present and the future. Journal of experimental psychology. General, 149(2), 311–322. https://doi.org/10.1037/xge0000643

      (7) Coming back to the point about the statistical analyses not being described in enough detail: One important example of this is the inclusion of random slopes in their mixed-effects model which is unclear. This is highly relevant as omission of random slopes has been repeatedly shown that it can lead to extremely inflated Type 1 errors (e.g., inflating Type 1 errors by a factor of then, e.g., a significant p value of .05 might be obtained when the true p value is .5). Thus, if indeed random slopes have been omitted, then it is possible that significant effects are significant only due to inflated Type 1 error. Without more information about the models, this cannot be ruled out.

      Thank you for sharing this very timely and crucial comment. After careful scrutiny, we identified this statistical flaw you pointed out - each participant was not yet modeled as random slopes but as random intercepts merely. As you kindly suggested, we have reanalyzed all the statistics by adding random slopes (i.e., (1 + day|SubjectID)). Results showed a statistically significant interaction effect for both procrastination willingness (β = -7.8, SE = 1.8, DF = 45.6, p < .001) and actual procrastination rates (β = -7.4, SE = 2.4, DF = 46.6, p = .004), indicating the effectiveness of multi-session neuromodulation in mitigating procrastination. In the post-hoc simple effect analyses, participants who engaged in active neuromodulation (NM) showed a significant increase in task-execution willingness (i.e., decreased procrastination willingness; NM-before: 35.65 ± 30.20, NM-after: 80.43 ± 19.92, t.ratio = 5.4, p < .0001, Tukey correction) and a decrease in actual procrastination rates (NM-before: 43.26 ± 39.09, NM-after: 0.00 ± 0.00, t.ratio = 5.1, p < .0001, Tukey correction), while no such effects were identified for participants in the sham control group (for willingness, SC-before: 37.57 ± 26.46, SC-after: 47.35 ± 30.49, t.ratio =0.3, p = .77, Tukey correction; for actual procrastination, SC-before: 46.47 ± 40.75, SC-after: 33.34 ± 37.82, t.ratio = 0.7, p = .48, Tukey correction). Taken together, we do appreciate your pointing out this definitely crucial statistical weakness, and have confirmed that our findings remain reliable after adjusting for Type 1 error by adding random slopes. Moreover, as you kindly suggested, we have incorporated these statistical details, particularly those concerning the GLMM, into the main text to facilitate your evaluation. Please see specific revisions below:

      Methods Section (Page 8, Line 381-401)

      “To clarify whether multiple-session HD-tDCS neuromodulation can reduce procrastination, the generalized mixed-effects linear model (GLMM) was constructed with full factorial design for subjective procrastination willingness (i.e., self-reported visual analog scores) and actual procrastination behavior (i.e., real-world task-completion rate before deadline). Here, sex, age and socioeconomic status (SES) were modeled as covariates of no interest. As the National Bureau of Statistics (China) issued (https://www.stats.gov.cn/sj/tjbz/gjtjbz/), on the basis of per capita annual household income, the SES was divided into seven hierarchical tiers from 1 (poor) to 7 (rich). To obviate subjective rating bias stemming from individual daily mood, we separately measured participants’ daily emotional fluctuation at 10:00 and 16:00 using a self-rating visual analog item (i.e., “How do feel for your mood today?”, 0 for “completely uncomfortable” and 100 for “definitely happy”). By doing so, the averaged score of those self-rating emotions at the two time points was modeled into the GLMM as covariate of no interests, yielding the final expression of “outcome ~ Group*Treatment_Day + Age + Gender + SES + Emotions + (1 + Treatment_Day | SubjectID)” in the statistical model”. This analysis was implemented using the “lme4” and “lmerTest” packages. Employing “emmeans” package, simple effects were also tested at baseline and post-last-intervention using Tukey-adjusted pairwise comparisons of estimated marginal means from the full GLMM, controlling for covariates and random-effects structure. To validate statistical robustness, instead of continuous outcomes for parametric tests, we also conducted a between-group comparison for the number of tasks that procrastination emerges by using the nonparametric x<sup>2</sup> test with φ correction or Fisher exact test....”

      Results Section (Page 9, Line 428-449)

      “To identify whether ms-tDCS targeting the left DLPFC can alleviate subjective procrastination willingness and actual procrastination behavior, a generalized linear mixed-effects model with Scatterthwaite algorithm was built, with task-execution willingness and actual procrastination rates (PR) as primary outcomes, respectively. For procrastination willingness, results showed a statistically significant interaction effect between multi-session neuromodulations and groups (β = -7.8, SE = 1.8, DF = 45.6, p < .001; Fig. 3A). In the post-hoc simple effect analysis, it demonstrated a significantly increased task-execution willingness (i.e., decreased procrastination willingness) after neuromodulation in the active neuromodulation group (NM-before: 35.65 ± 30.20, NM-after: 80.43 ± 19.92, t.ratio = 5.4, p < .0001, Tukey correction), but no such effects were identified in the sham control group (SC-before: 37.57 ± 26.46, SC-after: 47.35 ± 30.49, t.ratio =0.3, p = .77, Tukey correction) (Fig. 3B-C). A linear uptrend for task-execution willingness was further observed across multiple sessions in the active NM group, indicating gradually increasing neuromodulation effects (Fig. 3D; p < .01, Mann-Kendall test). For actual procrastination behavior, changes to actual procrastination rates across all the sessions have been detailed in the Fig. 3E. Similarly, a statistically significant interaction effect was identified here (β = -7.4, SE = 2.4, DF = 46.6, p = .004), and the simple effect analysis further revealed decreased actual procrastination rates after ms-tDCS in the active neuromodulation group (NM-before: 43.26 ± 39.09, NM-after: 0.00 ± 0.00, t.ratio = 5.1, p < .0001, Tukey correction), but no such prominent changes found in the sham control group (SC-before: 46.47 ± 40.75, SC-after: 33.34 ± 37.82, t.ratio = 0.7, p = .48, Tukey correction) (Fig. 3F-G). Also, a significant downtrend for procrastination rates across all the sessions was identified in the active NM group (Fig. 3H; p < .01, Mann-Kendall test).”

      (8) Related to the previous point: The authors report, for example, on the first results page, line 420, an F-test as F(1, 269). This means the test has 269 residual degrees of freedom despite a sample size of about 50 participants. This likely suggests that relevant random slopes for this test were omitted, meaning that this statistical test likely suffers from inflated Type 1 error, and the reported p-value < .001 might be severely inflated. If that is the case, each observation was treated as independent instead of accounting for the nestedness of data within participants. The authors should check this carefully for this and all other statistical tests using mixed-effects models.

      Thank you for underlining this very timely and helpful comment. As you correctly pointed out above, we did not include random slopes in the original GLMM, highly risking the inflation of the false-positive rate (i.e., Type-I error). By adding the random slopes, we reanalyzed all the statistics from the GLMM, and confirmed that all the findings are still reliable from those new GLMMs with random slopes. Again, thank you for this crucial statistical advice, and please see the above response for full details regarding what we have revised to address this comment you kindly raised.

      (9) Many of the statistical procedures seem quite complex and hard to follow. If the results are indeed so robust as they are presented to be, would it make sense to use simpler analysis approaches (perhaps in addition to the complex ones) that are easier for the average reader to understand and comprehend?

      We do thank you for this practical and helpful comment. In the original manuscript, we incorporated a joint model of longitudinal and survival data (JM-LSD), in conjunction with machine learning algorithms, to strengthen the robustness of our statistical findings. Nevertheless, we all agree with you on this point: there is no need to complicate the analyses by repeatedly probing the same research question to increase methodological robustness, at the expense of compromising readability and intelligibility for a broader audience. As you suggested, we have removed these complicated statistical methods, and merely maintained the primary ones - GLMM and X<sup>2</sup> cross-tab test, as well as a complementary one - Mann-Kendall linear trend test. Thus, we have almost rewritten the whole Results section. Please see the specific instances below:

      Results Section (Page 9, Line 468-485)

      “Ms-tDCS changes task aversiveness and task-outcome value

      Both task aversiveness and task outcome value serve as key pathways determining whether one would procrastinate. To this end, we further utilized a generalized linear mixed-effects model to examine the effects of ms-tDCS on changes in task aversiveness and task outcome value. Task aversiveness changes across all the sessions are shown in the Fig. 4A and 4C. We demonstrated a statistically significant decrease in task aversiveness and an increase in task outcome value via ms-tDCS in the neuromodulation group (Task aversiveness: interaction effect, β = -0.12, SE = 0.04, DF = 46.7, p = .002; simple effect, NM-before <sub>(AUC)</sub>: 1.13 ± 0.53, NM-after <sub>(AUC)</sub>: 1.95 ± 0.85, t.ratio = 4.5, p < .001, Tukey correction; Outcome value: β = -6.8, SE = 1.74, DF = 46.2, p < .001; simple effect, NM-before: 35.86 ± 27.82, NM-after: 73.08 ± 23.33, t.ratio = 5.0, p < .001, Tukey correction; see Fig. 4B), but not in the sham control group (Task aversiveness: SC-before <sub>(AUC)</sub>: 1.07 ± 0.51, SC-after <sub>(AUC)</sub>: 1.28 ± 0.46, t.ratio = 1.3, p = .20, Tukey correction; Outcome value: SC-before: 34.00 ± 25.17, SC-after: 40.13 ± 28.94, t.ratio = 0.8, p = .41, Tukey correction; see Fig. 4D). In the neuromodulation (NM) group, task aversiveness steadily decreased with the cumulative number of stimulation sessions, while perceived task outcome value increased significantly (see Fig. 4E-F, p < .05, Mann-Kendall test). Thus, it provides causal evidence clarifying that neuromodulation to left DLPFC reduces task aversiveness and enhances task-outcome value meanwhile.”

      Results Section (Page 10, Line 525-542)

      “Long-term effects of ms-tDCS

      We have also attempted to conduct a follow-up investigation to test the long-term retention of ms-tDCS in reducing actual procrastination. Almost all the participants had undergone follow-up except one in the neuromodulation group after last neuromodulation for 6 months (N<sub>NM</sub> = 22, N<sub>SC</sub> = 23). Thus, the GLMM was constructed, with the PR before first neuromodulation vs. PR after last neuromodulation for 6 months as covariates of interest. Results showed the statistically significant group*time interaction effects (β = 16.5, SE = 9.9, p = .049). Simple-effect model demonstrated a decrease in actual procrastination rates in the active neuromodulation group after last stimulation for 6 months compared to baseline (β = -22.05, SE = 10.0, p = .038, Tukey correction; NM-before: 40.68 ± 37.96, NM-after<sub>6-months</sub>: 18.63 ± 29.80), and revealed null effects in the SC group (β = 1.26, SE = 9.78, p = .99, Tukey correction; SC-before: 46.47 ± 40.75, SC-after<sub>6-months</sub>: 47.73 ± 39.18) (see Fig. 6).. Furthermore, using a nonparametric x<sup>2</sup> test to compare differences in the number of procrastinated tasks, we still found a statistically significant reduction in procrastination frequency in NM group after neuromodulation for 6 months compared to baseline (x<sup>2</sup> = 3.30, p = .035, NM-before: 68.19% (15/22), NM-after<sub>6-months</sub>: 40.91% (9/22)), while no significant changes were observed in the SC group (x<sup>2</sup> = 0.11, p = .74, SC-before: 69.56% (16/23), SC-after<sub>6-months</sub>: 73.91% (17/23)). Therefore, beyond to short-term effects, the benefits of ms-tDCS neuromodulation to reduce procrastination pose the long-term retention.”

      (10) As was noted by an earlier reviewer, the paper reports nearly exclusively about the role of the left DLPFC, while there is also work that demonstrates the role of the right DLPFC in self-control. A more balanced presentation of the relevant scientific literature would be desirable.

      We are grateful to you for noticing the unbalanced presentation of the literature on left DLPFC. As you kindly suggested, we have added literature to support the association between self-control and the right lateralization of the DLPFC. Please see below for what we have revised:

      Introduction Section (Page 4, Line 137-143)

      “...In addition to the left lateralization, there is solid evidence indicating significant associations between self-control and the right DLPFC indeed, particularly given that this region specifically functions in top-down regulation, future self-continuity representation and social decisions (Huang et al., 2025; Lin and Feng, 2024; Knoch & Fehr, 2007). Despite this case, Xu and colleagues demonstrated null effects of anodally stimulating the right DPFC to modulate either value evaluation or emotional regulation for changing procrastination willingness (Xu et al., 2023).”

      (11) Active stimulation reduced procrastination, reduced task aversiveness, and increased the outcome value. If I am not mistaken, the authors claim based on these results that the brain stimulation effect operates via self-control, but - unless I missed it - the authors do not have any direct evidence (such as measures or specific task measures) that actually capture self-control. Thus, that self-control is involved seems speculation, but there is no empirical evidence for this; or am I mistaken about this? If that is indeed correct, I think it needs to be made explicit that it is an untested assumption (which might be very plausible, but it is still in the current study not empirically tested) that self-control plays any role in the reported results.

      We truly appreciate your pointing out this weakness with regard to conceptualization. Yes, you are correct in understanding this causal chain: we conceptually speculate that the HD-tDCS stimulation over the left DLPFC operates self-control to change procrastination, rather than empirically validating this component in the chain: brain stimulation→increased self-control→increased task outcome value→decreased procrastination. In this causal chain, we did not collect data to directly measure self-control at either baseline or post-neuromodulation times. Therefore, we all agree with your suggestion to explicitly claim this case in the main text. Following this advice, we have redrawn a portion of the Conclusion by clearly pointing out the hypothesis-generating role of self-control in mitigating procrastination, and have further claimed this case in the Limitation section:

      Abstract Section (Page 2, Line 55-57)

      “... This establishes a precise, value-driven neurocognitive pathway to account the conceptualized roles of self-control on procrastination, and offers a validated, theory-driven strategy for interventions.”

      Results Section (Page 10, Line 489-492 and 520-522)

      “Given the dual neurocognitive pathways identified above—reduced task aversiveness and increased task-outcome value—we proposed that these changes, conceptually driven by enhanced self-control via ms-tDCS over left DLPFC, account for how neuromodulation reduces procrastination. ...”

      “In summary, these findings demonstrated a mechanistic pathway underlying procrastination: the self-control that was conceptualized to be governed by left DLPFC mitigate procrastination by plausibly increasing task-outcome value.”

      Discussion Section (Page 13, Line 642-645)

      “Moreover, this study did not collect data for assessing participants’ self-control at either baseline or post-neuromodulation, thereby limiting our ability to determine whether the effects on procrastination were uniquely attributable to neuromodulation-induced changes in self-control. ...”

      (12) Figures 3F and 3H show that procrastination rates in the active modulation group go to 0 in all participants by sessions 6 and 7. This seems surprising and, to be honest, rather unlikely that there is absolutely no individual variation in this group anymore. In any case, this is quite extraordinary and should be explicitly discussed, if this is indeed correct: What might be the reasons that this is such an extreme pattern? Just a random fluctuation? Are the results robust if these extreme cells are ignored? The authors remove other cells in their design due to unusual patterns, so perhaps the same should be done here, at least as a robustness check.

      Thank you for raising this highly important and helpful comment. Indeed, we fully understand that this result is somewhat extraordinary, a fact that was equally striking to us when unblinding the data. After carefully scrutinizing the data and statistics, we are thrilled to confirm that this pattern is true. In support of this observation, we were gratified to receive numerous thank-you letters from participants who engaged in active neuromodulation. They expressed gratitude to us, and reported that they have substantially ameliorated procrastination behavior in real-life activities after completing the trial. While this does not constitute formal scientific evidence, we are also glad to see the benefits of this neuromodulation for those procrastinators.

      Two reasons could account for this pattern herein. One interpretation is to attribute this pattern to “scalar inflation”. In the present study, the procrastination rate was calculated as 1 minus the task-completion rate (e.g., 80%, 60%, 40%) by the deadline. At sessions # 6 and #7, all the participants completed their real-life tasks before the deadline, yielding a 0% (1 minus 100% completion rate) procrastination rate, without any between-individual variation. Thus, rather than there being no individual variation in procrastination, this scalar – the procrastination rate - is too insensitive to capture subtle differences per se. For instance, although participants #1 and #2 both showed a 0% procrastination rate - meaning that both completed their tasks before the deadline - Participant #1 might have completed it 3 hours before the deadline, whereas Participant #2 might have completed it only 10 minutes before. In this case, the “scalar inflation” emerges to let us perceive that both participants have equivalent procrastination rates, although participant #2 may have a higher procrastination level than #1. As conceptually defined in the field, procrastination is contextualized as “not completing a task before the deadline”. Thus, if this task is completed before the deadline, regardless of whether it was finished close to or far in advance of the deadline, this case is defined as “no procrastination”. In the present study, the primary outcome is whether a participant procrastinated on a real-life task before the deadline in real-world settings, irrespective of when she/he completed this task. Thus, this scalar - procrastination rate - fits our conceptualization of procrastination.

      Another reason is the potential accumulative effects from sequential multi-session tDCS stimulation. As shown in Mann-Kendall trend tests, the procrastination rates show a significant linear downtrend in the active neuromodulation group across sessions, even after removing sessions #6 and #7. This indicates that the improvements of going against procrastination may be sequentially accumulative along with the increase in sessions, implying a potential “dose-dependent effect”. Despite a speculative interpretation, this “dose-dependent effect” in neuromodulation has been well-documented in previous studies, showing the robustly linear association between the number of sessions and effectiveness (c.f., Cole et al., 2020; Hutton et al., 2023; Sabé et al., 2024; Schulze et al., 2018). Therefore, although this extreme pattern is somewhat extraordinary compared to previous observations, it makes sense.

      Yes, this is a definitely great idea to carry out a robustness check by removing sessions #6, #7, or both. We do believe that this analysis could support statistical robustness to go against potential biases from extreme cells. By doing so, we found that all the group*treatment_day interaction effects remained significant when removing either session #6 or session #7 (or even both, all p-values < .05), indicating high statistical robustness. Please see Supplementary table S3 and S4

      Taken together, in spite of their being extraordinary, we confirm that those findings are statistically robust to extreme outliers. As you kindly suggested, we have added those findings of the robustness check into the revised Supplemental Materials section.

      References

      Cole, E. J., Stimpson, K. H., Bentzley, B. S., Gulser, M., Cherian, K., Tischler, C., Nejad, R., Pankow, H., Choi, E., Aaron, H., Espil, F. M., Pannu, J., Xiao, X., Duvio, D., Solvason, H. B., Hawkins, J., Guerra, A., Jo, B., Raj, K. S., Phillips, A. L., … Williams, N. R. (2020). Stanford Accelerated Intelligent Neuromodulation Therapy for Treatment-Resistant Depression. The American journal of psychiatry, 177(8), 716–726. https://doi.org/10.1176/appi.ajp.2019.19070720

      Hutton, T. M., Aaronson, S. T., Carpenter, L. L., Pages, K., Krantz, D., Lucas, L., Chen, B., & Sackeim, H. A. (2023). Dosing transcranial magnetic stimulation in major depressive disorder: Relations between number of treatment sessions and effectiveness in a large patient registry. Brain stimulation, 16(5), 1510–1521. https://doi.org/10.1016/j.brs.2023.10.001

      Sabé, M., Hyde, J., Cramer, C., Eberhard, A., Crippa, A., Brunoni, A. R., Aleman, A., Kaiser, S., Baldwin, D. S., Garner, M., Sentissi, O., Fiedorowicz, J. G., Brandt, V., Cortese, S., & Solmi, M. (2024). Transcranial Magnetic Stimulation and Transcranial Direct Current Stimulation Across Mental Disorders: A Systematic Review and Dose-Response Meta-Analysis. JAMA network open, 7(5), e2412616. https://doi.org/10.1001/jamanetworkopen.2024.12616

      Schulze, L., Feffer, K., Lozano, C., Giacobbe, P., Daskalakis, Z. J., Blumberger, D. M., & Downar, J. (2018). Number of pulses or number of sessions? An open-label study of trajectories of improvement for once-vs. twice-daily dorsomedial prefrontal rTMS in major depression. Brain stimulation, 11(2), 327–336. https://doi.org/10.1016/j.brs.2017.11.002

      (13) The supplemental materials, unfortunately, do not give more information, which would be needed to understand the analyses the authors actually conducted. I had hoped I would find the missing information there, but it's not there.

      Sorry to offer uninformative supplemental materials (SM) in the original submission. As you suggested, we have added a substantial number of details to clarify how we conducted data analyses in the main text, and also tightened the whole SM section to improve readability and comprehensibility. We do hope that this revised manuscript could offer clear and adequate information in understanding methods and statistics for broader readers.

      In sum, the reported/cited/discussed literature gives the impression of being incomplete/selectively reported; the analyses are not reported sufficiently transparently/fully to evaluate whether they are appropriate and thus whether the results are trustworthy or not. At least some of the patterns in the results seem highly unlikely (0 procrastination in the verum group in the last 2 observation periods), and the sample size seems very small for a between-subjects design.

      Thank you for this very helpful summary. As you kindly suggested above, we have overhauled this manuscript to address those points that you listed here, particularly where we added relevant literature to balance our claims, added a huge amount of details to sufficiently/transparently report statistics, and conducted a robustness check to confirm the statistical robustness of our findings to those plausible extreme patterns (sessions #6 and #7), as well as justified how we determined this sample size fulfilling medium statistical power in a priori. Please see above for full details regarding how we addressed those comments, point-by-point.

      Reviewer #2 (Public Review):

      Chen and colleagues conducted a cross-sectional longitudinal study, administering high-definition transcranial direct stimulation targeting the left DLPFC to examine the effect of HD-tDCS on real-world procrastination behavior. They find that seven sessions of active neuromodulation to the left DLPFC elicited greater modulation of procrastination measures (e.g., task-execution willingness, procrastination rates, task aversiveness, outcome value) relative to sham. They report that tDCS effects on task-execution willingness and procrastination are mediated by task outcome value and claim that this neuromodulatory intervention reduces procrastination rates quantified by their task. Although the study addresses an interesting question regarding the role of DLPFC on procrastination, concerns about the validity of the procrastination moderate enthusiasm for the study and limit the interpretability of the mechanism underlying the reported findings.

      Strengths:

      (1) This is a well-designed protocol with rigorous administration of high-definition transcranial direct current stimulation across multiple sessions. The approach is solid and aims to address an important question regarding the putative role of DLPFC in modulating chronic procrastination behavior.

      (2) The quantification of task aversiveness through AUC metrics is a clever approach to account for the temporal dynamics of task aversiveness, which is notoriously difficult to quantify.

      Thank you for taking your invaluable time to review our manuscript, warmly applauding the strength in research design and the conceptualization of scaling task aversiveness, as well as kindly sharing such helpful and insightful evaluations. As you correctly pointed out, we are aware of the absence of detailed, clear and understandable reporting of measures (e.g., real-world procrastination), statistics and methods, in the original manuscript. Following all your suggestions, we have thoroughly revised this manuscript to address those comments that you kindly made, point-by-point. Please see the full response underneath.

      Weaknesses:

      (1) The lack of specificity surrounding the "real-world measures" of procrastination is problematic and undermines the strength of the evidence surrounding the DLPFC effects on procrastination behavior. It would be helpful to detail what "real-world tasks" individuals reported, which would inform the efficacy of the intervention on procrastination performance across the diversity of tasks. It is also unclear when and how tasks were reported using the ESM procedure. Providing greater detail of these measures overall would enhance the paper's impact.

      We genuinely appreciate your raising this very crucial comment. We are sorry for omitting a tremendous number of methodological details to comply with the editorial requirement on the manuscript’s length, which hampered the comprehension of how we measure “real-life tasks” and “real-world procrastination”.

      As shown in the schematic diagram for experimental procedure (Fig. 1), the experimental protocol alternated between Neuromodulation Days (Days 2, 4, 6, 8, 10, 12, 14) and Task Days (Days 1, 3, 5, 7, 9, 11, 13, 15). On each Neuromodulation Day, participants received either active or sham HD-tDCS, and—critically—before stimulation—were instructed to specify a real-life task they were required to complete the following day, with a deadline between 18:00 and 24:00. This ensured ≥24 hours between neuromodulation and task execution, isolating offline after-effects. For instance, on Day #2 (Neuromodulation Day), before carrying out stimulation, participants were asked to report a real-life task that has a deadline within 18:00 - 24:00 for tomorrow’s “task day” (Day #3) (please see the schematic diagram in Author response image 2).

      Author response image 2.

      There are some real-life tasks that they reported in our experiment as examples: “Complete and submit a homework assignment”, “Complete a standardized English proficiency test”, “Complete an online course module required for applying a Class C driver’s license”, “Prepare slides for a seminar presentation”, “Practice guitar”, “Practice Chinese calligraphy”, and “Do the laundry”. Reported tasks spanned academic (e.g., submitting an assignment), occupational (e.g., preparing a presentation), administrative (e.g., applying for a license), self-improvement (e.g., practicing guitar for ≥30 min), domestic (e.g., laundry), and health-related domains (e.g., running ≥ 2,000m for exercise), indicating a plausible task diversity.

      On each “task day”, participants engaged in an intensive Experience Sampling Method (iESM) protocol via a custom-built mobile app. Using this app, participants were required to report a subjective task-execution willingness score (i.e., a one-item 100-point visual analog scale, “How willing are you to do this task?”, 0 for “I will definitely procrastinate this task” and 100 for “I will take action to complete this task immediately”; procrastination willingness = 100 – the task-execution willingness score), the subjective task aversiveness (i.e., a one-item 100-point visual analog scale), the subjective task outcome value (i.e., a one-item 100-point visual analog scale), and the objective procrastination rate, respectively.

      Rather than self-reported scores from those one-item visual analog scales, we asked participants to report real “task completion rate” for the objective quantification of the “real-world procrastination behavior”. Specifically, at the deadline, each participant was asked to report whether she/he had completed this task. If she/he reported not having yet completed the task (i.e. procrastination behavior emerged), she/he was further required to report the percentage of the task completed (1% - 99%), which was defined as the task completion rate. By doing so, we could calculate the real-world procrastination rate for the real-life task as the “1 – the task completion rate”. For instance, if a participant did not complete her/his real-life task before the deadline (i.e. she/he procrastinated this task) and reported completing 75% of this task at the deadline, her/his real-world procrastination rate was computed as the 25% (1 - 75%) (Please see the schematic diagram in Author response image 3).

      Moreover, rather than merely a self-reported task completion rate, each participant was also asked to upload proof (e.g., screenshots of submitted assignments, photos of printed documents, system timestamps) to the ESM digital system for validation.

      Author response image 3.

      To determine the sampling time points for this mobile app in the ESM, we capitalized on both the conceptual temporal decision model and the statistical Myerson algorithm. Specifically, the Temporal Decision Model (TDM) was originally proposed by our team (Xu et al., 2023; Zhang et al., 2019, 2020, 2021), which theoretically conceptualizes procrastination as the failure of the trade-off between task outcome value (i.e., motivation to take actions now for pursuing task reward) and task aversiveness (i.e., motivations for avoiding taking action now for avoiding negative experiences). Once task aversiveness overrides the pursuits of task outcome values, procrastination emerges. One overarching hypothesis in this theoretical model is that the task aversiveness is hyperbolically discounted when approaching the deadline: it would be discounted sharply when far from the deadline but discounted slowly when nearing the deadline (Zhang et al., 2019). To maximize statistical power to fit dynamic motivational curves, we employed a log-spaced temporal sampling scheme (Myerson et al., 2001) (please see the schematic diagram in https://uen.pressbooks.pub/behavioraleconomics/chapter/the-reality-of-homo-sapiens, where each point indicates a sampling time):

      By this fitting algorithm (Myerson et al., 2001), five time points were selected to fulfill the statistical prerequisites for hyperbolic model fitting, with increasing sampling density toward the deadline (e.g., for a task due at 20:00: sampled at 10:00, 16:00, 18:00, 19:30, 20:00). Once the task-specific five sampling time points were determined per participant, this mobile app sent a digital message to ask her/him to immediately report the task aversiveness and the task outcome value then. As the primary outcomes, the procrastination rate (i.e., 1 – the task completion rate) and the procrastination willingness were sampled at the deadline point.

      Furthermore, yes, we fully concur with you on this great idea, that is, transparency about task diversity strengthens the generalizability of our findings. In response, we have tabulated these real-life tasks that were reported in this experiment in the independent Appendix 1, with automatic translations from Chinese to English via Qwen GPT. Please see below for what we have added to the main text:

      Methods Section (Page 6-7, Line 238-308)

      “Nested cross-sectional longitudinal design

      This study used a nested cross-sectional longitudinal design to investigate whether the multiple-session anodal HD-tDCS targeting the left DLPFC could reduce actual procrastination behavior and to probe how this effect manifests. To assess procrastination in daily life, we implemented a 15-day protocol alternating between Neuromodulation Days (Days 2, 4, 6, 8, 10, 12, 14) and Task Days (Days 1, 3, 5, 7, 9, 11, 13, 15). On the Neuromodulation days, the 20-min anodal HD-tDCS neuromodulation targeting the left DLPFC was performed for HD-tDCS active group at intervals of 2 days, while the sham-control group received sham HD-tDCS training. This HD-tDCS training was repeated for a total of seven sessions, and lasted 15 days (see Fig. 1a). Crucially, to capture procrastination in ecologically valid contexts, prior to receiving either active or sham HD-tDCS (administered between 09:00–18:00), participants were instructed to specify a real-life task they were personally obligated to complete the following day, with a self-defined deadline strictly constrained to 18:00–24:00 to ensure ≥24 hours between stimulation offset and task deadline, thereby isolating offline after-effects. This task should meet the following three criteria: (a) it should be already assigned in the real-world settings; (b) deadline should be constrained to 18:00-24:00 (see above); (c) it should be more likely to induce procrastinate. By doing so, more than 300 real-life tasks were collected, spanning academic (e.g., “submit a statistics homework assignment”), occupational (e.g., “draft and email a project proposal”), administrative (e.g., “complete online application for Class C driver’s license”), self-improvement (e.g., “practice guitar for ≥30 minutes”), domestic (e.g., “do laundry ”), and health-related (e.g., “running 2,000m for exercise”). Full task list has been tabulated in the Appendix 1. As primary outcomes, all the participants were required to reported task-execution willingness (TEW) (Zhang & Feng, 2020; Zhang, Liu, et al., 2019), for a real-life task 24 hours post-neuromodulation. Thus, procrastination willingness was quantified as 100-TEW score (see underneath for details). Furthermore, we asked participants to report the actual task completion rate (CR) of the task at the deadline (e.g. participant A finished 90% homework at deadline and reported this situation to us at deadline). In this vein, the actual procrastination rate (PR) was quantified as 1-CR.

      On the Task day, we developed a mobile app to implement experience sampling method (ESM) for tracking one’s real-time evaluation of task aversiveness and task outcome value (see Fig. 1). The task aversiveness describes how disagreeable one perceives performing a given real-life task to be, whereas outcome value refers to the subjective benefits of the task outcome brought about by completing the task before the deadline (Zhang & Feng, 2020). As theoretically conceptualized by the temporal decision model (TDM) of procrastination, the perceived task aversiveness is hyperbolically discounted when approaching deadline, showing sharply discounting when faring away from deadline but slowly discounting once nearing deadline (Zhang & Feng, 2020; Zhang et al., 2021). Thus, considering this nonlinear dynamics inherent in this hyperbolic discounting, the five recording moments of ESM were selected per task a prior by using a log-spaced temporal sampling scheme (Myerson et al., 2001), with increasing sampling density toward the deadline, such as moments of 10:00 (earliest), 16:00, 18:00, 19:30, 20:00 (deadline). The five sampling points could meet statistical prerequisite in the hyperbolic model fitting (requiring ≥ 4 points; Green & Myerson, 2004). To do so, recording moments of tasks were individually tailored for each task per participant in this ESM procedure. To obviate the confounds of daily emotions in task aversiveness evaluation, we used the averaged scores of PANAS at 10:00 (noon) and 16:00 (afternoon) as anchoring points to quantify one’s daily emotions by using this ESM app. Before each session of HD-tDCS training, each participant was required to report a real-life task whose deadline is tomorrow. To obtain the long-term effect of HD-tDCS (i.e., the interval between HD-tDCS and task completion is at least 24 hours), the task deadline that participants reported was required to be between 18:00 - 24:00. Once a sampling time reached, this app would send a digital message to require participants to fill online form for data collection.

      Quantification of covariates of interests

      Outcome variables of this study were twofold: one is task-execution willingness and another is procrastination rate (PR). Task-execution willingness is used to evaluate one’s subjective inclination to avoid procrastination (Zhang & Feng, 2020). In this vein, we used a 100-point scale to require participants to report their task-execution willingness (0 for “I will definitely procrastinate this task” and 100 for “I will take action to complete this task immediately”). This metric was recorded 24 hours after neuromodulation to examine its long-term effects. PR is used to quantify the extent to which one task has been procrastinated, and was calculated as 1 - CR (task completion rate). Critically, at the precise deadline, the app prompted participants to (a) indicate task completion status (yes/no), and if incomplete, (b) report the percentage completed (1–99%), defined as the Task CR, while simultaneously uploading objective evidence (e.g., screenshots of submitted files, photos of physical outputs, system-generated logs, or app-exported records). If the task was actually completed before the deadline, the CR would be 100% and the PR would be calculated as 0% (1-CR). PR was recorded at the actual task deadline for each participant. We were also interested in re-investigating their actual procrastination by using PR 6 months after the last neuromodulation to test the long-term retention of this neuromodulation effect.”

      References

      Myerson, J., Green, L., & Warusawitharana, M. (2001). Area under the curve as a measure of discounting. Journal of the experimental analysis of behavior, 76(2), 235–243. https://doi.org/10.1901/jeab.2001.76-235

      Xu, T., Zhang, S., Zhou, F., & Feng, T. (2023). Stimulation of left dorsolateral prefrontal cortex enhances willingness for task completion by amplifying task outcome value. Journal of experimental psychology. General, 152(4), 1122–1133. https://doi.org/10.1037/xge0001312

      Zhang, S., Verguts, T., Zhang, C., Feng, P., Chen, Q., & Feng, T. (2021). Outcome Value and Task Aversiveness Impact Task Procrastination through Separate Neural Pathways. Cerebral cortex (New York, N.Y. : 1991), 31(8), 3846–3855. https://doi.org/10.1093/cercor/bhab053

      Zhang, S., Liu, P., & Feng, T. (2019). To do it now or later: The cognitive mechanisms and neural substrates underlying procrastination. Wiley interdisciplinary reviews. Cognitive science, 10(4), e1492. https://doi.org/10.1002/wcs.1492

      Zhang, S., & Feng, T. (2020). Modeling procrastination: Asymmetric decisions to act between the present and the future. Journal of experimental psychology. General, 149(2), 311–322. https://doi.org/10.1037/xge0000643

      (2) Additionally, it is unclear whether the reported effects could be due to differential reporting of tasks (e.g., it could be that participants learned across sessions to report more achievable or less aversive task goals, rather than stimulation of DLPFC reducing procrastination per se). It would be helpful to demonstrate whether these self-reported tasks are consistent across sessions and similar in difficulty within each participant, which would strengthen the claims regarding the intervention.

      Thank you for raising this very crucial comment. We indeed agree with you on this point that the reported effects may vary with task difficulties and task-execution proficiency, which potentially confound the effects of stimulation on mitigating procrastination. As you correctly comment, given no data collection on difficulties or other relevant characteristics of tasks, we cannot completely rule out this confounder in interpreting our findings on the one hand. As a result, we have explicitly claimed this limitation in the Discussion section.

      On the other hand, despite no quantitative evidence, this risk of confounding main effects with disparities in task characteristics was controlled experimentally. As we reported above, all the reported tasks were mandated to meet three criteria: (a) they were already assigned in the real-world settings; (b) the deadline was constrained to 18:00-24:00; (3) they were likely to lead to procrastinate. To do so, each participant was clearly instructed to report a real-life task that was more likely to be procrastinated in real-world settings, and was not allowed to report easy, achievable and cost-less tasks. Supporting this case, those reported tasks were found spanning academic (e.g., submitting an assignment), occupational (e.g., preparing a presentation), administrative (e.g., applying for a license), self-improvement (e.g., practicing guitar for ≥30 min), domestic (e.g., laundry), and health-related domains (e.g., running ≥ 2,000m for exercise), indicating a plausible task diversity and difficulty. This was resonated by observing the high within-subject task homogeneity. For instance, for Participant #5, she/he reported the tasks that were almost all around academic activities across all the sessions. Therefore, as the task list reported (please see Appendix 1), these self-reported tasks were plausibly consistent across sessions and similar in difficulty within each participant.

      In addition, as we tested, almost all the participants reported they were receiving treatment, with 91.30% (21/23) for the active neuromodulation group (NM) and with 86.95% (20/23) for the sham control group (SC) (x<sup>2</sup> = 0.224, p = .636), indicating the effectiveness of the double-blinding methods. If participants learned across sessions to report more achievable or less aversive task goals, their procrastination willingness and procrastination rates for their reported tasks would all increasingly decrease, irrespective of whether they were in the active neuromodulation-effect group or the sham group. However, no such effects - procrastination willingness and procrastination rates for their reported tasks increasingly decreasing across sessions - existed in the sham control group (Mann-Kendall test, for procrastination willingness, tau = 0.60, p = .13; for procrastination rate, tau = 0.61, p = .13), indicating no statistically significant learning effect or strategic effect on task performance. Again, thank you for this very crucial comment, and we do hope these clarifications could address it.

      Limitations Section (Page 12, Line 637-640)

      “In addition, despite instructing to report valid real-life tasks with high probabilities to procrastinate, we had not yet measured the task difficulty and consistency across sessions for each participant. Consequently, interpreting the effects of neuromodulation to mitigate procrastination as “unique contributions” should warrant cautions. ...”

      (3) It would be helpful to show evidence that the procrastination measures are valid and consistent, and detail how each of these measures was quantified and differed across sessions and by intervention. For instance, while the AUC metric is an innovative way to quantify the temporal dynamics of task-aversiveness, it was unclear how the timepoints were collected relative to the task deadline. It would be helpful to include greater detail on how these self-reported tasks and deadlines were determined and collected, which would clarify how these procrastination measures were quantified and varied across time.

      We do appreciate your highlighting the importance of clarifying how to measure procrastination, substantially helping readers to interpret these findings. As reported above, the primary outcomes of this experiment included subjective procrastination willingness and objective actual procrastination rate. For the subjective procrastination willingness, using the purpose-built mobile app, participants were required to report subjective task-execution willingness score (i.e., one-item 100-point visual analog scale, “How willing are you to do this task?”, 0 for “I will definitely procrastinate this task” and 100 for “I will take action to complete this task immediately”). Thus, the procrastination willingness was computed as “100 – the task-execution willingness score”. For the objective procrastination rate, rather than self-reported scores from those one-item visual analog scales, we asked participants to report the real “task completion rate from 1% to 99%” for the objective quantification of the “real-world procrastination behavior”. Full details can be found in Response #1.

      For determining sampling time points for the quantification of AUC, we capitalized on both the conceptual Temporal Decision Model and the statistical Myerson algorithm. Specifically, the Temporal Decision Model (TDM) was originally proposed by our team (Xu et al., 2023; Zhang et al., 2019, 2020, 2021), which theoretically conceptualizes procrastination as the failure of the trade-off between task outcome value (i.e., motivation to take actions now for pursuing task reward) and task aversiveness (i.e., motivations for avoiding taking action now for avoiding negative experiences). Once task aversiveness overrides the pursuits of task outcome values, the procrastination emerges. One overarching hypothesis in this theoretical model is that the task aversiveness is hyperbolically discounted when approaching the deadline: it would be discounted sharply when being far from the deadline but discounted slowly when nearing the deadline (Zhang et al., 2019). To maximize statistical power to fit dynamic motivational curves, we employed a log-spaced temporal sampling scheme (Myerson et al., 2001). By this fitting algorithm (Myerson et al., 2001), five time points were selected to fulfill the statistical prerequisites for hyperbolic model fitting, with increasing sampling density toward the deadline (e.g., for a task due at 20:00: sampled at 10:00, 16:00, 18:00, 19:30, 20:00).

      Once the task-specific five sampling time points were determined per participant, this mobile app sent a digital message to ask her/him to immediately report the task aversiveness and the task outcome value then. After capturing the task aversiveness from those five time points, the task aversiveness discounting was calculated as 1- (A(t) / A(earliest)), where t(earliest) was the earliest sampling point (e.g., 10:00), serving as the reference for immediate execution. Subsequently, using the GraphPad Prisma software (v9, 525), we estimated the AUC from those five data points based on the Myerson algorithm (Myerson et al., 2001), which was computed via the trapezoidal integration between task aversiveness discounting and time. By this modelling method, a higher AUC reflects stronger temporal discounting of task aversiveness, which means that participants experience a faster decline in subjective aversiveness as execution is delayed, yielding lower effective aversiveness and reduced avoidance behavior. That is to say, if a participant showcases a greater discounting of task aversiveness as reflected by a higher AUC, she/he experiences a more pronounced reduction in subjective aversiveness upon postponement, plausibly yielding less procrastination.

      Taken together, following your suggestion, we have added a substantial number of details to clarify how to measure procrastination, when to sample the data and how to estimate the AUC into the revised manuscript. Please see them in Response #1.

      (4) There are strong claims about the multi-session neuromodulation alleviating chronic procrastination, which should be moderated, given the concerns regarding how procrastination was quantified. It would also be helpful to clarify whether DLPFC stimulation modulates subjective measures of procrastination, or alternatively, whether these effects could be driven by improved working memory or attention to the reported tasks. In general, more work is needed to clarify whether the targeted mechanisms are specific to procrastination and/or to rule out alternative explanations.

      Yes, we fully agree with you on this consideration: we should tone down the conclusions currently claimed in the main text, given the inherent shortcomings mentioned above. As you helpfully suggested, we have moderated our overall claims regarding the effects of multi-session neuromodulation in alleviating chronic procrastination. Please see specific instances below:

      Abstract Section (Page 2, Line 55-57)

      “... This establishes a precise, value-driven neurocognitive pathway to account the conceptualized roles of self-control on procrastination, and potentially offers a validated, theory-driven strategy for interventions.”

      Conclusion Section (Page 13, Line 657-664)

      “In conclusion, this study potentially provides an effective way to reduce both procrastination willingness and actual procrastination behavior by using neuromodulation on the left DLPFC. Furthermore, such effects have been observed for 2-day-interval long-term after-effects, and were also found for 6-month long-term retention in part. More importantly, this study identified that the ms-tDCS neuromodulation could decrease task aversiveness and increase task outcome value while, and further demonstrated that the increased task outcome value could predict decreased procrastination, a relationship conceptually driven by enhancing self-control. In this vein, the current study enriches our understanding of neurocognitive mechanism of procrastination by showing the prominent role of increased task outcome value in reducing procrastination. Also, it may provide an effective method for intervening in human procrastination.”

      Moreover, yes, as we clarified above, in addition to the objective measure of procrastination behavior, we also leveraged a one-item visual analog scale (i.e. one-item 100-point visual analog scale, “How willing are you to do this task?”, 0 for “I will definitely procrastinate this task” and 100 for “I will take action to complete this task immediately”) to measure subjective procrastination willingness. Results demonstrated that the subjective procrastination willingness significantly decreased across neuromodulation sessions in the active group, but not in the sham control group, consistent with the observed reduction in the objective procrastination measure. In addition, we all perceive it as helpful and crucial to note that we cannot draw the conclusion that the effects of neuromodulation on mitigating procrastination are contributed by increasing task outcome value uniquely. Given no measures or evidence of other factors, such as working memory and attention, we cannot rule out other neurocognitive pathways. To address this point, we have removed or rephrased such statements throughout the whole revised manuscript, and explicitly constrained to interpret this neurocognitive mechanism (i.e., increased task outcome value) within the theory-driven framework of the temporal decision model.

      Reviewer #3 (Public review):

      This manuscript explores whether high-definition transcranial direct current stimulation (HD-tDCS) of the left DLPFC can reduce real-world procrastination, as predicted by the Temporal Decision Model (TDM). The research question is interesting, and the topic - neuromodulation of self-regulatory behavior - is timely.

      Many thanks for kindly dedicating time to review our manuscript, and for the helpful comments detailed below. Thank you for appreciating the novelty of this study.

      However, the study also suffers from a limited sample size, and sometimes it was difficult to follow the statistics.

      Thank you for pointing out these crucial concerns. As you correctly raised, the sample size is somewhat small in any case, but we confirm that this sample size is adequate to obtain medium statistical power.

      For estimating the sample size, we determined the a priori effect size based on the existing work we published (Xu et al., 2023, J Exp Psychol Gen;152(4):1122-1133). In this pilot study, we identified a significant interaction effect between single-session tDCS stimulation (active vs sham) and time (pre-test vs post-test) (t = 2.38, p = .02, n = 27; 95% CI [0.14, 1.49]) for changing procrastination willingness in laboratory settings, indicating a medium effect size. Therefore, this pilot study provides supportive evidence to determine this effect size a priori.

      Using the GPower software with an estimation of a medium effect size, we determined that a total sample size of N<sub>total</sub> = 34 could reach adequate statistical power. Please see outputs of the GPower in Author response image 1.

      As for the statistics, we genuinely acknowledge that the vague methodological descriptions and complex algorithms indeed complicated the understanding of the methods and statistics. To address this, echoing the comment raised by Reviewer #1, we have removed the complicated statistics and methods, and further clarified how we used the generalized linear mixed-effect model (GLMM) for statistical analysis. Please see the specific revisions below:

      Methods Section (Page 8, Line 378-403)

      “Statistics

      All the statistics were implemented by R (https://www.rstudio.com/) and R-dependent packages.

      To clarify whether multiple-session HD-tDCS neuromodulation can reduce procrastination, the generalized mixed-effects linear model (GLMM) was constructed with full factorial design for subjective procrastination willingness (i.e., self-reported visual analog scores) and actual procrastination behavior (i.e., real-world task-completion rate before deadline). Here, sex, age and socioeconomic status (SES) were modeled as covariates of no interest. As the National Bureau of Statistics (China) issued (https://www.stats.gov.cn/sj/tjbz/gjtjbz/), on the basis of per capita annual household income, the SES was divided into seven hierarchical tiers from 1 (poor) to 7 (rich). To obviate subjective rating bias stemming from individual daily mood, we separately measured participants’ daily emotional fluctuation at 10:00 and 16:00 using a self-rating visual analog item (i.e., “How do feel for your mood today?”, 0 for “completely uncomfortable” and 100 for “definitely happy”). By doing so, the averaged score of those self-rating emotions at the two time points was modeled into the GLMM as covariate of no interests, yielding the final expression of “outcome ~ Group*Treatment_Day + Age + Gender + SES + Emotions + (1 + Treatment_Day | SubjectID)” in the statistical model”. This analysis was implemented using the “lme4” and “lmerTest” packages. Employing “emmeans” package, simple effects were also tested at baseline and post-last-intervention using Tukey-adjusted pairwise comparisons of estimated marginal means from the full GLMM, controlling for covariates and random-effects structure. To validate statistical robustness, instead of continuous outcomes for parametric tests, we also conducted a between-group comparison for the number of tasks that procrastination emerges by using the nonparametric x<sup>2</sup> test with φ correction or Fisher exact test. Regarding the 6-month follow-up investigation, this GLMM was also built to examine the long-term retention of neuromodulation on reducing actual procrastination.”

      The preregistration and ecological design (ESM) are commendable, but I was not able the find the preregistration, as reported in the paper.

      We are sorry to encounter a serious technical barrier that has rendered our preregistration invisible and inaccessible. The OSF has disabled my OSF account, as it claimed to detect “suspicious user’s activities” in my account. This has prevented access to all materials deposited in this OSF account, including this preregistration. We have contacted the OSF team, but received no valid technical solution to recover this preregistered report (please see the screenshot below). We reckon that this may be due to my affiliation change to the Third Military Medical University of People’s Liberation Army (PLA).

      To address this unexpected circumstance and to ensure transparency, we have explicitly reported this case in the main text, and added the “Reconstructed Preregistration Statement” to the Supplemental Materials (SM). Also, as it has been out of best practices in preregistration, in addition to transparently reporting this case, we have removed this statement regarding preregistration elsewhere throughout the revised manuscript.

      Overall, the paper requires substantial clarification and tightening.

      We are grateful for your evaluation, and we fully agree with you. In response, we have added a tremendous number of details to clarify how to measure procrastination, how to conduct the statistical analyses, and how to collect real-life tasks, as well as other experimental materials. Please see the revisions in the Methods section of the revised manuscript. Again, thank you for those helpful suggestions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In the Supplemental Materials, page 4, lines 163 to 167 seem to be from a different manuscript (as the section talks about neural markers, significant clusters, and brain networks).

      We are sorry for erroneously embedding this irrelevant section here. We have removed it, and have double-checked the document to avoid such mistakes.

      (2) I'm no expert here, but some of the trace and density plots in the SOM look problematic (e.g., Figure S5 top panel). But it's not made clear to which model/analysis these plots belong, so they are not very helpful without that information.

      Thank you for bringing these potentially problematic plots to our attention. Following your great suggestion, these results have been removed from the SM to amplify readability and comprehensibility.

      (3) Table S1 reports side effects "from the neurostimulation" (this is also the language used in the main manuscript), but having the flu is rather unlikely to be a side effect from the stimulation, isn't it? Thus, this language is highly confusing, and when reading the main text, it's not clear that these are just life events that are most likely unrelated to the stimulation, but have the potential to affect the measured variables (i.e., ultimately, they seem a source of noise).

      We apologize for this confusing wording. Here, the “side effects” are defined as confounding effects deriving from unexpected life events that uncontrollably disrupt task execution and task performance, such as “having the flu”, or “an unexpected mandatory CCP (Communist Party of China) meeting assignment”. To obviate misunderstanding, we have rephrased “side effects” as “unexpected life events disrupting task execution” in both the main text and the SM section both.

      (4) The use of the English language could be improved.

      Thank you for your very practical suggestion. As you kindly suggested, we have invited a proofreading editor to edit and polish the English of the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) It would be helpful to include greater detail about the ESM procedure and details of the self-reported tasks. This would help rule out potential confounds of difficulty or learning (e.g., participants may have learned to identify more achievable and less difficult tasks across the sessions, which would mean they are learning to perform the task better rather than to procrastinate less). Further elaboration on the quantification of procrastination measures would help clarify the mechanism underlying this behavior, which is important for clarifying how these effects arise and what aspect of procrastination behavior is being targeted by the tDCS intervention (and rule of alternative explanations).

      We wholeheartedly appreciate your sharing this very crucial recommendation. As we mentioned above, we fully followed your helpful suggestions, particularly by adding massive details to fully report how to collect real-life tasks (with consistent and plausible difficulty across sessions), how to determine sampling time points, and how to quantify metrics (e.g., subjective procrastination willingness score, objective procrastination rate, AUC of task aversiveness, and task outcome value) to the revised manuscript. We do believe that these revisions and clarifications are imperative and necessary. By including these details, we do believe that the readability and clarity have been substantially improved in the current form. Please see the specific revisions and clarifications above.

      (2) It would be helpful to proofread for grammatical and spelling typos (e.g., DLPFC is spelled incorrectly in line 140, Satterwaite is spelled incorrectly in Line 415).

      Thank you for your kind suggestion. Both spelling typos have been corrected, and we have double-checked the revised manuscript to ensure no such typos remain. As you kindly suggested, we have invited a proofreading editor to edit and polish the English of the revised manuscript.

      (3) Please clarify in Figure 4 that a higher AUC is associated with lower task aversiveness (which is stated in the methods but not clearly in the figure).

      Many thanks to you for your helpful suggestion. As you kindly suggested, we have clarified this case in the figure legend.

      Reviewer #3 (Recommendations for the authors):

      I want to see the preregistration.

      Thank you for your helpful recommendation. As we replied above, a serious technical issue on OSF occurred, making our preregistration invisible and inaccessible. OSF has disabled my account, claiming to detect “suspicious user’s activities” in my account. As a result, there is no access to all materials that were already deposited in this OSF account, including this preregistration. We have reconstructed this preregistration based on archived documents, and reported it in the SM. As we reported above, although this partially addresses the problem, it no longer fulfills the best practices of preregistration. Consequently, in addition to transparently reporting this case, we have removed all the preregistration statements throughout the revised manuscript.

    1. Author response:

      Both reviewers noted that some published studies question the association of HPV types with cervical cancer survival {PMIDs 36207323 and 33117670}, while others did not observe that {REFS 69-74 in Chakravarty}. We appreciate both reviewers pushing us to discuss and hypothesize (even speculate) on our finding that HPV types not in phylogenetic clade α9 types (including HPV18) had more recurrences than α9 types (including HPV16). The most likely explanation is that we analyzed 225 HPV types not just the most prevalent types. Specifically, each of the 5 recurrences in our pilot study had different HPV types (α7’s: 18, 39, 45, 70 & α5: 69). Similarly, on re-examination of the TCGA data set, we found that 80% of the 181 α9 samples had HPV16, while 52.5% of the non-α9 samples had HPV18, consistent with a broader variety of types in the latter. We note that PMID: 36207323 did assess a broad number of HPV types, but these were classified into three non-cladistic categories, HPV16, HPV18 and Other for comparison. More in line with the main point of that study, HPV18 was enriched, though not significantly, in the more pathogenic C2 group (which was defined by a deep analysis of specific genomic alterations). It can be speculated that perhaps α9 types are less proficient at effecting or interacting with some C2 characteristic(s). Overall, we suggest that these observations emphasize the importance of examining the full spectrum of HPV types including phylogenetic relationships in cervical cancers induced by these viruses.

      Reviewer #1:

      The detection of “non-tumor HPVs” was noted as a potential limitation. The highly multiplexed, HC+SEQ methodology that we use obviously detects many HPV types and thus can identify lesions with multiple HPV types as occurred in Patient 16 and in other HPV cancers. It is unclear what role multiple HPV types might play in tumorigenesis if any. Regardless of whether broad detection of HPV types proves to be a limitation or an advantage, it will be interesting. Our approach in this study focused on integration of HPV DNAs into human DNA, as this is a key event in cervical tumorigenesis. We believe that detection of clonally expanded cells with an integrated URR-E6-E7 DNA segment of any HPV type (whether high-risk, low-risk, or intermediate, or even perhaps non α-clade {PMID:40742260}) should be viewed with suspicion. For the small fraction of cervical cancers that contain only unintegrated HPV DNA, it will be interesting to see if these viral DNAs share any particular properties.

      The reviewer asked for details of the HPV DNA capture probes used. All were from the proprietary Roche Nimblegen SeqCap EZ System. They encompassed all HPV types from HPV1 through HPV225.

      The reviewer questioned why the data verifying the viral-human DNA junctions in primary tumor tissue by the orthogonal approach of PCR assays PCR assays were not shown. The data summary and the approach used for PCR are in Figure 1, Table 1 and Supplementary Table 1. Only the dozens of agarose gel photographs were not shown. PCR assays that addressed key issues comparing primary and metastatic sites and confirming HPV16 + HPV18 coinfection are shown in Figure 2 and Figures 4A & 4B, respectively.

      Reviewer #2:

      The reviewer raised general issues about data quantification and statistical adequacy. Regarding data quantification, we used a strict, conservative guideline of a 10 read minimum per junction in the DNA from tumor samples. This was based on the sequence analysis pipeline design and on our requirement that some clonal expansion of cells containing specific junctions must have occurred. Extensive complications to comparing quantified read counts in different studies are detailed below in the responses to specific comments. The statistical methods used were based on the dichotomous variable of detection versus no detection of integrated HPV DNA. For this study, we also used the orthogonal method of verifying every junction by PCR with one primer in viral DNA and the other in flanking human DNA followed by Sanger sequencing. The statistical methods used were entirely appropriate for this dichotomous variable and time to event analyses. Nonetheless, we concur that quantification of HPV DNA integration would be an interesting variable to consider once carefully controlled methodologies are applied considering the issues detailed below.

      Regarding the first point about variability in HPV-human junction number in different studies: The number of HPV DNA genome and junction read counts obtained from a sample are subject to numerous technical and biological variables. Extensive caution should be applied when comparing quantitative results among different studies, and this particularly includes the number HPV-human DNA junctions detected. Among the factors that can be involved among different studies are the following: 1) inadequate deduplication of sequence reads; 2) “barcode-hopping” or “bleed-through” from one sample to another and thus cross-contamination of one sample with another during multiplexed short-read sequencing; 3) variation in the fraction of cells that are tumor cells in the post-clinical analysis sample of tissue obtained; 4) artifactual ligation of HPV and human DNA segments occurring at the adaptor ligation step of short-read sequencing; 5) variability in the mismatch settings of computational sequence aligners used; 6) perhaps most importantly, the level of genomic instability of each particular integration locus; and 7) subclonal variation in proliferation or survival of cells containing specific junctions within a lesion. The reviewer correctly implied that our requirement for a minimum of 10 sequence reads at each junction excludes low level, subclonal variants. Nonetheless, one tumor did have two integrations (Table 1). More importantly, we emphasize that all five tumor-recurrences at distant metastatic sites in our study had the exact same integration event as the primary tumor (determined at single nucleotide resolution at both ends). We judge this to be compelling evidence that the approach we use correctly identifies the key integration event underlying each cancer.

      Regarding the second point about ratios between genomic DNA copy numbers and junction read counts: Both human genome and HPV genome copy numbers deserve mention in regard to this issue. HPV HC+SEQ highly enriches for viral DNA, with the advantage gained of high read depth for viral sequences, but with human DNA largely excluded (except for the junction reads). Thus, ratios of junctions to the rest of the human genome cannot be assessed as they can be with whole genome sequencing methodologies. While HPV genome read depth can be ascertained with HC+SEQ reads (as in Figure 1C, 1D, 1E), and the reviewer’s suggestion raises the possibility of using junction to viral read ratios to normalize data to compare different integration loci and even perhaps different studies, there are nonetheless additional, biomedically relevant complications. HPV DNA segments are sometimes often present as tandem units with or without human DNA segments in tumors (Figure 1E shows the former), and this affects the ratio of junctions to viral genomes. Thus, using the suggested ratios would require additional normalization for tandem copy numbers, and thus, it would be difficult to use them in a manner analogous to gene-specific read counts per million total read ratios in RNA-seq.

      Regarding the third point about comparing read counts from primary tumor tissue with those from cfDNA: Ours was a retrospective study using archived samples that were available, and the HPV genome coverage obtained by HC+SEQ using cfDNA varied (Table 1). Assessment of viral DNA genome and human junction reads in a quantitatively reliable manner by HC+SEQ will require application of precise collection, storage, and processing of cfDNA samples. Nonetheless, the results presented in this study, while variable among the different samples, were entirely sufficient to test the dichotomous variable analyzed. We note that this included orthogonal, PCR verification of junctions, based on the straightforward, abundant identification of the junctions by HC+SEQ in the primary tumor samples. We emphasize that examination of HPV DNA integration directly interrogates a key, likely causal event in HPV cervical tumorigenesis.

      Regarding the fourth point about many of the initial cancer samples harboring no junction breakpoints: 100% of the 16 initial, cervical, primary tumor tissue samples harbored an integration (one sample had two). The reviewer is correct that many of the initial cfDNA samples lacked HPV DNA integration as assessed by HC+SEQ and by PCR based on the junctions detected in the primary tumor tissue. We interpret this to mean that these cancers were not spilling genomic DNA containing the integrated HPV DNA into serum at sufficient levels to be detected, and judge this to be due to underlying, unidentified, biomedically-relevant effects.

      Regarding the fifth point about HPV-human DNA junctions being used as a measure of tumor heterogeneity and subclonal variation: We concur with the reviewer that this is an interesting, important issue. We noted it in the response to the “first” point (numbers 6 and 7) above. Again, one of the samples had two integrations, and this patient did not suffer a recurrence (Table 1, Figure 1). Based on our ongoing experience, to take findings of junction subclonality beyond just detection of multiple integration junctions, we believe that development of in situ, single cell approaches are necessary to reveal the full meaningful picture of subclonality.

      Beyond these quantitative issues that we raise in response to Reviewer #2’s comments, the Reviewers’ comments point at important, incompletely understood aspects about HPV tumorigenesis. Our finding of the identical viral DNA insertions in primary tumors and metastases point to a central, constant role for these structures in viral tumorigenesis. Nonetheless, the issues raised point to key questions concerning subclonality, detailed structures and quantification of HPV and human tandem DNA units, intrachromosomal DNA vs. ecDNA, genomic instability of integrated HPV DNA loci, and cell-to-cell variation, and what roles these might play in tumorigenesis.

      Regarding the point about cell-free DNA breakpoints, we note the field of circulating tumor DNA fragmentomics that examines the sequences and a host of structural properties of circulating DNAs derived from tumors including specific, short sequences at the ends (breakpoints) of DNA fragments circulating in blood. These are of emerging significance as biomarkers for cancer {PMIDs:40038442 and 41043439}. We note that cell free DNA breakpoints are not synonymous with DNA junctions. We stress again that the main point of our manuscript was to investigate HPV-human DNA junctions in cfDNA, as this directly addresses a likely causal mechanism underlying HPV cervical tumorigenesis. Additional, future studies would be required to assess the effectiveness of our targeted, individualized approach relative to other aspects of fragmentomics in cervical cancer.

      In summary, we restate one of the reviewers’ points. “This study provides important foundational evidence for further evaluating the clinical utility of HPV DNA detection from cfDNA and specifically assessing for integration junctions.” Both reviewers raised thoughtful points about DNA integration and HPV tumorigenesis, and prospective studies are required to refine and evaluate clinical utility of the new findings presented here.

    1. eLife Assessment

      This important study probes the long-standing failure to resolve evolutionary relationships between the classical "spiralian" taxa-i.e., annelids, molluscs, brachiopods, platyhelminths and nemerteans-and provides convincing evidence that the branches leading to them are so short as to be unreliable guides to their relationships. This, in turn, has wide-ranging implications for our understanding of animal body plan evolution and the interpretation of early animal fossils.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The revised version adequately addresses the relatively minor comments from the previous round of review.]

      Summary:

      This interesting paper probes the problematic relationships between the classical "spiralian" taxa, i.e., annelids, molluscs, brachiopods, platyhelminths and nemerteans, and shows that the branches leading to them are so short as to be unreliable guides to their relationships. This, in turn, has important implications for how we view the origin of the animal phyla.

      Strengths:

      A very careful analysis of a famous old problem with quite significant results. The results seem to be robust and support their conclusions.

      It often passes uncommented that many different trees are published about animal relationships, yet some parts of the tree seem extremely difficult to resolve; the spiralians are perhaps the most difficult case. More recently, problems about sponges or ctenophores as sister groups to the rest of the animals have alerted us to major areas of uncertainty in large-scale phylogenetic reconstruction; this paper is a welcome reminder that other, perhaps even harder, problems exist which may be difficult to ever resolve with the (molecular) data we have.

    3. Reviewer #2 (Public review):

      Summary:

      The relationships among the phyla making up Spiralia - a major clade of animals including molluscs, annelids, flatworms, nemerteans and brachiopods - have been challenging from a phylogenomic perspective despite decades of molecular phylogenetic effort. Every topology uniting subsets of these phyla has been recovered with apparent support in at least one study, yet no consensus has emerged even from large-scale genomic datasets. Serra Silva and Telford set out to determine whether this instability reflects a genuine biological signal being obscured by analytical limitations, or whether it reflects a rapid, near-simultaneous origin of these phyla that has left behind in modern genomes far too little phylogenetic information to resolve. They focused deliberately on five phyla, reducing the problem to a tractable set of 15 unrooted and 105 rooted topologies, and applied a suite of complementary approaches across two independent datasets and multiple substitution models to test whether any topology is significantly preferred over alternatives.

      Strengths:

      (1) The conceptual framing of the problem is excellent, and the study makes a convincing case across several lines of evidence. By enumerating all possible topologies and demonstrating empirically that every one of the 15 unrooted arrangements has been recovered as the preferred solution in at least one published study, the authors make a strong argument about the state of the field. The use of two entirely independent datasets as a consistency check is great, and convergence between them, where it occur,s substantially strengthens confidence in the conclusions.

      (2) It is my view that the simulation framework is a particular strength. Generating data on a fully unresolved star tree and scoring those data under both correctly-specified and misspecified substitution models provides convincing evidence that the strong preference for rooting Spiralia on the flatworm branch is, at least partly, an analytical artefact driven by the exceptionally long branch in combination with compositional heterogeneity across sites. This is an important methodological demonstration with implications beyond spiralian phylogenetics, as the same issue is likely to affect other deep, long-branched lineages in the animal tree of life.

      (3) The randomised taxon-jackknifing approach is a very nice addition here. The demonstration that preferred topologies shift depending on which species happen to be sampled (even within the same phylum) is a convincing indicator of weak signal, and provides a practical caution for future studies that may report strong support for a particular spiralian arrangement based on a fixed taxon sample.

      (4) The branch-length analyses, benchmarking internal interphylum branches against the already disputed and extremely short branch uniting deuterostomes (work also by this group), are well-conceived and solid.

      (5) I think it is worth highlighting the notable intellectual honesty throughout the paper: the authors do not overstate their results, correctly acknowledging that while the unrooted topology grouping molluscs with brachiopods and flatworms with nemerteans emerges most consistently, this preference is not statistically significant under more adequate substitution models and may itself carry some artefactual component.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This interesting paper probes the problematic relationships between the classical "spiralian" taxa, i.e., annelids, molluscs, brachiopods, platyhelminths and nemerteans, and shows that the branches leading to them are so short as to be unreliable guides to their relationships. This, in turn, has important implications for how we view the origin of the animal phyla.

      Strengths:

      A very careful analysis of a famous old problem with quite significant results. The results seem to be robust and support their conclusions.

      It often passes uncommented that many different trees are published about animal relationships, yet some parts of the tree seem extremely difficult to resolve; the spiralians are perhaps the most difficult case. More recently, problems about sponges or ctenophores as sister groups to the rest of the animals have alerted us to major areas of uncertainty in large-scale phylogenetic reconstruction; this paper is a welcome reminder that other, perhaps even harder, problems exist which may be difficult to ever resolve with the (molecular) data we have.

      Weaknesses:

      The paper could have perhaps drawn out some of the implications of its results in a clearer manner.

      Reviewer #2 (Public review):

      Summary:

      The relationships among the phyla making up Spiralia - a major clade of animals including molluscs, annelids, flatworms, nemerteans and brachiopods - have been challenging from a phylogenomic perspective despite decades of molecular phylogenetic effort. Every topology uniting subsets of these phyla has been recovered with apparent support in at least one study, yet no consensus has emerged even from large-scale genomic datasets. Serra Silva and Telford set out to determine whether this instability reflects a genuine biological signal being obscured by analytical limitations, or whether it reflects a rapid, near-simultaneous origin of these phyla that has left behind in modern genomes far too little phylogenetic information to resolve. They focused deliberately on five phyla, reducing the problem to a tractable set of 15 unrooted and 105 rooted topologies, and applied a suite of complementary approaches across two independent datasets and multiple substitution models to test whether any topology is significantly preferred over alternatives.

      Strengths:

      (1) The conceptual framing of the problem is excellent, and the study makes a convincing case across several lines of evidence. By enumerating all possible topologies and demonstrating empirically that every one of the 15 unrooted arrangements has been recovered as the preferred solution in at least one published study, the authors make a strong argument about the state of the field. The use of two entirely independent datasets as a consistency check is great, and convergence between them, where it occur,s substantially strengthens confidence in the conclusions.

      (2) It is my view that the simulation framework is a particular strength. Generating data on a fully unresolved star tree and scoring those data under both correctly-specified and misspecified substitution models provides convincing evidence that the strong preference for rooting Spiralia on the flatworm branch is, at least partly, an analytical artefact driven by the exceptionally long branch in combination with compositional heterogeneity across sites. This is an important methodological demonstration with implications beyond spiralian phylogenetics, as the same issue is likely to affect other deep, long-branched lineages in the animal tree of life.

      (3) The randomised taxon-jackknifing approach is a very nice addition here. The demonstration that preferred topologies shift depending on which species happen to be sampled (even within the same phylum) is a convincing indicator of weak signal, and provides a practical caution for future studies that may report strong support for a particular spiralian arrangement based on a fixed taxon sample.

      (4) The branch-length analyses, benchmarking internal interphylum branches against the already disputed and extremely short branch uniting deuterostomes (work also by this group), are well-conceived and solid.

      (5) I think it is worth highlighting the notable intellectual honesty throughout the paper: the authors do not overstate their results, correctly acknowledging that while the unrooted topology grouping molluscs with brachiopods and flatworms with nemerteans emerges most consistently, this preference is not statistically significant under more adequate substitution models and may itself carry some artefactual component.

      Weaknesses:

      (1) The restriction to five phyla is the most significant limitation, as the authors acknowledge this and give a clear computational justification, but readers should be aware that the paper's convincing conclusions apply specifically to the five focal phyla and the evidence remains incomplete with respect to spiralian phylogeny as a whole.

      (2) The treatment of substitution model adequacy, while commendably thorough for site-heterogeneous models, is necessarily bounded. The authors note that models accounting for non-stationarity, across-lineage compositional heterogeneity, or mixtures of tree histories might yield different results, and that even the most sophisticated currently available approaches have not produced consistent spiralian topologies across studies. This is not a criticism of what has been done here - the analytical scope is reasonable and well-implemented - but it means the paper cannot be read as a definitive demonstration that no model will ever resolve these relationships. The distinction between a true hard polytomy and a radiation that is effectively unresolvable given current data and methods could be drawn more sharply in the discussion.

      (3) The reticulation-aware coalescent analyses are presented somewhat briefly relative to the likelihood-based topology scoring. The finding that flatworms are recovered within a paraphyletic jaw-bearing animal clade in both summary trees - interpreted as long-branch attraction - is striking, and its implications for gene-tree-based approaches to spiralian rooting deserve more discussion than they currently receive.

      (4) The central conclusions - that interphylum branches in Spiralia are extraordinarily short, that topological preferences are strongly model-dependent and taxon-sampling-sensitive, and that an ancient rapid radiation is the most parsimonious explanation - are convincingly supported by the evidence presented. The identification of flatworm long-branch attraction as an important confounding factor in rooting analyses is itself an important and well-demonstrated result.

      Conclusion:

      This paper clearly makes an important contribution to the ongoing debate about spiralian relationships and, more broadly, to methodological discussions about how to handle anciently diversified clades where phylogenetic signal is genuinely limited. The exhaustive topology-scoring framework combined with taxon-jackknifing and simulation under unresolved trees is a valuable methodological template that could usefully be applied to other notoriously difficult nodes in the animal tree. I thoroughly enjoyed the discussion of the implications of these findings for interpreting Cambrian fossils and the evolutionary history of shells, segmentation, larval types and other characters - it is both thoughtful and thought-provoking and will be of broad interest well beyond the phylogenomics and zoology communities. From a very practical perspective, the data and scripts provided make the work useful to researchers wishing to apply similar approaches to other groups.

      Reviewer #3 (Public review):

      Summary:

      This paper addresses the controversial internal relationships within the Spiralia, a major clade of invertebrate animals including molluscs, annelids, brachiopods and flatworms.

      Strengths:

      Performs a range of empirical analyses and simulations that address the core question. Although a favoured unrooted topology finds some support, this is not strongly endorsed in the paper.

      Weaknesses:

      (1) Only considers a subset of relevant phyla (e.g. gastrotrichs are relevant to the phylogenetic position of Platyhelminthes), although how this would change the scale of the analyses (i.e. number of topologies) is addressed in the paper.

      (2) Discussion of Spiralia evolution and broader context, particularly the relevance for the fossil record. Line 448: our current understanding of the early spiralian fossil record is quite consistent with the main results of this paper. For example, there are very few claims for fossils that sit on the short branch leading to Spiralia (or Lophotrochozoa as defined here) that this paper discusses. Many of the key fossils that inform on the characters discussed in the introduction, which have unusual character combinations, have an apomorphy of one of the phyla discussed, and so are resolved as members of the stem lineages of particular phyla.

      (3) This is what you would expect with long phylum stem lineages (line 148) and a short spiralia stem lineage. For example, the mollusc Wiwaxia has chaetae, but a mollusc like Radula (Smith 2012), the conchiferan mollusc Pelagiella has chaetae and a coiled shell (Thomas et al. 2020). The only fossil groups that are routinely discussed as belonging to the stem lineage of more than one phylum are the tommotiids, which have chaetae, segmentation and a complex mineralised skeleton (but not shells in the brachiopod/mollusc sense, see Guo et al 2023) but they sit on the lophophorate stem lineage, a synapomorphy rich group the monophyly of which the present paper endorses (e.g. line 435). The fossil record is consistent with the scenario presented in line 442, e.g. convergent loss or reduction of chaetae and segmentation and convergent evolution of shells in molluscs and brachiopods.

      We thank the reviewers for their kind comments. Please see below for detailed responses to all identified weaknesses.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Some minor comments that might help improve the paper:

      (1) Abstract L17. "Most analyses on the 15 unrooted trees showed a preference for the same topology but the support over other solutions was non significant" - I don't really understand this sentence in the context of the paper; it makes it sound as if the tree is, after all, well resolved! Non-significant, or not significant better than non significant?

      Having read the rest of the paper I see what this refers to (uT4), but still I don't understand the second clause.

      Re-written to clarify.

      (2) Introduction L31. This makes it sound as if phoronids are actually part of brachiopods, and while that was recovered by Cohen and Weydmann 2005, I'm not sure if it's really a general result. In addition, rather than using "brachiopods plus phoronids" everywhere, you could use "Brachiozoa" (Cavalier-Smith 1998, Biol. Rev).

      We have updated our text and figures to use Brachiozoa.

      (3) L36-37. Yes, but the presence of Chaetagnatha in this clade is suggestive that their primitive body size is not small.

      Have made clear that chaetognaths are not all tiny.

      (4) L85. Kumar et al. may have claimed that Spiralia are as old as 670, but many other analyses would suggest a range of different results. Why choose just this one? In addition, this age seems rather incompatible with your results.

      We agree this maximum age is highly improbable (the principal point remains the deep age of the protostomes). We have used a different reference and refer to a generally acceptable minimum age only.

      (5) L88. The key part of this sentence, "proving a hard polytomy", comes at the end of a long set of references that makes it hard to connect to the lead-in "given the age of", so I would suggest rephrasing.

      Rephrased for clarity.

      (6) L109. It is unclear what this means in the context: "and even support multiple topologies".

      Re-worded for clarity.

      (7) Figure 1. Why did you choose to indicate brachiopods plus phoronids as a larval form, unlike the other clades? Perhaps it's because we don't know what the last common ancestor of the two looked like (unless P is an ingroup of B), but that's arguably true for some of the other clades as well!

      Apologies, this was laziness as we already had a line drawing of an actinotroch larva. Have improved the images in figures 1 and 5 where required.

      (8) L164. Reticulation-aware analyses. As I understand it, this would include introgression, hybridization, etc. However, incomplete lineage sorting has also been invoked, not just for Cambrian-explosion age events but also for other major radiations, such as for angiosperms and birds. How significant might ILS be for generating the results you get?

      Section title amended. Results section updated to reflect this. We now explicitly mention the potential impact of ILS and introgression on spiralian relationships in our discussion.

      Unrooted trees analysis:

      (9) L405 on. Maybe it would be worth including a figure showing the relative branch lengths of uT4. All the images of trees show similar-length branches, which gives off the wrong impression within the context of the paper!

      We understand the motivation, but we worry that showing uT4 as the sole phylogram may end up with this being interpreted by a casual reader as being the main result of the paper. Hopefully the figures with branch lengths encompass this information well enough and with no danger of misinterpretation.

      (10) L430 on. Why is this a "conservative" interpretation?

      Yes agreed not clear. Have changed to “We interpret our results as showing that…”

      (11) You mention synapomorphy accumulation time and implicitly equate shortness of branches with shortness of time. However, other options are available under varying diversification rate models (e.g. ClaDs, Barido-Sottani et al. 2023 Syst. Biol.; CET, Budd and Mann 2025, Syst.Biol.). In particular, the latter paper shows that when unusually large clades are selected for study (as is arguably the case here), then those clades are likely to have started with very high "evolutionary tempo", which speeds up all aspects of evolution, including diversification rates.

      In the Budd and Mann scenario large clades begin with high tempo of cladogenesis, high substitution rate and high diversification rate (rapid origin of new characters). This would suggest that the period of the radiation was extra rapid (even less time than in a ‘normal’ period during which smaller clades emerge) so we feel the point stands.

      (12) L449. Maybe refer to the Song et al. paper again here on scaphopods plus bivalves, as it makes the same sort of points, albeit in a slightly different context.

      We thank the reviewer for the suggestion and have added the citation where relevant.

      (13) Finally, to return to L20. You mention implications for the Cambrian fossil record, but then fail to deliver any!

      We have hopefully addressed this remark in the discussion better (at least to the extent we are qualified to).

      Yet if you are correct, then synapomorphy accumulation would unite groups of phyla, and would surely lead to a scenario highly incompatible with clock models suggesting deep origins of clades (as they would all be more fossilisable).

      Apologies but we don’t completely understand this point as ‘synapomorphy accumulation would unite groups of phyla’ is a little ambiguous. Of course, this is generally true, but our results suggest there was little opportunity to accumulate identifiable synapomorphies linking pairs, triplets or quartets of our 5 spiralian phyla.

      In addition, clock results suggest rather long periods of time leading to the phyla, which would imply that there would have to be extremely slow rates of molecular evolution to yield the short early branches here. Also, it might be worth referring to papers compatible with this view, such as Wernström, J.V. et al., EvoDevo 13, 17 (2022). https://doi.org/10.1186/s13227-022-00202-8 or some of the palaeo literature, such as Budd and Jackson 2016, Phil Trans.

      The referee refers to clock results suggesting a (deep) Ediacaran origin of Lophotrochozoa/Spiralia. We interpret the spiralian radiation itself as rapid but, in the absence of a clock analysis, we cannot comment on when it took place.

      Reviewer #2 (Recommendations for the authors):

      (My not very) Major points - as I feel this is an excellent paper.

      (1) The coalescent-based summary tree analyses warrant expansion. The recovery of flatworms within a paraphyletic jaw-bearing animal clade in both summary trees is a striking result attributed to long-branch attraction, but this interpretation would be strengthened by examining whether pruning or downweighting the longest-branching taxa within those groups affects the outcome, or by reporting per-node quartet scores more fully. This would make the reticulation-aware results more directly informative and would bring this section into better balance with the detailed likelihood-based analyses.

      We thank the reviewer for the suggestion of the expanded analyses. We have now done these, and they yielded essentially the same results as the unpruned analyses. Additionally, while not discussed, we ran the Astral analyses on the subset of gene-trees where all groups of interest (spiralian phyla and superphyletic Ecdysozoa, Deuterostomia, etc.) were monophyletic and found no changes to interphylum quartet scores beyond those due to enforced (super)phylum monophyly, with Platyhelminths still recovered within Gnathifera.

      We have expanded our description of the results slightly as well as our discussion. Location of the tables with detailed quartet scores and local posterior probabilities has been added to Fig. S1’s legend.

      (2) It would strengthen the paper to include at least a brief analysis or explicit discussion of whether any currently available models accounting for non-stationary or across-lineage compositional heterogeneity show any change in the pattern of support, even if only tested on a subset of topologies. A null result here would itself be informative and would make the conclusions more robust to the concern that unexamined model classes might behave differently.

      We thank the reviewer for the suggestion, but this represents a considerable amount of new work and we think it falls outside the scope of the present work. We have, as suggested, included this as a discussion point.

      (3) The authors note that topologies grouping flatworms with ribbon worms appear among the higher-scoring arrangements even under model misspecification in simulations. It would be helpful to comment explicitly on whether the apparent signal for this grouping should therefore be regarded with particular scepticism, or whether it survives artefact correction in any of the analyses, as this is a grouping that has appeared repeatedly in the literature and readers will want guidance on how to interpret it.

      We do state that the nemertean+platyhelminth grouping seems likely to be at the least emphasised by an artefact (as the referee points out it is common to the higher scoring trees in the star tree simulations). We state that this suggests “…that this grouping derives some support from systematic errors.” We now return briefly to this in the discussion.

      Writing and presentation

      (1) The abstract states that rooting Spiralia on the flatworm branch "is a long-branch artefact" - this is slightly stronger than the language used in the body of the paper, where the authors correctly write that this preference is "at least enhanced by" the artefact. The abstract phrasing should be softened to reflect the more nuanced conclusion in the text.

      Good point. Done.

      (2) A brief signposting sentence near the start of the Results, setting out the overall analytical logic before the individual sections begin, would help orient readers. The strategy - score all topologies, test robustness to model choice and taxon sampling, then use simulation to identify artefactual signals - is clear in retrospect but would benefit from being made explicit upfront.

      We have taken this suggestion on board. The summary seemed in the end better placed as the final part of the introduction.

      (3) Figure 3 is complex and would be easier to interpret with a brief explanatory note in the legend clarifying what a wide versus narrow range of log-likelihood scores across topologies means in practical terms for statistical resolution between trees.

      Added sentence to legend.

      Minor Corrections:

      (1) The Figure 2 legend contains a typographical error: "shorter than the short, disputed deuterostome branch" should read "shorter than."

      Done

      (2) At least one reference appears to carry a future publication year (Ishii et al., 2026) and should be verified for accuracy before final submission.

      This reference is correct per the journal’s website. We did find Google Scholar to list it as being from 2025.

      Reviewer #3 (Recommendations for the authors):

      (1) Abstract/SI definitions of Spiralia/Lophotrochozoa

      While I don't have strong feelings about this, if Spiralia is being used as an apomorphy-based name, then it still might be equivalent to Lophotrochozoa, as spiral cleavage in Gnathostoniula jenneri was illustrated by Riedl (1969). Although no other studies have replicated this observation, this should at least be mentioned.

      Sorry this reference to gnathostomulid spiral cleavage was included in a longer version of the discussion of nomenclature. This was first reduced in length (which was when the mention of gnathostomulid spiral cleavage was dropped) then finally moved to the supplementary material. We have now re-included mention of this in the discussion in supplementary info.

      The SI text suggests that the name Lophotrochozoa, as used in its original form by Halanych et al. (1995), was a node-based definition, and that this name is for the sister group of Ecdysozoa. However, in that paper, the name is actually defined as "as the last common ancestor of the three traditional lophophorate taxa, the molluscs, and the annelids, and all of the descendants of that common ancestor". This definition would exclude Gnathifera, and depending on the internal relationships of the non-Gnathiferan phyla, may be equivalent (or not) to the usage of the name Spiralia adopted in the present paper. The perils of mixing node and apomorphy-based definitions of clades are clear, and the situation is less straightforward than the paper suggests, and (somewhat unhelpfully given the subject of the paper) may only become clearer if the relationships of non-ecdysozoan protostomes are resolved.

      We believe that the community universally understood the definition of Lophotrochozoa following the 1997 paper (by the authors who also provided the original 1995 definition). This 1997 definition included both chaetognaths and rotifers as examples of the Gnathifera. The Spiralia, in contrast, began life not even as a name for a clade but a description of a character shared by some apparently unrelated taxa – similar to a grouping of ‘carnivores’. The introduction of a new name was, we suggest, unhelpful. We hope that by defining our terms up front the meaning in the current paper is clear.

      (2) Introduction

      Line 76. Some references needed regarding claims that there was a polymeric brachiopod ancestor, e.g. Gutman (1978), Temereva and Malakhov (2011), Guo et al. (2023). Likewise for the chaetae of brachiopods, annelids and molluscs, e.g. Schiemann (2017), as it's key to trace where these ideas originated.

      Added

      Figure 1. This is a nice illustration of the uncertainty in the relationships of these groups. However, I kept checking which thumbnail image was which for nemerteans and annelids. A minor suggestion, but perhaps a polychaete instead for the annelid?

      We have replaced the rather poor image of an earthworm with a polychaete and also now include labels. We hope the improved images are more helpful. Good point.

      (3) Results

      Branch length comparison. I understand why the deuterostome stem was chosen as the branch for comparison from the point of view of phylogenetic uncertainty. However, what about the branch leading to ecdysozoa or the branch subtending lophotrochozoan and/or gnathifera? Given that the short internodes are used as an argument underpinning uncertain relationships, can we be sure that Gnathifera is not nested within the group of interest, especially given that Gnathifera contains many long-branched taxa and the root may be misplaced within the group?

      We have added the Lophotrochozoa and Ecdysozoa median lengths to our plots and now discuss both the lophotrochozoan branch in our results.

      Line 249. Given that Spiralia is the group of interest, why were the Gnathiferans also chosen at random?

      The point of the experiment was to see the effect of taxon sampling on the consistency of the resulting topology. Random sampling across the tree seems helpful in this context. We chose Gnathifera as one group to sample from as this ensured they would be present in all trees. This seems appropriate as they are the sister group of the clade of interest and as such their inclusion reflects a choice a typical investigator might make when choosing which species to include. Additionally, as noted in the reviewer’s earlier comment, Gnathifera includes many long-branched taxa and we wanted to ensure our root-placement results were robust to this aspect of taxon sampling.

      (4) Discussion

      Line 448. Our current understanding of the early spiralian fossil record is quite consistent with the main results of this paper. For example, there are very few claims for fossils that sit on the short branch leading to Spiralia (or Lophotrochozoa as defined here) that this paper discusses. Many of the key fossils that inform on the characters discussed in the introduction that have unusual character combinations have an apomorphy of one of the phyla discussed, and so are resolved as members of the stem lineages of particular phyla.

      This is what you would expect with long phylum stem lineages (line 148) and a short spiralia stem lineage. For example, the mollusc Wiwaxia has chaetae, but a mollusc like radula (Smith 2012), the conchiferan mollusc Pelagiella has chaetae and a coiled shell (Thomas et al. 2020). The only fossil groups that are routinely discussed as belonging to the stem lineage of more than one phylum are the tommotiids, which have chaetae, segmentation and a complex mineralised skeleton (but not shells in the brachiopod/mollusc sense, see Guo et al 2023) but they sit on the lophophorate stem lineage, a synapomorphy rich group the monophyly of which the present paper endorses (e.g. line 435). The fossil record is consistent with the scenario presented in line 442, e.g. convergent loss or reduction of chaetae and segmentation and convergent evolution of shells in molluscs and brachiopods.

      We accept these points (though are clearly not experts on these fossils). We have (slightly tentatively given our lack of expertise) expanded our discussion to include these fossil taxa with their combinations of characters.

    1. eLife Assessment

      This study presents a useful database resource containing protein conformations generated through molecular dynamics simulations, with extensive quality evaluation and benchmarking. While the database is well-constructed and professionally organized, the evidence supporting its claimed representation of protein conformational landscapes is incomplete, as the short simulation times and starting structure bias prevent true Boltzmann sampling of the conformational space.

    2. Reviewer #1 (Public review):

      Summary:

      The authors describe a new database that rigorously explores protein conformations.

      Strengths:

      It is extremely well done, using state-of-the-art tools by a group at the top of the field of structural modeling. The evaluation of qualities and the benchmarking of the structures are outstanding, and it is expected that the new database will have a significant impact on the field.

      Weaknesses:

      The authors are using MD simulation to generate some of the structure, and therefore should have access to standard MD energies. I am surprised that no evaluation is provided based on these energies that can be extended to free energies.

    3. Reviewer #2 (Public review):

      Summary:

      The authors developed a dataset of protein conformations by running molecular dynamics simulations starting from both native and decoy conformations for a large number of proteins. These conformations were put together as a dataset for querying and downloading, along with their energies under different force fields. The authors suggest that such conformations represent the proteins' conformational landscape, so that they will be useful for evaluating methods generating multiple conformations of proteins.

      Strengths:

      The dataset is online and working. It has good documentation for others to use.

      Weaknesses:

      The biggest weakness is that the collected conformations very likely do not represent the true conformational landscape. To represent the conformational landscape, the structures need to be sampled based on the Boltzmann distribution. However, in this study, conformations are generated by running very short (125ps to 375ps) MD simulations starting from near-native conformations and decoys. Such short simulations will produce small fluctuations around the starting conformations, so the distribution of conformations is largely dominated by the distribution of the initial conformations, which by one means are Boltzmann distributed. A conformation might be physically plausible, but it might have very small weight in the Boltzmann distribution. On the other hand, conformations with large weights might not be in the dataset.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript describes a web-based tool that allows researchers to compare large numbers of representative ("plausible") conformations of proteins. It also includes energetic analysis from multiple widely used structure-prediction methods.

      Strengths:

      This tool will likely be useful for students who want to learn more about the ensemble properties of proteins. The resource is well organized and it represents a large amount of computing resources.

      Weaknesses:

      It is not entirely clear how the database may be utilized by other groups to advance research. It could be helpful if the authors add a short section that provides example use cases that illustrate how this database can support new strategies for studying protein dynamics.

    1. eLife Assessment

      This is an important study uncovering a new role of the SETD6-PPARγ axis in the regulation of hepatic lipid metabolism. The data convincingly demonstrate that methylation of PPARγ by SETD6 plays a key role in this process, linking lysine methylation to transcriptional control of lipid storage genes.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript from the Levy lab, the authors investigate whether SETD6 regulates hepatic lipid accumulation through direct methylation of PPARγ. They show that SETD6 binds and mono-methylates PPARγ at K170, and provide evidence that this modification enhances PPARγ occupancy at target promoters, promotes expression of lipid metabolism genes, as well as facilitates lipid droplet accumulation in HepG2 cells. The authors also find a positive feedback loop or circuit in which PPARγ activates SETD6 transcription in a methylation-dependent manner, thereby reinforcing this lipogenic program. Overall, the work presents a novel SETD6-PPARγ regulatory axis linking lysine methylation to transcriptional control of lipid storage genes, with possible relevance to NAFLD-associated biology.

      In all, I find this to be an important paper that describes and advances a new regulatory pathway that has significance to human health and disease. It would also be of interest to a broad audience. That said, there are also some concerns that the authors should address, as outlined below.

      Major concerns (pertains to rigor - highest priority)

      (1) Overall, the work presented is of high quality, and the data nicely support the conclusions; however, a few panels should be strengthened that have missing controls or information:<br /> a. The co-IP panel in Figure 1B lacks a lane where HA SETD6 is expressed without PPARγ. This control is needed to verify that the SEDT6-HA signal depends on PPARγ.<br /> b. In Figure 1C, the authors should show that the co-IP works in both directions (include IP for PPARγ/blot for SETD6). I am a bit confused also over the labeling with IP on the left and on top of the panel next to the beads label. More importantly, the data would be stronger if the authors took advantage of a deletion line to validate that the co-IP is specific to the presence of both.<br /> c. The same IP labeling issue exists for Figure 3B (label is on the same and on top).<br /> d. Antibody information (e.g., where the pan-methyl Ab comes from and at what dilutions they are used at) is missing.

      Nice to have experiments (medium priority - strongly consider)

      (2) A missing gap is how K170me1 contributes to DNA binding and gene transcription. One possibility is that methylation enhances the DNA-binding activity of PPARγ. Given that the authors have all of the reagents, it would be possible to perform a gel shift assay (or other approach) with and without SETD6-mediated methylation. Is DNA binding affected/enhanced?

      (3) Along these lines, I wonder if there is another possibility: could SETD6-mediated methylation of PPARγ drive SETD6-PPARγ interaction? In other words, in the K170R, is SETD6 still even associated with PPARγ, and this interaction is required for promoter recruitment? Alternatively, would a catalytic dead version of SETD6 fail to associate with PPARγ? Currently, no experiments test the impact of an unmethylatable version of PPARγ or a catalytic dead version of SETD6 on SETD6-PPARγ interaction or SETD6 recruitment to promoters.

      Minor concerns (text and figure display)

      (4) The text has multiple typos and grammatical errors, and there are some issues with the figure display.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors investigated the regulation of the transcription factor PPARγ by the post-translational modification lysine methylation. The data demonstrate that the lysine methyltransferase SETD6 targets PPARγ for methylation using biochemical and cell-based assays. Methylation of PPARγ occurs in its DNA binding domain, and the authors demonstrate that loss of methylation limits PPARγ chromatin binding, particularly to lipid storage and metabolism gene promoters. As a physiological output, the authors demonstrate that deletion of SETD6 and loss of PPARγ methylation also disrupt lipid droplet accumulation in hepatocytes. In addition, the authors uncover a positive feedback loop in which SETD6 methylation of PPARγ also regulates its binding to the SETD6 promoter and expression of the gene.

      Strengths:

      One of the key strengths of this manuscript is the novelty of the findings in terms of identifying a new mode of regulation of PPARγ that modulates its chromatin association in cells and thereby regulates lipid metabolism genes. The authors nicely combine biochemical studies of SETD6 activity with cell-based assays investigating PPARγ and SETD6 function in regulating lipid storage. Data supporting this conclusion is largely convincing, and frequently, multiple assays are used to provide sufficient support to the conclusions. This work therefore expands regulatory modes of PPARγ and identifies a new target for SETD6, an enzyme that targets a number of other transcription factors. Furthermore, the regulatory loop that controls SETD6 expression via PPARγ methylation is likely important for understanding SETD6 function in different cell types that have high levels of lipid accumulation or regulation. The gene expression and lipid accumulation assays are useful for testing the physiological outcome of loss of SETD6 activity or PPARγ methylation directly.

      Weaknesses:

      The data presented in the manuscript are largely convincing in support of the authors' conclusions; however, there are some errors in the presentation of the figures and some issues in the text that would benefit from editing. Furthermore, there are some important questions not fully addressed in the results or discussion. It would be great if the authors could speculate more on the diverse roles of SETD6 in methylated transcription factors and/or provide more context regarding the conditions that are likely to support methylation of PPARγ by SETD6. Also, while a potential cross-talk between methylation and phosphorylation is described in the discussion, it would be great to provide more structural insight into how this might regulate DNA binding of PPARγ and/or discuss whether there are other possibilities given the location of the target lysine in the DNA binding domain.

    1. eLife Assessment

      In this useful manuscript, Yang et al attempt to show that platelet recruitment to the liver via macrophages contributes to APAP-induced liver injury, but there were many areas where the data supporting the conclusions were incomplete. For example, the idea that platelets only affected KC glycolysis, but not the metabolism of other cells, to mediate the phenotype after injury is not adequately supported by the evidence. It is recommended to perform additional experiments to strengthen the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Yang et al expand on their previous work showing that platelet recruitment to the liver via liver macrophages is important for APAP-induced liver injury. Here, they show that platelets induce a glycolytic switch in liver non-parenchymal cells, including Kupffer cells, and that this is mediated by the protein Aldolase A produced by platelet-derived extracellular vesicles (PEV). They show that targeting Aldolase A may be a valid therapeutic strategy for severe APAP injury.

      Strengths:

      (1) They nicely showed that platelet effects in APAP are mediated by Aldoa via platelet-derived extracellular vesicles.

      (2) Their data show that one of the effects of platelets in APAP liver injury is inducing metabolic switch to the glycolytic pathway, including in KCs.

      (3) Their data points to the therapeutic potential of targeting ALDOA in severe APAP liver injury.

      Weaknesses:

      (1) They have not shown that the platelet-induced glycolytic switch is only in KCs.

      (2) They also have not shown that KC's role in APAP injury is primarily mediated by their interaction with platelets and the subsequent glycolytic switch.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors have investigated the role of platelet-derived ALDOA in liver injury induced acetaminophen (APAP) induced acute liver injury. There are some major flaws in data interpretation as described below. While a decrease in liver injury due to platelet depletion and lower injury in platelet-specific ALDOA KO mice seems real, the claims related to EVs and Platelet-KC crosstalk are not well supported.

      Strengths:

      Core findings are interesting and supported by the data

      Weaknesses:

      (1) At least two additional timepoints, one at 6 hr and another at 24 hr should be performed in the APAP model to better understand the dynamics of liver injury, especially after platelet depletion.

      (2) Interpretation of the experiments in Figure 2 with clodronate is flawed. 2-DG pretreatment and CLDN administration alone both seem to decrease liver injury substantially, so it is not surprising to see very little injury in the 2-DG+CLDN group.

      (3) Since both 2-DG and CLDN were administered pre-APAP, it is possible that they may interfere with APAP metabolism. This should be checked by looking at GSH depletion at 30 min post APAP treatment. The same question goes for S2 figure data.

      (4) There are no data on specific steps of APAP toxicity, such as GSH depletion, JNK activation, mitochondrial injury, etc., which are all well characterized in any of the studies. Rather, only injury endpoints are measured. It is critical to measure the mechanistic steps. This applies to all studies, but most importantly to the ALDOA-PF-KO mice in Figure 6.

      (5) Interpretation of data in Figure 5F is flawed. Since depletion of platelets also decreases liver injury along with the platelets, it can not be deduced that the decrease in ALDOA is only in platelets. Many other things are changing.

    4. Reviewer #3 (Public review):

      Summary:

      The authors address the possibility that platelet (PLT) derived EVs are important mediators of acute liver injury. Furthermore, KCs are important mediators of inflammation and are noted to need to undergo metabolic reprogramming to achieve their effects during injury. They use an APAP-induced liver injury model (AILI). They show that PLTs are recruited and that they interact with KCs in this model system. RNA-seq of KCs showed upregulation of glycolysis and gluconeogenesis. PLT depletion led to reduced liver injury. RNA-seq of KCs showed downregulation of glycolysis. In vitro co-culture of KCs and pets recapitulated the glycolysis findings. In vivo, 2DG inhibited liver injury, but not in the setting of KC depletion. They went on to show that PLT-derived EVs mediate this effect on KCs using a mix of in vitro and in vivo assays, although control EVs were lacking. After doing mass spec on EVs, they find that ALDOA is the critical payload of the PEVs that mediates the pro-glycolytic effect in vivo. They both delete ALDOA from PLTs, and they use an ALDOA inhibitor to show that injury in AILI requires ALDOA.

      Strengths:

      This is generally an interesting series of observations with an elegant mechanism. Many of the experiments are done in vivo with highly rigorous KO models. However, in many of the EV experiments, there are concerns about a lack of appropriate controls that might limit the rigor of those aspects of the study. 

      Weaknesses:

      (1) There is strong variability in the gene expression between mice in Figure 1B. I worry that the signals may not be statistically significant. The authors should assess the statistical significance.

      (2) In Figure 2B, the necrosis areas that are circled in the image do not seem to resemble the quantitation on the right. For example, I don't see 60% necrosis in the APAP PBS group. Also, I don't see 5-10% necrosis in the CLDN APAP group. More images that are clearer are needed, and circled necrosis areas should be shown.

      (3) In Figure 2D, a higher N should be shown. The number of mice (3) is different from the other experiments, so the exclusion of those mice should be explained.

      (4) In general, control EVs from a non-PLT source should be used for all EV-related experiments. EVs derived from AML12 hepatocytes would seem to be a reasonable control for some of the experiments. Otherwise, it is hard to know if this is a general EV effect or one that is specific to PLT-derived EVs. In Figure 3B, EVs from non-PLTs should be used as a control. Since it is possible that all EVs express some level of TSG101 or CD63. In addition, control EVs should be used to test effects on KC metabolism, since the claim is that the effects are specific to PLT-derived EVs. Similarly, Figure 4 needs some kind of EV control that is not from PLTs.

      (5) Figure 5B should include an EV control in the blot. Most of the blots need controls from AML12 EVs or from another in vivo source.

      (6) It is a little difficult to imagine how enough ALDOA protein could be transmitted from PEVs to influence KC glycolysis on the gene expression level. It is possible that ALDOA is required for PLT-induced activation of KCs, or that EVs from PLTs can induce a metabolic shift in KCs. However, it has not been definitively shown that ALDOA from PEVs is directly causing the KC activation. Ultimately, it would be good to obtain PEVs from ALDOA WT and KO mice, then provide these PEVs to AILI mice without PLTs to see if they have differential effects on the AILI model. This would really demonstrate that the ALDOA in the PEVs is mediating the glycolytic, injurious effect.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Kim and Parsons present a timely overview of the NTR/prodrug system and its applications in regenerative biology research, with particular emphasis on tissue-specific cell ablation. The system has substantially advanced the field by enabling non-invasive, conditional cell elimination, and has proven especially powerful in zebrafish, though applications in other classical model organisms are also noted. The review covers the historical origins of the NTR system, its use in regeneration studies, small molecule screening, and genetic and CRISPR-based screening, as well as future directions, including the development of the highly efficient NTR2 enzyme variant.

      Strengths:

      This is a useful and well-structured contribution. The manuscript is a valuable resource for the regeneration biology community.

      Weaknesses:

      The impact and scientific value of this paper could be meaningfully enhanced by addressing several points outlined below. The concerns centre on completeness, conceptual precision, and the depth of mechanistic discussion.

      (1) Title: Species specificity.

      Given that the review's primary focus is the zebrafish model, it would be appropriate to include the species name in the title. This would improve discoverability and accurately set the scope of the article for prospective readers.

      Thank you for this suggestion. In revising the review, we have substantially expanded the content to address the reviewers' comments, including adding more detail on the use of NTR in other species. We agree that the majority of published work, and the research we cover, has been conducted in zebrafish, and we have clarified this in the abstract and introduction. However, our aim in writing the review was also to highlight that there is no intrinsic barrier to adopting this technique more broadly in other systems. Notably, NTR was first developed in mice, but with a prodrug that proved difficult to use, and it was not widely pursued. In mouse models, the development of DTR offered an alternative, though that approach carries risks of kidney toxicity and is incompatible with chronic ablation due to immunogenicity. Given this context, we would prefer to retain a title that does not limit the scope exclusively to zebrafish, so as not to discourage readers working in other model systems who might benefit from considering the NTR system.

      (2) Subchapter: Physical injury.

      The subchapter enumerates different types of physical injury models but would benefit from a more substantive comparative discussion. In particular, the authors are encouraged to address the following:

      (2.1) Outcome comparison: Surgical and other invasive approaches cause damage to entire tissue structures comprising multiple cell types, whereas tissue-specific genetic ablation eliminates a defined cell population while leaving the surrounding architecture largely intact. This fundamental distinction has direct implications for the interpretation of regenerative outcomes and should be clearly articulated.

      We appreciate the reviewer raising these important points, as well as those noted in Section 2.2. We addressed the concerns from Sections 2.1 and 2.2 throughout multiple parts of our review, specifically in the following sections:

      • Physical injury – where we highlight the importance of precisely characterizing the nature and extent of tissue damage in order to appropriately interpret subsequent biological responses.

      • Chemogenetic cell-specific ablation – where we expand on this theme by discussing the advantages of selectively eliminating discrete cell populations and how this improves mechanistic interpretation of regeneration.

      • Development of NTR as a suicide gene – where we examine apoptotic pathways and their relevance to nitroreductase-mediated cell ablation.

      • NTR/prodrug systems in regenerative studies – where we compare what is currently known about immune activation and inflammatory responses across different NTR-based ablation paradigms.

      (2.2) Inflammatory response: Invasive injuries typically trigger a robust inflammatory response, which itself can be a potent driver of regeneration. By contrast, genetic cell ablation may elicit a qualitatively different inflammatory reaction. A comparative discussion of this distinction would help readers appreciate a critical limitation of genetic ablation systems relative to models of natural, accidental tissue damage.

      Please see above response 2.1

      (3) Subchapter: Cell-specific toxins.

      This subchapter would benefit from several targeted expansions:

      (3.1) Off-target effects: The authors should include evidence that the exemplified drugs have known off-target activities, with a discussion of how these confounded the interpretation of experimental data. At least a few concrete published examples should be cited.

      Thank you very much for the comments. We have strengthened the discussion of off-target effects by adding concrete published examples. We now note that MPTP/MPP⁺ can affect noradrenergic and serotonergic systems in addition to dopaminergic neurons, that aminoglycoside antibiotics can damage support cells and afferent neurons at higher concentrations with compound-specific differences in ototoxicity, and that streptozotocin exhibits hepatotoxicity beyond pancreatic β-cells.

      (3.2) Completeness of the toxin list: The current list appears illustrative rather than comprehensive. A more complete enumeration would be valuable, particularly for neurotoxins and drugs targeting sensory cells, as these are highly relevant to the zebrafish regeneration field.

      We have now consolidated the toxins discussed throughout the review into Table 1, which includes additional entries alongside the previously listed agents. We have explicitly noted that this list is representative rather than exhaustive, as the full range of cell-specific toxins used across species is extensive.

      (3.3) Interspecies differences: It would be informative to specify whether drug specificity differs across species, as this is a practical consideration for researchers working in organisms other than zebrafish.

      We appreciate the reviewer’s question regarding potential interspecies differences in prodrug performance. Early work using NTR in mammals was conducted in mice, and all five published mouse studies relied exclusively on CB1954. No other NTR-activating prodrugs have been reported in mouse models, so direct comparisons are not available. Likewise, all published Xenopus studies used MTZ and thus do not provide internal comparisons across prodrugs. The Nematostella study employed NFP (citing rationale from a zebrafish study) and the approach yielded effective ablation.

      The only non-zebrafish study that directly compared prodrugs is the Drosophila work, which evaluated MTZ, RNZ, and NFP and reported lower activity for MTZ relative to the other compounds. Because it is not clear whether the authors were aware of the batch variability of MTZ or the need for freshly prepared solutions, interpreting this specific comparison is difficult.

      To address the reviewer’s comment, we have expanded the section on non-zebrafish organisms to clearly state which prodrug was successfully used in each species. However, given the limited number of studies, the absence of titration experiments, and the lack of standardized conditions across laboratories, we do not feel that the available evidence supports drawing conclusions about interspecies differences in prodrug performance.

      Consistent with our original discussion and based on the broader biochemical and empirical data available, we continue to recommend RNZ as the starting point for new experiments.

      (4) Subchapter: Optogenetic cell ablation.

      The authors note that optogenetic cell ablation has not yet been applied in conventional regeneration studies. It would strengthen this section to include a discussion of the underlying reasons for this gap, whether technical or biological, so that readers can appreciate the barriers and potential for future adoption.

      We thank the reviewer for this helpful suggestion. As recommended, we have added a concise, explicitly speculative statement discussing potential technical factors that may explain why optogenetic cell ablation has not yet been widely applied in regeneration studies. Specifically, we note that KillerRed-based ablation requires localized light delivery and ROS generation, making it best suited for discrete, optically accessible cells and less practical for targeting large or deep tissues. We also highlight that the dependence on microscopy-based illumination inherently limits throughput. This new text clarifies possible barriers to broader adoption while acknowledging that these points remain speculative.

      (5) Terminology: "Suicide gene".

      The use of the term "suicide gene" to nitroreductase is conceptually imprecise and merits reconsideration. Strictly speaking, a suicide gene is one whose expression alone is sufficient to kill the cell, as in the case of genes encoding direct triggers of apoptosis or the catalytic A subunit of diphtheria toxin (DTA). NTR does not meet this criterion: it requires the exogenous administration of a prodrug (e.g., metronidazole) to produce a cytotoxic metabolite and is therefore only conditionally lethal.

      It is worth noting that nitroreductases evolved in bacteria and fungi as enzymes involved in chemoprotection and detoxification, converting potentially toxic and mutagenic nitroaromatic compounds into less harmful metabolites (PMID: 18355273). This biological context further underscores that NTR is not inherently a lethal protein. The authors are encouraged to replace or qualify the term "suicide gene" and instead adopt terminology that more accurately reflects the conditional, prodrug-dependent nature of the system.

      We appreciate the reviewer’s thoughtful attention to terminology. We agree that, in its most classical and stringent sense, a suicide gene is one whose expression alone is sufficient to induce cell death. We also recognize that NTR does not meet this strict criterion.

      At the same time, we note that the term has broadened in contemporary usage, particularly within applied and translational contexts, to encompass prodrug-dependent systems. For example, the National Cancer Institute Thesaurus defines a suicide gene as “a gene which will cause a cell to kill itself, typically through interaction with a prodrug,” and Taber’s Medical Dictionary likewise states that it is “a gene that causes a cell to kill itself, usually by encoding an enzyme that converts a nontoxic prodrug into a toxic metabolite.” Under these widely used definitions, NTR is included within the scope of suicide gene systems.

      Nevertheless, we appreciate that terminology in this area is not universally standardized. To ensure clarity for all readers, we have added a brief definition in the revised manuscript explicitly noting the conditional, prodrug-dependent nature of NTR-mediated ablation. We are grateful to the reviewer for prompting this clarification.

      (6) NTR/MTZ in regenerative studies: Mechanistic depth.

      While the review catalogues several studies employing the NTR/MTZ system, it lacks mechanistic depth regarding the cellular basis of ablation. The following questions should be addressed, where evidence exists in the literature:

      (6.1) Temporal dynamics of cell death: What is known about the kinetics of NTR/MTZ induced lethality across different tissue types in larval and adult zebrafish, as well as other organisms? Are there age- and tissue-specific differences in the speed or completeness of ablation?

      Thank you for this important question. We have added text noting that the kinetics and completeness of NTR/prodrug-mediated ablation vary across experimental contexts, including with differences in NTR expression, enzyme/prodrug pairing, dose, cell type, and developmental stage. Published studies illustrate that the time course of ablation can differ substantially between models. Because most studies were designed to optimize ablation within individual tissues rather than for direct side-by-side comparison, the literature does not yet support broad quantitative conclusions about age- or tissue-specific differences across systems.

      (6.2) Mechanism of cell death: What is the cellular basis of NTR/MTZ-induced cytotoxicity in zebrafish? In particular, do the toxic metabolites preferentially cause mitochondrial damage or nuclear DNA damage, and what downstream death pathways are engaged?

      Thank you for the comments. We have added text discussing the mechanism of NTR/MTZ-induced cell death. We now note that NTR-mediated reduction of MTZ generates reactive intermediates that cause DNA damage and oxidative stress, with cell death occurring predominantly through apoptosis. We have also more strongly emphasized that in dopaminergic neurons, mitochondrial damage was identified as the primary cytotoxic mechanism. We acknowledge that the relative contribution of these pathways is likely to vary by cell type and remains an important area for future study.

      (6.3) Proliferative versus post-mitotic cells: Are proliferating and non-proliferating cells equally sensitive to the NTR/MTZ system, or does the proliferative status of a cell influence susceptibility? This is a practically important question for researchers designing ablation experiments in tissues with mixed cell populations.

      We appreciate the reviewer’s insightful question. We have now added a brief clarification to this section explaining that the NTR/MTZ system has been shown to act in a cell-cycle–independent manner, and both proliferating and post-mitotic cells can be ablated effectively.

      (6.4) Ablation of progenitor cells: Are there published examples demonstrating that co-ablation of differentiated functional cells and organ-specific progenitor cells abolishes regenerative capacity? Such examples would be highly informative in illustrating the system's power to dissect the cellular requirements for regeneration.

      To our knowledge, the zebrafish lateral line currently provides the clearest example in which NTR-mediated ablation of progenitor populations results in a loss of regenerative capacity. In this system, targeted ablation of support-cell progenitors severely reduces hair-cell regeneration, illustrating how NTR enables direct testing of cellular requirements for tissue repair.

      Addressing the points above, particularly the comparative discussion of injury models and inflammatory responses, the clarification of terminology, and the mechanistic discussion of NTR/MTZ-induced cell death would substantially strengthen the review's scientific contribution and utility.

      Reviewer #2 (Public review):

      Summary:

      Kim and Parsons reviewed the nitroreductase (NTR)/prodrug system: when engineered cells expressing the enzyme NTR are treated with prodrug (e.g. metronidazole), NTR converts the prodrug into a cytotoxic compound that kills these cells. The review covers how the system has been developed, spatiotemporal control of targeted cell ablation, and its broad utility to study regenerative mechanisms, model human diseases, and screen chemicals to discover pro-regenerative and protective compounds. They further discussed the newer version of NTR, a more potent prodrug, and experimental design, which not only expands the possible utility of the NTR/prodrug system, but also allows the research community to develop a precise, reproducible and versatile platform.

      Strengths:

      The review summarized landmark work application of the NTR/prodrug system, and recent studies, with focus on the model organism zebrafish. The review provides a good gateway to understanding the system and considering regenerative studies.

      Weaknesses:

      No weaknesses were identified by this reviewer.

      Reviewer #3 (Public review):

      Summary:

      This manuscript by Kim and Parsons presents an overview of the nitroreductase/metronidazole (NTR/MTZ) cell ablation system.

      Strengths:

      This manuscript nicely places the NTR/MTZ system in the context of other cell ablation methods, with a discussion of their respective advantages and disadvantages. This review is particularly useful for highlighting the many ways the NTR/MTZ system has been applied to study the regeneration of multiple cell types and to model different degenerative human diseases. The review concludes with a discussion on recent improvements made to the system and practical considerations and "best practices" for NTR-based experiments. This review could be a helpful resource, especially for researchers new to regeneration or cell ablation studies.

      Weaknesses:

      Although the NTR/MTZ system has been used in other model organisms, this review is primarily focused on its uses in zebrafish. While this is understandable given the wide adoption of NTR/MTZ in the zebrafish field, discussion of the unique considerations and/or challenges for non-zebrafish systems would be an interesting addition and could broaden the potential audience for this review. Additional minor revisions, as suggested below, could also improve readability.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Since the lab mouse is an important mammalian model system, with certain tissues harbouring some regenerative capabilities, including the peripheral nervous system (e.g., sciatic nerve regeneration after crush), and myelin, etc., it would be great if a section could be included to discuss the potential adoption of the NTR/prodrug system in future mouse studies.

      We appreciate the reviewer’s suggestion to discuss the potential future use of the NTR/prodrug system in mouse models. In surveying the literature, we identified only five mouse studies employing NTR, all of which used CB1954. These early studies were conducted primarily as proof-of-principle work in the context of gene-directed enzyme prodrug therapy (GDEPT) for cancer, rather than for regenerative or lineage-specific ablation applications. We added this point to the text.

      Since those reports, we have not found additional examples of NTR use in mice. We do not know the precise reasons for this limited adoption, but it may reflect the availability of alternative ablation systems that are widely established in mouse research, such as the diphtheria toxin receptor (DTR) system.

      We agree that certain mouse tissues exhibit regenerative capacity and that targeted ablation tools can be valuable in such contexts. To address the reviewer’s point, we have added text noting the very limited historical use of NTR/CB1954 in mouse. We have no explanation as to why no one moved onto using NTR/MTZ in the mouse but note in two places in the text that DTR is preferred method to use in mouse ablation experiments (even though DT does cause kidney damage and is incompatible with chronic studies!).

      Minor:

      (1) Line 174-176, the sentence was repeated.

      (2) Figure 1, for the transgenic line, please be consistent with the line name in italics.

      Reviewer #3 (Recommendations for the authors):

      (1) In the abstract as well as in the main text, the authors note that the NTR/MTZ system has been used in multiple model systems. Yet, most of the review, and especially the practical advice given at the end, is very zebrafish-focused. Although this is understandable given the wide adoption of NTR/MTZ in the zebrafish field, the authors might consider revising the abstract to make it clearer that this review is primarily concerned with the use of the NTR/MTZ system in zebrafish.

      Thanks for the suggestion. We have changed last half of first paragraph in abstract

      That said, a brief discussion of any unique considerations and/or challenges for non-zebrafish systems would be an interesting addition and could broaden the potential audience for this review.

      Agreed and we have expanded in several places in the text to discuss more about the NTR system in non-zebrafish. We especially expanded our discussion about NTR in the mouse.

      (2) Line 176: There is a repetition of the sentence, "NTR/MTZ-mediated ablation has also been adapted for other model organisms."

      Found and deleted. Thank you!

      (3) Line 177: To improve clarity, the authors should include species names to prevent confusion. For example, both Xenopus laevis and Xenopus tropicalis are commonly used model organisms. Similarly, multiple Drosophila species are used by researchers.

      Added melanogaster and laevis to text.

      (4) Can the authors address whether alternatives to MTZ (RNZ, etc.) have the same issues with batch-to-batch variability? That might be an important consideration for potential users. It would also be useful to include practical guidance for accounting for batch variability, for example, how to determine optimal prodrug concentrations, whether effective concentrations need to be determined for every batch/replicate/experiment, etc.

      Added text that discusses that, it is not yet known whether RNZ exhibits batch-to-batch variability similar to MTZ, as this has not been systematically reported. Given the potential for variability, it would be prudent for researchers to titrate each new batch of RNZ or, alternatively, adopt a dosing strategy that exceeds the minimum effective concentration to ensure consistent ablation results.

      (5) For the last section ("Experimental design: Practical and technical considerations"), readability would be improved by applying a consistent bullet point format.

      Made the changes as requested.

      (6) Figure 1: Asterisks are not defined.

      The asterisks where to link to two boxes depicting the same transgene without rewriting the name of the transgene. Clearly, this wasn’t clear, so we have added explanation to legend too.

      (7) Figure 3: Given that the schematics specify expression of NTR1 and NTR1.1, I assume this figure is adapted or based on previous published report(s). If so, the reference(s) should be noted in the figure legend or on the figure itself (as done for Figure 1). If the schematic is meant to depict only in general terms how binary expression vectors can be used, a more inclusive "NTR" label might be less confusing.

      Changed figure legend and figure

      (8) Figure 4: To improve readability and accessibility, the authors should consider modifying panels C-N to use a more colorblind-friendly palette (e.g., green/magenta) or to present each channel as separate grayscale images.

    2. eLife Assessment

      This Review Article nicely synthesizes the development, applications, and recent technical advances of the nitroreductase/prodrug system, highlighting how it enables precise spatiotemporal cell ablation and experimental platforms for studying regenerative mechanisms and screening for pro-regenerative or protective compounds. Together, the article provides a conceptual and practical overview that will help researchers adopt and further develop this versatile approach in regenerative biology. It will be of interest to researchers studying regeneration, disease modelling, and targeted cell ablation, particularly those working with zebrafish and other genetic model systems.

    3. Reviewer #1 (Public review):

      Summary:

      Kim and Parsons present a timely overview of the NTR/prodrug system and its applications in regenerative biology research, with particular emphasis on tissue-specific cell ablation. The system has substantially advanced the field by enabling non-invasive, conditional cell elimination, and has proven especially powerful in zebrafish, though applications in other classical model organisms are also noted. The review covers the historical origins of the NTR system, its use in regeneration studies, small-molecule screening, and genetic and CRISPR-based screening, as well as future directions including the development of the highly efficient NTR2 enzyme variant.

      Strengths:

      This is a useful and well-structured contribution. The manuscript is a valuable resource for the regeneration biology community.

      Weaknesses:

      The revised manuscript shows significant improvements; however, two points remain insufficiently addressed and should be resolved in the final version.

      (1) The term 'suicide gene'

      As noted in my first round of revisions, the term 'suicide gene' as applied to bacterial nitroreductase remains unaddressed in the revised manuscript, despite being scientifically inappropriate and a potential source of confusion regarding the NTR/Mtz mechanism.

      'Suicide' implies an intrinsic, cell-autonomous programme of self-destruction. This is incompatible with the NTR/Mtz system, in which cell death is experimentally induced through exogenous administration of metronidazole (Mtz) by the investigator. While the 'suicide gene' framing may have utility in the cancer therapy literature, likely to aid communication with non-specialist and clinical audiences, however, it is not standard in the zebrafish field, where NTR is more accurately described as a conditional toxigene. Since this review focuses predominantly on zebrafish models, its terminology should reflect that of the relevant literature.

      A further conceptual problem with the 'suicide gene' framing is that it obscures the pharmacological nature of Metronidazole. Mtz is a pharmaceutical agent with intrinsic baseline toxicity: extended exposure or modestly elevated concentrations cause toxic side effects and lethality even in non-transgenic (wild-type) zebrafish (PMID: 24428354). NTR-expressing cells do not self-destruct; rather, they are rendered selectively hypersensitive to Mtz relative to other eukaryotic cells by virtue of expressing the enzyme. This distinction is mechanistically important and should be reflected in the language used throughout the manuscript.

      In summary, the term 'suicide gene' does not accurately capture enzyme-mediated bioactivation of an exogenous prodrug and should be removed from the manuscript.

      (2) Barriers to using the NTR/Mtz system in non-aquatic model organisms

      In response to my suggestion that the title should include "zebrafish" to accurately convey the scope of the review to prospective readers, the authors stated that "there is no intrinsic barrier to adopting this technique more broadly in other systems," citing the example that "NTR was first developed in mice, but with a prodrug that proved difficult to use, and it was not widely pursued." These two statements are, however, contradictory: if the prodrug proved difficult to use, this constitutes precisely the kind of practical barrier the authors claim does not exist. The authors should clarify and reconcile this inconsistency, and provide a more thorough discussion of why the NTR/Mtz system has seen limited adoption in classical model organisms, such as mice and Drosophila.

    4. Reviewer #2 (Public review):

      Summary:

      Kim and Parsons reviewed the nitroreductase (NTR)/prodrug system: when engineered cells expressing the enzyme NTR are treated with prodrug (e.g. metronidazole), NTR converts the prodrug into cytotoxic compound which kill these cells. The review covers how the system has been developed, spatiotemporal control of targeted cell ablation, and its broad utility to study regenerative mechanisms, model human diseases, and screen chemicals to discover pro-regenerative and protective compounds. They further discussed the newer version of NTR, more potent prodrug, and experimental design, which not only expand the possible utility of the NTR/prodrug system, but allow the research community to develop a precise, reproducible and versatile platform.

      Strengths:

      The review summarized landmark work application of the NTR/prodrug system, and recent studies in model organisms, with focus on the model organism zebrafish. The review provides a good gateway to understanding the system and considering regenerative studies.

      Weaknesses:

      None.

      Comments on revisions:

      The authors have addressed the previous points, and the manuscript has been greatly improved.

    5. Reviewer #3 (Public review):

      Summary:

      This manuscript by Kim and Parsons presents an overview of the nitroreductase/metronidazole (NTR/MTZ) cell ablation system.

      Strengths:

      This manuscript nicely places the NTR/MTZ system in context of other cell ablation methods, with a discussion of their respective advantages and disadvantages. This review is particularly useful for highlighting the many ways the NTR/MTZ system has been applied to study regeneration of multiple cell types and to model different degenerative human diseases. The review concludes with a discussion on recent improvements made to the system and practical considerations and "best practices" for NTR-based experiments. This review could be a helpful resource, especially for researchers new to regeneration or cell ablation studies.

      Comments on revised version:

      I thank the reviewers for revising the manuscript to expand their discussion of using the prodrug/NTR system in other model organisms while also revising the abstract to make it clear this review will be zebrafish focused. With these revisions, this review provides an informative overview of how the prodrug/NTR system has not only been an important tool for regeneration studies and but also for elevating the zebrafish as a regeneration model. That said, including other model organisms could have been a nice addition to the last section on experimental considerations, especially in the context of discussing potential barriers to wider adoption of the NTR system. However, given that the vast majority of studies using the NTR system are in zebrafish, the current scope of this review is understandable.

    1. eLife Assessment

      This study provides valuable contributions to establish canonical Dhh signaling as a primary mediator in the differentiation of Leydig cells and their steroidogenic capacity. Together, the experimental design using their established stem Leydig cell line alongside relevant genetically mutated models, both derived using the relevant Nile tilapia animal system, provided largely convincing evidence to support their conclusions. The work will be of broad interest to developmental biologists interested in differentiation of steroidogenic or hormone producing cells.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      Summary:

      This manuscript by Zhao et. al investigates the canonical hedgehog pathway in testis development of Nile tilapia. They used complementary approaches with genetically modified tilapia and transfected TSL cells (a clonal stem Leydig cell line) previously derived from 3-mo old tilapia. The approach is innovative and provides a means to investigate DHH and each downstream component from the ptch receptors to the gli and sf1 transcription factors. They concluded that Dhh binds Ptch2 to stimulate Gli1 to promote an increase in Sf1 expression leading to the onset of 11-ketotesterone synthesis heralding the differentiation of Leydig cells in the developing male tilapia.'

      Strengths of the methods and results:

      - The use of Nile tilapia is important as it is an important aquaculture species, it shares the genetic pathway for sex determination of mammalian species, and molecular differentiation pathways are highly conserved<br /> - The approach is rigorous and incorporates a novel TSL, clonal stem Leydig cell model that they developed that is relatively faithful in following endogenous developmental steps and can produce the appropriate steroid.<br /> - Tilapia are relatively amenable to CRISPR/Cas9 targeting and, with their accelerated developmental time frame, provide an excellent model system to interrogate specific signaling pathways.<br /> - The stepwise analysis from dhh-gli-sf1 is thoughtful and well done.

      Achieved Aims: The authors set out to test the hypothesis that the canonical Dhh signaling pathway for Leydig cell differentiation and steroidogenic activity is mediated via ptch2 and gli1 regulation of sf1. The results are strong, there are additional steps needed to verify that redundancy/compensation is not contributing to the outcomes.

      This work is important in better understanding of nuanced commonalities and differences in developmental pathways across species. Specific to Leydig cell differentiation and steroidogenesis, their work with tilapia supports conservation of the canonical Dhh pathway; however, there appear to be some differences in downstream mediators compared to mouse. Specifically, they conclude that ptch2/gli1 stimulates sf1 and steroidogenesis in tilapia where gli1 is dispensable in mouse. Instead, Gli3 has recently been shown to play an important role to stimulate Sf1 and support the hedgehog pathway.

    3. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses of the methods and results:

      - Line 162: need to establish and verify the PKH26-labeled TSL cells were unaffected by the dhh-/- environment. No data to support the claim that they were unaffected.

      We thank the reviewer for this important comment. In dhh<sup>-/-</sup> recipient testes, PKH26-labeled TSL cells were observed within the interstitial compartment (Fig. 3C3). Importantly, these PKH26-positive cells could be induced by SAG treatment to differentiate into Cyp11c1-positive steroidogenic cells (Fig. 3E3), indicating that they remained viable in the dhh<sup>-/-</sup> environment.

      We have revised the Results section (line 171–173) to “These results suggest that SLC differentiation is inhibited, whereas the survival and engraftment of PKH26-labeled TSL cells were not affected in dhh<sup>-/-</sup> XY tilapia testes.”

      - The rescued phenotype caused by the addition of ptch2-/- to the dhh-/- model is a compelling. To further define potential ptch1 contributions, it would be helpful to examine the expression level of ptch1 in the context of the ptch2-/- and ptch2-/-;dhh-/- mutant animals. Any compensatory increase in ptch1 in either case, without obvious phenotype changes, would support the dominant role for ptch2.

      We thank the reviewer for this valuable suggestion. We have now performed RT-qPCR analysis of ptch1 expression in XY testes from WT, ptch2<sup>-/-</sup> and dhh<sup>-/-</sup>;ptch2<sup>-/-</sup> fish at 90 dah. As shown in Fig. S8, no significant differences in ptch1 mRNA levels were detected among these genotypes, indicating that loss of ptch2 does not induce compensatory upregulation of ptch1 at the transcriptional level under the conditions examined. We have revised the Discussion section (line 277–290) to “The specificity for Ptch2 in this context might stem from unique co-receptor interactions or expression patterns within the testicular niche. To preliminarily assess potential compensatory regulation, we examined ptch1 expression in XY testes from WT, ptch2<sup>-/-</sup> and dhh<sup>-/-</sup>;ptch2<sup>-/-</sup> fish at 90 dah. No significant differences in ptch1 mRNA levels were detected among these genotypes (Fig. S8), suggesting that loss of ptch2 does not trigger compensatory upregulation of ptch1 at the transcriptional level under the conditions examined. Nonetheless, global ptch2 mutation affects multiple tissues, whereas our mechanistic focus is on SLC differentiation within the testicular niche. Moreover, the early embryonic lethality of global ptch1 mutation in tilapia (Liu et al., 2024) precludes direct assessment of its role in postnatal testis development. Therefore, although our findings strongly support a predominant role for Ptch2 in mediating Dhh signaling in SLCs, definitive resolution of receptor specificity will require future Leydig cell-specific conditional knockout models.”

      - Activity of individual gli factors need additional reconciliation. The expression profiles for both alternative gli factors should be quantified in each knockout cell line to establish redundancy and/or compensation.

      We agree that quantifying the expression of alternative gli genes might be informative. In the present study, TSL-gli1<sup>-/-</sup> cells completely lose responsiveness to Dhh stimulation in the 8×GLI luciferase assay, whereas TSL-gli2<sup>-/-</sup> and TSL-gli3<sup>-/-</sup> cells retain normal pathway activation (Fig. 5B), which unambiguously suggest that Gli1 is the principal transcriptional effector in tilapia SLCs under our experimental conditions. Redundancy and/or compensation of alternative gli factors need further genetic dissection in the future study.

      - Figure 5E: An important control is missing that includes evaluation of HEK293 cells transfected with pcDNA3.1-OnGli1 without the addition of pGL3-sf1.

      We don’t think HEK293 cells transfected with pcDNA3.1-OnGli1 without the addition of pGL3-sf1 is an important control in our study. In the dual-luciferase assays, we think pcDNA3.1 + pGL3 (empty reporter) and pcDNA3.1 + pGL3-sf1 controls were sufficient.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Recommendations for improving the writing and presentation; minor corrections:

      - Include Park paper (Endocrinology 2007) somewhere near line 73. Need to acknowledge this paper as it is one of the first to connect Dhh to Sf1.

      We have now included the citation of Park et al. (Endocrinology 2007) in the Introduction (now line 81).

      - Include Kothandapani paper (PLoS Genetics 2020) somewhere near line 86. Need to acknowledge this paper as it is the only to reconcile the data showing no difference in Gli1 or Gli2 knockouts, but loss of Leydig cell function due to Gli3 activity.

      We have now included the citation of Kothandapani et al. (PLoS Genetics 2020) in the Introduction (now line 97).

      - Please include sequences of B1 and B2 in sf1 promoter, how conserved are they to the canonical Gli binding sequence?

      We have revised the Results section (line 216–218) to “Functional annotation of its promoter region identified two conserved Gli1-binding motifs, B1 (AACCACCCA) and B2 (GAGCCACCCA)”.

      - Figure 1 or results text: please clarify that the dhh-/- model used is the delta13bp mutation.

      We have clarified in the Results section (line 133) that the dhh<sup>-/-</sup> model corresponds to the 13-bp (CAGGGATGCGGAC) frameshift deletion.

      - Figure 5E legend: please clarify that HEK293 cells are used

      We have revised the Figure 5E legend to explicitly state that the dual-luciferase reporter assays were performed in HEK293 cells. Revised legend sentence (line 743-746): HEK293 cells were co-transfected with pRL-TK, pGL3, pcDNA3.1, pGL3-sf1, pcDNA3.1-On Gli1, and the indicated cold probe constructs, and luciferase activity was measured 48 hours post-transfection.

      - Figure S5E: * indicates the heteroduplex-it seems that there is a heteroduplex highlighted with the asterisk at ~600bp size; based on homozygous and mutant bands, it seems the asterisk should be highlighting the duplex near those sized bands. What are the bands up at ~600bp?

      We thank the reviewer for the careful observation. In Figure S5E, the bands observed at approximately ~600 bp represent heteroduplex products formed during the re-annealing of PCR amplicons derived from heterozygous individuals. During denaturation and re-annealing, WT and mutant strands can pair in different configurations, generating distinct heteroduplex conformations that migrate more slowly than homoduplex products in PAGE. As a result, two heteroduplex bands are visible at ~600 bp, reflecting alternative mismatched duplex structures. The homoduplex WT and mutant bands are indicated separately by arrows.

      - Figure S7F: dhh-/- data are missing

      We thank the reviewer for pointing out this omission. The missing dhh<sup>-/-</sup> dataset has now been added to Figure S7F, and the figure has been updated accordingly.

    1. eLife Assessment

      This important study provides a comprehensive multi-omics characterization of Leishmania donovani stage differentiation, offering insights into the molecular basis of parasite adaptation across host environments. The authors present convincing evidence that stage transitions are not driven by genomic variation but instead rely on coordinated post-transcriptional regulation, including mRNA turnover, translation, and protein degradation. Although experimental validation of these findings and conclusions remains to be completed, the integration of diverse, high-quality datasets establishes a robust resource that will be of broad utility to researchers investigating Leishmania biology and life-cycle progression.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      Summary:

      The authors describe co-regulated gene modules underlying stage differentiation in Leishmania donovani through a system-level analysis of multiple molecular layers. Using amastigotes isolated from infected hamster spleens and corresponding culture-derived promastigotes, they analyzed genomic variation, transcript abundance, protein levels, phosphorylation states, and metabolite profiles. By combining these, the study identified potential regulatory mechanisms associated with parasite differentiation and generated hypotheses regarding how gene expression is coordinated across different levels.

      Strengths:

      A major strength of the study is the breadth of the dataset generated. The integration provides an unusually comprehensive view of molecular changes associated with Leishmania differentiation in vitro. Such multi-layer datasets involving bona fide vertebrate host stages remain relatively rare in parasitology and will likely become a valuable resource for the molecular parasitology community. In addition, the use of amastigotes isolated from infected hamsters rather than relying on axenic models provided a biologically relevant framework for the analyses.

      The revised manuscript improved several aspects of the original. The RNA-seq analysis is described with a clearer pipeline, and several claims regarding causal regulatory feedback associations have been appropriately toned down. Among the observations reported, the association between parasite differentiation and proteasome-mediated protein degradation is particularly remarkable. The combination of quantitative proteomics with pharmacological inhibition of the proteasome with lactacystin provides support for a role for protein turnover in developmental transitions and paves the way for future mechanistic studies.

      Weaknesses:

      Most regulatory interpretations remain largely inferential or indirect. The integration identifies correlations between different levels, but direct functional validation is limited/absent. Many of the descriptions should not be interpreted as validated. As highlighted by the authors in this revised version, the mechanistic studies will be part of future work and are beyond the scope of the current work. Of note, the attempt to confirm lactacystin-induced inhibition of proteasomal activity via anti-polyUb immunoblotting did not demonstrate the expected outcome of increase in overall poly-ubiquitylation.Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

    3. Reviewer #2 (Public review):

      Pescher and colleagues present a revised manuscript detailing the multi-omic characterisation of Leishmania donovani amastigote to promastigote differentiation and integration of this data. The molecular pathways that regulate Leishmania life-stage transitions are still poorly understood, with many approaches exploring single proteins/RNAs etc in a reductionist manner. This paper takes a systems-scale approach and does a good job of integrating the disparate -omics datasets to generate hypotheses about the intersections of regulatory proteins that are associated with life-cycle progression. The differentiation step studied is from amastigote to promastigote using hamster-derived amastigotes which is a major strength. The use of hamsters permits the extraction of parasites that are host adapted and represent "normal", host-adapted Leishmania ploidy; the promastigote experiments are performed at a low passage number. Therefore, this is a strength or the work as it reduces the interference from the biological plasticity of Leishmania when it is cultured outside the host for prolonged periods. The multi-omics datasets presented are robust in their acquisition and analysis and will form an excellent resource for researchers studying the molecular events (particularly proteasomal protein degradation, and phosphorylation) during life-stage progression.

      Overall, in the absence of follow up experiments on specific individual examples, some of the claims in the original submission were toned down and reflect a more neutral description of the data now. Significantly, the data still underpin a key role for regulation of the ribosome between the amastigote and promastigote stages (and during the differentiation process). The recursive and reciprocal links between the phosphorylation and ubiquitination systems are interesting and present many opportunities for future investigation.

    4. Reviewer #3 (Public review):

      Summary:

      The authors proposed to use 5-layer systems level analysis (genomics, transcriptomics, proteomics / protein degradation, metabolomics, phosphoproteomics) to uncover how post-transcriptional mechanisms regulate stage differentiation in Leishmania donovani.<br /> This enabled the identification of several potential regulatory networks, including the regulation of stage-specific gene clusters by RNA stabilisation or decay, proteasomal degradation and protein phosphorylation.

      In the new version of this manuscript, the authors have addressed all questions raised by the reviewers.

      Strengths:

      Although some observations in this study have already been described in the literature, the integrated analysis applied here provides a novel view on how different levels of post-transcriptional networks regulate Leishmania differentiation. This "5-layer system" represents the first analysis of this depth in kinetoplastid parasites.<br /> The revised version with an increased sample number for the RNA-seq now made the authors assumptions adequate to their obtained data.<br /> The use of a proteasomal inhibitor adds an interesting insight in how protein degradation is involved in the parasite differentiation, confirming previous observations in the literature, and help to explain the discrepancies between mRNA and protein expression in the different stages.

      Weaknesses:

      While this work provides an impressive and foundational dataset, it opens the door for future research to rigorously validate these initial findings and conclusions.

      Significance and Impact in the field.

      The different datasets generated in this study will be of great interest to the parasitology community, either to be used for hypothesis generation, to validate data from other sources, etc.

      The multi-layered analysis performed here identified a series of potential feedback loops and regulatory networks to be further explored in organisms that lack transcriptional control.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Comments on revised version:

      The authors have appropriately addressed my comments and questions from the initial review process. My remaining concern relates to the lack of evidence to confirm proteasomal inhibition by lactacystin in both promastigotes and amastigotes. The immunoblotting experiment newly presented does not reveal a clear increase in the levels of poly-ubiquitylated proteins in treated parasites. In fact, poly-Ub levels were lower at both the 4h and 18h timepoints of treatment. If alternative antibodies or additional immunoblots are not available, the manuscript would benefit from an expanded discussion of this observation and potential explanations. In particular, the interpretation that lactacystin stabilizes ama- and pro-specific degradation would be greatly strengthened by such validation.

      Reviewer #2 (Public review):

      General comments on the revisions:

      My view is that the authors have made significant, satisfactory changes that address the comments and queries I made on the original manuscript (Review Commons).

      There are two areas where the authors had to make major changes/justifications where further comment is merited, these were:

      RNA-seq.

      The most significant issue was the originally underpowered RNA-seq which had only two replicates. This has been repeated with four replicates now. This has not led to changes in the interpretation of the data between the original study and this one. One comment that the authors make in the response to this was : "Given the robustness of the stage-specific transcriptome, and the legal constrains associated with the use of animals, we chose to limit the number of replicates to the necessary". Ensuring that animal experiments are properly powered and that maximum robustness of the data from the minimum sample size is an important part of experimental design for ethical use of animal models. Essentially the replication here could have been avoided if the original study had used 1 more animal. However, the new version of RNA-seq brings appropriate confidence to the interpretation of the data.

      Phosphoproteomics.

      The authors provide a robust justification of their strategy for the phosphoproteomics and highlight the inclusion criteria for phosphosites: "Phosphosites were only considered if detected with high confidence (identification FDR<1%) and high localisation confidence (localisation probability >0.75) in at least one replicate". The way missing values were dealt with is explained "For statistical analyses, missing values within a given condition were imputed with a well-established algorithm (MLE) only when at least one observed value was present in that condition." This fills in some of the gaps I was missing from the original manuscript, and I am satisfied that the data analysis is entirely appropriate for a discovery/system -based approach such as this one. The authors also edit the manuscript to reflect that "occupancy" or "stoichiometry" might not be the best description of what they were presenting and switched to the terminology of "normalised phosphorylation level" - I think this is an appropriate response.

      Overall, in the absence of follow up experiments on specific individual examples, some of the claims in the original submission were toned down and reflect a more neutral description of the data now. Significantly, the data still underpin a key role for regulation of the ribosome between the amastigote and promastigote stages (and during the differentiation process). The recursive and reciprocal links between the phosphorylation and ubiquitination systems are interesting and present many opportunities for future investigation.

      Reviewer #3 (Public review):

      Summary:

      The authors proposed to use 5-layer systems level analysis (genomics, transcriptomics, proteomics / protein degradation, metabolomics, phosphoproteomics) to uncover how post-transcriptional mechanisms regulate stage differentiation in Leishmania donovani.<br /> This enabled the identification of several potential regulatory networks, including the regulation of stage-specific gene clusters by RNA stabilisation or decay, proteasomal degradation and protein phosphorylation.

      In the new version of this manuscript, the authors have addressed all questions raised by the reviewers.

      Strengths:

      Although some observations in this study have already been described in the literature, the integrated analysis applied here provides a novel view on how different levels of post-transcriptional networks regulate Leishmania differentiation. This "5-layer system" represents the first analysis of this depth in kinetoplastid parasites.

      The revised version with an increased sample number for the RNA-seq now made the authors assumptions adequate to their obtained data.

      The use of a proteasomal inhibitor adds an interesting insight in how protein degradation is involved in the parasite differentiation, confirming previous observations in the literature, and help to explain the discrepancies between mRNA and protein expression in the different stages.

      Weaknesses:

      While this work provides an impressive and foundational dataset, it opens the door for future research to rigorously validate these initial findings and conclusions.

      Significance and Impact in the field.

      The different datasets generated in this study will be of great interest to the parasitology community, either to be used for hypothesis generation, to validate data from other sources, etc.

      The multi-layered analysis performed here identified a series of potential feedback loops and regulatory networks to be further explored in organisms that lack transcriptional control.

      According to the reviewers’ comments, we made the following minor changes:

      As suggested by reviewer 1, we have extended the discussion of the results related to the analysis of the ubiquitination pattern by Western blot analysis as follows: “Proteasome inhibition blocked amastigote-to-promastigote differentiation, without inducing rapid global accumulation of ubiquitinated proteins (Figure S7C, upper panel) consistent with a quiescent-like state and low basal ubiquitin–proteasome system activity in amastigotes. After 18 h, ubiquitination levels remained similar to untreated cells, indicating that protein turnover and ubiquitin accumulation are primarily driven by developmental remodeling rather than acute proteasome inhibition. In promastigotes, the lack of detectable change (Fig. S7C, lower panel) may also reflect high basal ubiquitination, engagement of compensatory pathways such as autophagy, and/or only partial proteasome inhibition.”

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      Minor comments:

      - Supplementary figure 3 is not referenced in the main text.

      - The authors removed the "infinite" sign from figures 3 and 4 to better present the data according to their chosen approach to missing values when LFQ=0. However, the sign is still present in the respective figure legends, please adjust.

      Supplementary Figure 3 (Figure S3) is now referenced in the main text as requested.

      The "infinite" sign has been removed from the legends of Figures 3 and 4 as requested.

    1. eLife Assessment

      This study provides valuable insights into mitochondrial cristae organization in Plasmodium falciparum, particularly in the context of its divergent MICOS composition. The authors present convincing evidence, supported by phenotypic and morphological analyses, that cristae junction maintenance can be uncoupled from de novo cristae formation, reinforcing an emerging model of mitochondrial inner membrane organization. Notably, the absence of Mic10 alongside an enlarged and divergent MICOS complex highlights an intriguing evolutionary adaptation, although further characterization of the complex would strengthen the study's overall significance.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      The manuscript by Tassan-Lugrezin et al. confirms the existence of the MICOS complex in the causative agent of malaria Plasmodium falciparum. Prior to this study, only one of the two core MICOS subunits, Mic60, was found by homology search to be encoded in the apicomplexan parasite's genome. This study demonstrates the absence of the other core subunit, Mic10. It also identifies another MICOS subunit, Mic19, which co-migrates with Mic60 in a very large molecular weight complex upon blue native polyacrylamide gel electrophoresis. The authors then demonstrate that expression of both Mic60 and Mic19 is considerably upregulated during the differentiation of P. falciparum from the pathogenic asexual blood stage (ABS) to gametocytes, which correlates with the activation of oxidative phosphorylation during this process. While gene deletion of Mic19, Mic60 and both simultaneously does not affect this transition, the crista are nevertheless deformed. More significantly, crista junctions are significantly reduced, indicating that MICOS serves the same function in apicomplexans as it does in opisthokonts: maintaining crista junctions. Furthermore, the genetic interaction of mic60 and mic19 observed by augmented crista deformation when both are deleted is further evidence of their biochemical interaction, further supporting their similar complexome profiles. This study represents an important contribution to our understanding of MICOS evolution. Furthermore, the study shows that proper cristae formation is not essential for Plasmodium life cycle progression under in vitro conditions. Moreover, mutant gametocytes are still able to mate in the mosquito vector, albeit with lower efficiency.

      Strengths:

      The study is a result of a lot of technically challenging work in the model Plamsodium. The technically difficult life cycle progression experiments are well performed as far as I can tell. The electron microscopy is very well done and rigorously analyzed to obtain information about crista parameters. In particular, the authors were able to quantify the occurrence and diameter of crista junctions, which is very challenging in small mitochondria with small cristae. Finally, the authors provide convincing support that Mic60 and the newly discovered Mic19 act to shape crista junctions and MICOS can apparently carry out this function without the core subunit Mic10.

      Weaknesses:

      In its current form, there are conceptual weaknesses. The authors focus on the development of crista from a highly likely acristate state. This is true. But there can be more insight by considering their result in light of discovering the first functioning MICOS complex without one of its two core proteins, Mic10. The surprisingly large size of is also not really considered by the authors. This brings me the second weakness in my opinion. While I think the study represents a lot of work utilizing appropriate and crucial experiments, it seems the Complexome data was not explored enough. This data revealed Mic19, but what other potential subunits are co-migrating with Mic60 and Mic19 that can explain the large size of Plasmodium MICOS? Also, what is the consequence of the loss of Mic60 and Mic19 on the mitoproteome? Perhaps other MICOS subunits can be identified by their depletion in the knockouts versus the parental cell line.

      Comments on latest version:

      I am reviewing this manuscript again after reviewing it for Reviewers Commons. I appreciate the author's responses to my comments. The new version is improved but, in my opinion, still needs more work.

      These revisions are changes to text, interpretations and obtaining more data from existing data or databases. I do still think one experimental control is necessary to substantiate the authors claim about membrane potential.

    3. Reviewer #2 (Public review):

      This manuscript reports the identification of putative orthologues of mitochondrial contact site and cristae organizing system (MICOS) proteins in Plasmodium falciparum - an organism that unusually shows an acristate mitochondrion during the asexual part of its life cycle and then develops cristae as it enters the sexual stage of its life cycle and beyond into the mosquito. The authors identify PfMIC60 and PfMIC19 as putative members and study these in detail. The authors add HA tags to both proteins and look for timing of expression during the parasite life cycle and attempt (unsuccessfully) to localise them within the parasite - lack of signal concluded to be reflect very low expression levels. They also genetically delete both genes singly and in parallel and phenotype the effect on parasite development. They show that both proteins are expressed in gametocytes and not asexuals, suggesting they are present at the same time as cristae development. They also show that the proteins are dispensable for the entire parasite life cycle investigated (asexuals through to sporozoites), however there is some reduction in mosquito transmission. Using mitotracker labelling, the authors observe differences in mitochondrial organisation in gametocytes compared to the transgenic lines. Further investigation at higher resolution using EM techniques, shows data supporting their hypothesis that PfMIC60 and PfMIC19 are important for organising the parasite mitochondrion.

      The manuscript is interesting and is an intriguing use of a well-studied organism of medical importance to answer fundamental biological questions. Given the essentiality of mitochondrial respiration for parasite survival in the mosquito, it is surprising that the single and double knock-out transgenics do not give a severe phenotype. However, the authors have been rigorous in characterizing the impact of genetic deletion of both genes throughout the parasite life cycle. Subtle differences in mitochondrial organisation were observed, consistent with their hypothesis that PfMIC60 and PfMIC19 play roles in mitochondrial organisation. Therefore, these data presented give new insights into an organelle that dramatically changes during parasite development and adds to our knowledge of mitochondrial biology in a highly unusual organism.

      Comments on revised version:

      I previously reviewed this manuscript for Review Commons. This version is greatly improved and the authors should be commended for addressing all comments raised.

    4. Reviewer #3 (Public review):

      Summary:

      MICOS is a conserved mitochondrial protein complex responsible for organising the mitochondrial inner membrane and the maintenance of cristae junctions. This study sheds first light on the role of two MICOS subunits (Mic60 and the newly annotated Mic19) in the malaria parasite Plasmodium falciparum, which forms cristae de novo during sexual development, as demonstrated by EM of thin section and electron tomography. By generating knockout lines (including a double knockout), the authors demonstrate that knockout of both MICOS subunits leads to defects in cristae morphology and a partial loss of cristae junctions. With a formidable set of parasitological assays, the authors show that despite the metabolically important role of mitochondria for gametocytes, the knockout lines can progress through the life stages and form sporozoites, albeit with diminished infection efficiency.

      Major comments (from the previous round of review):

      (1) The authors should improve to present their findings in the right context, in particular by:

      (i) giving a clearer description in the introduction of what is already known about the role of MICOS. This starts in the introduction, where one main finding is missing: loss of MICOS leads to loss of cristae junctions and the detachment of cristae membranes, which are nevertheless formed, but become membrane vesicles. This needs to be clearly stated in the introduction to allow the reader to understand the consistency of the authors' findings in P. falciparum with previous reports in the literature.

      (ii) at the end to the introduction, the motivating hypothesis is formulated ad hoc "conclusive evidence about its involvement in the initial formation of cristae is still lacking" (line 83). If there is evidence in the literature that MICOS is strictly required for cristae formation in any organism, then this should be explained, because the bona fide role of MICOS is maintenance of cristae junctions (the hypothesis is still plausible and its testing important).

      (2) Line 96-97: "Interestingly, PfMIC60 is much larger than the human MICOS counterpart, with a large, poorly predicted N-terminal extension." This statement is lacking a reference and presumably refers to annotated ORFs. The authors should clarify if the true N-terminus is definitely known - a 120kDa size is shown for the P. falciparum, but this is not compared to the expected length or the size in S. cerevisiae.

      (3) lines 244-245: "Furthermore, our data indicates the effect size increases with simultaneous ablation of both proteins?". The authors should explain which data they are referring to, as some of the data in Figs 3 and 4 look similar and all significance tests relate to the wild type, not between the different mutants, so it is not clear if any overserved differences are significant. The authors repeat this claim in the discussion in lines 368-369 without referring to a specific significance test. This needs to be clarified.

      (4) lines 304-306: "Though well established as the cristae organizing system, the role of MICOS in initial formation of cristae remains hidden in model organisms that constitutively display cristae.". This sentence is misleading since even in organisms that display numerous cristae throughout their life cycle, new cristae are being formed as the cells proliferate. Thus, failure to produce cristae in MICOS knockout lines would have been observable but has apparently not been reported in the literature. Thus, the concerted process in P. falciparum makes it a great model organism, but not fundamentally different to what has been studied before in other organisms.

      (5) lines 373-378. "where ablation of just MIC60 is sufficient to deplete functionality of the entire MICOS (11, 15),". The authors' claim appears to be contrary to what is actually stated in ref 15, which they cite:

      "MICOS subunits have non-redundant functions as the absence of both MICOS subcomplexes results in more severe morphological and respiratory growth defects than deletion of single MICOS subunits or subcomplexes."

      This seems in line with what the authors show, rather than "different".

      (6) lines 380-385: "... thus suggesting that membrane invaginations still arise but are not properly arranged in these knockout lines. This suggests that MICOS either isn't fully depleted,...". These conclusions are incompatible with findings from ref. 15, which the authors cite. In that study, the authors generated a ∆MICOS line which still forms membrane invaginations, showing that MICOS is not required at all for this process in yeast. Hence the authors' implication that MICOS needs to be fully depleted before membrane invaginations cease to occur is not supported by the literature.

      (7) The authors should consider if the first part of their title could be seen as misleading: It suggests that MICOS is "the architect" in cristae formation, but this is not consistent with the literature nor their own findings.

      Significance:

      The main strength of the study is that it provides the first characterisation of the MICOS complex in P. falciparum, a human parasite in which the mitochondrion has been shown to be a drug target. Mic60 and the newly annotated Mic19 are confirmed to be essential for proper cristae formation and morphology, as well as overall mitochondrial morphology. Furthermore, the mutant lines are characterised for their ability to complete the parasite life cycle and defects in infection effectivity are observed. This work is an important first step for deciphering the role of MICOS in the malaria parasite and the composition and function of this complex in this organism.

      The limitation of the study stems from what is already known about MICOS and its subunits in other organisms. MICOS subunit knockouts have been characterised in great detail in yeast and humans with similar findings regarding loss of cristae and cristae defects. The findings of this study do not provide dramatic new insight on MICOS function or go substantially beyond the vast existing literature in terms of the extent of the study, which focuses on parasitological assays and morphological analysis.

      Exploring the role of MICOS in an early-divergent organism and human parasite is however important given the divergence found in mitochondrial biology and P. falciparum is a uniquely suited model system. One aspect that would increase the impact of the paper would be if the authors could mechanistically link the observed morphological defects to the decreased infection efficiency, e.g. by probing effects on mitochondrial function. This will likely be challenging as the morphological defects are diverse and the fitness defects appear moderate/mild.

      The advance presented in this study is to pioneer the study of MICOS in P. falciparum, thus widening our understanding of the role of this complex to different model organism. This study will likely be mainly of interest for specialised audiences such as basic research parasitologists and mitochondrial biologists. My own field of expertise is mitochondrial biology and structural biology.

      Comments on revised version:

      The authors have addressed my all of my previous comments in the updated manuscript version.

    5. Author response:

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary:

      This manuscript reports the identification of putative orthologues of mitochondrial contact site and cristae organizing system (MICOS) proteins in Plasmodium falciparum - an organism that unusually shows an acristate mitochondrion during the asexual part of its life cycle and then this develops cristae as it enters the sexual stage of its life cycle and beyond into the mosquito. The authors identify PfMIC60 and PfMIC19 as putative members and study these in detail. The authors at HA tags to both proteins and look for timing of expression during the parasite life cycle and attempt (unsuccessfully) to localise them within the parasite. They also genetically deleted both gene singly and in parallel and phenotyped the effect on parasite development. They show that both proteins are expressed in gametocytes and not asexuals, suggesting they are present at the same time as cristae development. They also show that the proteins are dispensible for the entire parasite life cycle investigated (asexuals through to sporozoites), however there is some reduction in mosquito transmission. Using EM techniques they show that the morphology of gametocyte mitochondria is abnormal in the knockout lines, although there is great variation.

      Major comments:

      The manuscript is interesting and is an intriguing use of a well studied organism of medical importance to answer fundamental biological questions. My main comments are that there should be greater detail in areas around methodology and statistical tests used. Also, the mosquito transmission assays (which are notoriously difficult to perform) show substantial variation between replicates and the statistical tests and data presentation are not clear enough to conclude the reduction in transmission that is claimed. Perhaps this could be improved with clearer text?

      We would like to thank the reviewer for taking the time to review our manuscript. We are happy to hear the reviewer thinks the manuscript is interesting and thank the reviewer for their constructive feedback.

      To clarify the statistical analyses used, we included a new supplementary dataset with all statistical analyses and p-values indicated per graph. Furthermore, figure legends now include the information on the exact statistical test used in each case.

      Regarding mosquito experiments, while we indeed reported a reduction in transmission and oocysts numbers, we are aware that this effect might be due to the high variability in mosquito feeding assays. To highlight this point, we deleted the sentence “with the transmission reduction of [numbers]….” and we included the sentence “The high variability encountered in the standard membrane feeding assays, though, partially obstructs a clear conclusion on the biological relevance of the observed reduction in oocyst numbers“

      More specific comments to address:

      Line 101/Fig1E (and figure legend) - What is this heatmap showing. It would be helpful to have a sentence or two linking it to a specific methodology. I could not find details in the M+M section and "specialized, high molecular mass gels" does not adequately explain what experiments were performed. The reference to Supplementary Information 1 also did not provide information.

      We added the information “high molecular mass gels with lower acrylamide percentage” to clarify methodology in the text. Furthermore, we extended the figure legend to include all relevant information. Further experimental details can be found in the study cited in this context, where the dataset originates from (Evers et al., 2021).

      Line 115 and Supplementary Figure 2C + D - The main text says that the transgenic parasites contained a mitochondrially localized mScarlet for visualization and localization, but in the supplementary figure 2 it shows mitotracker labelling rather than mScarlet. This is very confusing. The figure legend also mentions both mScarlet and MitoTracker. I assume that mScarlet was used to view in regular IFAs (Fig S2C) and the MitoTracker was used for the expansion microscopy (Fig S2D)?

      Please clarify.

      We thank the reviewer for pointing this out – this was indeed incorrectly annotated. We used the endogenous mito-mScarlet signal in IFA and mitoTracker in U-ExM. The figure annotation has now been corrected.

      Figure 2C - what is the statistical test being used (the methods say "Mean oocysts per midgut and statistical significance were calculated using a generalized linear mixed effect model with a random experiment effect under a negative binomial distribution." but what test is this?)?

      The statistic test is now included in the material and method section with the sentence “The fitted model was used to obtain estimated means and contrasts and were evaluated using Wald Statistics”. The test is now also mentioned in the figure legend.

      Also the choice of a log10 scale for oocyst intensity is an unusual choice - how are the mosquitoes with 0 oocysts being represented on this graph? It looks like they are being plotted at 10^-1 (which would be 0.1 oocysts in a mosquito which would be impossible).

      As the data spans three orders of magnitude with low values being biologically meaningful, we decided that a log scale would best facilitate readability of the graph. As the 0 values are also important to show, we went with a standard approach to handle 0s in log transformed data and substituted the 0s with a small value (0.001). We apologize for not mentioning this transformation in the manuscript. To make this transformation transparent, we added a break at the lower end of the log-scaled y-axis and relabelled the lowest tick as ‘0’. This ensures that mosquitoes with zero oocysts are shown along the x-axis without being assigned an artificial value on the log scale. We would furthermore like to highlight that for statistics we used the true value 0 and not 0.001.

      Figure 2D - it is great that the data from all feeding replicates has been shared, however it is difficult to conclude any meaningful impact in transmission with the knock-out lines when there is so much variation and so few mosquitoes dissected for some datapoints (10 mosquitoes are very small sample sizes). For example, Exp1 shows a clear decrease in mic19- transmission, but then Exp2 does not really show as great effect. Similarly, why does the double knock out have better transmission than the single knockouts? Sure there would be a greater effect?

      We agree with the reviewer and with the new sentence added, as per major point, we hope we clarified the concept. Note that original Figure 2D has been moved to the supplementary information, as per minor comment of another reviewer.

      Figure 3 legend - Please add which statistical test was used and the number of replicates.

      Done

      Figure 4 legend - Please add which statistical test was used and the number of replicates.

      Done. Regarding replicates, note that while we measured over 100 cristae from over 30 mitochondria, these all stem from the same parasite culture.

      Figure 5C - the 3D reconstructions are very nice, but what does the red and yellow coloring show?

      Indeed, the information was missing. We added it to the figure legend.

      Line 352 - "Still, it is striking that, despite the pronounced morphological phenotype, and the possibly high mitochondrial stress levels, the parasites appeared mostly unaffected in life cycle propagation, raising questions about the functional relevance of mitochondria at these stages."

      How do the authors reconcile this statement with the proven fact that mitochondria-targeted antimalarials (such as atovaquone) are very potent inhibitors of parasite mosquito transmission?

      Our original sentence was reductive. What we wanted to state was related to the functional relevance of crista architecture and overall mitochondrial morphology rather than the general functional relevance of the mitochondria. We changed the sentence accordingly.

      Furthermore, even though we do not discuss this in the article, we are aware of mitochondria targeting drugs that are known to block mosquito transmission. We want to point out that it is difficult to discern the disruption of ETC and therefore an impact on energy conversion with the impact on the essential pathway of pyrimidine synthesis, highly relevant in microgamete formation. Still, a recent paper from Sparkes et al. 2024 showed the essentiality of mitochondrial ATP synthesis during gametogenesis so it is very likely that the mitochondrial energy conversion is highly relevant for transmission to the mosquito.

      Reviewer #1 (Significance):

      This manuscript is a novel approach to studying mitochondrial biology and does open a lot of unanswered questions for further research directions. Currently there are limitations in the use of statistical tests and detail of methodology, but these could be easily be addressed with a bit more analysis/better explanation in the text.

      This manuscript could be of interest to readers with a general interest in mitochondrial cell biology and those within the specific field of Plasmodium research.

      My expertise is in Plasmodium cell biology.

      We thank the reviewer for the praise.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Major comments:

      (1) In my opinion, the authors tend to sensationalize or overinterpret their results. The title of the manuscript is very misleading. While MICOS is certainly important for crista formation, it is not the only factor, as ATP synthase dimer rows make a highly significant contribution to crista morphology. Thus, one can argue with equal validity that ATP synthase should be considered the 'architect', as it's the conformation of the dimers and rows modulate positive curvature. Secondly, while cristae are still formed upon mic60/mic19 gene knockout (KO), they are severely deformed, and likely dysfunctional (see below). Thus, I do not agree with the title that MICOS is dispensable for crista formation, because the authors results show that it clearly is essential. So, the title should be changed.

      We thank the reviewer for taking the time to review our manuscript.

      Based on the reviewers’ interpretation we conclude the title does not come across as intended. We have changed the title to: “The role of MICOS in organizing mitochondrial cristae in malaria parasites”

      The Discussion section starting from line 373 also suffers from overinterpretation as well as being repetitive and hard to understand. The authors infer that MICOS stability is compromised less in the single KOs (sKO) in compared to the mic60/mic19 double KO (dKO). MICOS stability was never directly addressed here and the composition of the MICOS complex is unaddressed, so it does not make sense to speculate by such tenuous connections. The data suggest to me that mic60 and mic19 are equally important for crista formation and crista junction (CJ) stabilization, and the dKO has a more severe phenotype than either KO, further demonstrating neither is epistatic.

      We do agree with the reviewer’s notion that we did not address complex stability, and our wording did not make this sufficiently clear. We shortened and rephrased the paragraph in question.

      The following paragraphs (line 387 to 422) continues with such unnecessary overinterpretation to the point that it is confusing and contradictory. Line 387 mentions an 'almost complete loss of CJs' and then line 411 mentions an increase in CJ diameter, both upon Mic60 ablation. I do not think this discussion brings any added value to the manuscript and should be shortened. Yes, maybe there are other putative MICOS subunits that may linger in the KOS that are further destabilized in the dKO, or maybe Mic60 remains in the mic19 KO (and vice versa) to somehow salvage more CJs, which is not possible in the dKO. It is impossible to say with confidence how ATP synthase behaves in the KOs with the current data.

      We shortened this paragraph.

      (2) While the authors went through impressive lengths to detect any effect on lifecycle progression, none was found except for a reduction in oocyte count. However, the authors did not address any direct effect on mitochondria, such as OXPHOS complex assembly, respiration, membrane potential. This seems like a missed opportunity, given the team's previous and very nice work mapping these complexes by complexome profiling. However, I think there are some experiments the authors can still do to address any mitochondrial defects using what they have and not resorting to complexome profiling (although this would be definitive if it is feasible):

      i) Quantification of MitoTracker Red staining in WT and KOs. The authors used this dye to visualize mitochondria to assay their gross morphology, but unfortunately not to assay membrane potential in the mutants. The authors can compare relative intensities of the different mitochondria types they categorized in Fig. 3A in 20-30 cells to determine if membrane potential is affected when the cristae are deformed in the mutants. One would predict they are affected.

      Interesting suggestion. As our staining and imaging conditions are suitable for such analysis (as demonstrated by Sarazin et al., 2025, https://www.biorxiv.org/content/10.1101/2025.11.27.690934v1), we performed the measurements on the same dataset which we collected for Figure 3. We did, however, not detect any difference in mitotracker intensity between the different lines. The result of this analysis is included in the new version of Supplementary figure S6.

      ii) Sporozoites are shown in Fig S5. The authors can use the same set up to track their motion, with the hypothesis that they will be slower in the mutants compared to WT due to less ATP. This assumes that sporozoite mitochondria are active as in gametocytes.

      While theoretically plausible and informative, we currently do not know the relevance of mitochondrial energy conversion for general sporozoite biology or specifically features of sporozoite movement. Given the required resources and time to set this experiment up and the uncertainty whether it is a relevant proxy for mitochondrial functioning, we argue it is out of scope for this manuscript.

      iii) Shotgun proteomics to compare protein levels in mutants compared to WT, with the hypothesis that OXPHOS complex subunits will be destabilized in the mutants with deformed cristae. This could be indirect evidence that OXPHOS assembly is affected, resulting in destabilized subunits that fail to incorporate into their respective complexes.

      While this experiment could potentially further our understanding of the interaction between MICOS and levels of OXPHOS complex subunits we argue that the indirect nature of the evidence does not justify the required investments.

      To expedite resubmission, the authors can restrict the cell lines to WT and the dKO, as the latter has a stronger phenotype that the individual KOs and conclusions from this cell line are valid for overall conclusions about Plasmodium MICOS.

      I will also conclude that complexome/shotgun proteomics may be a useful tool also for identifying other putative MICOS subunits by determining if proteins sharing the same complexome profile as PfMic60 and Mic19 are affected. This would address the overinterpretation problem of point 1.

      (3) I am aware of the authors previous work in which they were not able to detect cristae in ABS, and thus have concluded that these are truly acristate. This can very well be true, or there can be immature cristae forms that evaded detection at the resolution they used in their volumetric EM acquisitions. The mitochondria and gametocyte cristae are pretty small anyway, so it not unreasonable to assume that putative rudimentary cristae in ABS may be even smaller still. Minute levels of sampled complex III and IV plus complex V dimers in ABS that were detected previously by the authors by complexome profiling would argue for the presence of miniscule and/or very few cristae.

      I think that authors should hedge their claim that ABS is acristate by briefly stating that there still is a possibility that miniscule cristae may have been overlooked previously.

      We acknowledge that we cannot demonstrate the absolute absence of any membrane irregularities along the inner mitochondrial membrane. At the same time, if such structures were present, they would be extremely small and unlikely to contain the full set of proteins characteristic of mature cristae. For this reason, we consider it appropriate to classify ABS mitochondria as acristate. To reflect the reviewer’s point while maintaining clarity for readers, we have slightly adjusted our wording in the manuscript, changing ‘fully acristate’ to ‘acristate’.

      This brings me to the claim that Mic19 and Mic60 proteins are not expressed in ABS. This is based on the lack of signal from the epitope tag; a weak signal is detected in gametocytes. Thus, one can counter that Mic19 and Mic60 are also expressed, but below the expression limits of the assay, as the protein exhibits low expression levels when mitochondrial activity is upregulated.

      We agree with the reviewer that the absence of a detectable epitope-tag signal does not definitively exclude low-level expression, and we have therefore replaced the term ‘absent’ with ‘undetectable’ throughout the manuscript. In context with previous findings of low-level transcripts of the proteins in a study by Lopez-Berragan et al. and Otto et al., we also added the sentence “The apparent absence could indicate that transcripts are not translated in ABS or that the proteins’ expression was below detection limits of western blot analysis.” to the discussion. At the same time, we would like to clarify that transcript levels for both genes fall within the <25th percentile, suggesting that these low values likely represent background signal rather than biologically meaningful expression. This interpretation is further supported by proteomic datasets in PlasmoDB, which report PfMIC19 and PfMIC60 expression in gametocyte and mosquito stages, but not in asexual blood stages.”

      To address this point, the authors should determine of mature mic60 and mic19 mRNAs are detected in ABS in comparison to the dKO, which will lack either transcript. RT-qPCR using polyT primers can be employed to detect these transcripts. If the level of these mRNAs are equivalent to dKO in WT ABS, the authors can make a pretty strong case for the absence of cristae in ABS.

      We appreciate the reviewer’s suggestion. As noted in the Discussion, existing transcriptomic datasets already show detectable MIC19 and MIC60 mRNAs in ABS. For this reason, we expect RT-qPCR to reveal low (but not absent) levels of both transcripts, unlike the true loss expected to be observed in the dKO. Because such residual signals have been reported previously and their biological relevance remains uncertain, we do not believe transcript levels alone can serve as a definitive indicator of cristae absence in ABS.

      They should highlight the twin CX9C motifs that are a hallmark of Mic19 and other proteins that undergo oxidative folding via the MIA pathway. Interestingly, the Mia40 oxidoreductase that is central to MIA in yeast and animals, is absent in apicomplexans (DOI: 10.1080/19420889.2015.1094593).

      Searching for the CX9C motifs is a valuable suggestion. In response to the reviewer´s suggestion we analysed the conservation of the motif in PfMIC19 and included this in a new figure panel (Figure 1 F).

      Did the authors try to align Plasmodium Mic19 orthologs with conventional Mic19s? This may reveal some conserved residues within and outside of the CHCH domain.

      In response to this comment we made Figure 1 F, where we show conserved residues within the CHCH domains of a broad range of MIC19 annotated sequences across the opisthokonts, and show that the Cx9C motifs are conserved also in PfMIC19. Outside the CHCH domain, we did not find any meaningful conservation, as PfMIC19 heavily diverges from opisthokont MIC19.

      (5) Statistical significance. Sometimes my eyes see population differences that are considered insignificant by the statistical methods employed by the authors, eg Fig. 4E, mutants compared to WT, especially the dKO. Have the authors considered using other methods such as student t-test for pairwise comparisons?

      The graphs in figures 3, 4 and 5 got a makeover, such that they now are in linear scale and violin plots (also following a suggestion from further down in the reviewer’s comments). We believe that this improves interpretability. ANOVA was kept as statistical testing to assure the correction for multiple comparisons that cannot be performed with standard t-test. A full overview of statistics and exact pvalues can also be found in the newly added supplementary information 2.

      Minor comments:

      Line 33. Anaerobes (eg Giardia) have mitochondria that do produce ATP, unlike aerobic mitochondria

      We acknowledge that producing ATP via OXPHOS is not a characteristic of all mitochondria-like organelles (e.g. mitosomes), which is why these are typically classified separately from canonical mitochondria. When not considering mitochondria-like organelles, energy conversion is the function that the mitochondrion is most well-known for and the one associated with cristae.

      Line 56: Unclear what authors mean by "canonical model of mitochondria"

      To clarify we changed this to “yeast or human” model of mitochondria.

      Lines 75-76: This applies to Mic10 only

      We removed the “high degree of conservation in other cristate eukaryotes” statement.

      Line 80: Cite DOI: 10.1016/j.cub.2020.02.053

      Done

      Fig 2D: I find this table difficult to read. If authors keep table format, at least get rid of 'mean' column' as this data is better depicted in 2C. I suggest depicted this data either like in 3B depicting portion of infected vs unaffected flies in all experiments, then move modified Table to supplement. Important to point out experiment 5 appears to be an outlier with reduced infectivity across all cell lines, including WT.

      To clarify: the mean reported in the table indicates the mean per replicate while the mean reported in figure 2C is the overall mean for a given genotype that corrects for variability within experiments. We agree that moving the table to the supplementary data is a good idea. We decided to not include a graph for infected and non-infected mosquitoes as this information would be partially misleading, highlighting a phenotype we argue to be influenced by the strong variability.

      Fig. 3C-G: I feel like these data repeatedly lead to same conclusions. These are all different ways of showing what is depicted in Fig 2B: mitochondria gross morphology is affected upon ablation of MICOS. I suggest that these graphs be moved to supplement and replaced by the beautiful images.

      Thank you for the nice comment on our images. We have now moved part of the graphs to supplementary figure 6 and only kept the Relative Frequency, Sphericity and total mitochondria volume per cell in the main figure.

      Line 180: Be more specific with which tubulin isoform is used as a male marker and state why this marker was used in supplemental Fig S6.

      We have now specified the exact tubulin isoform used as the male gametocyte marker, both in the main text and in Supplementary Fig. S6. This is a commercial antibody previously known to work as an effective male marker, which is why we selected it for this experiment. This is now clearly stated in the manuscript.

      Line 196 and Fig 3C: the word 'intensities' in this context is very ambiguous. Please choose a different term (puncta, elements, parts?). This is related to major point 2i above.

      To clarify the biological effect that we can conclude form the measurement, we added an explanation about it in the respective section of the results, and we decided to replace the raw results of the plug-in readout with the deduced relative dispersion.

      Line 222: Report male/female crista measurements

      We added Supplementary information 2, which contains exact statistical test and outcomes on all presented quantifications as well as a per-sex statistical analysis of the data from figure 4. Correspondingly, we extended supplementary information 2 by a per-sex colour code for the thin section TEM data.

      Fig. 4B-E: depict data as violin plots or scatter plots like Fig. 2C to get a better grasp of how the crista coverage is distributed. It seems like the data spread is wider in the double KO. This would also solve the problem with the standard deviation extending beyond 0%.

      We changed this accordingly.

      Lines 331-333: Please clarify that this applies for some, but not all MICOS subunits. Please also see major point 1 above. Also, the authors should point out that despite their structural divergence, trypanosomal cryptic mitofilins Mic34 and Mic40 are essential for parasite growth, in contrast to their findings with PfMic60 (DOI: https://doi.org/10.1101/2025.01.31.635831).

      This has been changed accordingly.

      Line 320: incorrect citation. Related to point 1above.

      Correct citation is now included in the text.

      Lines 333-335. This is related to the above. Again, some subunits appear to affect cell growth under lab conditions, and some do not. This and the previous sentence should be rewritten to reflect this.

      This has been changed accordingly.

      Line 343-345: The sentence and citation 45 are strange. Regarding the former, it is about CHCHD10, whose status as a bona fide MICOS subunit is very tenuous, so I would omit this. About the phenomenon observed, I think it makes more sense to write that Mic60 ablation results in partially fragmented mitochondria in yeast (Rabl et al., 2009 J Cell Biol. 185: 1047-63). A fragmented mitochondria is often a physiological response to stress. I would just rewrite as not to imply that mitochondrial fission (or fusion) is impaired in these KOs, or at least this could be one of several possibilities.

      The sentence has been substituted following the indication of the reviewer. Though we still include the data of the human cells as this has also been shown in Stephens et al. 2020.

      Line 373: 'This indicates' is too strong. I would say 'may suggest' as you have no proof that any of the KOs disrupts MICOS. This hypothesis can be tested by other means, but not by penetrance of a phenotype.

      Done

      Line 376-377; 'deplete functionality' does not make sense, especially in the context of talking about MICOS subunit stability. In my opinion, this paragraph overinterprets the KO effects on MICOS stability. None of the experiments address this phenomenon, and thus the authors should not try to interpret their results in this context. See major point 1.

      We removed the sentence. Also, the entire paragraph has been shortened, restructured and wording was changed to address major point 1.

      Other suggestions for added value

      (1) Does Plasmodium Sam50 co-fractionate with Mic60 and Mic19 in BN PAGE (Fig. 1E)

      While we did identify SAMM50 in our BN PAGE, the protein does not co-migrate with the MICOS components but instead comigrates with other components of a putative sorting and assembly machinery (SAM) complex. As SAMM50, the SAM complex and the overarching putative mitochondrial membrane space bridging (MIB) complex are not mentioned in the manuscript, we decided to not include the information in Author response image 1.

      Author response image 1.

      Reviewer #2 (Significance):

      The manuscript by Tassan-Lugrezin is predicated on the idea that Plasmodium represents the only system in which de novo crista formation can be studied. They leverage this system to ask the question whether MICOS is essential for this process. They conclude based on their data that the answer is no, which the authors consider unprecedented. But even if their claim is true that ABS is acristate, this supposed advantage does not really bring any meaningful insight into how MICOS works in Plasmodium.

      First the positives of this manuscript. As has been the case with this research team, the manuscript is very sophisticated in the experimental approaches that are made. The highlights are the beautiful and often conclusive microscopy performed by the authors. Only the localization of Mic60 and Mic19 was inconclusive due to their very low expression unfortunately.

      The examination of the MICOS mutants during in vitro life cycle of Plasmodium falciparum is extremely impressive and yields convincing results. Mitochondrial deformation is tolerated by life cycle stage differentiation, with a modest but significant reduction of oocyte production, being observed.

      However, despite the herculean efforts of the authors, the manuscript as it currently stands represents only a minor advance in our understanding of the evolution of MICOS, which from the title and focus of the manuscript, is the main goal of the authors.

      In its current form, the manuscript reports some potentially important findings:

      (1) Mic60 is verified to play a role in crista formation, as is predicted by its orthology to other characterized Mic60 orthologs.

      (2) The discovery of a novel Mic19 analog (since the authors maintain there is no significant sequence homology), which exhibits a similar (or the same?) complexome profile with Mic60. This protein was upregulated in gametocytes like Mic60 and phenocopies Mic60 KO.

      (3) Both of these MICOS subunits are essential (not dispensable) for proper crista formation

      (4) Surprisingly, neither MICOS subunit is essential for in vitro growth or differentiation from ABS to sexual stages, and from the latter to sporozoites. This says more about the biology of plasmodium itself than anything about the essentiality of Mic60, i.e. plasmodium life cycle progression tolerates defects to mitochondrial morphology. But yes, I agree with the authors that Mic60's apparent insignificance for cell growth in examined conditions does differ with its essentiality in other eukaryotes. But fitness costs were not assayed (e.g. by competition between mutants and WT in infection of mosquitoes)

      (5) Decreased fitness of the mutants is implied by a reduction of oocyte formation.

      While interesting in their own way, collectively they do not represent a major advance in our understanding of MICOS evolution. Furthermore, the findings bifurcate into categories informing MICOS or Plasmodium biology. Both aspects are somewhat underdeveloped in their current form.

      This is unfortunate because there seem to be many missed opportunities in the manuscript that could, with additional experiments, lead to a manuscript with much wider impact. For me, what is remarkable about Plasmodium MICOS that sets it apart from other iterations is the apparent absence of the Mic10 subunit. Purification of plasmodium MICOS via the epitope tagged Mic60 and Mic19 could have verified that MICOS is assembled without this core subunit. Perhaps Mic60 and Mic19 are the vestiges of the complex, and thus operate alone in shaping cristae. Such a reduction may also suggest the declining importance of mitochondria in plasmodium.

      Another missed opportunity was to assay the impact of MICOS-depletion of OXPHOS in plasmodium.

      This is a salient issue as maybe crista morphology is decoupled from OXPHOS capacity in Plasmodium, which links to the apparent tolerance of mitochondrial morphology in cell growth and differentiation. I suggested in section A experiments to address this deficit.

      Finally, the authors could assay fitness costs of MICOS-ablation and associated phenotypes by assaying whether mosquito infectivity is reduced in the mutants when they are directly competing with WT plasmodium. Like the authors, I am also surprised that MICOS mutants can pass population bottlenecks represented by differentiation events. Perhaps the apparent robustness of differentiation may contribute plasmodium's remarkable ability to adapt.

      I realize that the authors put a lot of efforts into their study and again, I am very impressed by the sophistication of the methods employed. Nevertheless, I think there is still better ways to increase the impact of the study aside from overinterpreting the conclusions from the data. But this would require more experiments along the lines I suggest in Section A and here.

      We thank the reviewer for their extensive analysis of the significance of our findings, including the compliments on our microscopy images and the sophisticated experimental approaches. We hope we have convincingly argued why we could or could not include some of the additional analyses suggested by the reviewer in section 1 above.

      With regard to the significance statement, we want to point out that our finding that PfMICOS is not needed for initial formation of cristae (as opposed to organization thereof), is a confirmation of something that has been assumed by the field, without being the actual focus of studies. We argue that the distinction between formation and organization of cristae is important and deserves some attention within the manuscript. The result of MICOS not being involved in the initial formation of cristae, we argue to be relevant in Plasmodium biology and beyond. As for the insights into how MICOS works in Plasmodium we have confirmed that the previously annotated PfMIC60 is indeed involved in the organization of cristae. Furthermore, we have identified and characterized PfMIC19. These findings, we argue, are indeed meaningful insights into PfMICOS.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      MICOS is a conserved mitochondrial protein complex responsible for organising the mitochondrial inner membrane and the maintenance of cristae junctions. This study sheds first light on the role of two MICOS subunits (Mic60 and the newly annotated Mic19) in the malaria parasite Plasmodium falciparum, which forms cristae de novo during sexual development, as demonstrated by EM of thin section and electron tomography. By generating knockout lines (including a double knockout), the authors demonstrate that knockout of both MICOS subunits leads to defects in cristae morphology and a partial loss of cristae junctions. With a formidable set of parasitological assays, the authors show that despite the metabolically important role of mitochondria for gametocytes, the knockout lines can progress through the life stages and form sporozoites, albeit with diminished infection efficiency.

      We thank the reviewer for their time and compliment.

      Major comments:

      (1) The authors should improve to present their findings in the right context, in particular by:

      i) giving a clearer description in the introduction of what is already known about the role of MICOS. This starts in the introduction, where one main finding is missing: loss of MICOS leads to loss of cristae junctions and the detachment of cristae membranes, which are nevertheless formed, but become membrane vesicles. This needs to be clearly stated in the introduction to allow the reader to understand the consistency of the authors' findings in P. falciparum with previous reports in the literature.

      We extended the introduction to include this information.

      iii) at the end to the introduction, the motivating hypothesis is formulated ad hoc "conclusive evidence about its involvement in the initial formation of cristae is still lacking" (line 83). If there is evidence in the literature that MICOS is strictly required for cristae formation in any organism, then this should be explained, because the bona fide role of MICOS is maintenance of cristae junctions (the hypothesis is still plausible and its testing important).

      To clarify we rephrased the sentence to: “Although MICOS has been described as an organizer of crista junctions, its role during the initial formation of nascent cristae has not been investigated.”

      (2) Line 96-97: "Interestingly, PfMIC60 is much larger than the human MICOS counterpart, with a large, poorly predicted N-terminal extension." This statement is lacking a reference and presumably refers to annotated ORFs. The authors should clarify if the true N-terminus is definitely known - a 120kDa size is shown for the P. falciparum but this is not compared to the expected length or the size in S. cerevisiae.

      To solve the reference issue, we added the uniprot IDs we compared to see that the annotated ORF is bigger in Plasmodium. We also changed the comparison to yeast instead of human, because we realized it is confusing to compare to yeast all throughout the figure, but then talk about human in this specific sentence.

      Regarding whether the true N-terminus is known. Short answer: No, not exactly.

      However, we do know that the Pf version is about double the size of the yeast protein.

      As the reviewer correctly states, we show the size of 120kDa for the tagged protein in Figure 1G. Considering that we tagged the protein C-terminally, and observed a 120kDa product on western blot, it is safe to conclude that the true N-terminus does not deviate massively from the annotated ORF, and hence, that there is a considerable extension of the protein beyond a 60kDa protein. We do not directly compare to yeast MIC60 on our western blots, however, that comparison can be drawn from literature: Tarasenko et al., 2017 showed that purified MIC60 running at ~60kDa on SDS-PAGE actively bends membranes, suggesting that in its active form, the monomer of yeast MIC60 is indeed 60kDa in size.

      To clarify, we now emphasize that we ran the Alphafold prediction on the annotated open reading frame (annotated and sequenced by Bohme et al. and Chapell et al. now cited in the manuscript), and revised the wording to make clear what we are comparing in which sentence.

      (3) lines 244-245: "Furthermore, our data indicates the effect size increases with simultaneous ablation of both proteins?". The authors should explain which data they are referring to, as some of the data in Fig 3 and 4 look similar and all significance tests relate to the wild type, not between the different mutants, so it is not clear if any overserved differences are significant. The authors repeat this claim in the discussion in lines 368-369 without referring to a specific significance test. This needs to be clarified.

      As a reply to this and other comments from the reviewers we added the multiple testing within all samples. In addition, to clarify statistics used we included a supplementary dataset with all p-values and statistical tests used.

      (4) lines 304-306: "Though well established as the cristae organizing system, the role of MICOS in initial formation of cristae remains hidden in model organisms that constitutively display cristae.". This sentence is misleading since even in organisms that display numerous cristae throughout their life cycle, new cristae are being formed as the cells proliferate. Thus, failure to produce cristae in MICOS knockout lines would have been observable but has apparently not been reported in the literature. Thus, the concerted process in P. falciparum makes it a great model organism, but not fundamentally different to what has been studied before in other organisms.

      We deleted this statement.

      (5) lines 373-378. "where ablation of just MIC60 is sufficient to deplete functionality of the entire MICOS (11, 15),". The authors' claim appears to be contrary to what is actually stated in ref 15, which they cite:

      "MICOS subunits have non-redundant functions as the absence of both MICOS subcomplexes results in more severe morphological and respiratory growth defects than deletion of single MICOS subunits or subcomplexes."

      This seems in line with what the authors show, rather than "different".

      This sentence has been removed.

      (6) lines 380-385: "... thus suggesting that membrane invaginations still arise, but are not properly arranged in these knockout lines. This suggests that MICOS either isn't fully depleted,...". These conclusions are incompatible with findings from ref. 15, which the authors cite. In that study, the authors generated a ∆MICOS line which still forms membrane invaginations, showing that MICOS is not required at all for this process in yeast. Hence the authors' implication that MICOS needs to be fully depleted before membrane invaginations cease to occur is not supported by the literature.

      This sentence has been deleted in the revised version of the manuscript.

      Minor comments:

      (1) The authors should consider if the first part of their title could be seen as misleading: It suggests that MICOS is "the architect" in cristae formation, but this is not consistent with the literature nor their own findings.

      Title is changed accordingly

      - Line 43, of the three seminal papers describing the discovery of MICOS in 2011, the authors only cite two (refs 6 and 7), but miss the third paper, Hoppins et al, PMID: 21987634, which should probably be corrected.

      Done, the paper is now cited

      - Page 2, line 58: for a more complete picture the authors should also cite the work of others here which shows that although at very low levels, e.g. complex III (a drug target) and ATP synthase do assemble (Nina et al, 2011, JBC).

      Done

      - Page 3, line 80: "Irrespective of the shape of an organism's cristae, the crista junctions have been described as tubular channels that connect the cristae membrane to the inner boundary membrane (22, 24)." This omits the slit-shaped cristae junctions found in yeast (Davies et al, 2011, PNAS), which the authors should include.

      The paper and concept have been added to the manuscript, though the sentence has been moved up in the introduction, when crista junctions are first introduced.

      - Line 97: "poorly predicted N-terminal extension", as there is no experimental structure, we don't know if the prediction is poor. Presumably the authors mean either poorly ordered or the absence of secondary structure elements, or the poor confidence score for that region in the prediction? This should be clarified or corrected.

      We were referring to the poor confidence score. To address this comment as well as major point 2, we rewrote the respective paragraph. It now clearly states that confidence of the prediction is low, and we mention the tool that was used to identify conserved domains (Topology-based Evolutionary Domains).

      - Line 98: "an antiparallel array of ten β-sheets". They are actually two parallel beta-sheets stacked together. The authors could find out the name of this fold, but the confidence of the prediction is marked a low/very low. So, its existence is unknown, not just its "function".

      We adapted the domain description to “a stack of two parallel beta-sheets" and replaced the statement on unknown function by the statement “Because this domain is predicted solely from computational analysis, both its actual existence in the native protein and its biological function remain unknown.”

      - Fig 1B: The authors show two alphafold predictions of S. cerevisiae and P. falciparum Mic60 structures. There is however an experimental Mic60/19 (fragment) structure from the former organism (PMID: 36044574), which should be included if possible.

      We appreciate the reviewer’s suggestion and note that the available structural data indeed provides valuable insight into how MIC60 and MIC19 interact. However, these structures represent fusion constructs of limited protein fragments and therefore capture only a small portion of each protein, specifically the interaction interface. Because our aim in Fig. 1B is to compare the overall domain architecture of the full-length proteins, we believe that including fragment-based structures would be less informative in this context.

      - Line: 318-321: "The same trend was observed for PfMIC19 and PfMIC60. Although transcriptomic data suggested that low-level transcripts of PfMIC19 and PfMIC60 are present in ABS (38), we did not detect either of the proteins in ABS by western blot analysis. While this statement is true, the authors should comment on the sensitivity of the respective methods - how well was the antibody working in their hands and how do they interpret the absence of a WB band compared to transcriptomics data?

      The HA antibody used in our experiments is a standard commercial reagent that performs reliably in both WB and IFA, although it shows a low background signal in gametocytes. We agree that the sensitivity of the method and the interpretation of weak or absent bands should be addressed explicitly. Transcript levels for both PfMIC19 and PfMIC60 in asexual blood stages fall within the <25 percentile, suggesting that these signals likely represent background. Nevertheless, we acknowledge that low-level protein expression below the detection limit of western blot analysis cannot be excluded. To reflect these considerations, we added the sentence: ‘The apparent absence could indicate that transcripts are not translated in ABS or that the proteins’ expression was below detection limits of western blot analysis.

      - Lines 322-323: would the authors not typically have expected an IFA signal given the strength of the band in Western blot? If possible, the authors should comment if the negative fluorescence outcome can indeed be explained with the low abundance or if technical challenges are an equally good explanation.

      Considering the nature of the investigated proteins (embedded in the IMM and spread throughout the mitochondria) difficulties in achieving a clear signal in IFA or U-ExM are not very surprizing. While epitopes may remain buried in IFA, U-ExM usually increases accessibility for the antibodies. However, U-ExM comes at the cost of being prone to dotty background signals, therefore potentially hiding low abundance, naturally dotty signals such as the signal of MICOS proteins that localize to distinct foci (at the CJ) along the mitochondrion. Current literature suggests that, in both human and yeast, STED is the preferred method for accurate spatial resolution of MICOS proteins (https://www.ncbi.nlm.nih.gov/pubmed/32567732,https://www.ncbi.nlm.nih.gov/pubmed/3206734 4). Unfortunately, we do not have experience with, nor access to, this particular technique/method.

      - Lines 357-365: the authors describe limitations of the applied methods adequately. Perhaps it would be helpful to make a similar statement about the analysis of 3D objects like mitochondria and cristae from 2D sections. E.g. the apparent cristae length depends on whether cristae are straight (e.g. coiled structures do not display long cross sections despite their true length in 3D).

      The limitations of other methods are described in the respective results section.

      We added a clarifying sentence in the results section of Figure 4:

      “Note that such measurements do not indicate the true total length or width of cristae, as the data is two-dimensional. The recorded values are to be considered indicative of possible trends, rather than absolute dimensions of cristae.“

      This statement refers to the length/width measurements of cristae.

      In the context of Figure 4D we mention the following (see preprint lines 229 – 230): “We expect this effect to translate into the third dimension and thus conclude that the mean crista volume increases with the loss of either PfMIC19, PfMIC60, or both.”

      For Figure 5, we included a clarifying statement in the results section of the preprint (lines 269 – 273): “Note that these mitochondrial volumes are not full mitochondria, but large segments thereof. As a result of the incompleteness of the mitochondria within the section, and the tomography specific artefact of the missing wedge, we were unable to confirm whether cristae were in fact fully detached from the boundary membrane, or just too long to fit within the observable z-range.”

      - Line 404: perhaps undetected or similar would be a better description than "hidden"?

      The sentence does not exist in the revised manuscript.

      Reviewer #3 (Significance):

      The main strength of the study is that it provides the first characterisation of the MICOS complex in P. falciparum, a human parasite in which the mitochondrion has been shown to be a drug target. Mic60 and the newly annotated Mic19 are confirmed to be essential for proper cristae formation and morphology, as well as overall mitochondrial morphology. Furthermore, the mutant lines are characterised for their ability to complete the parasite life cycle and defects in infection effectivity are observed. This work is an important first step for deciphering the role of MICOS in the malaria parasite and the composition and function of this complex in this organism. The limitation of the study stems from what is already known about MICOS and its subunits in great detail in yeast and humans with similar findings regarding loss of cristae and cristae defects. The findings of this study do not provide dramatic new insight on MICOS function or go substantially beyond the vast existing literature in terms of the extent of the study, which focuses on parasitological assays and morphological analysis. Exploring the role of MICOS in an early-divergent organism and human parasite is however important given the divergence found in mitochondrial biology and P. falciparum is a uniquely suited model system. One aspect that would increase the impact of the paper would be if the authors could mechanistically link the observed morphological defects to the decreased infection efficiency, e.g. by probing effects on mitochondrial function. This will likely be challenging as the morphological defects are diverse and the fitness defects appear moderate/mild.

      As suggested by Reviewer 2, we examined mitochondrial membrane potential in gametocytes using MitoTracker staining and did not observe any obvious differences associated with the morphological defects. At present, additional assays to probe mitochondrial function in P. falciparum gametocytes are not sufficiently established, and developing and validating such methods would require substantial work before they could be applied to our mutant lines. For these reasons, a more detailed mechanistic link between the observed morphological changes and the reduced infection efficiency is currently beyond reach.

      The advance presented in this study is to pioneer the study of MICOS in P. falciparum, thus widening our understanding of the role of this complex to different model organism. This study will likely be mainly of interest for specialised audiences such as basic research parasitologists and mitochondrial biologists. My own field of expertise is mitochondrial biology and structural biology.

    1. eLife Assessment

      This valuable study used genetic and pharmacological manipulations of insulin/IGF signaling to address the role of insulin/IGF axis in the function of renal glomerular podocyte. Solid data are presented to demonstrate that co-inhibition of insulin/IGF signaling in podocytes led to aberrant splicing of mRNAs, which could contribute to the loss of podocytes in vitro and in vivo in mice. In light of the fact that IR/IGF-1R signaling are critically required for normal development and growth in multiple cells and organs, the lack of the assessment of developmental phenotype of podocytes in the mouse model limits the interpretation of the data.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the role of the insulin receptor and the insulin growth factor receptor was investigated in podocytes. Mice, where both receptors were deleted, developed glomerular dysfunction and developed proteinuria and glomerulrosclerosis over several months. Because of concerns about incomplete KO, the authors generated and studied podocyte cell lines where both receptors were deleted. Loss of both receptors was highly deleterious with greater than 50% cell death. To elucidate the mechanism of cell death, the authors performed global proteomics and found that spliceosome proteins were downregulated. They confirmed this directly by using long-read sequencing. These results suggest a novel role for insulin and IGF1R signaling in RNA splicing in podocytes.

      This is primarily a descriptive study and no technical concerns are raised. The mechanism of how insulin and IGF1 signaling regulates splicing is not directly addressed but implicates potentially the phosphorylation downstream of these receptors. In the revised manuscript, it is shown that the mouse KO is incomplete potentially explaining the slow onset of renal insufficiency. Direct measurement of GFR and serial serum creatinines might also enhance our understanding of progression of disease, proteinuria is a strong sign of renal injury. An attempt to rescue the phenotype by overexpression of SF3B4 would also be useful but may be masked by defects in other spliceosome genes. As insulin and IGF are regulators of metabolism, some assessment of metabolic parameters would be an optional add-on.

      Significance:

      With the GLP1 agonists providing renal protection, there is great interest in understanding the role of insulin and other incretins in kidney cell biology. It is already known that Insulin and IGFR signaling play important roles in other cells of the kidney. So, there is great interest in understanding these pathways in podocytes. The major advance is that these two pathways appear to have a role in RNA metabolism.

      Latest comments:

      The new reviewer raised two major points, whether the KO effect on splicing is specific to IGF1 and whether the interpretation could be developmental rather than due to splicing. The reviewer raises some important issues but the evidence to suggest that this is specific is data in the literature that IR/IGF signaling is already known to regulate splicing and that splicing defects were not detected in other models that they have analyzed. I agree with the reviewer (and authors) that the incomplete floxing of the genes is a major complication. The point that there could be a developmental defect with mice being born with fewer podocytes and perhaps the authors should caveat this point. The fact that they mice are born with normal function, that renal function can be maintained with up to 80% loss of podocytes suggest that they are likely born with a good number of podocytes and the dysfunction that occurs at 6 months is due to a process, induced by the loss of IR/IGF signaling that is detrimental to the podocyte.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, submitted to Review Commons (journal agnostic), Coward and colleagues report on the role of insulin/IGF axis in podocyte gene transcription. They knocked out both the insulin and IGFR1 mice. Dual KO mice manifested a severe phenotype, with albuminuria, glomerulosclerosis, renal failure and death at 4-24 weeks.

      Long read RNA sequencing was used to assess splicing events. Podocyte transcripts manifesting intron retention were identified. Dual knock-out podocytes manifested more transcripts with intron retention (18%) compared wild-type controls (18%), with an overlap between experiments of ~30%.

      Transcript productivity was also assessed using FLAIR-mark-intron-retention software. Intron retention w seen in 18% of ciDKO podocyte transcripts compared to 14% of wild-type podocyte transcripts (P=0.004), with an overlap between experiments of ~30% (indicating the variability of results with this method). Interestingly, ciDKO podocytes showed downregulation of proteins involved in spliceosome function and RNA processing, as suggested by LC/MS and confirmed by Western blot.

      Pladienolide (a spliceosome inhibitor) was cytotoxic to HeLa cells and to mouse podocytes but no toxicity was seen in murine glomerular endothelial cells.

      The manuscript is generally clear and well-written. Mouse work was approved in advance. The four figures are generally well-designed, bars/superimposed dot-plots.

      Methods are generally well described.

      Comments on previous version:

      Coward and colleagues have done an excellent job of responding to all the reviewer comments.

    4. Reviewer #4 (Public review):

      Summary and background:

      This report entitled "The insulin/IGF axis is critically important (for) controlling gene transcription in the podocyte" from Hurcombe et al is based on a mouse double knockdown of the IR and IGF1R and a parallel cultured mouse podocyte model. Insulin/IGF signaling system in mammals evolved as three gene reduplicated peptides (insulin, IGF-1, and IGF-2) and their two receptors IR and IGF1R that cross-react to variable extents with the peptides, are ubiquitously expressed, and signal through parallel pathways. The major downstream effect of insulin is to regulate glucose uptake and metabolism, while that of the IGF pathways is to regulate growth and cell cycling in part through mTORC1. The GH-IGF-1-IGF1R pathway regulates post-natal growth. IGF-2 signaling is thought to play a major role in regulating intrauterine growth and development, although IGF-2 is also present at high levels in post-natal life. Thus, one would anticipate that reducing IR/IGF1R signaling in any cell would slow growth and cell cycling by reducing growth factor and metabolic mTORC1-mediated and other processes including the splicing of RNA for protein synthesis.

      Comments on revised version:

      The second sentence of the Summary reads "This study sought to elucidate the compound role of the insulin/IGF1 axis in podocytes using transgenic mice and cell culture models deficient in both receptors." The study design and rationale for the proteosome analysis described is predicated on the finding that podocyte-specific knockdown of the IR/IGF-1R in mice is associated with development of proteinuria and reduced eGFR by 20months of life. Since the IR/IGF-1R are critically required for normal development and growth of all cells and organs, the obvious explanation for the observation would be that the model system results in defective podocyte development and deployment (caused by reduced IR/IGF-1) that, in turn, causes subsequent development of proteinuria and glomerulosclerosis (that may be much less dependent on a normal level of IR/IGF-1R expression). Thus, the experimental design does not allow a distinction between podocyte development and steady state function which are different biologic processes. The data provided does not examine podocyte status immediately after birth to confirm that podocyte number and size and structure is normal in mice that subsequently develop proteinuria and glomerulosclerosis. The response to the reviewer suggests that since this would require additional mice it has not been undertaken in order to reduce animal usage. This is not a valid argument, particularly when the investigators have not even used state-of-the-art methods to measure podocyte number, size and density in adult mice, key parameters that would be required to interpret their data. Counting podocyte nuclear number in glomerular cross-sections is simply an inadequate method, even if it is used and reported in other journals, and particularly where the examples given to justify its use can hardly be viewed as representing first rate science.

      If the absence of studies that would answer the above questions, the investigators should add a sentence to the Discussion dealing with study limitations as follows. "The study design does not allow us to determine whether the primary effect of reduced IR/IGF-1R expression on the phenotype is during in utero and post-natal podocyte development and deployment, during periods of rapid growth when IGF-1 levels are highest, in steady state adult podocytes, or under all of the above conditions".

    5. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the role of the insulin receptor and the insulin growth factor receptor was investigated in podocytes. Mice, where both receptors were deleted, developed glomerular dysfunction and developed proteinuria and glomerulrosclerosis over several months. Because of concerns about incomplete KO, the authors generated and studied podocyte cell lines where both receptors were deleted. Loss of both receptors was highly deleterious with greater than 50% cell death. To elucidate the mechanism of cell death, the authors performed global proteomics and found that spliceosome proteins were downregulated. They confirmed this directly by using long-read sequencing. These results suggest a novel role for insulin and IGF1R signaling in RNA splicing in podocytes.

      This is primarily a descriptive study and no technical concerns are raised. The mechanism of how insulin and IGF1 signaling regulates splicing is not directly addressed but implicates potentially the phosphorylation downstream of these receptors. In the revised manuscript, it is shown that the mouse KO is incomplete potentially explaining the slow onset of renal insufficiency. Direct measurement of GFR and serial serum creatinines might also enhance our understanding of progression of disease, proteinuria is a strong sign of renal injury. An attempt to rescue the phenotype by overexpression of SF3B4 would also be useful but may be masked by defects in other spliceosome genes. As insulin and IGF are regulators of metabolism, some assessment of metabolic parameters would be an optional add-on.

      Significance:

      With the GLP1 agonists providing renal protection, there is great interest in understanding the role of insulin and other incretins in kidney cell biology. It is already known that Insulin and IGFR signaling play important roles in other cells of the kidney. So, there is great interest in understanding these pathways in podocytes. The major advance is that these two pathways appear to have a role in RNA metabolism.

      Comments on revised version:

      I'm satisfied with the revised manuscript and the responses to my previous concerns.

      Thank you.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, submitted to Review Commons (journal agnostic), Coward and colleagues report on the role of insulin/IGF axis in podocyte gene transcription. They knocked out both the insulin and IGFR1 mice. Dual KO mice manifested a severe phenotype, with albuminuria, glomerulosclerosis, renal failure and death at 4-24 weeks.

      Long read RNA sequencing was used to assess splicing events. Podocyte transcripts manifesting intron retention were identified. Dual knock-out podocytes manifested more transcripts with intron retention (18%) compared wild-type controls (18%), with an overlap between experiments of ~30%.

      Transcript productivity was also assessed using FLAIR-mark-intron-retention software. Intron retention w seen in 18% of ciDKO podocyte transcripts compared to 14% of wild-type podocyte transcripts (P=0.004), with an overlap between experiments of ~30% (indicating the variability of results with this method). Interestingly, ciDKO podocytes showed downregulation of proteins involved in spliceosome function and RNA processing, as suggested by LC/MS and confirmed by Western blot.

      Pladienolide (a spliceosome inhibitor) was cytotoxic to HeLa cells and to mouse podocytes but no toxicity was seen in murine glomerular endothelial cells.

      The manuscript is generally clear and well-written. Mouse work was approved in advance. The four figures are generally well-designed, bars/superimposed dot-plots.

      Methods are generally well described.

      Comments on revised version:

      Coward and colleagues have done an excellent job of responding to all the reviewer comments.

      Thank you.

      Reviewer #4 (Public review):

      Summary and background:

      This report entitled "The insulin/IGF axis is critically important (for) controlling gene transcription in the podocyte" from Hurcombe et al is based on a mouse double knockdown of the IR and IGF1R and a parallel cultured mouse podocyte model. Insulin/IGF signaling system in mammals evolved as three gene reduplicated peptides (insulin, IGF-1, and IGF-2) and their two receptors IR and IGF1R that cross-react to variable extents with the peptides, are ubiquitously expressed, and signal through parallel pathways. The major downstream effect of insulin is to regulate glucose uptake and metabolism, while that of the IGF pathways is to regulate growth and cell cycling in part through mTORC1. The GH-IGF-1-IGF1R pathway regulates post-natal growth. IGF-2 signaling is thought to play a major role in regulating intrauterine growth and development, although IGF-2 is also present at high levels in post-natal life. Thus, one would anticipate that reducing IR/IGF1R signaling in any cell would slow growth and cell cycling by reducing growth factor and metabolic mTORC1-mediated and other processes including the splicing of RNA for protein synthesis.

      Thank you for this new extra review and assessing our paper with new suggestions (we addressed the previous suggestions to the satisfaction of other reviewers). Of note -regarding this introduction – the podocyte is a terminally differentiated cell and may have unique responses to insulin / IGF as it is accepted it does not generally proliferate (hence we consider understanding the actions of insulin / IGF and their receptors to be of interest). Indeed, we have recently shown a contrasting effect of IGF signalling in the podocyte. Partial suppression of the IGF1 receptor is beneficial in contrast to near complete suppression that results in mitochondrial dysfunction (PMID:38706850).

      Mouse IR/IGF1R double knockdown model:

      A double knockdown mouse model was generated by interbreeding mice with different genetic backgrounds carrying floxed sites for IR and IGF-1R to produce mixed background offspring with both floxed IR and IGF-1R genes. These mice were crossed so that the podocin promoter driven-Cre (that comes on at about embryonic day 12 bas podocytes are developing) would delete IR and IGF-1R genes. Since podocin is believed to be an absolutely podocyte-specific protein, this podocin promoter this is predicted to specifically knock down the IR and IGF1R genes only in podocytes. The weight and growth of double KO offspring was not different from controls, but some proportion of the double knockdown mice subsequently developed proteinuria by 6 months and 20% died, although no specific data is provided to identify the cause of the deaths since eGFR was not decreased. Surviving mice were evaluated at 6 months of age. The efficacy of knockdown was not demonstrated in the mouse model itself, although a temperature-sensitive cell line developed from these double knockdown mice showed that expression of IR and IGF-1R proteins in the Cre-treated cell line were both reduced by about 50% (no statistical analysis of this result provided).

      In the knockout mice, proteinuria was significantly increased by 6 months, but not at earlier time points. Histologic analysis showed proteinaceous casts, glomerulosclerosis and interstitial fibrosis. Podocyte number was stated to be reduced by about 30% in double knockdown mice, although the method by which this was evaluated seems to have been by counting WT1 positive nuclei in glomerular cross-sections, an approach that is well-known not to be a reliable way of assessing true podocyte number. No information is provided about podocyte size, density or glomerular volume.

      Comment: If IR/IGF1R deletion plays a significant role in normal podocyte function sufficient to cause proteinuria and glomerulosclerosis then the effect of reduced IR and IGF1R protein expression on podocyte function would have been expected to produce a phenotype before 6 months. A more likely scenario to explain the overall result is that deleting the IR and IGF1R genes at about embryonic day12 impacted podocyte development to a variable extent such that some mice developed fewer podocytes per glomerulus than other mice. As mice grow and their glomeruli and glomerular capillary area increases, those mice with fewer podocytes would not be able to completely cover the filtration surface with foot processes and would develop proteinuria and glomerulosclerosis. If reduced podocyte number per glomerulus is the proximate cause of the observed proteinuria, then modulation of the body and kidney growth rate by calorie restriction to slow growth (lower circulating IGF-1 levels) would be expected to be protective, while a high protein high calorie diet (higher circulating IGF-1 levels) or uni-nephrectomy to increase kidney growth rate would be expected to enhance proteinuria and glomerulosclerosis.

      Thank you for these comments. In response to them:

      (1) WT1 as a marker of podocyte number. We agree may not be the most accurate way of precisely measuring podocyte number but is widely accepted in the field (PMID:33655004 / PMID:38542564) and we think convincingly shows fewer podocytes at 6-months.

      (2) Podocyte size and density was not measured. This was not the focus of the paper and the histology obviously showed a significant phenotype in several mice (Figs 1D-F). Of note we did objectively assess a glomeruloscleorosis index (Fig 1D). We took the approach to understand mechanism through non-biased proteomics and phospho-proteomics of conditionally immortalised podocytes in which we had convincingly knocked down the insulin and IGF1 receptors (Figure 2)

      (3) You did not study the mice earlier to ascertain the developmental phenotype. We concede we did not do this but there was no significant proteinuria detected early in the mice so elected not to increase mouse numbers by studying them then (which we consider good practice for reduction, replacement and refinement). We suspect there would have been subtle changes in those mice that had significantly reduced simultaneous IR and IGF1R knockdown. It was precisely because of this that we generated a conditionally immortalised podocyte cell line with robust simultaneous knock-down of both receptors.

      (4) You did not show significant insulin and IGF1 receptor knockdown in the conditionally immortalised cell line (reviewer states it was 50%). We clearly knocked both receptors down (insulin and IGF1R) in the podocyte line by >80% which was highly statistically significant (p<0.00001). Figure 2A. We agree this was crucial (and we made the cell line because of the variability in the mouse model).

      The model as used may be more representative of a variable degree of podocyte depletion than an effect of impaired IR/IGF1R signaling. Therefore, although the phenotype may be ultimately attributable to the IR/IGF1R gene deletions the proteinuria and glomerulosclerotic phenotype itself was probably a consequence of defective podocyte development. Examining podocyte number, size, density and glomerular volume at earlier time points (4 weeks) would help to answer this question. Therefore, a more appropriate title would be "The insulin/IGF axis is critically important (for) normal podocyte development and deployment". In this context the effect of the knockdowns on splicing would make more sense.

      Please see our response (above). We think our final conclusion that in the podocyte the insulin/IGF axis is important for spliceosome activity and control is valid. This is due to our findings (both total and phospho proteomics results) and considering recent other papers showing this axis can rapidly phosphorylate a variety of spliceosome proteins in different cell types (PMID:39939313 / PMID:32888406). All discussed in detail in the manuscript).

      Cell culture studies. A cell line was generated using a temperature sensitive SV40 system that has been previously reported from this laboratory. A detailed analysis is provided to show that double knockout cells exhibited abnormal spliceosome activity. This forms the basis for the conclusion that "The insulin/IGF axis is critically important (for) controlling gene transcription in the podocyte". There are several concerns that weaken this conclusion.

      (1) In the double knockdown cell culture system about 30% of cells were "lost" by 3 days and about 70% of cells were "lost" by 5days. The studies were done at the 3 day time point. It is not clear whether "lost" cells were in the process of dying, stress-induced detachment, or just growing more slowly than control due to reduced IR and IGF-1R signaling. These processes could have impacted splicing in a non-specific way independent of IR/IGF1R signaling itself.

      (2) Can a single cell line derived from the double floxed mice be relied on to provide an unbiased picture of the effect of deleting IR and IGF-1R? Presumably, the transfection and selection process will select for cells that survive thereby including unknown biases, possibly related to spliceosome function. Is a single cell line adequate? These investigators have extensive experience with this type of analysis, but this question is not addressed in the discussion.

      (3) To determine whether the effect is specific to reduced IR/IGFR signaling the deletion of IR and IGF-1R could be corrected by transfecting full length IR and IGF-1R cDNAs into the cells to restore normal IR/IGF1R signaling. If transfected cells with intact IR and IGF-1R expression and activity returns spliceosome activity to normal this would be evidence that receptors themselves play some role in spliceosome activity, as opposed to the downstream effect on growth limitation/stress on the cells.

      (4) Other ways of testing whether the splicing effect is specifically due to reduced IR/IGF-1R signaling would be to (a) block IR and IGF1R receptors using available inhibitors, (b) remove or reduce insulin, IGF-1 and IGF-2 levels in the culture medium, (c) use low glucose and amino acid culture medium to slow growth rate independent of receptor function, (d) or block intra-cellular signaling via the IR and IGF-1R receptors through mTORC1 inhibition using rapamycin or other signaling targets.

      (5) It would be useful to determine whether the cultured cells stressed in other ways (e.g. ischemia, toxins, etc.) also results in the same splicing abnormalities.

      Point 1. 70% cell loss was observed at day 7 (not day 5). We found approximately 20% loss at day 3. We opted to go for this early date hypothesising the key detrimental processes would be clear then. This 3 day time point also ensures there has been enough time to allow for the expression of Cre recombinase, receptor gene excision and degradation of existing endogenous IR/IGF1R following lentiviral transduction. Interestingly we did not find a major “death or apoptosis” signal in our data then but agree it should be considered. We think this is a specific pathway as we have examined several other conditionally immortalised detrimental podocyte cell line previously using proteomics with a much more severe phenotype of cell death (E.g. podocyte GSK3 alpha/beta knockdown) and we detected NO spliceosome signal (PMID:30679422). Furthermore, there are now other podocyte proteomics “stress” studies that have been published in which there is proteinuria and significant cell loss / death that also do not show spliceosome dysfunction. These include studying the detailed proteosomal signature of podocytes stressed with Doxorubicin and Lipopolysaccharide endotoxin LPS in mice (PMID:32047005) and bradykinin stimulation of rat podocytes (PMID:32518694).

      Point 2. Yes, we think it is valuable and reproducible. We generated a podocyte cell line from insulin receptor and IGF1 receptor homozygous floxed cells. Hence there is no selection bias in the cells when generating the line as both receptors are effectively intact. We then temporally “knocked down” the receptors with extrinsic lentiviral Cre.

      Importantly we validated our cell line findings both back in the cells (with Western blotting) and in our transgenic receptor knockdown mice and found evidence of spliceosomal dysregulation (Figure 3E and 3F). Also as discussed above the spliceosome has been identified in other models in the insulin/IGF pathway.

      Point 3. We don’t think the experiment of knocking down the receptors and then reconstituting them would prove this hypothesis. This is because if splicing abnormality was due to generalised cell dysfunction (which we do not think is the case in this situation) then putting the receptors back may simply restore cell health and the spliceosomal function (e.g. it does not prove it is via the receptors). Secondly, the process of transduction with multiple lentiviruses may be inherently stressful to the cell and there may be a high level of extrinsic receptor inserted which may also be confounding/detrimental. Finally, as discussed there are now several lines of evidence describing insulin / IGF signalling to spliceosomal proteins which we consider important (discussed in the paper in detail).

      Point 4. We think modulating the receptors using the Cre-lox approach is the cleanest approach (with fewer off-target effects) to interrogate the insulin / IGF axis. It allows us to differentiate the cells by thermo-switching (which is crucial for this terminally differentiated cell) and then robustly knocking down both receptors simultaneously to investigate mechanism. We agree these supplementary approaches may give some extra information if their limitations (eg off target effects of inhibitors) are also taken into consideration.

      Point 5. They do not. Please see response to point 1 above regarding GSK3, Doxorubicin, LPS and bradykinin challenge.

    1. eLife Assessment

      This study presents a valuable finding relating to how the state of arousal is represented within the superior colliculus (SC), a principal visuo-oculomotor structure. The main conclusion that the SC's neural representation of arousal is segregated from motor related output appears to have solid support by the data. The work will be of interest to sensory, motor and cognitive neuroscientists.

    2. Reviewer #1 (Public review):

      Summary:

      Johnston and Smith used linear electrode arrays to record from small populations of neurons in the superior colliculus (SC) of monkeys performing a memory-guided saccade (MGS) task. Dimensionality reduction (PCA) was used to reveal low-dimensional subspaces of population activity reflecting the slow drift of neuronal signals during the delay period across a recording session (similar to what they reported for parts of cortex: Cowley et al., 2020). This SC drift was correlated with a similar slow-drift subspace recorded from the prefrontal cortex, and both slow-drift subspaces tended to be associated with changes in arousal (pupil size). These relationships were driven primarily by neurons in superficial layers of the SC, where saccade sensitivity/selectivity is typically reduced. Accordingly, delay-period modulations of both spiking activity and pupil size were independent of saccade-related activity, which was most prevalent in deeper layers of the SC. The authors suggest that these findings provide evidence of a separation of arousal- and motor-related signals. The analysis techniques expand upon the group's previous work and provides useful insight into the power of large-scale neural recordings paired with dimensionality reduction. This is particularly important with the advent of recording technologies which allow for the measurement of spiking activity across hundreds of neurons simultaneously. Together, these results provide a useful framework for comparing how different populations encode signals related to cognition, arousal, and motor output in potentially different subspaces.

      Comments on revised manuscript:

      The authors have done a very good job of responding to all of the reviewers' concerns.

    3. Reviewer #2 (Public review):

      Summary:

      Neurons in motor-related areas have increasingly shown to carry also other, non-motoric signals. This creates a problem of avoidance of interference between the motor and non-motor-related signals. This is a significant problem that likely affects many brain areas. The specific example studied here is interference between saccade-related activity and slow-changing arousal signals in the superior colliculus. The authors identify neuronal activity related to saccades and arousal. Identifying saccade-related activity is straightforward, but arousal-related activity is harder to identify. The authors first identify a potential neuronal correlate of arousal using PCA to identifying a component in the population activity corresponding to slow drift over the recording session. Next, they link this component to arousal by showing that the component is present across different brain areas (SC and PFC), and that it is correlated with pupil size, an external marker of arousal. Having identified an arousal-related component in SC, the authors show next that SC neurons with strong motor-related activity are less strongly affected by this arousal component (both SC and PFC). Lastly, they show that SC population activity pattern related to saccades and pupil size form orthogonal subspaces in the SC population.

      Strengths:

      A great strength of this research is the clear description of the problem, its relationship with the performed analysis and the interpretation of the results. The paper is very well written and easy to follow.

      An additional strength is the use of fairly sophisticated analysis using population activity.

      Weaknesses:

      (1) The greatest weakness in the present research is the fact that arousal is a functionally less important non-motoric variable. The authors themself introduce the problem with a discussion of attention, which is without any doubt the most important cognitive process that needs to be functionally isolated from oculomotor processes. Given this introduction, one cannot help but wonder, why the authors did not design an experiment, in which spatial attention and oculomotor control are differentiated. Absent such an experiment, the authors should spend more time on explaining the importance of arousal and how it could interfere with oculomotor behavior.

      (2) In this context, it is particularly puzzling that one actually would expect effects of arousal on oculomotor behavior. Specifically, saccade reaction time, accuracy, and speed could be influenced by arousal. The authors should include an analysis of such effects. They should also discuss the absence or presence of such effects and how they affect their other results.

      (3) The authors use the analysis shown in Figure 6D to argue that across recording sessions the activity components capturing variance in pupil size and saccade tuning are uncorrelated. however, the distribution (green) seems to be non-uniform with a peak at very low and very high correlation, specifically. The authors should test if such an interpretation is correct. If yes, where are the low and high correlations respectively? Are there potentially two functional areas in SC?

      Comments on the first revision:

      My main concern with the paper is really two-fold. First, I think it is only incremental and adds next to no useful information about the SC. That might not be a fair criticism and certainly is purely subjective, but it affects the standards that eLife has on significance thresholds for papers. As such, this is an issue the editors should talk about.

      Second, my main concern with the substance of the paper is that the authors jump immediately into an analysis of the 'arousal-related' effects on SC activity. Before that, I would like to see some behavioral indicators of arousal, such as RT differences, pupil size (the talk about this), or accuracy. The authors first need to describe the objective behavioral indicators of the level of arousal. Using these indices, they need to establish that there are meaningful differences in the level of arousal across the recording session. Having done so, they can proceed to link changes in SC activity with levels of arousal.

      Instead, in its current form, the authors find changes in SC activity and describe them immediately as 'arousal-related'. I hope it is clear why that is premature. The 'slow-drift' fluctuations are presumed to be related to arousal, but they could be meaningless random fluctuations, or related to some other cognitive process.

      Other than this conceptual issue, I do not have major problems with the analysis per se.

      Comments on the latest version:

      They have constructively responded to my concerns. I think 'incomplete' should be replaced with 'solidly supported'.

    4. Reviewer #3 (Public review):

      Summary:

      This study looked at slow changes in neuronal activity (on the order of minutes to hours) in the superior colliculus (SC) and prefrontal cortex (PFC) of two monkeys. They found that SC activity shows slow drift in neuronal activity like in the cortex. They then computed a motor index in SC neurons. By definition, this index is low if the neuron has stronger visual responses than motor response, and it is high if the neuron has weaker visual responses and stronger motor responses. The authors found that the slow drift in neuronal activity was more prevalent in the low motor index SC neurons and less prevalent in the high motor index neurons. In addition, the authors measured pupil diameter and found it to correlate with slow drifts in neuronal activity, but only in the neurons with lower motor index of the SC. They concluded that arousal signals affecting slow drifts in neuronal modulations are brain-wide. They also concluded that these signals are not present in the deepest SC layers, and they interpreted this to mean that this minimizes the impact of arousal on unwanted eye movements.

      Strengths:

      The paper is clear and well-written.

      Showing slow drifts in the SC activity is important to demonstrate that cortical slow drifts could be brain-wide.

      Weaknesses:

      The authors find that the SC cells with the low motor index are modulated by pupil diameter. However, this could be independent of an "arousal signal". These cells have substantial visual sensitivity. If the pupil diameter changes, then their activity should be influenced since the monkey is watching a luminous display. So, in this regard, the fact that they do not see "an arousal signal" in the most motor neurons (through the pupil diameter analyses) is not evidence that the arousal signal is filtered out from the motor neurons. It could simply be that these neurons simply do not get affected by the pupil diameter because they do not have visual sensitivity.

      Comments on revisions:

      The authors have given due consideration to the possibility that SC signaling of arousal could be at least in part due to changes in pupil size related responses to ambient light. Discussion of this point in the most recent revision helps to mitigate this concern.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Johnston and Smith used linear electrode arrays to record from small populations of neurons in the superior colliculus (SC) of monkeys performing a memory-guided saccade (MGS) task. Dimensionality reduction (PCA) was used to reveal low-dimensional subspaces of population activity reflecting the slow drift of neuronal signals during the delay period across a recording session (similar to what they reported for parts of cortex: Cowley et al., 2020). This SC drift was correlated with a similar slow-drift subspace recorded from the prefrontal cortex, and both slow-drift subspaces tended to be associated with changes in arousal (pupil size). These relationships were driven primarily by neurons in superficial layers of the SC, where saccade sensitivity/selectivity is typically reduced. Accordingly, delay-period modulations of both spiking activity and pupil size were independent of saccade-related activity, which was most prevalent in deeper layers of the SC. The authors suggest that these findings provide evidence of a separation of arousal- and motor-related signals. The analysis techniques expand upon the group's previous work and provides useful insight into the power of large-scale neural recordings paired with dimensionality reduction. This is particularly important with the advent of recording technologies which allow for the measurement of spiking activity across hundreds of neurons simultaneously. Together, these results provide a useful framework for comparing how different populations encode signals related to cognition, arousal, and motor output in potentially different subspaces.

      Comments on revised manuscript:

      The authors have done a very good job of responding to all of the reviewers' concerns.

      No weaknesses to address.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) The greatest weakness in the present research is the fact that arousal is a functionally less important non-motoric variable. The authors themself introduce the problem with a discussion of attention, which is without any doubt the most important cognitive process that needs to be functionally isolated from oculomotor processes. Given this introduction, one cannot help but wonder, why the authors did not design an experiment, in which spatial attention and oculomotor control are differentiated. Absent such an experiment, the authors should spend more time on explaining the importance of arousal and how it could interfere with oculomotor behavior.

      (2) In this context, it is particularly puzzling that one actually would expect effects of arousal on oculomotor behavior. Specifically, saccade reaction time, accuracy, and speed could be influenced by arousal. The authors should include an analysis of such effects. They should also discuss the absence or presence of such effects and how they affect their other results.

      (3) The authors use the analysis shown in Figure 6D to argue that across recording sessions the activity components capturing variance in pupil size and saccade tuning are uncorrelated. however, the distribution (green) seems to be non-uniform with a peak at very low and very high correlation specifically. The authors should test if such an interpretation is correct. If yes, where are the low and high correlations respectively? Are there potentially two functional areas in SC?

      Comments on revised manuscript:

      I remain somewhat concerned that the authors jump immediately into an analysis of the 'arousal-related' effects on SC activity. Before that, I would like to see a more detailed discussion justifying the use pupil size alone (i.e., w/o other indicators such as RT) as indicative of fluctuations in general arousal that are causal to concomitant changes in SC activity. Instead, in its current form, the authors find changes in SC activity and describe them immediately as 'arousal-related'.

      Other than this conceptual issue, I do not have major problems with the analysis per se.

      We agree with the reviewer that we may have advanced into discussing arousal-related effects in the previous version of the manuscript without providing a thorough explanation for why we think the slow drift axis is associated with changes in the monkey’s arousal levels. Arousal has been linked to the size of the pupil as well as movements of the eyes in numerous previous studies. We have made the following changes in the revised manuscript to address the reviewer’s concern:

      (1) When first describing how the spiking responses of SC neurons fluctuate over the course of a recording session (Lines 130-132), we have used the phrase "slow fluctuations in the spiking responses" rather than "arousal-related fluctuations in the spiking responses". Then, when describing these effects in more detail (Lines 136-147), we have explained why we think these fluctuations may be related to arousal. The following text has been added in the revised manuscript for clarification:

      “We found that this low-dimensional pattern of activity in the SC was also correlated with pupil size in the present study and with simultaneously recorded data in the prefrontal cortex (PFC), pointing to a link between this brain-wide fluctuation and changes in the monkeys’ arousal levels while performing the task.” (Lines 136-147)

      (2) We have changed the subheading in Line 183 of the revised manuscript from "Arousal-related fluctuations are present in the SC and correlated with pupil size and fluctuations in PFC activity" to "Slow fluctuations in SC spiking activity are correlated with pupil size and PFC activity". Given that we have not yet explained the results linking these fluctuations to arousal at this stage of the manuscript, we believe that this revised title is more accurate and avoids jumping too quickly to arousal-related fluctuations without first explaining the link between SC slow drift, pupil size and PFC activity.

      (3) We have provided additional justification for using pupil size and PFC activity to assess whether SC slow drift is associated with changes in the monkeys’ arousal levels. In a previous study, we computed an identical slow drift axis for spiking responses in visual cortex (V4) and PFC, and investigated how these low-dimensional neural activity patterns, which were themselves strongly correlated, were associated with various eye-related metrics (e.g., pupil size, microsaccade rate, reaction time, saccade velocity). Results showed that pupil size was the strongest predictor of slow drift in V4 and PFC. Given that the eye metrics were also strongly correlated with each other, we believe that the observed relationship between SC slow drift, pupil size and PFC activity provides sufficient evidence to suggest that the fluctuations observed in the SC are arousal-related. The following text has been added to the Results section of the revised manuscript:

      “Moreover, previous work in our laboratory computed a similar slow-drift axis using spiking activity in visual cortex (V4) and PFC, and investigated the relationship between these low-dimensional neural activity patterns and different eye-related metrics (e.g., pupil size, microsaccade rate, reaction time, saccade velocity). In addition to observing a strong correlation between V4 and PFC slow drift, we found that, relative to the other eye-related metrics, pupil size was the strongest predictor of these fluctuations (Johnston et al., 2022a). Thus, to further confirm the link between the SC slow drift axis and changes in the monkeys’ arousal levels while they performed the MGS task, we next sought to explore if projections onto the SC slow drift axis were associated with pupil size.” (Lines 236-344)

      Reviewer #3 (Public review):

      Summary:

      This study looked at slow changes in neuronal activity (on the order of minutes to hours) in the superior colliculus (SC) and prefrontal cortex (PFC) of two monkeys. They found that SC activity shows slow drift in neuronal activity like in the cortex. They then computed a motor index in SC neurons. By definition, this index is low if the neuron has stronger visual responses than motor response, and it is low if the neuron has weaker visual responses and stronger motor responses. The authors found that the slow drift in neuronal activity was more prevalent in the low motor index SC neurons and less prevalent in the high motor index neurons. In addition, the authors measured pupil diameter and found it to correlate with slow drifts in neuronal activity, but only in the neurons with lower motor index of the SC. They concluded that arousal signals affecting slow drifts in neuronal modulations are brain-wide. They also concluded that these signals are not present in the deepest SC layers, and they interpreted this to mean that this minimizes the impact of arousal on unwanted eye movements.

      Strengths:

      The paper is clear and well-written.

      Showing slow drifts in the SC activity is important to demonstrate that cortical slow drifts could be brain-wide.

      Weaknesses:

      The authors find that the SC cells with the low motor index are modulated by pupil diameter. However, this could be completely independent of an "arousal signal". These cells have substantial visual sensitivity. If the pupil diameter changes, then their activity should be influenced since the monkey is watching a luminous display. So, in this regard, the fact that they do not see "an arousal signal" in the most motor neurons (through the pupil diameter analyses) is not evidence that the arousal signal is filtered out from the motor neurons. It could simply be that these neurons simply do not get affected by the pupil diameter because they do not have visual sensitivity. So, even with the pupil data, it is still a bit tricky for me to interpret that arousal signals are excluded from the "output layers" of the SC.

      Of course, the general conclusion is that the motor neurons will not have the arousal signal. It's just the interpretation that is different in the sense that the lack of the arousal signal is due to a lack of visual sensitivity in the motor neurons.

      I think that it is important to consider the alternative caveat of different amounts of light entering the system. Changes in light level caused by pupil diameter variations can be quite large. Please also note that I do not mean the luminance transient associated with the target onset. I mean the luminance of the gray display. it is a source of light. if the pupil diameter changes, then the amount of light entering to the visually sensitive neurons also changes.

      Comments on revised manuscript:

      The authors have addressed my first primary comment. For the light comment, I'm still not sure they addressed it. At the very least, they should explicitly state the possibility that the amount of light entering from the gray background can matter greatly, and it is not resolved by simply changing the analysis interval to the baseline pre-stimulus epoch. I provide more clear details below:

      In line 194 of the redlined version of the article (in the Introduction), the citation to Baumann et al., PNAS, 2023 is missing near the citation of Jagadisan and Gandhi, 2022. Besides replicating Jagadisan and Gandhi, 2022, this other study actually showed that the subspaces for the visual and motor epochs are orthogonal to each other

      We thank the reviewer for this comment and apologize that the citation to Baumann et al., PNAS, 2023 was missing in the previous version of the manuscript. In addition to including this citation in the revised version, we have provided a much more comprehensive description of all three cited studies and clarified that, in addition to replicating the results of Jagadisan and Gandhi, Baumann et al., PNAS, 2023 showed that the subspaces for the visual and motor epochs are orthogonal to each other. The following lines have been added to the Introduction of the revised manuscript:

      “A similar separation has been observed for visual and motor responses in the SC (Jagadisan and Gandhi, 2022; Ayar et al., 2023; Baumann et al., 2023). For example, Jagadisan and Gandhi (2022) used linear microelectrode arrays to investigate why early eye movements are not triggered when neuronal responses to a visual target, presented before a delayed saccade to that target, cross a threshold. They found that population activity in the SC was less stable during the visual epoch of a delayed saccade task, relative to the saccade epoch. Moreover, saccades could be evoked more easily by patterned microstimulation when the temporal structure of the microstimulation was stable across electrodes, providing a potential explanation for how downstream regions differentiate between visual and motor responses. Similar results were reported by Baumann et al. (2023) who found that the strength of SC motor responses during a saccade to a visual image depends on the features of that image (e.g., contrast, orientation). When dimensionality reduction was applied to the spiking responses of neuronal populations in the SC, the population trajectory during the initial visual response to the image was orthogonal to that during the motor response. These findings replicate the separation in temporal population structure reported by Jagadisan and Gandhi (2022) and support the results of Ayar et al. (2023). They found that, although not completely orthogonal, population activity in the SC is distinct for visual and motor responses during the same oculomotor task and across different tasks, which could further facilitate the decoding of signals related to sensation, action and context by downstream regions.” (Lines 110-127)

      Line 683 (and around) of the redlined version of the article (in the Results): I'm very confused here. When I mentioned visual modulation by changed pupil diameter, I did not mean the transient changes associated with the brief onset of the cue in the memory-guided saccade task. I meant the gray background of the display itself. This is a strong source of light. If the pupil diameter changes across trials, then the amount of light entering the eye also changes from the gray background. Thus, visually-responsive neurons will have different amount of light driving them. This will also happen in the baseline interval containing only a fixation spot. The arguments made by the authors here do not address this point at all. So, please modify the text to explicitly state the possibility that the global luminance of the display (as filtered by the pupil diameter) alters the amount of light driving the visually-responsive neurons and could contribute to the higher effects seen in the more visual neurons.

      We apologize that our analysis did not fully address the reviewer’s concern that the presence of fluctuations in visual neurons and their absence in motor neurons may have arisen indirectly due to changes in the amount of light entering the eye caused by changes in pupil size. As per the reviewer’s suggestion, we have now raised the possibility that visual neurons in the SC may have firing rates that are monotonically related to slow trends in overall luminance induced by pupil size changes, whereas motor neurons do not. Although we believe this to be an unlikely explanation, the paragraph from lines 374-398 has been modified to better describe this possibility, including the following text:

      “Given that slow drift is found in traditionally defined visual areas (e.g., area V4) and in regions that show mixed selectivity for multiple task variables (e.g., PFC) (Cowley et al., 2020), it seems unlikely that slow drift is caused by luminance fluctuations alone and more likely that it reflects global changes in arousal. At the same time, these arousal-related fluctuations covary with changes in pupil size (Johnston et al., 2022a), which could modulate the amount of light entering the eye from the display. This might affect visual neurons but not motor neurons due to their lack of visual sensitivity. Because SC neurons exist on a continuum, with visual responses decreasing and motor responses increasing from the intermediate to deep layers (Massot et al., 2019; Heusser et al., 2022) and no clear categorical boundary for motor-only neurons, any readout strategy would still need to avoid corruption of the motor output by slow drift, even if it were caused by changes in the amount of light entering the eye.” (Lines 387-398)

      The figures (everywhere, including the responses to reviewers) are very low resolution and all equations in methods are missing.

      We thank the reviewer for bringing this to our attention. We believe this issue may have arisen during conversion of the manuscript file for review, as the figures were of sufficient quality and the equations visible in the version that appeared online (https://doi.org/10.7554/eLife.99278.2). In any case, we will ensure that high-resolution figures are submitted with the revised manuscript and apologize that they were low resolution in the previous version.

      I'm very confused by Fig. 2 - supplement 2. Panel B shows a firing rate burst aligned to *microsaccade* onset. Does that mean you were in the foveal SC? i.e. how can neurons have a motor burst to the target of the memory-guided saccade and also for microsaccades? And which microsaccade directions caused such a burst? And what does it mean to compute the motor index and spike count for microsaccades in panel C? if you were in the proper SC location for the saccade target, then shouldn't you *not* get any microsaccade-related burst at all? This is very confusing to me and needs to be clarified

      We agree that clarification is needed here and thank the reviewer for their comment. The eccentricity of the targets was set to match the endpoints of the evoked saccades, which for some sessions were relatively close to the fovea. The mean eccentricity of the targets across sessions was 4.52° (SD = 2.89°). These values are now reported in the Methods section of the revised manuscript (Line 637). For the neuron shown in Figure 2–figure supplement 2, the eccentricity of the targets was 3°. Previous research has shown that some SC neurons respond during microsaccades as well as slightly larger saccades (see Hafed & Krauzlis, 2012, J. Neurophysiol., Fig. 4B). This likely explains why the neuron shown in Figure 2–figure supplement 2, which had a receptive field at ~3° based on saccades evoked by microstimulation, also responded during microsaccades. We apologize that this was not explained in the previous version and agree that it could have been confusing for the reader. To address this, the legend for this supplementary figure has been edited in the revised version and now reads:

      “(B) PSTH for an SC neuron that responded around the time of a microsaccade. Firing rates were computed in 1ms bins, averaged across trials and smoothed using a Gaussian function (σ = 5ms). Note that the targets were set to 3º in this session based on saccades evoked by microstimulation (see Methods). Previous research has shown that some SC neurons respond during microsaccades as well as to slightly larger saccades (Hafed and Krauzlis, 2012). This likely explains why this SC neuron, which had a RF at ~3º based on saccades evoked by microstimulation, also responded around the time of a microsaccade.” (Lines 1026-1031)

    1. eLife Assessment

      This study explores how exogenous attention operates at the finest spatial scale of vision, within the foveola - a topic that has not been previously explored but is of interest to visual neuroscientists. The question is important for understanding how attention shapes perception, and how it differs between the periphery and the central regions of highest visual acuity. The evidence indicating that attention near the fovea preferentially enhances low spatial frequencies is compelling, as shown by carefully designed experiments with state-of-the-art eye tracking to monitor attended locations just a few tens of minutes of arc away from the fixation target.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the weaknesses noted above, which were raised in the previous round of review.]

      Summary:

      The manuscript investigates how exogenous attention modulates spatial frequency sensitivity within the foveola. Using high-precision eye-tracking and gaze-contingent stimulus control, the authors show that exogenous attention selectively improves contrast sensitivity for low- to mid-range spatial frequencies (4-8 cycles/degree), but not for higher frequencies (12-20 CPD). In contrast, improvements in asymptotic performance at the highest contrast levels occur across all spatial frequencies. These results suggest that, even within the foveola, exogenous attention operates through a mechanism similar to that observed in peripheral vision, preferentially enhancing lower spatial frequencies.

      Strengths:

      The study shows strong methodological rigor. Eye position was carefully controlled, and the stimulus generation and calibration were highly precise. The authors also situate their work well within the existing literature, providing a clear rationale for examining the fine-grained effects of exogenous attention within the foveola. The combination of high spatial precision, gaze-contingent presentation, and detailed modeling makes this a valuable technical contribution.

      Weaknesses:

      The manipulation of attention raises some interpretive concerns. Clarifying this issue, together with additional detail about statistics, participant profiles, other methodological elements, and further discussion in relation to oculomotor control in general, could broaden the impact of the findings.

    3. Reviewer #2 (Public review):

      Summary:

      This study aims to test whether foveal and non-foveal vision share the same mechanisms for endogenous attention. Specifically, they aim to test whether they can replicate at the foveola previous results regarding the effects of exogenous attention for different spatial frequencies.

      Strengths:

      Monitoring the exact place where the gaze is located at this scale requires very precise eye-tracking methods and accurate and stable calibration. This study uses state-of-the-art methods to achieve this goal. The study builds on many other studies that show similarities between foveal vision and non-foveal vision, adding more data supporting this parallel.

      Weaknesses:

      The study lacks a discussion of the strength of the effect and how it relates to previous studies done away from the fovea. It would be valuable to know if not just the range of frequencies, but the size of the effect is also comparable.

    4. Reviewer #3 (Public review):

      Summary:

      This paper explores how spatial attention affects foveal information processing across different spatial frequencies. The results indicate that exogenously directed attention enhances contrast sensitivity for low- to mid-range spatial frequencies (4-8 CPD), with no significant benefits for higher spatial frequencies (12-20 CPD). However, asymptotic performance increased as a result of spatial attention independently of spatial frequency.

      Strengths:

      The strengths of this article lie in its methodological approach, which combines a psychophysical experiment with precise control over the information presented in the foveola.

      Weaknesses:

      The authors acknowledge that they used the standard approach of analyzing observer-averaged data, but recognize that this method has limitations: it ignores the uncertainty associated with parameter estimates and the relationships between different parameters of the psychometric model. This may affect the interpretation of attentional effects. In the future, mixed-effects models at the trial level could overcome these limitations.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study explores how exogenous attention operates at the finest spatial scale of vision, within the foveola - a topic that has not been previously explored. The question is important for understanding how attention shapes perception, and how it differs between the periphery and the central regions of highest visual acuity. The evidence is compelling, as shown by carefully designed experiments with state-of-the-art eye tracking to monitor attended locations just a few tens of minutes of arc away from the fixation target, but additional clarification regarding analyses and implications for vision and oculomotor control would broaden the impact of the study.

      We thank the editors and reviewers for their thorough evaluation of our work. We have carefully revised the manuscript and substantially reworked the Discussion to address all of the points raised, eliminate redundancies, streamline the text, and clarify the implications of our findings for vision and oculomotor control. We have also expanded the documentation of our power analyses and conducted the additional analyses requested by the reviewers. Our point-by-point responses are provided.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript investigates how exogenous attention modulates spatial frequency sensitivity within the foveola. Using high-precision eye-tracking and gaze-contingent stimulus control, the authors show that exogenous attention selectively improves contrast sensitivity for low- to midrange spatial frequencies (4-8 cycles/degree), but not for higher frequencies (12-20 CPD). In contrast, improvements in asymptotic performance at the highest contrast levels occur across all spatial frequencies. These results suggest that, even within the foveola, exogenous attention operates through a mechanism similar to that observed in peripheral vision, preferentially enhancing lower spatial frequencies.

      Strengths:

      The study shows strong methodological rigor. Eye position was carefully controlled, and the stimulus generation and calibration were highly precise. The authors also situate their work well within the existing literature, providing a clear rationale for examining the fine-grained effects of exogenous attention within the foveola. The combination of high spatial precision, gazecontingent presentation, and detailed modeling makes this a valuable technical contribution.

      Weaknesses:

      The manipulation of attention raises some interpretive concerns. Clarifying this issue, together with additional detail about statistics, participant profiles, other methodological elements, and further discussion in relation to oculomotor control in general, could broaden the impact of the findings.

      We thank the reviewer for the helpful comments. In the Discussion, we have now considered additional factors that could have contributed to the observed attentional effects. First, the exogenous cue might have functioned as a temporal warning signal. However, the interval between cue and stimulus onset was fixed across trials, meaning that the cue did not provide temporal information beyond what participants could already anticipate. Furthermore, participants completed a large number of trials (≥ 4000), making it highly likely that the temporal relationship between trial onset and target onset was overlearned. These considerations indicate that the observed benefit in the valid condition was predominantly attributable to spatial reorienting induced by the cue, rather than to differences in the temporal predictability of the target across conditions.

      Another possibility is that the 100% validity of the exogenous cue could potentially have promoted endogenous attentional engagement. Yet, several characteristics of our task strongly limited the extent to which such endogenous engagement could meaningfully influence performance. Endogenous attentional benefits typically emerge only after ~150-200 ms (Posner & Petersen, 1990; Carrasco, 2011), whereas our cue-target SOA was 100 ms, and the target remained visible for only 50 ms. Under these temporal constraints, any voluntary, slow endogenous enhancement would primarily occur after the stimulus offset. Thus, although endogenous maintenance is theoretically possible given the cue’s validity, it is unlikely to have substantially contributed to the observed attentional benefits in our task.

      Regarding the points on statistical reporting and participant details, we followed the reviewer’s suggestions by adding post hoc power analyses and providing more comprehensive reporting of the linear model outputs (see Appendices 1 and 2). We also expanded the description of the training procedures conducted with participants prior to formal data collection in the Methods section.

      We appreciate the reviewer for raising the important question of how our findings may relate to oculomotor control. To address this, we analyzed trials excluded from the manuscript due to saccades. This analysis revealed that saccade latencies were shorter in the valid condition than in the neutral condition (see Figure 2 — Supplementary Figure 2). This earlier saccade onset may reflect exogenously triggered preparatory activity in the oculomotor system in response to the salient cue. Future studies are needed to examine whether this preparatory mechanism serves to efficiently guide microsaccades or saccades toward behaviorally relevant stimuli in everyday vision. We have incorporated this point into the Discussion, highlighting a potential mechanistic link between exogenous attention and oculomotor behavior.

      Reviewer #2 (Public review):

      Summary:

      This study aims to test whether foveal and non-foveal vision share the same mechanisms for endogenous attention. Specifically, they aim to test whether they can replicate at the foveola previous results regarding the effects of exogenous attention for different spatial frequencies.

      Strengths:

      Monitoring the exact place where the gaze is located at this scale requires very precise eyetracking methods and accurate and stable calibration. This study uses state-of-the-art methods to achieve this goal. The study builds on many other studies that show similarities between foveal vision and non-foveal vision, adding more data supporting this parallel.

      Weaknesses:

      The study lacks a discussion of the strength of the effect and how it relates to previous studies done away from the fovea. It would be valuable to know if not just the range of frequencies, but the size of the effect is also comparable.

      We thank the reviewer for raising these important issues. In response, we have expanded the Discussion to link our findings to prior work. First, we included a direct comparison of our effect sizes with those reported in previous studies. This analysis revealed that our effect sizes are highly comparable to those earlier studies (see Figure 3 — Supplementary Figure 4). Second, we contextualized our findings within the popular framework of normalization model of attention in the Discussion. We detected a mixture of contrast and response gain effects, consistent with predictions from the normalization framework given our experimental design. Finally, we extended the Discussion to consider potential underlying neural mechanisms. Specifically, we suggested that differences in attentional modulation, particularly the manifestation in response gain vs. contrast gain between the fovea and extrafovea, may reflect distinct characteristics of foveal neurons relative to those in extrafoveal regions.

      Reviewer #3 (Public review):

      Summary:

      This paper explores how spatial attention affects foveal information processing across different spatial frequencies. The results indicate that exogenously directed attention enhances contrast sensitivity for low- to mid-range spatial frequencies (4-8 CPD), with no significant benefits for higher spatial frequencies (12-20 CPD). However, asymptotic performance increased as a result of spatial attention independently of spatial frequency.

      Strengths:

      The strengths of this article lie in its methodological approach, which combines a psychophysical experiment with precise control over the information presented in the foveola.

      Weaknesses:

      The authors acknowledge that they used the standard approach of analyzing observeraveraged data, but recognize that this method has limitations: it ignores the uncertainty associated with parameter estimates and the relationships between different parameters of the psychometric model. This may affect the interpretation of attentional effects. In the future, mixed-effects models at the trial level could overcome these limitations.

      We thank the reviewer for this comment. Our Methods section continues to transparently discuss these limitations, as well as the fact that these limitations are shared with most published studies in psychophysics. Additionally, we now include measures of uncertainty for all key effects (see Appendices 1 and 2), and we have reported effect sizes throughout the Results section. Finally, we have added post hoc power analyses to the Methods. Following previous approaches to power calculation for related experiments, we found that our study was sufficiently powered to detect the main effect of attention and had moderate power to detect the interaction between attention and spatial frequency.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The manipulation of attention raises some interpretive concerns. Since only valid and neutral cue conditions were included, the results might reflect differences in temporal predictability rather than true spatial reorienting of attention. In other words, the valid cue could act mainly as a temporal warning signal that reduces uncertainty about stimulus onset. Without invalid trials or a non-predictive control cue, it remains difficult to separate spatial and temporal contributions to exogenous attention.

      We thank the reviewer for raising this point. In this regard, we would like to clarify that there was no temporal uncertainty in stimulus onset: across all conditions and trial types, the stimulus was presented at the same time relative to the start of the trial, i.e., 600 ms after the start. Yet, we acknowledge that the shorter temporal proximity between the cue and stimulus in valid trials could serve as an additional temporal warning signal, potentially conferring an advantage relative to the neutral condition. While we cannot completely rule out a contribution of such temporal cueing within the constraints of the current experimental design, we believe its impact was limited. Specifically, the fixed cue-stimulus interval reduced the cue’s ability to convey additional temporal information. Furthermore, observers completed a large number of trials (≥4000), and the temporal contingency between trial onset and target onset was likely overlearned. Taken together, these considerations indicate that the observed benefit in the valid condition was predominantly attributable to spatial reorienting induced by the cue, rather than to differences in the temporal predictability of the target across conditions. We now mention this in the revised Discussion (lines 309-318).

      We recognized that the original Figure 2 illustrating the experimental paradigm may have caused confusion regarding the timing structure of the task. We have therefore updated the figure to more explicitly illustrate the trial timeline in both conditions.

      (2) The reported effects seem small, and no power analysis is provided. With only seven participants, the study may not have enough statistical power to confirm that the observed differences are reliable or generalizable. Although the technical precision in gaze and stimulus control is impressive, it cannot offset the limitations of a small sample. The authors should include effect size estimates, confidence intervals, and ideally a post-hoc power analysis.

      The statistical results are reported only as χ² values from model comparisons, which do not show the direction or size of the effects. For clarity and transparency, these tests should be accompanied by fixed-effect estimates with their standard errors and confidence intervals, so readers can better assess both the reliability and perceptual relevance of the findings.

      The reviewer raised several important points regarding the study's statistical rigor.

      In the revised manuscript, we now report effect size estimates (Cohen’s d) in the Results section and Appendices. Effect sizes were in the medium-to-large range, including the effect of attention on contrast sensitivity at 4 and 8 CPD, and the difference in attentional benefit on contrast sensitivity between 4 and 12 CPD and between 8 and 12 CPD. We have also included the full model outputs, including standard errors and confidence intervals, in the Appendices.

      The sample size for the current study was determined based on the magnitude of the attentional effects observed in our previous work (Guzhang et al., 2021). The experimental design and dependent measures were highly similar across the two studies, and the prior study revealed a robust effect, which accounted for a substantial proportion of within-observer variance in a tightly controlled repeated-measures design.

      We have revised the manuscript, adding bootstrap-based power estimates, following the procedure described by Jigo and Carrasco (2020), using data from Guzhang et al. (2021). Assuming the effect size in our current study would be comparable to the prior one, 2 to 12 observers were randomly sampled with replacement, and a one-way repeated-measures ANOVA with attention as the main factor was used. This procedure was repeated 10,000 times, and power was estimated as the proportion of iterations yielding a significant main effect for each sample size. The results of this analysis indicate that a sample size of five observers would have been sufficient to achieve approximately 80% power to detect the main effect of attention in the prior study. Based on these estimates, the sample size used in the current study (seven observers) is adequately powered.

      We also conducted a post hoc power analysis to evaluate the power of our design to detect the main effects and their interaction. It was performed using the R package simr, which estimates statistical power for mixed-effects models through model-based simulation. Specifically, simr generated datasets based on the fixed- and random-effect structure of the fitted model, preserving the observed effect sizes and variance components. For each simulated dataset, the model was refit, and the effect of interest was tested. By repeating this procedure 501 times across different sample sizes, power was estimated as the proportion of simulations in which the effect was statistically significant. Based on these post hoc simulations, we estimated that our study had high power (>95%) to detect the main effects and moderate power (>65%) to detect the interaction. Although the estimated power for the interaction was lower than for the main effects, the observed effect size was substantial (as indexed by Cohen’s d), indicating that the interaction was not trivially small.

      We now describe these analyses in lines 501-532 in the Methods section.

      (3) The task seems quite demanding, requiring fine spatial discrimination, very small stimuli, and head stabilization with a bite bar. It is not clear whether participants were naïve or experienced observers. If they had prior psychophysical training, practice effects could have influenced the results, particularly given the lack of invalid trials. The manuscript would benefit from clarifying participants' experience level and describing any training or familiarization procedures.

      We appreciate the reviewer’s concern regarding potential training effects. All observers had prior experience with similar tasks, but were naïve to the scope of this study. Each participant underwent an initial familiarization phase of approximately 50 trials with the experimental setup of this study. They then completed an additional ~50 trials to estimate their individual contrast thresholds per spatial frequency level before we proceeded with data collection at the five predefined contrast levels.

      Based on our experience, we have found that, for experiments similar to the one described here, observers quickly adapt to the setup and are generally able to maintain reliable fixation and stable performance, even during the initial training phase. In addition, each participant completed approximately 400 trials before the data collection started. Even observers who began the session with no prior experience would have become practiced with the setup by the time the actual data-collection phase started, during which ~4000 trials were collected per observer. Therefore, whether an observer participated in previous experiments is unlikely to meaningfully affect the results, as the large number of trials ensures comparable levels of task familiarity across individuals.

      Crucially, valid and neutral trials were interleaved throughout the session. Any general learning or practice would therefore influence both conditions equally. Despite this, we still observed clear performance improvements in the valid condition relative to the neutral condition, indicating that the observed benefits cannot be attributed solely to practice and reflect an attentional enhancement. We have added elaboration on the training procedures in Methods (lines 411-429).

      Finally, we recognize that the lack of invalid trials may raise concerns given our 100% spatially predictive cue, as noted in Reviewer 3’s first comment. We refer the reader to our response to that point for a more detailed discussion of cue validity and the distinction between exogenous and endogenous influences in our paradigm.

      (4) The study would benefit from a clearer connection between the behavioral results and possible underlying neural mechanisms. How might the observed changes in contrast sensitivity relate to known physiological processes at the retinal, thalamic, or cortical level? The discussion could be strengthened by framing the findings within established models of attentional modulation or by referring to known effects of attention in the early visual cortex.

      This is an important point, and we agree that framing the findings within established models of attentional modulation can strengthen the discussion. We believe that the normalization model of attention (Reynolds and Heeger, 2009; Herrmann et al., 2010) offers a useful framework for interpreting our behavioral findings, especially the attention-related changes in contrast sensitivity and asymptotic performance observed at the foveal scale. We have now added a more detailed discussion linking our results to this model and considering, explicitly as speculation, how known physiological processes at different stages may contribute to the observed effects in Discussion (lines 264-307).

      (5) The ecological relevance of the results is not fully developed. The authors propose that the observed effects may resemble natural attentional shifts triggered by salient events, yet the brief, highly localized flashes used here are somewhat artificial. A more likely interpretation is that these mechanisms relate to oculomotor control within the fovea, perhaps reflecting preparatory activity for microsaccades or fine fixation adjustments. Considering this view could broaden the impact of the findings and link them to current discussions on the relationship between attention and oculomotor control.

      We thank the reviewer for raising this important point regarding the ecological relevance of our findings, which we did not sufficiently address in the original manuscript. Although we briefly motivated scenarios that engage exogenous attention at high spatial resolution, such as detecting road signs or traffic lights at a distance while driving, we did not fully elaborate on how such attentional processes may link to downstream visual and oculomotor functions.

      In our experiment, observers maintained fixation and avoided saccades throughout the trial. Nevertheless, in a subset of trials (on average 17% ± 3%), observers made saccades after stimuli disappeared and prior to providing a response. Typically, these movements were microsaccades with amplitudes smaller than 0.5°, directed toward the target location, in both valid and neutral trials. These saccades were discarded prior to the analyses performed in the manuscript. Inspired by the reviewer’s feedback, we decided to examine the saccade latency in these trials relative to the onset of the response cue to assess whether exogenous cueing influenced oculomotor timing. Notably, we observed an earlier onset of microsaccades in valid compared to neutral trials (71 ms ± 50 ms faster, P < 0.01). We have now added this observation as Figure 2 — Supplementary Figure 2 in the manuscript. Because the presence of an exogenous pre-cue was the only difference between the two trial types, the earlier microsaccade onset likely reflects exogenously triggered preparatory activity in the oculomotor system in response to the salient pre-cue. Such fine-grained attention may prime potential eye movements toward behaviorally relevant stimuli for further examination. This interpretation is consistent with the reviewer’s suggestion and supports a mechanistic link between exogenous attention and oculomotor behavior, extending the ecological relevance of our findings. This point has been added to the Discussion on lines 329 to 340.

      We also conducted analysis to examine ocular drift behavior following the response cue. Although trials included in the manuscript analyses were constrained such that fixation during target presentation remained within a small window (10’ radius) around the fixation marker, we did not assess whether gaze subsequently drifted closer to the target location after the response cue. One possibility is that exogenous attention might bias ocular drift, shifting the preferred locus of fixation closer to the target. To address this, we computed the average Euclidean distance between gaze position and the target location following response cue onset for valid and neutral trials. However, we found no significant difference in gaze-target distance between valid and neutral trials (p = 0.57).

      Although the spatial cueing approach has long been used to probe exogenous attention in a controlled manner in psychophysical experiments, we fully recognize the importance of understanding attention under more naturalistic viewing conditions that allow observers to freely move their eyes. Developing paradigms that incorporate more naturalistic, salient stimuli would be an important direction for future work, enabling investigation of exogenous attention in ecologically valid settings and its influence on sequential actions and processes, including oculomotor behavior.

      (6) There is no statement about the availability of the data and code used for the experiment.

      We have now added the data and code for the analysis pipeline to the Open Science Framework (OSF).

      Reviewer #2 (Recommendations for the authors):

      (1) The study could discuss the strength of the effect and how it relates to previous studies.

      We thank the reviewer for raising this point. To facilitate direct comparison with the study by Jigo and Carrasco (2020), we computed attentional benefit as the ratio of contrast sensitivity between the valid and neutral conditions (now shown in Figure 3 — Supplementary Figure 4). In their data, the attentional benefit at 0° eccentricity peaked just below 4 CPD, with a ratio of approximately 1.2, corresponding to a ~20% increase in contrast sensitivity. This magnitude closely matches the benefit we observed for fine-grained attentional shifts within the foveola at spatial frequencies between 4 and 8 CPD (17% ± 12% and 16% ± 14% for 4 and 8 CPD, respectively). We have added this comparison to the Discussion (lines 246-262).

      In addition, we acknowledge that prior studies have reported heterogeneous attentional effects, including pure contrast gain, pure response gain, or a mixture of the two. We now explicitly reference these findings in the Discussion and use the normalization model of attention (Reynolds and Heeger, 2009; Herrmann et al., 2010) to account for how differences in stimulus configuration, attention field size, and eccentricity may account for discrepancies between our findings and prior studies examining attention in the extrafovea or when broadly distributed across the fovea (lines 264-307).

      (2) Minor details:

      (a) The abstract mentions gaze-contingent-display, but if I understand correctly, the stimulus was not presented in a gaze-contingent manner.

      That’s correct. Although stimuli were not presented gaze-contingently, we used a gaze-contingent calibration procedure (see Methods, lines 386-389) to achieve higher precision in localizing the line of sight. This increased accuracy was essential for selecting trials in which stimuli remained at the intended eccentricity relative to the preferred locus of fixation. To avoid potential confusion, however, we have removed this detail from the abstract.

      (b) Line 361: What is the manual calibration the authors are referring to? It does not appear to be described.

      The text has been updated to explain more explicitly what auto and manual calibrations are.

      (c) Line 402: There may be a typo towards the end of the line "t0" should be "to"?

      Text has been updated. Thank you.

      (d) Line 405. What are the units of 30?

      It’s in arcminutes. Text has been updated.

      Reviewer #3 (Recommendations for the authors):

      I found this paper very interesting, with a solid methodological approach and excellent data analyses. The authors present a well-designed psychophysical study that contributes valuable insights into the mechanisms of attention in the foveola. The methodology is rigorous, and the analyses are thoughtfully conducted and clearly presented.

      That said, I would like to offer a few comments and suggestions for clarification and further consideration:

      (1) Exogenous attention:

      If a 100% spatially predictive cue is compared to a neutral cue, the observed attentional effect should not be described as (purely) exogenous, since the cue fully predicts where the post-cue will request a response. This situation represents a case in which attention is exogenously driven but endogenously maintained (see e.g., Chica et al., 2013, Behavioural Brain Research). I recommend clarifying this distinction in the manuscript (and title) to avoid conceptual ambiguity.

      We thank the reviewer for raising this important conceptual point. We agree that because the pre-cue was 100% spatially predictive, the resulting attentional allocation cannot be considered purely exogenous. Although the abrupt, salient onset of the cue obligatorily triggers an exogenous shift of attention, its validity could also promote endogenous maintenance of attention at the cued location. Yet, several characteristics of our task strongly limit the extent to which such endogenous engagement could meaningfully influence performance. Endogenous attentional benefits typically emerge only after ~150-200 ms (Posner & Petersen, 1990; Carrasco, 2011), whereas our cue-target SOA was 100 ms, and the target remained visible for only 50 ms. Under these temporal constraints, any voluntary, slow endogenous enhancement would primarily occur after the stimulus offset. Thus, although endogenous maintenance is theoretically possible given the cue’s validity, it is unlikely to have substantially contributed to perceptual encoding in our task.

      We also considered the possibility that our response cue (a retro-cue indicating the target location) might recruit endogenous attention to the internal perceptual representation. Importantly, however, this retro-cue was equally informative in valid and neutral conditions. Any enhancement driven by the retro-cue should therefore benefit both trial types to the same extent. The fact that we still observe a robust advantage in valid trials supports the conclusion that the performance improvements predominantly reflect fast, spatially specific exogenous facilitation rather than slower endogenous processes.

      We have revised the manuscript to clarify that although the cue obligatorily triggers an exogenous attentional shift, its 100% validity could allow for endogenous attention maintenance as shown by Chica et al. (2013). We also added an explanation detailing why such endogenous contributions are unlikely to drive our main results, given the rapid cue-target timing in our task in Discussion (lines 319-327). Finally, to further prevent ambiguity, we updated the manuscript title to refer to “exogenously triggered attention,” rather than simply “exogenous attention.”

      (2) Interpretation of statistical effects:

      The statement "Therefore, asymptotic performance showed only independent, additive effects of frequency and attention, without a systematic influence of spatial frequency on the attentional benefit" seems not to be supported by the data, as the main effect of frequency was not significant.

      We thank the reviewer for this helpful observation. We agree that the original phrasing did not accurately reflect the results, as the main effect of spatial frequency was not significant (p = .0545). We have revised the sentence to “Therefore, asymptotic performance reflected an effect of attention alone, with no detectable contribution of spatial frequency or of the interaction between spatial frequency and attention” to avoid implying such an effect (lines 210-211).

      If data from two participants were missing in one condition, the authors should consider replacing this data with new participants.

      We agree with the reviewer that having two observers with missing data in one condition is not ideal. However, the 20 cpd condition was deliberately positioned near the resolution limit at the tested eccentricity and was therefore extremely demanding. Observers also had to monitor two stimulus locations simultaneously, further increasing task difficulty. This condition was challenging for all observers and, despite testing up to the highest contrast, two of seven observers were unable to perform above chance, indicating that for a non-trivial fraction of observers, this condition was effectively unmeasurable with our paradigm. As noted in the manuscript, the 20 cpd condition also has a statistical limitation: thresholds clustered near the upper bound (approaching 100% contrast), compressing the dynamic range and markedly reducing variance relative to lower spatial frequencies, which violates the homoscedasticity assumption of linear models. For these reasons, we did not pursue additional data collection in this condition. Nevertheless, we report the data that were successfully obtained, as they remain informative about performance near the resolution limit.

      We finally note that even when setting aside the 20 CPD condition, our data support this conclusion: comparisons between 4 and 12 CPD, as well as between 8 and 12 CPD, revealed large differences in the magnitude of the attentional benefit (d = 0.65, 95% CI [0.11, 1.18] and d = 0.62, 95% CI [0.08, 1.14], respectively). To further quantify these effects, we have added Cohen’s d to report the effect sizes for these spatial-frequency comparisons across texts in Results as well as in tables in Appendices.

      (3) Sample size:

      As this is a psychophysical experiment with many trials and few participants, I am curious about how the authors determined the appropriate sample size and the number of trials required to detect the expected effects. Given that many effects were found to be significant, it seems that statistical power was adequate; however, it would be helpful if the authors could explain how this issue was addressed a priori during experimental planning.

      We appreciate that the reviewer raised this point. Please see the reply to the second point from Reviewer 1, who raised a related question about statistical power.

      (4) Figure 2 clarification:

      In Figure 2B, I do not fully understand the "Valid" and "Neutral" representation. Both conditions include a post-cue indicating the right position; however, in the neutral condition, there is a central fixation square, whereas in the valid condition, there is not. Please clarify this aspect of the figure. I think I understood the paradigm, but this part of the figure is misleading.

      Precue only exists in valid condition. But there is a mistake where fixation marker is missing in valid condition in panel B.

      We thank the reviewer for pointing this out. We have updated Figure 2 to explicitly show the sequence of valid vs. neutral trials. The fixation mark remained on the screen throughout the trial in both the valid and neutral conditions. After a 500 ms fixation period, an exogenous cue was presented for 30 ms in valid trials, followed by a 70 ms interval before stimulus onset. In neutral trials, no cue was presented, and the screen remained blank for 100 ms before the stimuli appeared. In conditions, a response cue would appear 50 ms after stimulus offset.

    1. eLife Assessment

      This is a valuable report describing tracheal terminal cells (TTCs) in Drosophila as an immune privileged organ. The authors demonstrated that TTCs lack expression of the membrane-associated peptidoglycan recognition receptor PGRP-LC, which protects these cells from immune pathway activation and JNK-mediated cell death to maintain TTC homeostasis. While the genetic experiments using RNAi and overexpression are convincing and solid, the broader biological significance of this phenomenon requires further investigation. This work will be of interest to researchers in innate immunity across various model systems.

    2. Reviewer #1 (Public review):

      Summary:

      In their manuscript entitled "Terminal tracheal cells of Drosophila are immune privileged to maintain their Foxo-dependent structural plasticity", Bossen and colleagues determine that the terminal cells of the tracheal system differ from other larval tracheal cells in that they do not typically show an Imd-dependent immune response to fungal and viral infections. Authors reach this conclusion based on the expression of a reporter line, Drs-GFP. The authors speculate that this difference may reflect differential expression of an immune pathway component, as tracheal terminal cells (ttcs) do not respond to forced expression of PRGP-LS. The authors then go on to show that, unlike the other cells of the tracheal system, terminal cells do not express PGRP-LC as reported by a GAL4 enhancer trap. Forced expression of PGRP-LC in terminal cells resulted in reduced branching, cell damage and features of the cell death program. These effects could be suppressed by depletion of AP-1 or Foxo transcription factors. Authors show that Foxo plays a negative role in branching of ttcs, with ectopic branching occurring upon RNAi (or under hypoxic conditions). The authors speculate that immune privilege of the ttcs may have evolved to permit Foxo regulation of ttc branching.

      Strengths:

      The authors provide compelling genetic data that support their overall conclusions.

      Weaknesses:

      FC do not appear to express DRS reporter in Figure 1 or elsewhere, raising the question of whether fusion cells are also immune privileged.<br /> Fig 5, TRE_RFP expression, is convincing in wt ttc, but not in ttc o/x PGRP-LCx

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Bossen et al. looked at the immune status of the tracheal terminal cells (TTCs) in Drosophila larvae. The authors propose that these cells do show PGFP-LCx expression and, hence, lack immune function. Artificial overexpression of the PGRP-LCx in the TTCs causes these cells to undergo apoptosis.

      Strengths:

      Only a few groups have tried to look at the immune status of the trachea, though we know that AMPs are expressed there after infection. This exciting study attempts to understand the differences in the tracheal cells that do not produce AMPs upon infection.

      Weaknesses:

      The reason why the TTCs have some immune privilege still needs to be completely clear. Whether the phenotype is cell autonomous or contributes to the cellular immune system is not evaluated. As we know, crystal cells also maintain oxygen levels in larvae; whether in the absence of a terminal trachea, the crystal cells have any role is not explored.

      My particular comments on the figures are as follows:

      (1) In Figure 2, the PGRP-LCx signal should be quantified as done for Drosomycin GFP, as shown in Figure 1.<br /> - The authors have now done this.

      (2) In Fig 2F and G are the larvae infected? If not, what happens to PGRP-LCx expression post Ecc15 infection?<br /> - The authors have answered this question, saying infection has no effect on TTCs' Dr-GFP expression.

      (3) Is the effect of overexpression of LCx exaggerated post-infection? In particular, when it comes to the escape phenotype.<br /> - This was not done; the infection experiment was done with PGRP-LE overexpression.

      (4) Does overexpression of anti-apoptotic genes in TTC and PGRP-LCx rescue the TTC branching?<br /> - This was not done.

      (5) Have the authors tried to rescue the larvae with shallow food?<br /> - This was not done.

      (6) Is there any effect on the circulating hemocytes or lymph gland in the PGFRP-LCx overexpressing animals?<br /> - This was not done.

    4. Reviewer #3 (Public review):

      Summary:

      The authors report that tracheal terminal cells (TTCs) in Drosophila do not activate innate immunity following bacterial infection, and attribute this to the absence of PGRP-LCx expression in these cells. Forced activation of the Imd pathway in TTCs leads to JNK-mediated cell death and reduced tracheal branching. The authors propose that this immune-privileged status preserves Foxo-dependent structural plasticity, which is essential for TTCs to respond to changing environmental conditions such as hypoxia.

      Strengths:

      The revised manuscript represents a meaningful improvement over the initial submission. The addition of multiple antimicrobial peptide reporters substantially strengthens the key observation that TTCs do not mount a humoral immune response upon infection, moving beyond reliance on the Drs-GFP reporter alone. The mechanistic dissection of the cell death pathway - demonstrating roles for JNK, AP-1, and Foxo downstream of ectopic PGRP-LCx activation - is well-executed and provides solid mechanistic insight. The inclusion of a second, independent UAS-PGRP-LCx line with a milder phenotype adds useful calibration. The hypoxia sensitivity assays provide physiological context, and the discussion of the gradient hypothesis, while based on qualitative observation, is logically reasoned and addresses a legitimate alternative interpretation.

      Weaknesses:

      The primary remaining concern is that the absence of PGRP-LCx expression in TTCs is supported by a single GAL4 enhancer trap line, without independent validation by complementary methods such as in situ hybridization, antibody staining, or reanalysis of publicly available single-cell transcriptomic data. The authors acknowledge this limitation transparently. While the convergent evidence from infection experiments - in which neither the Drs-GFP reporter nor the PGRP-LCx-Gal4 line shows TTC activation - lends indirect support, orthogonal confirmation would more definitively establish this mechanistic claim.

      Additionally, the finding that Dcp-1 cleavage occurs in non-TTC tracheal cells as well suggests that Imd-mediated apoptotic signaling is not uniquely restricted to TTCs, and the Discussion could more explicitly address what distinguishes the TTC response in terms of degree or cellular context.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their manuscript entitled "Terminal tracheal cells of Drosophila are immune privileged to maintain their Foxo-dependent structural plasticity", Bossen and colleagues determine that the terminal cells of the tracheal system differ from other larval tracheal cells in that they do not typically show an Imd-dependent immune response to fungal and viral infections. The authors reach this conclusion based on the expression of a reporter line, Drs-GFP. The authors speculate that this difference may reflect differential expression of an immune pathway component, as tracheal terminal cells (TTCs) do not respond to forced expression of PRGP-LS. The authors then go on to show that, unlike the other cells of the tracheal system, terminal cells do not express PGRP-LC as reported by a GAL4 enhancer trap. Forced expression of PGRP-LC in terminal cells resulted in reduced branching, cell damage, and features of the cell death program. These effects could be suppressed by the depletion of AP-1 or Foxo transcription factors. The authors show that Foxo plays a negative role in the branching of TTCs, with ectopic branching occurring upon RNAi (or under hypoxic conditions). The authors speculate that the immune privilege of the TTCs may have evolved to permit Foxo regulation of TTC branching.

      Strengths:

      The authors provide compelling genetic data.

      Weaknesses:

      (1) The authors state that after infection 34% of larvae were not GFP+ as defined by the detection of Drs-GFP in dorsal branches. The authors should clarify if these larvae are completely without response to infection, with no Drs-GFP in dorsal trunks and or other tracheal branches. If these larvae are entirely unresponsive, could authors indicate why this might be? Also, at this point in the manuscript, the authors are somewhat misleading regarding TTC expression of Drs-GFP - they should state at this point that there are some TTCs that do express Drs-GFP, and also should address their prior study of Drs-GFP induction which does not claim exclusion of TTC Drs-GFP expression.

      GFP– indicates the absence of detectable fluorescence in regions proximal to the TTCs (dorsal branch and fusion cells). Our analysis specifically focused on these regions and did not assess fluorescence in other parts of the tracheal system. Therefore, the reported 34% of larvae classified as GFP– does not imply a complete absence of response in these animals; rather, no fluorescence was detected within our defined region of interest. To clarify how fluorescence in TTCs was quantified, we have added a schematic (new Fig. 1F). In addition, new Fig. S1 illustrates that AMP reporter activation frequently occurs in other tissues.

      Our observations are consistent with earlier reports. In the original description of the AMP reporter lines, Tzou et al. (2000; https://doi.org/10.1016/S1074-7613(00)00072-8) reported that “only a fraction of the flies or larvae exhibited fluorescence in surface epithelia, and the proportion of GFP-expressing animals was variable from one culture vial to the next. In addition, fluorescence was rarely distributed throughout the whole tissue and was limited to restricted areas of the epithelium,” suggesting that AMP reporter activation can occur locally rather than uniformly across tissues.

      In a previous study (https://doi.org/10.1186/1471-2164-9-446), we reported that airway epithelial cells, including the finest tracheal endings on target organs, can activate drosomycin transcription following infection. However, that study focused specifically on infected larvae. Importantly, it did not quantify the frequency of reporter activation or analyze TTC-specific phenotypes. As such, those statements should not be interpreted as implying uniform or ubiquitous reporter activation across all tracheal cells.

      (2) The authors describe the terminal cell phenotype as "shrunken" but this implies loss of size or pruning, however, it is not clear whether the defects could equally be due to lack of growth or slower growth.

      We omitted the term “shrunken” in the present manuscript to avoid potential misinterpretation.

      (3) Figure 1 suggests that GFP+ dorsal branches are not uniform in their expression of Drs-GFP, it seems more patchy. The authors should define the fraction of dorsal branch cells that are Drs-GFP positive. Also, are fusion cells Drs-GFP positive?

      We included a schematic illustrating our quantification approach (new Fig. 1F). We also revised the wording to clarify that GFP<sup>+</sup> animals include fluorescence not only in the dorsal branch (DB) but also in fusion cells (FCs), i.e., structures located between the dorsal trunks and the terminal tracheal cells (TTCs). Any structure in proximity to the TTCs that shows GFP expression was scored as GFP<sup>+</sup>. In most cases, GFP expression was observed in the dorsal fusion cells.

      (4) Drs-GFP expression is largely absent from terminal cells; however, a still significant # of terminal cells show expression (8%). Authors argue that PRGP-LC expression is absent based on a GAL4 transgenic line. If this line reflects endogenous PRGP-LC expression, should there not be 8% positive TTCs? Or is the 8% Drs-GFP expression independent of the IMD receptor?

      We detected PGRP-LE expression in approximately 3% of epithelial tracheal cells that expressed Drs after infection (Fig. 3F,G). This observation suggests that Drs activation can occur through a mechanism independent of PGRP-LCx. We have incorporated this finding into both the Results and Discussion sections.

      (5) Figure 2: the authors state that TTCs are negative even with induced PRGP-LE expression - should there not be at least 8% that are positive?

      We included infection of the PGRP-LE overexpression and could see Drs-GFP expression in 3 % of the cases, which we did not see without infection.

      (6) The authors compare PRGP-LC expression to induction of cell death by expression of reaper and hid. Reaper and Hid had stronger effects and eliminated TTCs. See cleavage of caspase Dpc-1 in PRGP-LC expressing cells. Is caspase cleavage always diagnostic of apoptosis or could the weaker than rpr/hid phenotype imply a different function?

      We have included the potential non-apoptotic functions of Dcp-1 in the Discussion. The weaker phenotype observed could therefore be explained by a non-apoptotic role of Dcp-1.

      (7) Drs-GFP expression is said to be "completely" absent from tracheal terminal cells when the entire tracheal system is expressing PGRP-LE.

      We have revised the wording accordingly.

      (8) Figure 5, TRE_RFP expression, is not convincing that it is higher or in terminal cells. https://doi.org/10.7554/eLife.102369.1.sa2

      We have revised the wording in line 230.

      Reviewer #2 (Public review):

      Summary:

      In this study, Bossen et al. looked at the immune status of the tracheal terminal cells (TTCs) in Drosophila larvae. The authors propose that these cells do show PGFP-LCx expression and, hence, lack immune function. Artificial overexpression of the PGRP-LCx in the TTCs causes these cells to undergo apoptosis.

      Strengths:

      Only a few groups have tried to look at the immune status of the trachea, though we know that AMPs are expressed there after infection. This exciting study attempts to understand the differences in the tracheal cells that do not produce AMPs upon infection.

      Weaknesses:

      The reason why the TTCs have some immune privilege still needs to be completely clear. Whether the phenotype is cell autonomous or contributes to the cellular immune system is not evaluated. As we know, crystal cells also maintain oxygen levels in larvae; whether in the absence of terminal trachea, the crystal cells have any role is not explored. https://doi.org/10.7554/eLife.102369.1.sa1

      In addition to the Drs-GFP reporter line, we performed new infection experiments using additional antimicrobial peptide reporters to further support our observations. While these experiments confirm the humoral immune response, they do not address the mechanisms underlying the apparent immune privilege. Our analysis therefore focuses specifically on the humoral immune response and does not allow conclusions regarding potential contributions of the cellular immune system, including crystal cells, to maintaining oxygen levels in animals with impaired TTCs. Notably, complete loss of TTCs is lethal, as demonstrated by TTC ablation using hid;rpr expression (Fig. 4F).

      Reviewer #3 (Public review):

      Summary:

      The authors report that tracheal terminal cells (TTCs) in Drosophila do not activate innate immunity following bacterial infection. They attribute this to the lack of expression of PGRP-LCx in these cells. Forced activation of the Imd pathway in TTCs leads to cell death and a reduction in tracheal branching. The authors propose a mechanism for cell death induction via pathways involving JNK, AP-1, and foxo. They suggest that the suppression of innate immunity in TTCs may serve to maintain their plasticity, preparing them for responses to hypoxic conditions.

      Strengths:

      (1) The study addresses the understudied area of immune privilege in innate immunity, providing a potentially important example in Drosophila TTCs.

      (2) The molecular characterization of the cell death pathway induced by forced Imd activation is well-executed and provides solid mechanistic insights.

      (3) The authors draw interesting parallels between Drosophila TTCs and mammalian endothelial cells, suggesting broader implications for their findings.

      Weaknesses:

      (1) The core premise of the study - that TTCs do not activate innate immunity following bacterial infection - relies heavily on a single readout (Drs reporter). Additional markers of immune activation would strengthen this crucial claim.

      We included new experiments using additional antimicrobial peptide reporter genes that show results similar to those obtained with the Drs-GFP reporter (new Fig. 1).

      (2) The evidence for the lack of PGRP-LCx expression in TTCs is based on a single GAL4 reporter line. Given the importance of this observation to the authors' model, validation using alternative methods would be beneficial.

      Although we were not able to include alternative methods to further confirm our hypothesis, we performed additional infection experiments. Upon bacterial infection, we observed a strong increase in GFP fluorescence throughout the animal and in many other tissues, while still detecting no response in the TTCs. These results further support our hypothesis.

      (3) The phenotypes observed upon forced activation of the Imd pathway in TTCs, while intriguing, may be influenced by non-physiological levels of pathway activation. The authors should address this potential caveat and consider examining the effects of more moderate pathway activation. https://doi.org/10.7554/eLife.102369.1.sa0

      We used two independent UAS-PGRP-LCx lines located on different chromosomes. One line (III) produced a stronger phenotype than the other (II). We clarified this point in the Results section (Fig. 4C,D) and added supplementary data (new Fig. S2) showing that both lines produce comparable phenotypes when expressed using an alternative tracheal driver. The epithelial thickening observed follows the same pattern as the phenotype detected in TTCs, indicating that even moderate pathway activation leads to similar effects. However, we acknowledge that this represents ectopic pathway activation and therefore likely reflects a non-physiological level of signaling.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      My particular comments on the figures are as follows:

      (1) In Figure 2, the PGRP-LCx signal should be quantified as done for Drosomycin GFP, as shown in Figure 1.

      We agree and have added a quantification.

      (2) In Figure 2F and G are the larvae infected? If not, what happens to PGRP-LCx expression post Ecc15 infection?

      We also included infected larvae to test whether infection induces GFP expression in TTCs. However, GFP expression was never observed in TTCs, although overall fluorescence increased in other tissues.

      (3) Is the effect of overexpression of LCx exaggerated post-infection? In particular when it comes to the escape phenotype.

      We induced mild Imd pathway activation by expressing PGRP-LE using a tracheal driver active in all tracheal cells, including TTCs, for 24 hours. In addition, these larvae were infected and their sensitivity to hypoxia was assessed. Animals expressing PGRP-LE in the trachea showed increased sensitivity to hypoxia, which was further enhanced following infection.

      (4) Does overexpression of anti-apoptotic genes in TTC and PGRP-LCx rescue the TTC branching?

      This point was not addressed.

      (5) Have the authors tried to rescue the larvae with shallow food?

      This point was not addressed.

      (6) Is there any effect on the circulating hemocytes or lymph glands in the PGFRP-LCx overexpressing animals?

      This point was not addressed.

      Reviewer #3 (Recommendations for the authors):

      The authors present an intriguing model of immune privilege in Drosophila tracheal terminal cells (TTCs). This model is built upon three key pillars: (1) the absence of innate immune activation in TTCs, (2) the lack of PGRP-LCx expression in TTCs, and (3) the induction of cell death when innate immunity is activated in TTCs. However, the experimental evidence supporting each of these critical points requires substantial strengthening. The reviewer recommends the following improvements and additional experiments to address these core issues:

      (1) Innate immune activation in TTCs:

      Evaluate the expression of additional antimicrobial peptide reporters to provide a more comprehensive assessment of innate immune activation in TTCs.

      In addition to the Drs-GFP reporter line, we performed new infection experiments using other antimicrobial peptide reporters to confirm our results.

      (2) PGRP-LCx expression in TTCs:

      Validate the PGRP-LCx-GAL4 line used in the study to ensure it accurately reflects endogenous PGRP-LCx expression.

      Employ complementary techniques such as in situ hybridization and antibody staining to corroborate the absence of PGRP-LCx in TTCs.

      We also included infection experiments using PGRP-LCx-Gal4 larvae. Infection did not trigger GFP expression in TTCs. However, the overall PGRP-LCx expression pattern observed in other larval tissues supports that the results reflect endogenous PGRP-LCx expression.

      (3) Cell death induction upon immune activation in TTCs:

      Address the possibility that the observed cell death is an artifact of strong, forced Imd pathway activation. To do that,

      perform control experiments activating the Imd pathway in non-TTC tracheal cells to determine if cell death is specific to TTCs.

      Use broader tracheal drivers (e.g., ppk4-GAL4 or btl-GAL4) to activate the Imd pathway and verify if cell death is indeed restricted to TTCs.

      We included results from PGRP-LCx overexpression using the tracheal driver ppk4-Gal4 and stained for the apoptosis marker Dcp-1 (new Fig. S3). We observed increased Dcp-1 signal in dorsal trunk cells, indicating that PGRP-LCx-mediated Dcp-1 cleavage is not restricted to TTCs.

      Ideally, generate a transgenic line expressing physiological levels of PGRP-LCx in TTCs and demonstrate that bacterial infection induces cell death specifically in TTCs through the proposed pathway. The reviewer acknowledges the complexity of this experiment but believe it would significantly strengthen the authors' conclusions.

      We did not generate a new transgenic line but instead used an alternative UAS-PGRP-LCx line (II), which exhibits a milder phenotype. This has now been clarified more prominently in the Results section (Fig. 4C,D). Additionally, we performed further experiments showing an epithelial thickening phenotype whose severity depends on the UAS-PGRP-LCx line used (new Fig. S2).

      In addition to the above major points

      (4) Quantitative data presentation:

      Provide quantitative analyses for the results presented in Figures 2 and 3J-K to allow for a more rigorous evaluation of the data.

      We included a quantitative analysis of the results shown in Fig. 2 (now presented in new Fig. 3). In addition, we added quantification of fluorescence in the TTCs of infected larvae.

      (5) Alternative hypothesis:

      Consider and address an alternative explanation for the lack of innate immune activation in TTCs: the potential gradient of bacterial ligands from proximal trachea to distal TTCs. If this hypothesis is correct, one might expect to see a gradient of Drs expression correlating with the distance from the proximal trachea. Addressing this possibility would strengthen the authors' proposed model.

      We now included the following paragraph as part of the discussion section.

      “An alternative explanation for the observed lack of an immune response in TTCs could be their maximal distance from the spiracles. In this scenario, a gradient of bacterial inducers along the tracheal system might be expected, resulting in a gradual decrease in immune activation from the spiracles toward the TTCs. However, this is not what we observed. In tracheae that displayed an immune response, the response was largely homogeneous along the entire length of the tracheal system, from the spiracles to the TTCs. Only at the transition to the TTCs did the immune response drop abruptly. This observation argues against the gradient hypothesis and suggests that TTCs are specifically excluded from the immune response.”

    1. eLife Assessment

      By screening an FDA-approved small-molecule library against a leucine-dependent M. tuberculosis strain, this study identifies semapimod as an inhibitor of Mtb growth that functions by impairing leucine import. The work is useful in linking leucine uptake to cell wall lipid biology in Mtb. However, the mechanistic understanding remains incomplete. Additional experimental evidence is required to clarify how PDIM contributes to or regulates leucine uptake.

    2. Reviewer #3 (Public review):

      Agarwal et al identified the small molecule semapimod from a chemical screen of repurposed drugs with specific antimycobacterial activity against a leucine-dependent strain of M. tuberculosis. To better understand the mechanism of action of this repurposed anti-inflammatory drug, the authors used RNA-seq to reveal a leucine-deficient transcriptomic signature from semapimod challenge. The authors then measured a decreased intracellular concentration of leucine after semapimod challenge, suggesting that semapimod disrupts leucine uptake as the primary mechanism of action. Unexpectedly however, resistant mutants raised against semapimod had a mutation in the polyketide synthase gene ppsB that resulted in loss of PDIM synthesis. The authors believe growth inhibition is a consequence of decreased accumulation of leucine as a result of an impaired cell wall and a disrupted, unknown leucine transporter. This study highlights the importance of branched-chain amino acids for M. tuberculosis survival and the chemical genetic interactions between semapimod and ppsB indicate that ppsB is a conditionally essential gene in a medium deplete of leucine.

      The conclusions regarding the leucine and PDIM phenotypes are moderately supported by experimental data. The authors do not provide experimental evidence to support a specific link between leucine uptake and impaired PDIM production. Additional work is needed to support these claims and strengthen this mechanism of action.

      A mechanistic gap still exists for the model of semapimod antitubercular activity. The basis for semapimod activity is that the leucine auxotroph strain cannot acquire leucine from its environment, and thus the bug ceases to grow. Under normal growth conditions, the leucine auxotroph strain produces PDIM and acquires exogenous leucine through some mechanism (either through a transporter or through PDIM). Semapimod binding to PpsB causes the cell to alter its PDIM profile (lacking experimental for this), and now with the altered PDIM profile the cell cannot acquire enough exogenous leucine to sustain growth (either because the altered PDIM profile interferes with the leucine transporter activity or through PDIM uptake). Acquiring a mutation in ppsB results in cells unable to produce PDIM (some evidence supporting this) but can now acquire enough exogenous leucine to sustain growth. I cannot find the connection between cells that have normal PDIM with normal leucine uptake and cells that are missing PDIM with normal leucine uptake.

      (1) The manuscript would benefit from adding additional antibiotic controls to experiments. With the current experimental approaches, it is unclear if these signatures are the result of semapimod specifically or the effect of an antimicrobial agent. Adding additional strains to the 2D TLC experiments could provide more confidence in the absence or modifications of the PDIM band.

      (2) The intriguing observation that wild-type H37Rv is resistant to semapimod but the leucine-auxotroph is sensitive should be further explored. If the authors are correct and semapimod does inhibit leucine uptake through a specific transporter or modified PDIM profiles, testing semapimod activity against the leucine-auxotroph in various concentrations of BCAAs could highlight the importance of intracellular leucine. Cells might recover growth in the presence of semapimod treatment if enough leucine is provided in the media and some fraction is able to enter the cell through the impaired PDIM barrier.

    3. Reviewer #4 (Public review):

      Summary:

      In this study, the authors screened an FDA-approved repurposed library of small-molecule inhibitors against the auxotrophic strain Mtb mc2 6206 and found that semapimod exclusively inhibited its growth. Further studies showed that it inhibits L-leucine uptake by interacting with PpsB, although the exact mechanism remains unknown. Interestingly, semapimod showed antibacterial activity against H37Rv only in vivo, not in vitro, suggesting a dependence on host-derived exogenous leucine during intracellular growth. This work therefore suggests that uptake of host-derived leucine can be targeted as an effective strategy to reduce intracellular survival of Mtb.

      Strengths:

      The authors have used different approaches to understand the mechanism of L-leucine uptake in Mtb. To start, they conducted an in vitro screen using an FDA-approved library, followed by transcriptomic and metabolic analyses of different Mtb mutants. Through whole-genome sequencing, they identified mutations conferring resistance to semapimod to gain further mechanistic understanding. This led to the analysis of semapimod-PpsB interaction by BLI-Octet and analysis of cell-wall apolar lipid, which explained how PDIM loss resulted in sensitivity to vancomycin. Finally, infection experiments in mice surprisingly showed that semapimod was effective against intracellular Mtb in vivo but not in vitro.

      Weakness:

      The major weakness of this study is that it is unclear what role PpsB plays in L-leucine uptake. It is also not clear why intracellular Mtb relies on exogenous leucine rather than endogenous leucine. Does intracellular Mtb lose its ability to synthesize leucine, which is why semapimod is active in vivo but not in vitro? Or semapimod has any other effect on host immunity that has not been explored. I have a few minor comments, which are as follows:

      (1) Authors state that "The colony forming unit (CFU) estimation further shows a bactericidal activity of this molecule which causes 88% reduction of bacterial viability on day 2 and >99% reduction after 5 days of incubation" (Fig. 1d). However, this is only true when compared to the untreated control. Compared to the Day 0 control, treated bacteria appear to have undergone little or no change, suggesting that the compound is bacteriostatic, not bactericidal. The drug concentration used for Fig 1d is not mentioned. For Fig. 1e, there is no day 0 control, and the comparison is with the untreated control at Day 6, which again does not suggest bactericidal action of Semapimod.

      (2) The authors report that "Notably, no cytotoxic effect was observed at this concentration against THP1, thus ruling out the possibility of cell lysis by semapimod," but the data are not shown. Similarly, authors state that "As a control, interaction of semapimod was also analyzed with the purified Ppe60, which fails to exhibit any binding," but the data is not shown.

      (3) Line 235: change "promote" to "promoter".

    4. Reviewer #5 (Public review):

      Summary:

      The authors have extensively characterized the response of the leucine and pantothenate auxotroph Mtb strain H37Rv mc26 206 to an FDA-approved compound library and identified semapimod that is, at best, bacteriostatic in its action against the pathogen. The authors have used transcriptional profiling, metabolite quantification and a screening of genetically-resistant mutants to identify changes in leucine uptake under semapimod exposure. Based on these data, the authors attribute changes in antibiotic susceptibility to differences in environmental leucine availability and bacterial PDIM architecture. While the work presents an interesting avenue of investigation of metabolite uptake and utilization in a comparative fashion between fully virulent and auxotroph Mtb strains, it lacks clear and direct evidence to link the observations with a mechanistic explanation.

      Strengths:

      The authors used a well-designed screening strategy for FDA-approved compounds against a metabolically defined strain and follow up characterization of semapimod exposure through RNA-seq and pathway analysis, metabolomics and time-course analysis of drug effects. The data has been interestingly interpreted to identify a phenotypic connection between PDIM and altered drug susceptibility.

      Weaknesses:

      The major gap in the study is the speculative nature of the mechanism underpinning the connection between PDIM architecture and changes in leucine uptake under various bacterial growth conditions.

      (1) Despite claims of identifying a "novel leucine uptake mechanism", the authors only provide endpoint metabolite measurements rather than kinetic leucine transport studies.

      (2) A clear explanation for the differences in susceptibility between auxotroph and fully virulent Mtb strains through changes in "PDIM architecture" is not supported by any direct evidence such as structural analysis, lipidomics, or direct measurement of PDIM architectural changes.

      (3) The figures 1D (lines 110-112, "kills bacteria") and 7c (lines 283-285) are used to infer a bactericidal role of semapimod, which maybe a mischaracterization of drug activity. The trend in CFUs in both cases seems of no bacterial growth rather than a CFU reduction- therefore interpreted as "bacteriostatic" at best. These observations would in fact align with the general antibiotic/stress response signature identified by RNA-seq, where leucine transport related genes only happen to be a small subset of many dysregulated genes. How do the authors disentangle these generic signatures from the leucine transport evidence, other than endpoint metabolite quantification?

      (4) Furthermore, the studies with supplementation of leuCD (and not panCD) in rescuing from semapimod susceptibility are not supported by a clear mechanistic link. The complementation of leuCD does not completely rescue growth- does this indicate differences in uptake and metabolism? The authors should test this by monitroing the growth of the strains in minimal medium in presence and absence of exogenous leucine.

      (5) It remains unclear if the authors attribute leucine uptake differences to a loss of PDIM or changes in PDIM amount and architecture. No direct evidence is provided for differences in PDIM production in the WT H37Rv strain and the auxotroph mc2 6206 strains used in this study. Mulholland et al (2024) report similar PDIM levels for WT and auxotrophic Mtb (mc2 6206) in their stocks passaged to maintain PDIM. This could change for stocks maintained differently. Since the presence of PDIM has classically been used to explain a penetration barrier for small molecules and the schematic provided by the authors at the end of the manuscript (figure 8c) suggest free leucine penetration in the absence of PDIM, how do the authors explain the increased leucine uptake and sensitivity of a PDIM positive auxotroph to semapimod through direct experimental evidence? Further on the point of PDIM production, the WT auxotroph strain seems to produce limited amounts of PDIM as evidenced by the TLC data in Figure 6b. To solidify this point, the authors should test other point mutants for PDIM production (not attenuated for growth) through TLC and quantify these differences. These data should be compared with PDIM production in the WT Mtb H37Rv strain (used by the authors) under in vitro growth conditions. A comparative lipidomics of cell envelope components might be insightful in explaining these differences. I believe answering this query is crucial and within the scope of the work whose central claim is the identification of a novel leucine uptake mechanism. It would be interesting, in fact, to identify a novel transporter associated with the PDIM layer on the cell envelope.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review): 

      Summary:

      In this manuscript, the authors used a leucine/pantothenate auxotrophic strain of Mtb to screen a library of FDA-approved compounds for their antimycobacterial activity and found significant antibacterial activity of the inhibitor semapimod. In addition to alterations in pathways, including amino acid and lipid metabolism and transcriptional machinery, the authors demonstrate that semapimod treatment targets leucine uptake in Mtb. The work presents an interesting connection between nutrient uptake and cell wall composition in mycobacteria.

      Strengths:

      (1a) The link between the leucine uptake pathway and PDIM is interesting but has not been characterized mechanistically. The authors discuss that PDIM presents a barrier to the uptake of nutrients and shows binding of the drug with PpsB. However it is unclear why only the leucine uptake pathway was affected.

      We observe interference of L-leucine, but not of pantothenate, uptake in mc2 6206 strain upon semapimod treatment. At present, we do not have any clue whether PDIM presents a barrier exclusively to the uptake of L-leucine. Further studies may shed a light on underlying mechanism(s) by which L-leucine uptake is modulated by this small molecule.

      (1b) We still do not know what PpsB actually does for amino acid uptake - is it a transporter?

      By BLI-Octet we do not find any interaction between L-leucine and PpsB. Therefore, we doubt that PpsB is a transporter of L-leucine.

      (1c) Does semapimod binding affect its activity?

      Our study suggests that semapimod treatment alters PDIM architecture which becomes restrictive to L-leucine. However, at present the exact mechanism is not clear. Further studies are required to thoroughly examine the effect of semapimod on Mtb PpsB activity and alterations in PDIM by mass spectrometry.

      (1d) Does the auxotrophic Mtb have lower PDIM levels compared to wild-type Mtb?

      As per the published report by Mulholland et al, and by vancomycin susceptibility phenotype in our study, both the strains appear to have comparable PDIM levels.

      (2) The authors show an interesting result where they observed antibacterial activity of semapimod against H37Rv only in vivo and not in vitro. Why do the authors think this is the basis of this observation? It is possible semapimod has an immunomodulatory effect on the host since leucine is an essential amino acid in mice. The authors could check pro-inflammatory cytokine levels in infected mouse lungs with and without drug treatment.

      Semapimod inhibits production of proinflammatory cytokines such as TNF-α, IL-1β, and IL-6, which would indeed help pathogen establish chronic infection. However, a significant reduction in bacterial loads in lungs and spleen upon semapimod treatment despite inhibition of proinflammatory cytokines clearly indicates bacterial dependence on host-derived exogenous leucine during intracellular growth.

      (3) The authors show that the semapimod-resistant auxotroph lacks PDIM. The conclusions would be further strengthened by including validations using PDIM mutants, including del-ppsB Mtb and other genes of the PDIM locus, whether in vivo this mutant would be more susceptible (or resistant) to semapimod treatment.

      PDIM is a virulence factor, and plays an important role in the intracellular survival of the TB pathogen. Mtb strains lacking PDIM are expected to show attenuated growth during infection, even without semapimod treatment. In such a case, it might be difficult to draw any conclusions about the effect of semapimod against PDIM(-) strains in vivo.

      (4) Prolonged subculturing can introduce mutations in PDIM, which can be overcome by supplementing with propionate (Mullholland et al, Nat Microbiol, 2024). Did the authors also supplement their cultures with propionate? It would be interesting to see what mutations would result in Semr strains with propionate supplementation along with prolonged semapimod treatment. 

      Considering the fact that extensive subculturing may result in loss of PDIM, we avoided prolonged subculturing of bacteria. As presented in Fig. 6b, the WT bacteria retain PDIM. While performing the initial screening of drugs, we did not anticipate such phenotype, and hence bacteria were cultured in regular 7H9-OADS medium without propionate supplementation.

      A comprehensive future study would help examining the effect of propionate on generation of semapimod resistant mutants in Mtb mc2 6206.

      Weaknesses:

      I have summarized the limitations above in my comments. Overall, it would be helpful to provide more mechanistic details to study the connection between leucine uptake and PDIM.

      Reviewer #2 (Public review): 

      Summary

      This important study uncovers a novel mechanism for L-leucine uptake by M. tuberculosis and shows that targeting this pathway with 'Semapimod' interferes with bacterial metabolism and virulence. These results identify the leucine uptake pathway as a potential target to design new anti-tubercular therapy. 

      Strengths

      The authors took numerous approaches to prove that L-leucine uptake of M. tuberculosis is an important physiological phenomenon and may be effectively targeted by 'Semapimod'. This study utilizes a series of experiments using a broad set of tools to justify how the leucine uptake pathway of M. tuberculosis may be targeted to design new anti-tubercular therapy.

      Weaknesses

      (1) The study does not explain how L-leucine is taken up by M. tuberculosis, leaving the mechanism unclear. Even though 'Semapimod' binds to the PpsB protein, the relevant connection between changes in PDIM and amino acid transport remains incomplete.

      While Leucine uptake involves specific transporters in other bacteria, such transport system is not known in Mtb. By screening small molecule inhibitors, we came across a molecule, semapimod, which selectively kills the leucine auxotroph (mc2 6206), but not the WT Mtb. To understand the underlying mechanism of differential susceptibility of the WT and auxotrophic strains to this molecule, we evaluated the effect of restoration of leuCD and panCD expression on susceptibility of the auxotrophic strain to semapimod. Interestingly, our results demonstrated that upon endogenous expression of leuCD genes, mc2 6206 strain becomes resistant to killing by semapimod. In contrast, no effect of panCD expression was observed on semapimod susceptibility of mc2 6206. These findings were further substantiated by gene expression analysis of semapimod treated mc2 6206, which exhibits differential regulation of a set of genes that are altered upon leucine depletion in Mtb as well as in other bacteria. Overall results thus provide first evidence of perturbation of L-leucine uptake by semapimod treatment of the leucine auxotroph.

      To further gain mechanistic insights into the effect of semapimod on leucine uptake in Mtb, we generated the semapimod resistant strain which exhibits point mutation in 4 genes including ppsB. Interestingly, overexpression of wild-type ppsB, but not of other genes, restored susceptibility of the resistant bacteria to semapimod. Our observations that semapimod interacts with PpsB, and semapimod resistant strain accumulates mutation in PpsB resulting in loss of PDIM together support the involvement of cell-wall PDIM in regulation of L-leucine transport in Mtb.

      As mentioned above, we anticipate that semapimod treatment brings about certain modifications in PDIM which becomes more restrictive to L-leucine. A comprehensive future study will be helpful to examine the effect of semapimod on Mtb physiology.

      (2) Also, the fact that the drug does not function on WT bacteria makes it a weak candidate to consider its usefulness for a therapeutic option.

      We agree that semapimod is not an appropriate drug candidate against TB owing to its inhibitory effect on production of proinflammatory cytokines such as TNF-α, IL-1β, and IL-6 that help pathogen establish chronic infection. However, a significant reduction in bacterial loads in lungs and spleen upon semapimod treatment despite inhibition of proinflammatory cytokines clearly indicates bacterial dependence on host-derived exogenous leucine during intracellular growth. Therefore targeting L-leucine uptake can be a novel therapeutic strategy against TB.

      Reviewer #3 (Public review): 

      (1) Agarwal et al identified the small molecule semapimod from a chemical screen of repurposed drugs with specific antimycobacterial activity against a leucine-dependent strain of M. tuberculosis. To better understand the mechanism of action of this repurposed anti-inflammatory drug, the authors used RNA-seq to reveal a leucine-deficient transcriptomic signature from semapimod challenge. The authors then measured a decreased intracellular concentration of leucine after semapimod challenge, suggesting that semapimod disrupts leucine uptake as the primary mechanism of action. Unexpectedly, however, resistant mutants raised against semapimod had a mutation in the polyketide synthase gene ppsB that resulted in loss of PDIM synthesis. The authors believe growth inhibition is a consequence of decreased accumulation of leucine as a result of an impaired cell wall and a disrupted, unknown leucine transporter. This study highlights the importance of branched-chain amino acids for M. tuberculosis survival, and the chemical genetic interactions between semapimod and ppsB indicate that ppsB is a conditionally essential gene in a medium depleted of leucine. 

      The conclusions regarding the leucine and PDIM phenotypes are moderately supported by experimental data. The authors do not provide experimental evidence to support a specific link between leucine uptake and impaired PDIM production. Additional work is needed to support these claims and strengthen this mechanism of action.

      As mentioned above, overall results from this study provide first evidence of perturbation of L-leucine uptake by semapimod treatment of the leucine auxotroph. Our observations that semapimod interacts with PpsB, and semapimod resistant strain accumulates mutation in PpsB resulting in loss of PDIM together support the involvement of cell-wall PDIM in regulation of L-leucine transport in Mtb.

      As hitherto mentioned, it appears that semapimod treatment brings about certain modifications in PDIM which becomes restrictive to L-leucine. Future studies are required to gain detailed mechanistic insights into the effect of semapimod on Mtb physiology.

      (2) Since leucine uptake and PDIM synthesis are important concepts of the manuscript, experiments would benefit from exploring other BCAAs to know if the phenotypes observed are specific to leucine, and adding additional strains to the 2D TLC experiments to provide confidence in the absence of the PDIM band.

      We thank the peer reviewer for this suggestion. We would be happy to analyse the effect of semapimod on the level of other amino acids including BCAA by mass spectrometry.

      (3) The intriguing observation that wild-type H37Rv is resistant to semapimod but the leucine-auxotroph is sensitive should be further explored. If the authors are correct and semapimod does inhibit leucine uptake through a specific transporter or disrupted cell wall (PDIM synthesis), testing semapimod activity against the leucine-auxotroph in various concentrations of BCAAs could highlight the importance of intracellular leucine. H37Rv is still able to synthesize endogenous leucine and is able to circumvent the effect of semapimod.

      We thank the peer reviewer for this suggestion. We would explore the possibility of analysing the effect of increasing concentrations of BCAAs on mc2 6206 susceptibility to semapimod.

      Recommendations for the authors:

      (1A) Intracellular leucine can decrease from:

      inhibition of transport/uptake via semapimod as the authors claim or

      decreased uptake/requirement of many metabolites due to cells entering static growth arrest from challenge by semapimod

      To rule out the growth-inhibitory effect of semapimod on L-leucine uptake, we estimated intracellular L-leucine in Mtb after brief exposure of 24 hours to 50ng/ml semapimod (kindly refer Materials and Methods). We confirmed that 24 hours of treatment with 50ng/ml semapimod does not cause cells entering static growth arrest.

      (1B) increased consumption/utilization of leucine for some programmed response to semapimod challenge

      Our results show reduced expression of genes involved in leucine catabolism such as accD1, bkdA and bkdB in semapimod-treated cells, and thus the above hypothesis seems unlikely.

      (1C) Additional metabolites should be measured to determine the specificity of the semapimod challenge.

      As mentioned below, we measured intracellular valine in the semapimod-treated Mtb 6206 by LC-MS/MS, which shows no change in its level. These observations thus corroborate a specific effect of semapimod on L-leucine level in the cell.

      (2) The effect of Semapimod on L-leucine uptake is largely based on indirect evidence, without showing reduced transport of the amino acid. Gene expression data is not enough to prove that the amino acid transport is blocked. More compelling evidence is required to confirm this mechanism.

      The authors could perform leucine uptake assays to directly confirm the functioning of Semapimod, inhibiting L-leucine transport. Another possibility would be to try out measuring intra-bacterial leucine levels for drug-treated versus untreated M. tuberculosis strains.

      Data presented in the Fig. 3b shows lesser intracellular L-leucine upon semapimod treatment; in contrast, Sem<sup>R</sup> strain exhibits ~3-fold more intracellular L-leucine, as estimated by mass spectrometry (kindly refer our response to comment #6 below). Together, these observations indicate an inhibitory effect of semapimod on L-leucine uptake by the auxotroph.

      (3) The authors show that the overexpression of leuC-leuD restores Semapimod resistance in the auxotroph (Figs. 3C-3E). Is it possible to examine Semapimod resistance of WT-H37Rv or the complemented mutant grown in leucine-limiting conditions? This sort of evidence will be more direct on the specific drug-target beyond the auxotroph (mc<sup>2</sup> 6206).

      Because endogenous L-leucine synthesis pathway is functional in WT-H37Rv, as well as complemented auxotrophic strain, leucine-limiting conditions are unexpected to yield any effect on susceptibility to semapimod.

      Author response image 1.

      (4) Biolayer Interferometry (BLI) shows Semapimod binds to PpsB (Fig. 6); however, there is no clear evidence that it disrupts PDIM synthesis. More direct evidence would be to study the effect of Semapimod on a ppsB mutant (may be a knock-down). This would prove the specificity of Semapimod for PpsB. Likewise, it would be worth looking into the effect of Semapimod using mutant M. tuberculosis defective for PDIM synthesis.

      As recommended by the peer reviewer, we created the ppsB knockdown strain in the Mtb mc2 6206 by CRISPRi and examined its vulnerability to semapimod treatment. As can be seen in the Author response image 1, ppsB KD strain shows lesser susceptibility to semapimod when compared with the pDcas9-control strain which exhibits significant growth inhibition on the 7H11-OADS-PL agar plate containing 200nM semapimod.

      (5) Metabolomics experiments would benefit from including other control BCAAs like isoleucine and valine to determine if decreased intracellular levels of leucine are specific to semapimod or a general consequence of growth arrest from an antimicrobial agent.

      As suggested by the reviewer, we measured intracellular valine as well as proline levels in the semapimod-treated Mtb 6206 by LC-MS/MS; data presented in the supplimentry figure 5 clearly show no change in their levels upon semapimod treatment.

      (5) Figure 3c, pyrazinamide susceptibility assay could be included on the panCD strain to ensure complementation leads to functional panCD. Parent strain would be resistant to PZA, complement strain would be susceptible. (doi: 10.1038/s41467-019-14238-3).

      The wild-type Mtb 6206 is unable to grow in the absence of pantothenate. We verified resumption of growth of Mtb 6206 in 7H9-OADS-L-leucine medium lacking pantothenate upon PanCD overexpression, which provides more direct evidence of the expression of functional copies of panCD genes.

      (6) does the Sem-R mutant have increased levels of leucine?

      As can be seen in the supplimentry figure 7, Sem<sup>R</sup> strain shows ~3.0 fold increase in the intracellular L-leucine level when compared with the WT strain. In contrast, a comparable level of another BCAA– valine, is observed in both the strains

    1. eLife Assessment

      This study presents valuable findings on the differential effects of RNA on the phase separation, aggregation dynamics, and bioactivity of PSMα3 and LL-37. The authors provide solid evidence from complementary biophysical and cell-based experiments that RNA influences peptide assembly and associated in vitro activities. The study is of interest for understanding interactions between amyloidogenic peptides and nucleic acids, although the physiological significance and some aspects of the mechanistic interpretation would benefit from further clarification.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Rayan et al. aims to elucidate the role of RNA as a context-dependent modulator of liquid-liquid phase separation (LLPS), aggregation, and bioactivity of the amyloidogenic peptides PSMα3 and LL-37, motivated by their structural and functional similarities.

      Strengths:

      The authors combine extensive biophysical characterization with cell-based assays to investigate how RNA differentially regulates peptide aggregation states and associated cytotoxic and antimicrobial functions.

      Weaknesses:

      While the study addresses an interesting and timely question with potentially broad implications for host-pathogen interactions and amyloid biology, some aspects of the experimental design and data analysis require further clarification and strengthening.

    3. Reviewer #2 (Public review):

      In this paper, Rayan et al. report that RNA influences cytotoxic activity of the staphylococcal secreted peptide cytolysin PSMalpha3 versus human cells and E. coli by impacting its aggregation. The authors used sophisticated methods of structural analysis and describe the associated liquid-liquid phase separation. They also compare to the influence of RNA on aggregation and activity of LL-37, which shows differences to that on PSMalpha3.

      That RNA impacts PSM cytotoxicity when co-incubated in vitro becomes clear. However, I have two major problems with this study:

      (1) The premise, as stated in the introduction and elsewhere, that PSMalpha3 amyloids are biologically functional, is highly debatable and has never been conclusively substantiated. The property that matters most for the present study, cytotoxicity, is generally attributed to PSM monomers, not amyloids. The likely erroneous notion that PSM amyloids are the predominant cytotoxic form is derived from an earlier study by the authors that has described a specific amyloid structure of aggregated PSMalpha3. Other authors have later produced evidence that, quite unsurprisingly, indicated that aggregation into amyloids decreases, rather than increases, PSM cytotoxicity. Unfortunately, yet other groups have in the meantime published in-vitro studies on "functional amyloids" by PSMs without critically challenging the concept of PSM amyloid "functionality". Of note, the authors' own data in the present study that show strongly decreased cytotoxicity of PSMalpha3 after prolonged incubation are in agreement with monomer-associated cytotoxicity as they can be easily explained by the removal of biologically active monomers from the solution.

      In their revision and in the rebuttal, the authors have further described their concept regarding what they call "functionality" of PSMalpha3 amyloids. They now admit that monomers are the active cytolytic form, like other researchers have stressed, whereas amyloids are not. This represents a considerable difference to earlier papers in which they ascribed functionality, i.e. cytolytic capacity, to PSMalpha3 amyloids, a claim that has raised considerable controversy. Now, they use the term "functional " to describe that PSMalpha3 amyloids, while not cytolytic, can be reversed to a cytolytic monomeric state, calling them a "dynamic reservoir". There is no evidence that such a reservoir is necessary for the cytolytic activity of the monomers to be established; also, there is no evidence that in a biological system, such an amyloid reservoir exists. To continue calling PSMalpha3 amyloids "functional" based on this - considerably changed - concept of the authors appears inappropriate, given the finally admitted absence of cytolytic activity of the PSM amyloids in addition to the continuing complete lack of evidence of any biological relevance of PSM amyloid formation.

      (2) That RNA may interfere with PSM aggregation and influence activity is not very surprising, given that PSM attachment to nucleic acids - while not studied in as much detail as here - has been described. Importantly, it does not become clear whether this effect has biologically significant consequences beyond influencing, again not surprisingly, cytotoxicity in vitro. The authors do show in nice microscopic analyses that labeled PSMalpha3 attaches to nuclei when incubated with HeLa cells. However, given that the cells are killed rapidly by membrane perturbation by the applied PSM concentrations, it remains unclear and untested whether the attachment to nucleic acids in dying cells makes any contribution to PSM-induced cell death or has any other biological significance.

      Overall, the findings can be explained in a much more straightforward way with the common concept of cytotoxicity being due to monomeric PSMs, and the impact of nucleic acids on cytotoxicity being due to lowering of the concentration of that active form by RNA attachment. Further limiting the significance of the findings, whether this interaction has any biological significance on the physiology or infectivity of the PSM producer remains largely unexplored.

      Further remarks:

      • Circumstantial evidence based on the "amyloid inhibitor", EGCG: The results with EGCG, which has been shown to have a moderate amyloid-reducing effect on PSMalpha 1 and PSMalpha4, should not be taken as evidence for amyloid-based cytotoxicity. While increased concentrations of EGCG reduced the cytotoxic effect of PSMalpha3, it is not convincingly shown that this is due to a lower concentration of amyloid vs. monomeric PSM.

      • It is appreciated that the authors refrain from presenting the unsubstantiated concept of "functional" PSM amyloids in the discussion. However, wording in that direction must also be removed from other parts of the manuscript (e.g. "bioactive fibrillar polymorphs". "The formation of cross-alpha amyloids has been correlated with toxic activity", etc.), generally refraining from uncritically implying that amyloid formation underlies PSM biological activity, and rather discussing that the much more likely explanation of the findings is a lowering of cytolytically active, monomeric PSM concentration.

      • Discussion: "PSM alpha3 interaction with nucleic acids within human cells ...supports a comparable mechanism...". Delete. Unsubstantiated.

      • The authors should cite papers that have argued against their hypothesis and not only their own manuscripts.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Rayan et al. aims to elucidate the role of RNA as a context-dependent modulator of liquid-liquid phase separation (LLPS), aggregation, and bioactivity of the amyloidogenic peptides PSMα3 and LL-37, motivated by their structural and functional similarities.

      Strengths:

      The authors combine extensive biophysical characterization with cell-based assays to investigate how RNA differentially regulates peptide aggregation states and associated cytotoxic and antimicrobial functions.

      Weaknesses:

      While the study addresses an interesting and timely question with potentially broad implications for host-pathogen interactions and amyloid biology, several aspects of the experimental design and data analysis require further clarification and strengthening.

      Major Comments:

      (1) In Figure 1A, the author showed "stronger binding affinity" based on shifts at lower peptide concentrations, but no quantitative binding parameters (e.g., apparent Kd, fraction bound, or densitometric analysis) are presented. This claim would be better supported by including: (i) A binding curve with quantification of free vs bound RNA band intensities (ii) Replicates and error estimates (mean {plus minus} SD).

      We thank the reviewer for this suggestion. To quantitatively support the binding differences observed in Figure 1A, we have now performed densitometric analysis of the EMSA data and included the results in Figure S1. The analysis showed that the Kd for PSMα3 binding to polyAU and polyA RNA is in the same order of magnitude but lower for the polyAU, indicating a stronger binding. A description was added to the results in lines 137-145 of the revised version.

      (2) The authors report droplet formation at low RNA (50 ng/µL) but protein aggregation at high RNA (400 ng/µL) through fluorescence microscopy. However, no intermediate RNA concentrations (e.g., 100-300 ng/µL) are tested or discussed, leaving a critical gap in understanding the full phase diagram and transition mechanisms.

      Our initial choice of 50 ng/µL (low RNA) and 400 ng/µL (high RNA) was guided by a broader RNA titration performed by turbidity measurements across 0, 10, 20, 50, 100, 200, and 400 ng/µL (Figure S2 in the revised version). In this screen, turbidity increased up to 50 ng/µL and then decreased dose-dependently from 100–400 ng/µL. We interpret this non-monotonic behavior as consistent with a transition from a droplet rich regime (maximal light scattering at intermediate dense-phase volume) toward conditions where assemblies become larger and/or more compact and sediment out of the optical path. This is described in lines 158-161 of the revised version.

      Of note, additional intermediate RNA conditions (100 and 200 ng/µL) are included in Figure S14 (of the revised version). While these experiments were performed under the heat-shock perturbation, they nevertheless support the central point that RNA tunes assembly state across intermediate concentrations rather than producing a binary low/high outcome.

      Importantly, we agree with the reviewer that a full phase diagram would be the most rigorous way to define the transition mechanism. However, establishing csat and constructing a complete phase diagram would require systematic measurements of dilute-phase concentrations (e.g., centrifugation/quantification or fluorescence calibration), controlled ionic strength titrations, and time-resolved mapping, which is beyond the scope of the present study. We have therefore revised the text to avoid implying that we provide a complete phase diagram. Instead, we frame our results as a qualitative with multi-assay characterization showing that RNA concentration drives a shift from liquid-like condensates (at low RNA) toward solid-like assemblies (at high RNA), with an intermediate regime suggested by the turbidity transition and supported by additional imaging under stress. Finally, to address the “critical gap” concern directly, we add a sentence (lines 239-241) stating that: “Future work will be required to quantitatively define the phase boundaries and delineate the dominant mechanisms, such as sedimentation, dissolution, or coarsening/aging, across intermediate RNA concentrations”.

      (3) Additionally, the behaviour of PSMα3 in the absence of RNA under LLPS conditions is not shown. Without protein-only data, it is difficult to assess if droplets are RNA-induced or if protein has a weak baseline LLPS that RNA tunes. The saturation concentration (csat) for PSMα3 phase separation, either in the absence or presence of RNA, should be reported.

      In response to the reviewer’s request, we have added Figure 2F, which shows PSMα3 alone in the absence of RNA under the same conditions. PSMα3 does not form droplets in this condition, indicating that condensate formation is RNA-dependent in the tested conditions. This is referred to in the text in lines 190-193 of the revised version. Please see our response about determining the csat in the response to the previous comment.

      (4) For a convincing LLPS claim, it is important to show: Quantitative FRAP curves (mobile fraction and half-time of recovery) rather than only microscopy images and qualitative statements.

      We have included quantitative FRAP analysis in Figure S4 of the revised version, showing normalized recovery curves along with extracted mobile fractions and half-times of recovery (t₁/₂). These quantitative measurements support the dynamic nature of the PSMα3–RNA. This is referred to in the text in lines 179-184 of the revised version.

      (5) The manuscript highly relies on fluorescence microscopy to show colocalization. However, the colocalization is presented in a qualitative manner only. The manuscript would benefit from the inclusion of quantitative metrics (e.g., Pearson's correlation coefficient, Manders' overlap coefficients, or intensity correlation analysis).

      In response, we have added quantitative colocalization analysis to the revised manuscript. Specifically, we now report Pearson’s correlation coefficients and Manders’ overlap coefficients for the dual-channel fluorescence microscopy datasets in Figure S5 of the revised version. These metrics provide an objective measure of co-distribution and complement the qualitative imaging.

      The analysis supports that at low RNA concentrations (droplet/condensate conditions), PSMα3 and RNA show strong colocalization, consistent with RNA being incorporated within, or closely associated with, the peptide-rich phase. In contrast, at high RNA concentrations, where the assemblies are more solid-like/amyloid-positive, the quantitative coefficients decrease, consistent with reduced overlap and an apparent spatial demixing in which RNA becomes partially excluded from the peptide-rich structures. This is referred to in the text in lines 194-203 of the revised version.

      (6) In Figures 3 B and 3C, the contrast between "no AT630 at 30 min, strong at 2 h" (50 ng/μL) and "strong at 30 min" (400 ng/μL) is compelling, but a simple quantification (e.g., mean fluorescence intensity per area) would greatly increase rigor.

      We have included quantitative analysis of AmyTracker630 fluorescence intensity in Figure S6 of the revised version, reporting the mean fluorescence intensity per area for the indicated conditions and time points. This quantification supports the qualitative differences observed in Figures 3B and 3C. This is now referred to in the text in lines 233-236 of the revised version.

      (7) In Figure S3 ssCD data, if possible, indicate whether the α-helical signal increases with RNA concentration or shows a non-linear dependence, which might link to the LLPS vs solid aggregate regimes.

      The ssCD spectra displayed in Figure S7 in the revised version (corresponding to Figure S3 in the original submission) show that the α-helical signature of PSMα3 is markedly enhanced in the presence of RNA compared to peptide alone, as evidenced by increased signal intensity, deeper minima, and more pronounced spectral features characteristic of α-helical structure. Importantly, this enhancement is more pronounced at 400 ng/µL Poly(AU) RNA than at 50 ng/µL, particularly after 2 hours of coincubation, indicating that RNA concentration influences the stabilization of α-helical assemblies. This is now more specifically detailed in the text in lines 258-263 of the revised version.

      We note that solid-state CD does not allow direct quantitative deconvolution of secondary structure content (e.g., % helix) in the same manner as solution CD, due to sample anisotropy, scattering, and orientation effects inherent to dried or aggregated films. Consequently, our interpretation is qualitative rather than strictly quantitative. The ssCD data therefore suggest a non-linear dependence on RNA concentration, rather than a simple linear dose–response. This is also expected considering that phase transition, suggested by the other findings, is intrinsically non-linear.

      (8) In Figure 5B, FRAP recovery in dying cells may reflect artifactual mobility rather than biological relevance. Additionally, the absence of quantification data limits interpretation; providing recovery curves would clarify relevance.”

      We added quantitative FRAP analysis of the effect on PSMα3 within HeLa cells, shown in Figure S8 of the revised version. Compared to PSMα3 assemblies in vitro, nucleolar PSMα3 exhibits slower fluorescence recovery and a reduced mobile fraction. The nucleolus represents a highly crowded, RNA-rich cellular environment, which is expected to impose additional constraints on molecular mobility and likely contributes to the slower recovery kinetics observed in cells. This is now more specifically detailed in the text in lines 324-333 and discussed in lines 597-607 of the revised version.

      (9) The narrative conflates cytotoxicity endpoints (membrane damage, PI staining, aggregates) with localization data (nucleolar foci), creating ambiguity about whether nucleolar targeting drives toxicity or is a consequence of cell death. Separating toxicity assessment from localization analysis, or clearly demonstrating that nucleolar accumulation precedes cytotoxicity, would resolve this ambiguity.

      We thank the reviewer for raising this important point. We agree that, in the current dataset, cytotoxicity readouts (membrane damage, PI staining, aggregate formation) and subcellular localization (nucleolar accumulation) are observed in close temporal proximity, which limits our ability to unambiguously assign causality. In the experiments presented here, PSMα3 was applied at concentrations known to induce rapid membrane disruption and cytotoxicity in HeLa cells. Under these conditions, PSMα3 accumulates on cellular membranes and penetrates into the cell and nucleus on very short timescales (seconds to minutes), likely preceding the temporal resolution accessible by standard live-cell fluorescence microscopy. As a result, nucleolar accumulation and cytotoxic endpoints are detected essentially concurrently, precluding a definitive determination of whether nucleolar association actively drives toxicity or occurs as a downstream consequence of membrane permeabilization and cell damage.

      We therefore emphasize that, in this study, nucleolar localization is presented as a phenomenological observation consistent with RNA-rich compartment association, rather than as a demonstrated causal mechanism of cytotoxicity. We have revised the Discussion (lines 597-607) to clarify this distinction and to avoid implying that nucleolar targeting is the primary driver of cell death.

      We agree that resolving this ambiguity would require systematic time-resolved and concentration-dependent experiments, including analysis at sub-toxic PSMα3 concentrations below the membrane-disruptive threshold, combined with orthogonal imaging approaches. Such experiments are planned for future work but are beyond the scope of the present study.

      (10) In Figure 8, to strengthen the LLPS assignment for LL-37, additional evidence, such as FRAP analysis or observation of droplet fusion events, would be valuable. This is particularly relevant given that the heat shock conditions (65 °C for 15 minutes) could potentially induce partial denaturation or nonspecific coacervation.

      In response to this comment, we have added FRAP analysis of LL-37 assemblies in the revised manuscript (Figure S12), including representative images and corresponding fluorescence recovery curves. The FRAP measurements show minimal fluorescence recovery over the acquisition window, indicating that the LL-37–RNA assemblies formed under these conditions are largely immobile and solid-like, rather than liquid-like droplets. This is now referred to in the text in lines 458-462 of the revised version.

      Reviewer #2 (Public review):

      In this paper, Rayan et al. report that RNA influences cytotoxic activity of the staphylococcal secreted peptide cytolysin PSMalpha3 versus human cells and E. coli by impacting its aggregation. The authors used sophisticated methods of structural analysis and described the associated liquid-liquid phase separation. They also compare the influence of RNA on the aggregation and activity of LL-37, which shows differences from that on PSMalpha3. 

      Strengths:

      That RNA impacts PSM cytotoxicity when co-incubated in vitro becomes clear. 

      Weaknesses:

      I have two major and fundamental problems with this study:

      (1) The premise, as stated in the introduction and elsewhere, that PSMalpha3 amyloids are biologically functional, is highly debatable and has never been conclusively substantiated. The property that matters most for the present study, cytotoxicity, is generally attributed to PSM monomers, not amyloids. The likely erroneous notion that PSM amyloids are the predominant cytotoxic form is derived from an earlier study by the authors that has described a specific amyloid structure of aggregated PSMalpha3. Other authors have later produced evidence that, quite unsurprisingly, indicated that aggregation into amyloids decreases, rather than increases, PSM cytotoxicity. Unfortunately, yet other groups have, in the meantime, published in-vitro studies on "functional amyloids" by PSMs without critically challenging the concept of PSM amyloid "functionality". Of note, the authors' own data in the present study, which show strongly decreased cytotoxicity of PSMalpha3 after prolonged incubation, are in agreement with monomer-associated cytotoxicity as they can be easily explained by the removal of biologically active monomers from the solution.

      We thank the reviewer for this important critique and agree that direct cytotoxicity is most plausibly mediated by soluble PSM species, while extensive fibrillation generally reduces toxicity by depleting these forms, a conclusion supported by our data and by other studies (e.g., Zheng et al 2018 and Yao et al 2019). We do not propose mature amyloid fibrils as the primary toxic entities. Rather, we use the term functional amyloid in a regulatory sense, consistent with other biological amyloids whose fibrillar states modulate activity (e.g., hormone storage amyloids or RNA-binding proteins).

      In line with emerging findings, we interpret PSMα3 toxicity as arising from a dynamic assembly process rather than from a single static molecular species. We previously showed that PSMα3 forms cross-α fibrils that are thermodynamically and mechanically less stable than cross-β amyloids and readily disassemble upon heat stress, fully restoring cytotoxic activity (Rayan et al., 2023). This behavior contrasts with PSMα1, which forms highly stable cross-β fibrils that do not recover activity after heat shock, suggesting that the limited thermostability of PSMα3 is an evolved feature enabling reversible switching between inactive (stored) and active states.

      Consistent with this view, both PSMα1 and PSMα3 are cytotoxic in their soluble states, yet mutants unable to fibrillate lose activity, indicating that fibrillation is required but not itself the toxic end state (Tayeb-Fligelman et al., 2017, 2020; Malishev et al., 2018). Our other studies further show that cytotoxicity toward human cells correlates with inherent or lipid-induced α-helical assemblies, rather than with inert β-sheet amyloids (RagonisBachar et al., 2022, 2026; Salinas 2020, Bücker 2022). Together, these findings support a model in which membrane-associated, dynamic α-helical assembly, which requires continuous exchange between soluble species and growing fibrils, drives membrane disruption, potentially through lipid recruitment or extraction, analogous to mechanisms proposed for human amyloids such as islet amyloid polypeptide (Sparr et al., 2004).

      In the present study, we further show that RNA reshapes this dynamic landscape: while PSMα3 alone progressively loses activity upon incubation, co-incubation with RNA preserves cytotoxicity by stabilizing bioactive polymorphs and condensate-like states, whereas high RNA concentrations promote solid aggregation but nevertheless preserve activity. Thus, aggregation is neither inherently functional nor toxic, but context dependent and environmentally regulated. Taken together, our data support a model in which PSMα3 amyloids act as a dynamic reservoir, enabling S. aureus to tune virulence by reversibly shifting between dormant and active states in response to environmental cues such as heat or RNA.

      This is now discussed in lines 56-76 and 523-553 of the revised version.

      (2) That RNA may interfere with PSM aggregation and influence activity is not very surprising, given that PSM attachment to nucleic acids - while not studied in as much detail as here - has been described. Importantly, it does not become clear whether this effect has biologically significant consequences beyond influencing, again not surprisingly, cytotoxicity in vitro. The authors do show in nice microscopic analyses that labeled PSMalpha3 attaches to nuclei when incubated with HeLa cells. However, given that the cells are killed rapidly by membrane perturbation by the applied PSM concentrations, it remains unclear and untested whether the attachment to nucleic acids in dying cells makes any contribution to PSM-induced cell death or has any other biological significance.

      We thank the reviewer for this important point and agree that PSM–nucleic acid interactions are not unexpected and that our data do not support a direct intracellular role for RNA binding in mediating cytotoxicity. Accordingly, we do not propose nucleolar or nuclear association of PSMα3 as a causal mechanism of cell death. At the concentrations used, PSMα3 induces rapid membrane disruption, and nucleic acid association is observed along with membrane attachment, precluding conclusions about intracellular function. This limitation is now explicitly clarified in the revised manuscript. The biological significance of our findings lies instead in extracellular and environmental contexts, where PSMα3 encounters abundant nucleic acids, such as RNA or DNA released from damaged host cells or present in biofilms as now addressed in lines 622631. Our data show that RNA modulates PSMα3 aggregation trajectories, shifting the balance between liquid-like condensates and solid aggregates, and thereby regulates the persistence and timing of cytotoxic activity. In this framework, RNA acts as a context dependent regulator of virulence, rather than as an intracellular cytotoxic cofactor, an aspect which would be studied in depth in future work. This is now addressed in the text in lines 597-607 of the revised version.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Rayan et al. aims to investigate the role of RNA in modulating both virulent amyloid and host-defense peptides, with the objective of understanding their self-assembly mechanisms, morphological features, and aggregation pathways. 

      Strengths:

      The overall content is well-structured with a logical flow of ideas that effectively conveys the research objectives.

      Weaknesses:

      (1) Figure 2 displays representative FRAP images demonstrating fluorescence recovery within seconds. To gain a more comprehensive understanding of how recovery after photobleaching varies under different conditions, it is recommended to supplement these images with corresponding quantitative fluorescence recovery curves for analysis.

      In response to this comment, we have supplemented the representative FRAP images with quantitative fluorescence recovery curves, reporting normalized recovery kinetics for the indicated conditions. These data are now provided in Figure S4 of the revised manuscript, allowing direct comparison of recovery behavior across conditions (shown by microscopy in Figure 2). In addition, we have included quantitative FRAP analyses for the cellular imaging shown in Figure 5 (presented in Figure S8) and for LL-37 assemblies formed under heat-shock conditions (Figure S12). Together, these additions provide a quantitative framework for interpreting the FRAP results and strengthen the distinction between liquid-like and solid-like assembly states.

      (2) Ostwald ripening typically leads to the shrinkage or even disappearance of smaller droplets, accompanied by the further growth of large droplets. However, the droplet size in Figure 2D decreases significantly after 2 h of incubation. This observation prompts the question, what is the driving force underlying RNA-regulated phase separation and phase transition?”

      We thank the reviewer for this observation. Across multiple samples, we consistently observe a coexistence of small droplets and larger aggregates, rather than systematic growth of larger droplets at the expense of smaller ones or a uniform decrease in droplet size. In addition, the timescales examined do not allow us to reliably assess whether diffusion-driven droplet coalescence is fast enough to draw firm conclusions about droplet size evolution. This is now addressed in the text in lines 181-184 of the revised version.

      A decrease in droplet size over time is nevertheless observed in some instances and is more consistent with a time-dependent conversion of initially liquid-like condensates into more solid-like assemblies, which would reduce molecular mobility and suppress droplet coalescence. In parallel, progressive fibril formation may act as a sink for soluble peptide, leading to partial dissolution or shrinkage of less mature condensates. Together, these observations are consistent with a non-equilibrium aging process, in which RNAregulated assemblies evolve from dynamic condensates toward more solid structures rather than following equilibrium Ostwald ripening.

      (3) The manuscript aims to study the role of RNA in modulating PSMα3 aggregation by using solution-state NMR to obtain residue-specific structural information. The current NMR data, as described in the method and figure captions, were recorded in the absence of RNA. Whether RNA binding induces conformational changes of PSMα3, and how these changes alter the NMR spectra? Also, the sequential NOE walk between neighboring residues can be annotated on the spectrum for clarity.

      The solution-state NMR experiments were performed specifically to characterize the potential binding of EGCG to PSMα3. Due to the strong tendency of PSMα3 to undergo rapid aggregation and line broadening upon RNA addition, solution state NMR spectra in the presence of RNA could not be obtained at sufficient quality for residue-specific analysis. As suggested, we have updated and annotated the sequential NOE walk between neighboring residues on the relevant NOESY spectra to improve clarity.

      (4) The authors claim that LL-37 shares functional, sequence, and structural similarities with PSMα3. However, no droplet formation was observed of LL-37 in the presence of RNA only. The authors then applied thermal stress to induce phase separation of LL-37. What are the main factors contributing to the different phase behaviors exhibited by LL37 and PSMα3? What are the differences in the conformation of amyloid aggregates and the kinetics of aggregation between the condensation-induced aggregation in the presence of RNA and the conventional nucleation-elongation process in the absence of RNA for these two proteins?

      We appreciate this important question and have clarified both the basis of the comparison and the origin of the divergent phase behaviors of LL-37 and PSMα3. While PSMα3 and LL-37 share key properties as short, cationic, amphipathic α-helical peptides that self-assemble and interact with nucleic acids, they differ fundamentally in their assembly architectures. PSMα3 is an amyloidogenic peptide that forms cross-α amyloid fibrils, in which α-helices stack perpendicular to the fibril axis. In contrast, LL-37 can form fibrillar or sheet-like assemblies (observed in cryo grids), but these lack canonical amyloid features without clear cross-α or cross-β amyloid order, as so far observed by crystal structures. This is now clarified in different parts of the text of the revised version. Thus, the comparison between the two peptides is functional and physicochemical rather than implying identical amyloid mechanisms. These structural differences likely underlie their distinct phase behaviors.

      Because LL-37 does not follow a classical amyloid nucleation–elongation pathway, and high-resolution structural information (e.g., cryo-EM) is currently lacking, partly due to its sheet-like, non-twisted morphology (unpublished results), it is not possible to directly compare aggregation kinetics or nucleation mechanisms between LL-37 and PSMα3. It is possible that amyloidogenic systems such as PSMα3 exhibit greater flexibility in prefibrillar and fibrillar polymorphism, enabling RNA-regulated phase behavior, whereas non amyloid assemblies such as LL-37 are more prone to stress-induced solid aggregation. We note that this interpretation is necessarily tentative and does not imply a general rule, but rather reflects differences evident in the present system. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor Comments:

      (1) In the abstract, replacing the word "overriding" with "counteracting" may provide a scientifically neutral tone.

      In the course of revision, the abstract was substantially rewritten to more precisely convey the mechanistic framework and key conclusions of the study. As part of this rewrite, the term "overriding" was removed and the language throughout was revised to adopt a more scientifically neutral tone, consistent with the reviewer's suggestion.

      (2) In abstract, the final sentence is ambitious but heavy. It may benefit from being split into two shorter sentences, for example:

      "These findings establish RNA as a potent, context-dependent modulator of both virulent amyloids and host-defense peptides. They further reveal phase transitions as tunable regulators of peptide activity and potential therapeutic targets across infectious and neurodegenerative diseases."

      As part of the broader abstract revision, the final sentence was restructured and the abstract as a whole was rewritten to improve clarity and readability, in the spirit of the reviewer's recommendation.

      (3) In the Introduction section,

      The phenol-soluble modulins (PSMs) produced by Staphylococci contain amyloid-forming short peptides which play multiple functional roles...", consider "Staphylococcal phenolsoluble modulins (PSMs) are short, amyloidogenic peptides that perform multiple roles central to pathogenesis....

      In accordance with the suggestion, the sentence has been revised.

      (4) To improve narrative flow in the final paragraph of the Introduction, a short bridging sentence could be added, such as:

      "Given these nucleic acid interactions, we next examined whether RNA can drive phase separation or structural reorganization of these amyloidogenic peptides."

      We thank the reviewer for this helpful suggestion. It provided an opportunity to clarify an important distinction between the two peptides studied. While LL-37 can self-assemble into higher-order α-helical structures, it is not amyloidogenic, in contrast to PSMα3. We therefore revised the bridging sentence in the final paragraph of the Introduction to read: “Given their shared cationic, amphipathic α-helical character, but distinct amyloidogenic properties, we sought to examine whether RNA differentially influences the assembly landscapes and bioactivity of PSMα3 and LL-37. “

      (5) The rationale for selecting Poly(A) and Poly(AU) would benefit from further clarification. It would be helpful to specify whether these RNAs are intended to model particular host or bacterial RNA species, such as AU-rich elements, rRNA-like sequences, or mRNA-like contexts.

      Poly(A) and Poly(AU) RNAs were selected as simplified, well-defined model RNAs to probe general peptide–RNA interactions in an unbiased manner, as no prior information was available regarding whether such interactions occur or which specific RNA species might be involved. This rationale is now clarified in the revised text (lines 128–131).

      These RNAs are not intended to represent a single biological transcript, but rather generic RNA features relevant to both host and bacterial contexts, including single-stranded homopolymeric regions and AU-rich elements commonly found in mRNAs and stress srelated RNAs. The use of such reductionist RNA models to study RNA–protein interactions, phase behavior, and RNA-modulated aggregation is well established. We nevertheless agree that RNA sequence and structure may influence peptide assembly and activity, and future studies will address sequence-specific and biologically derived RNAs.

      (6) In Figure 1A, essential EMSA controls- RNA alone, peptide alone, and a nonspecific peptide or PSMα3 should be included to distinguish specific complexes from artifacts, even if presented in the supplementary information. In addition, a competition assay using unlabeled RNA would help confirm binding specificity and rule out predominantly nonspecific electrostatic interactions; these data could also be reported in the supplementary figures.

      An RNA-alone control is already included in Figure 1A of the revised version. The first lane (“0 µM”) shows free Poly(A) or Poly(AU) RNA in the absence of peptide and serves as the negative control against which PSMα3-induced mobility shifts are evaluated. A peptide-alone EMSA cannot be performed, as PSMα3 is highly cationic and does not migrate into the gel in the absence of RNA; moreover, EMSA in this format reports on RNA mobility rather than peptide migration.

      With respect to binding specificity, we compared Poly(A) and Poly(AU) RNAs and observed distinct binding behaviors, which would not be expected for purely nonspecific electrostatic interactions. In addition, the extracted Hill coefficients (>1) are consistent with cooperative binding, further arguing against simple charge-driven association. Finally, the RNA-dependent association of PSMα3 is independently supported by fluorescence microscopy and quantitative colocalization analyses, which corroborate the EMSA results. Together, these orthogonal approaches support the relevance of the observed peptide–RNA interactions.

      (7) In Figure 1B, there is a time mismatch between EMSA (30 minutes) and TEM (2 hours). If aggregation progresses over time, the EMSA pattern at 2 hours may differ. This point could be acknowledged or experimentally addressed, as RNA-peptide assemblies may evolve from liquid-like condensates to more solid aggregates.

      The EMSA and TEM experiments were intentionally performed at different time points to capture distinct stages of the PSMα3–RNA assembly process. The EMSA assay (30 minutes) was designed to probe early RNA–peptide complex formation and binding interactions, before extensive higher-order aggregation occurs. At this stage, we aim to detect mobility shifts reflecting complex formation rather than mature assemblies. In contrast, TEM was performed after 2 hours to visualize later-stage structural outcomes, including fibrillation and morphological reorganization. As aggregation progresses over time, the assemblies evolve from early RNA–peptide complexes into more ordered fibrillar structures, which are best assessed by electron microscopy at later time points. To improve clarity and avoid potential confusion, we have streamlined Figure 1 to focus on the EMSA data, which specifically addresses early binding events. The TEM data were removed from Figure 1 and are now presented in Figure 3, where later-stage structural transitions and fibrillation are shown more comprehensively and in the appropriate mechanistic context.

      (8) In Figure 1B, if feasible, complementing TEM with a confirmatory fibril assay (e.g., ThT kinetics) under the same conditions would strengthen the conclusion that the morphology difference is robust, but it is not mandatory.

      We attempted to perform ThT fibrillation kinetics under the same RNA containing conditions; however, these assays were not informative for this system. PSMα3 aggregates extremely rapidly, producing an immediate and steep increase in ThT fluorescence (Fig. S9 in the revised version), which prevents reliable resolution of RNA dependent differences in aggregation kinetics or lag phases. In addition, Poly(AU) RNA interferes with ThT readout through electrostatic interactions between the negatively charged RNA and the cationic dye, as well as through RNA-induced changes in fibril morphology, both of which complicate quantitative interpretation of fluorescence kinetics. Based on these technical constraints and prior experience with RNA–amyloid systems, ThT kinetics under identical RNA conditions would not provide a robust or interpretable confirmation of the morphological differences observed by TEM.

      (9) In Figure 1B, PSMα3 alone control is missing in TEM images.

      A TEM image of PSMα3 alone is included in Figure 3, where we systematically present fibrillation outcomes across different RNA concentrations alongside the peptide-only control. Figure 1 was streamlined to focus on early RNA– peptide interactions assessed by EMSA, whereas Figure 3 provides a comprehensive TEM analysis of later-stage structural outcomes. This organization was chosen to clearly separate early binding events from subsequent assembly transitions and to avoid redundant presentation of TEM images under similar conditions.

      (10) Although it is experimentally practical to focus on Poly(AU), the justification is very one-sided. The Poly(A) condition, which yields amorphous aggregates, may be equally informative for understanding toxicity, LLPS, or nonfibrillar states and could be discussed more explicitly.

      We agree that Poly(A)-induced amorphous aggregation is informative for understanding non fibrillar assembly states. However, the primary aim of this study was to dissect RNA-dependent regulation of fibrillar assembly and phase behavior, which is most clearly captured using Poly(AU). Poly(A) was therefore included as a comparative condition rather than as a focus for detailed mechanistic analysis. A more systematic comparison of different RNA classes and their effects on non fibrillar states and toxicity is an important direction for future work but is beyond the scope of the present study.

      (11) To improve readability of the manuscript, the main text should follow the order of the figure panels (e.g., A, B, C, D, and E) and numbers (Figure 1, 2...) sequentially, so that readers can easily align with the corresponding images.

      We have revised the manuscript to improve alignment between the main text and the figures, adjusting panel ordering and numbering where appropriate so that the text now follows the figure panels and figure numbers more sequentially. These changes were made to enhance readability while maintaining a logical visual flow within the figures.

      (12) In the result section of Figure 2, the analogy to Ddx4-like systems is a helpful concept, but should be clearly framed as an analogy, not evidence. It would be more accurate to say that the behavior is "conceptually similar to" those systems, while noting that the molecular context is significantly different.

      We have revised the text to explicitly frame the comparison to Ddx4-like systems as a conceptual analogy rather than evidence: lines 158-161 in the revised version.

      (13) In Figure 4, inclusion of positive and negative controls to validate assay performance (e.g., untreated bacteria or HeLa cells, lysis buffer, media alone) would strengthen confidence in the bioactivity measurements.

      We wish to clarify that appropriate positive and negative controls were included in all bioactivity assays and were used to normalize the data presented in Figure 4. For the HeLa cytotoxicity assay (LDH), untreated cells were used to determine spontaneous LDH release (negative control), and cells treated with the manufacturer supplied lysis buffer were used to determine maximum LDH release (positive control). The percent cytotoxicity shown in Figure 4B was calculated relative to these internal controls, as described in the Methods. For the antibacterial assay (PrestoBlue), wells containing E. coli without peptide served as the positive control for 100% viability, while wells containing sterile LB medium alone were used as blanks. Viability values in Figure 4A were normalized to these controls. We have ensured that the Methods section explicitly describes these controls to reinforce confidence in the bioactivity measurements.

      (14) To enhance clarity, consider presenting the RNA concentration and time-dependent effects on PSMα3 bioactivity in a comparison table within the main text or as a supplementary figure.

      We appreciate this suggestion and carefully considered presenting the data in tabular form. However, we found that graphical representation more effectively conveys the trends, transitions, and comparative patterns between conditions. A table would not adequately capture these relationships.

      Reviewer #2 (Recommendations for the authors):

      Further remarks:

      (1) Circumstantial evidence based on the "amyloid inhibitor", EGCG: The results with EGCG, which has been shown to have a moderate amyloid-reducing effect on PSMalpha 1 and PSMalpha4, should not be taken as evidence for amyloid-based cytotoxicity. While increased concentrations of EGCG reduced the cytotoxic effect of PSMalpha3, it is not convincingly shown that this is due to a lower concentration of amyloid vs. monomeric PSM.

      We agree that the effects of EGCG should not be interpreted as evidence for amyloid fibrils being the cytotoxic species. Our data instead support a mechanism in which EGCG primarily targets soluble PSMα3, thereby redirecting its assembly pathway and depleting bioactive species. Specifically, solution-state NMR (Fig. 7) shows that EGCG binds defined residues of monomeric PSMα3, consistent with sequestration of soluble peptide rather than selective inhibition of fibrils. Complementary light and electron microscopy, together with kinetic measurements, indicate that EGCG does not simply stabilize monomers but instead diverts PSMα3 into amorphous, non-functional aggregates, as visualized by TEM (Fig. 6B) and reflected in altered ThT responses (Fig. S9). Importantly, these EGCG-induced aggregates are non-cytotoxic (Fig. 6A/C) and fail to associate with membranes or cells, in contrast to untreated PSMα3, which forms membrane-associated assemblies and induces disruption (newly added Movies S1-S2). Thus, EGCG potentially reduces cytotoxicity by remodeling the aggregation landscape and depleting active soluble species, rather than by selectively inhibiting specific fibril formation. This clarification is now added to the Discussion in lines 554-564 of the revised version.

      (2) It is appreciated that the authors refrain from presenting the unsubstantiated concept of "functional" PSM amyloids in the discussion. However, wording in that direction must also be removed from other parts of the manuscript (e.g. "bioactive fibrillar polymorphs". "The formation of cross-alpha amyloids has been correlated with toxic activity", etc.), generally refraining from uncritically implying that amyloid formation underlies PSM biological activity, and rather discussing that the much more likely explanation of the findings is a lowering of cytolytically active, monomeric PSM concentration.

      As detailed in our response to Major Comment #1, we agree that uncritical language implying that amyloid fibrils themselves are the cytotoxic species should be avoided. Accordingly, we have revised the manuscript to consistently frame amyloid formation in regulatory terms. Aggregation, depending on context, modulates activity by altering the availability, persistence, and assembly pathways of these species. Distinct aggregation states are therefore presented as correlated with, but not equivalent to, cytotoxic activity, and as components of a dynamic assembly landscape rather than as direct toxic entities.

      (3) Discussion: "PSM alpha3 interaction with nucleic acids within human cells ...supports a comparable mechanism...". Please delete this as it is unsubstantiated.

      We agree that the original phrasing overstated the evidence. The sentence was removed and the Discussion was revised to clearly frame nucleolar accumulation as a phenomenological observation reflecting PSMα3's intrinsic nucleic acid–binding capacity, rather than as evidence for a comparable intracellular mechanism. Specifically, the revised Discussion (lines 597–607) states that nucleolar localization is "unlikely to represent a distinct intracellular toxic mechanism" and instead "reflects binding competence within RNA-rich compartments following cellular entry." The biological relevance of this interaction, particularly at sub-cytotoxic concentrations, is noted as an open question requiring further investigation.

      (4) The authors should also cite papers that have argued against their central hypothesis of "functional" PSM amyloids.

      We thank the reviewer for this suggestion. Accordingly, we have revised the manuscript to explicitly cite and discuss studies that argue against amyloid fibrils as the primary cytotoxic species, and that instead attribute PSM cytotoxicity to soluble or membrane-associated forms. These perspectives are now incorporated in the Discussion to provide a balanced view of the field and to clarify how our findings align with, and differ from, existing models of PSM activity.

    1. eLife Assessment

      This important work advances our understanding of the development of the visual system. The data presented is compelling and provides a detailed single-cell atlas of post-natal anterior chamber development in mice, highlighting the trabecular meshwork and Schlemm's canal.

    2. Reviewer #2 (Public review):

      Summary:

      This study presents a detailed single-cell transcriptomic analysis of the post-natal development of mouse anterior chamber tissues. The dataset is robust, consisting of ~130,000 cells collected across seven time points from early post-natal development to adult. Analysis focused on the development of cells that comprise Schlemm's Canal (SC) and trabecular meshwork (TM).

      Comments on revisions:

      My critiques have been adequately addressed.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study presents a comprehensive single-cell atlas of mouse anterior segment development, focusing on the trabecular meshwork and Schlemm's canal. The authors profiled ~130,000 cells across seven postnatal stages, providing detailed and solid characterization of cell types, developmental trajectories, and molecular programs.

      Strengths:

      The manuscript is well-written, with a clear structure and thorough introduction of previous literature, providing a strong context for the study. The characterization of cell types is detailed and robust, supported by both established and novel marker genes as well as experimental validation. The developmental model proposed is intriguing and well supported by the evidence. The study will serve as a valuable reference for researchers investigating anterior segment developmental mechanisms. Additionally, the discussion effectively situates the findings within the broader field, emphasizing their significance and potential impact for developmental biologists studying the visual system.

      Weaknesses:

      The weaknesses of the study are minor and addressable. As the study focuses on the mouse anterior segment, a brief discussion of potential human relevance would strengthen the work by relating the findings to human anterior segment cell types, developmental mechanisms, and possible implications for human eye disease. Data availability is currently limited, which restricts immediate use by the community. Similarly, the analysis code is not yet accessible, limiting the ability to reproduce and validate the computational analyses presented in the study.

      In the revised version we have added an additional paragraph to the discussion section highlighting the human relevance of our work. Additionally, data is public on single cell portal and GEO, accession numbers have been updated. Codes are available on Github (https://github.com/revathi-balasubramanian/Anterior-segment-development-single-cell-data-analysis).

      Reviewer #2 (Public review):

      Summary:

      This study presents a detailed single-cell transcriptomic analysis of the postnatal development of mouse anterior chamber tissues. Analysis focused on the development of cells that comprise Schlemm's Canal (SC) and trabecular meshwork (TM).

      Strengths:

      This developmental atlas represents a valuable resource for the research community. The dataset is robust, consisting of ~130,000 cells collected across seven time points from early post-natal development to adulthood. Analyses reveal developmental dynamics of SC and TM populations and describe the developmental expression patterns of genes associated with glaucoma.

      Weaknesses:

      (1) Throughout the paper, the authors place significant weight on the spatial relationships of UMAP clusters, which can be misleading (See Chari and Patcher, Plos Comb Bio 2023). This is perhaps most evident in the assessment of vascular progenitors (VP) into BEC and SEC types (Figures 4 and 5). In the text, VPs are described as a common progenitor for these types, however, the trajectory analysis in Figure 5 denotes a path of PEC -> BEC -> VP -> SEC. These two findings are incongruous and should be reconciled. The limitations of inferring relationships based on UMAP spatial positions should be noted.

      (2) Figure 2d does not include P60. It is also noted that technical variation resulted in fewer TM3 cells at P21; was this due to challenges in isolation? What is the expected proportion of TM3 cells at this stage?

      (3) In Figures 3a and b it is difficult to discern the morphological changes described in the text. Could features of the image be quantified or annotated to highlight morphological features?

      (4) Given the limited number of markers available to identify SC and TM populations during development, it would be useful to provide a table describing potential new markers identified in this study.

      (5) The paper introduces developmental glaucoma (DG), namely Axenfeld-Rieger syndrome and Peters Anomaly, but the expression analysis (Figure S20) does not annotate which genes are associated with DG.

      (1) We agree that inferring biological relationships from the spatial arrangement of UMAP clusters has limitations and we have qualified our interpretation accordingly in the text. We have also added clarifying language to the trajectory analysis in Figure 5. The intended developmental trajectory is PEC → VP → BEC and SEC; however, the cluster labels in Figure 5 were applied incorrectly. Specifically, VP, BECs cluster was mislabeled as BECs, which led to the confusion. This cluster contains VPs that transition into BECs as well as VPs that are precursors to SECs.

      (2) We recently published the P60 dataset separately (Tolman, Li, Balasubramanian et al., eLife 2025); these data consist of integrated single-nucleus multiome profiles that were subjected to in-depth analysis. Additionally, we found that integrating the P60 dataset with the developmental datasets obscured sub-clustering of mature cell types. In future manuscripts, we will pursue a more detailed analysis of TM development and perform time point–specific clustering, similar to the approach we used for endothelial cells (Figure 4e).

      Comparing proportions of cells at different ages and as the eyes grows needs to be done cautiously. Notwithstanding the limitations, the proportions of TM1, TM2, and TM3 clusters are expected to be similar between P14 and P21 as the proportions at P14 and P60 are similar when comparing to the separately analyzed P60 data. Importantly, our dissection strategy changed with age: from P2 to P14, we removed approximately one-third of the cornea, whereas at P21 and P60 we removed most of the cornea to help maximize representation of limbal cells as the eyes grew. This change in dissection likely contributed to the reduced number of TM3 cells observed at P21. TM3 cells are enriched anteriorly (at-least in adult) and so are located closer to the corneal cut during dissection of the P21 eyes (which despite being larger than younger ages are still small and more delicate to accurately dissect than at P60) and are therefore more likely to be lost. Additional details are provided in the Methods section and the caveats surrounding our dissection method have now been included.

      (3) For Figure 3a and b, we have now pseudo-colored the spaces and provided a quantification of how both TM volume and intratrabecular spaces change with developing age (Figure 3c).

      (4) We have now included a supplemental table of markers for developing and mature TM and SC cell types (Table S3).

      (5) We have highlighted DG genes in rectangular boxes in Figure S20.

    1. eLife Assessment

      This study provides a useful demonstration that, at least for the systems examined, aspects of the entropic contribution to protein-ligand binding can be inferred directly from crystallographic data. In doing so, it strengthens a view of crystal structures as heterogeneous ensembles that are amenable to statistical-mechanical analysis rather than purely static models. The analytical approaches are carefully developed and transparently discussed, with thoughtful consideration of both successful and less effective methods, lending solid support to the central conclusions. However, because the analysis is based on a relatively small and narrowly sampled set of protein-ligand complexes, the generality of these findings remains speculative and will require broader validation.

    2. Reviewer #1 (Public review):

      Summary:

      The authors show that if they generate a weighted multi-conformer ensemble of structural models to fit crystallographic electron density data, the application of statistical mechanical methodologies to that ensemble can provide reasonable estimates of configurational entropy terms related to protein-ligand binding.

      Strengths:

      A fair range of proteins (12) and ligands (70) is included in the study. The analytical methodologies are well described. Both successful and less successful analytical approaches are discussed, and the latter are frequently as insightful as the former.

      Weaknesses:

      Compared to the universe of protein-ligand complexes, this dataset is inevitably very limited, so the generality of the observations made here remains speculative. Though a fair range of proteins is studied, the dynamic range in the binding affinity data is limited. The practical utility of the approach is never really commented on.

    3. Reviewer #2 (Public review):

      The manuscript by Miller and Wankowicz (M&W) develops a crystallographic approach to predict the contribution of protein conformational entropy to the total binding entropy using multi-conformer ensemble models. The approach loosely follows the path developed by Wand using NMR relaxation methods. Their approach is to generate local crystallographic order parameters (analogous to NMR order parameters) to estimate protein conformational entropy and then combine this with statements about water entropy. The static view of the ensemble is perhaps easier to grasp, with respect to entropy, than the NMR-based dynamical view. This approach is potentially ground-breaking and of great importance given the ease, relative to NMR, with which the source data can be obtained. However, the approach has several deficiencies, only some of which are noted by the authors.

      Like the initial Wand approach (Frederick et al Nature, 2007), M&W develop a simple counting relationship between members of the ensemble and a statement about conformational entropy. For reasons that are not clear, M&W utilize "per residue" scaling, which was initially introduced by Wand but later discarded for the more physically meaningful "per torsion angle" scaling. As noted in the Nature 2007 paper, this assumes uncorrelated occupancy. The current Wand approach (Caro et al PNAS, 2017) subsumes correlated occupancy and potentially incomplete sampling of the ensemble into an empirically determined scaling parameter (sd). This is likely a major contributor to the mysterious 1/4 scaling factor that is introduced. It is not clear to me how discrete conformational states are counted from the qFit models. Using the B-factor, as opposed to a thermal factor, to account for motion in a rotamer well seems suspect. With some irony, M&W only look at chi-1 rotamers in distinct contrast to the NMR approach, which looks at the end of the side chain, which captures the entire disorder. On the other hand, the crystallographic approach "sees" all side chains, whereas the NMR approach, as currently rendered, looks only at methyl-bearing side chains and requires coupling to neighbors to report on all side chains (see Kasinath JACS 2013 and Wand & Sharp ARB 2018).

      Nevertheless, as noted by Nature 2007, the fact that a linear relationship is seen between the apparent conformational entropy and total binding entropy suggests that the former is a major component of the latter. It also reinforces the idea that dSrt is constant for higher affinity complexes, i.e., residual rigid-body motion of protein relative to ligand is limited (a conclusion reached in PNAS 2017) but not mentioned. This is an important result.

      The classic hydrophobic effect is potentially a significant component of total binding entropy. Here, the manuscript falls flat by focusing on crystallographically resolved waters. As shown in site-resolved detail (Nucci et al, NSMB 2011 and others), hydration water has a range of residual motion (entropy) that will modulate contributions to water entropy upon displacement from an interface. A very clear example of the potential for large contributions was demonstrated in the wet interface of a barnase-DNA complex (PNAS 2017). The fact that the classic dASA treatment failed, I think, points to problems elsewhere in the approach.

      I note that the range of ligand types explored by M&W is quite limited as compared to PNAS 2017, making generalization somewhat difficult (see Wand Cur. Opin. Struct. Biol, 2013 for why this is important). Finally, it is disappointing that the authors chose not to examine systems common to PNAS 2017, making direct comparison to the NMR method impossible.

      In summary, this manuscript sets the field in a new direction. It is a first serious look at conformational entropy using crystallographic approaches. If fully validated, this approach would permit an explosion of insight since the crystallography is now straightforward, very fast and capable of approaching larger systems, relative to the NMR approach. However, there are missing quantitative elements represented by a formal relationship that is fitted by the data. I do not think this is a fatal flaw for this manuscript, however. If the supplementary material is improved for clarity and completeness (e.g, include tables of thermodynamic data; conformer analysis; B-factors) such that all figures could be independently reproduced and therefore analyzed in different ways, and the comments made above are addressed, if not resolved, then I think this manuscript could become a keystone for this new direction.

    1. eLife Assessment

      This study provides valuable insights into how cells maintain sphingolipid homeostasis through transcriptional control and regulated protein degradation in response to changes in sphingolipid levels. The evidence supporting the conclusions is convincing overall, with solid genetic and biochemical approaches, while some mechanistic aspects remain to be clarified. This work will be of interest to researchers studying lipid metabolism and membrane biology.

    2. Reviewer #1 (Public review):

      Matsumoto et al. identify Com2, a C2H2-type zinc finger transcription factor not previously linked to sphingolipid metabolism, as a regulator of this pathway in budding yeast. They show that depletion of sphingolipids by myriocin, an inhibitor of serine palmitoyl transferase, increases Com2 expression. This, in turn, promotes the expression of the protein kinase Ypk1 and enhances TORC2-dependent phosphorylation of Ypk1. The authors identify a Com2-binding site in the YPK1 promoter and provide evidence that Com2 functions upstream of Ypk1 to regulate its<br /> expression. They further report that Com2 abundance is controlled by the ubiquitin-proteasome system: degradation of Com2 is inhibited by myriocin treatment and enhanced by phytosphingosine. Mutational analyses of putative phosphorylation and ubiquitination sites support a role for these modifications in regulating Com2 stability. Based on these findings, the authors propose that Com2 acts as a transcriptional regulator of sphingolipid metabolism that responds to sphingolipid levels and promotes Ypk1 expression.

      Strengths:

      This study provides a valuable finding on the regulation of sphingolipid synthesis by the transcription factor Com2 in budding yeast. The evidence supporting the authors' claims is solid, although additional evidence clarifying the mechanisms and biological significance of ubiquitin-proteasome-mediated degradation of Com2 would strengthen the work. This work will be of interest to microbiologists studying budding yeast.

      Weaknesses:

      The biological significance of Com2 degradation is not sufficiently clear, which represents an important limitation of the study. It would also be important to determine whether Com2 is actively degraded under normal growth conditions, such as during logarithmic growth in the absence of drug treatment.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Matsumoto and co-workers use budding yeast as a model organism to identify and characterize transcriptional mechanisms that homeostatically regulate sphingolipid metabolism. Through a genetic suppressor screen and a series of genetic, molecular, and biochemical analyses, they identify the transcription factor Com2 as a key regulator that responds to sphingolipid levels and regulates the expression of genes such as YPK1, which in turn controls the activity of several enzymes in the yeast sphingolipid biosynthetic pathway.

      Com2 itself is further regulated by the ubiquitin proteasome system in response to sphingolipid levels. High sphingolipid levels promote proteasomal degradation of Com2, whereas low sphingolipid levels stabilize Com2. These findings suggest that Com2 is a central component of a feedback system that helps maintain sphingolipid homeostasis.

      Strengths:

      The identification of Com2 as an upstream regulator of the TORC2-Ypk1 pathway is supported by multiple orthogonal lines of evidence. The authors also provide mechanistic insight into how Com2 protein levels are dynamically controlled through phosphorylation and ubiquitin-mediated degradation. Stabilization of Com2 in response to sphingolipid depletion appears to be required for the transcriptional upregulation of YPK1 expression.

      Weaknesses:

      Although several important questions remain unresolved, such as which kinases function upstream of Com2 and which ubiquitin ligase(s) target Com2, this work is nevertheless likely to have a meaningful impact on the field of sphingolipid metabolism. The identification of a regulated transcription factor that responds to sphingolipid levels may also be of broader interest to researchers studying membrane homeostasis.

    4. Reviewer #3 (Public review):

      This paper extends the authors' 2022 studies of how the synthesis of membrane sphingolipids is regulated in budding yeast. Here, they hypothesized that overexpression of a protein involved in sphingolipid (SL) biosynthesis would confer resistance of lip1-1 cells, which are Dox-inducibly defective in expression of a ceramide synthase regulatory subunit, to myriocin (Myr), a serine palmitoyltransferase inhibitor that inhibits SL synthesis. To test this idea, they transformed lip1-1 cells with a multi-copy genomic library, selecting for Myr resistance. Apart from LIP1 itself and YPK1, a protein kinase downstream of TORC2, COM2, which encodes the Com2 C2H2-type zinc finger transcription factor, was the most frequent hit in the screen. They went on to show that com2Δ cells exhibited Myr sensitivity, and that Com2 protein expression was induced under conditions that reduced complex sphingolipid synthesis, such as Myr-treatment. Using ypk1-as ypk2Δ cells and the 3-MB-PP1 Ypk1as a selective Ypk1as kinase inhibitor, they showed that Com2 phosphorylation was independent of Ypk1 activity, suggesting that Ypk1 lies downstream of Com2. Consistently, Myr treatment, which reduces SL synthesis, resulted in an increase in both Com2 and Ypk1 proteins. By generating a Ptet-off-GFP-COM2 strain they showed that when Dox was removed to induce GFP-Com2 overexpression, Myr resistance was increased. They went on to show that Com2 binds to a Com2 response element in the YPK1 promoter and drives expression of Ypk1. This was confirmed by showing that expression of a YPK1-driven lacZ reporter gene was also elevated when GFP-Com2 overexpression was induced. CRISPR deletion of the putative Com2-binding site (CBS) from the endogenous YPK1 promoter was used to generate PYPK1-ΔCBS cells, which showed a significant reduction in Ypk1 expression and exhibited intermediate Myr sensitivity, suggesting that Com2 is important for but not the only regulator of Ypk1 expression. Analysis of SL levels showed that they largely paralleled the levels of Ypk1 protein and active pT662 Ypk1. Using deletion analysis of the COM2 gene, they showed that residues 2-190 and the C-terminal DNA binding domain of Com2 were essential for Com2 function in the SL synthesis pathway. Deletion of {greater than or equal to}40 amino acids from the N-terminus increased expression of Com2 protein irrespective of Myr treatment, suggesting that Com2 protein levels are regulated by protein stability. Consistently, they found the high level of Com2 protein induced by Myr was rapidly reversed by treatment with phytosphingosine (PHS), a ceramide precursor that bypasses the Myr-blocked step and restores SL synthesis. The reduction in Com2 protein plus PHS was prevented by MG132 proteasome inhibitor treatment and led to the accumulation of polyUb-Com2 species, consistent with Com2 being negatively regulated by SL-induced UPS-mediated degradation. Based on the use of selective inhibitors of different steps in SL synthesis, they showed that SL biosynthesis up to the level of MIPC (mannnosyldiinositol phosphorylceramide) is required for the SL-mediated degradation response. Based on individual and combined K to R mutagenesis of the three Lys in Com2 1-49, they showed that K23, K35 and K51 in combination are needed for PHS-induced Com2 degradation, and therefore are likely to be the main Com2 Ub sites. Finally, they observed that PHS induced an increase in K3R Com2 phosphorylation, finding that an S/T10A mutant was only weakly phosphorylated and was resistant to PHS-induced degradation, suggesting that phosphorylation of Com2 is required for PHS-dependent degradation.

      The paper is clearly written, and the data in Figures 1-6 show convincingly that the Com2 zinc finger protein, by inducing the expression of a set of genes, including YPK1 and LCB1, plays an important role in sphingolipid (SL) homeostasis in yeast under conditions when sphingolipid levels are low. However, the data in Figures 7 and 8, where the authors provide evidence that the Com2 protein was rapidly degraded in a proteasome-dependent manner in response to phytosphingosine (PHS) treatment, dependent on the N-terminal 40 residues of Com2 and a combination of three Lys residues in this region, are intriguing but incomplete. There are a number of issues, including the identity of the Com2 ubiquitylation sites. They showed that the K23/35/51R Com2 mutant was stabilized, but did they provide direct evidence that these three Lys are in fact ubiquitylated (e.g. GG-K peptide enrichment based MS analysis of Ub-Com2 from PHS-treated, MG132-treated cells). They showed that PHS treatment increased Myc13-tagged Com2 ubiquitylation in the presence of MG132, but did not show that the K3R Com2 mutant (or the S/T10A phosphorylation site Com2 mutant) failed to be ubiquitylated. They also found that the WT Com2 and particularly the K3R Com2 mutant protein exhibited hyperphosphorylation in response to PHS treatment, and that mutation of 10 potential pSer sites to Ala abolished this effect, and stabilized the Com2 protein. However, it is unclear whether the K3R mutation led to increased Com2 hyperphosphorylation per se following PHS treatment, or whether this is because there is more K3R protein, as they suggest might be the case. It is also not clear what protein kinase is responsible or how it might be activated when SL levels are high. In addition, the E3 Ub ligase needed for Com2 degradation was not identified, and it is not clear whether Com2 phosphorylation is directly involved in its recognition by a phosphodependent E3 Ub ligase, as they propose in the model shown in Figure 9. Finally, and perhaps most importantly. It is unclear how elevated levels of phytosphingosine or any sphingolipid are sensed by the Com2 pathway in order to switch on the degradation response as a negative feedback event. The model depicted in Figure 9 exposes all of these unknowns. The paper would be significantly strengthened by additional experiments defining how complex SL levels are sensed, how Com2 is phosphorylated in response to SL sensor signals, and how (phospho)Com1 is recognized for ubiquitylation and degradation.

      In summary, the finding that the Com2 zinc finger transcription factor is an upstream regulator of the sphingolipid biosynthesis pathway in budding yeast, acting as part of an SL sensor system to maintain sphingolipid homeostasis, is new and potentially important. However, more mechanistic work needs to be done to address the unanswered questions raised by the data in Figures 7 and 8.

    1. eLife Assessment

      This study presents important findings on the molecular mechanisms governing how the natural killer cell receptor KIR2DL4 interacts with HLA-G and undergoes internalization. The authors provide solid evidence for an allosteric disulfide-bond switch that regulates receptor activity, using a multifaceted approach that includes mutagenesis, mass spectrometry, and imaging. The work would be further strengthened by validating these mechanisms in primary immune cells and providing direct structural evidence for the proposed ligand-binding interface.

    2. Reviewer #1 (Public review):

      Summary:

      This paper asks how the NK cell receptor KIR2DL4 binds HLA-G and undergoes endocytosis. The authors propose that an allosteric disulfide-bond switch controls whether the receptor is in a ligand-binding or non-binding state, and they support this model using mutagenesis, imaging, mass spectrometry, and structural prediction.

      Strengths:

      A major strength is the use of diverse, complementary approaches to validate the central claim. The authors combined unbiased random mutagenesis to identify key residues, confocal microscopy to track cellular localization , and mass spectrometry to quantify the redox states of specific disulfide bonds. These methods consistently support a single model: an allosteric disulfide switch. The transition between a Cys10-Cys28 bond and a Cys28-Cys74 bond serves as a functional switch that controls whether the receptor resides at the plasma membrane to bind ligand or remains inactive in endosomes.

      Weaknesses:

      The core model is interesting, but some of the strongest mechanistic claims still rely heavily on structure prediction rather than direct structural evidence, especially the proposed HLA-G contact surface in Figure 6.

      The paper supports an effect of the disulfide state on trafficking and uptake, but the case for direct KIR2DL4-HLA-G binding still feels somewhat indirect. The manuscript itself notes that direct binding had not been previously shown, and the current explanation partly depends on inference about which disulfide state is present.

      Most of the main experiments are done in transfected 293T cells, so it is still not fully clear how strongly this mechanism carries over to the more relevant NK-cell setting discussed in the paper.

      The cellular evidence for the PDI story is not specific, since it depends a lot on inhibitor and blocking experiments that could affect the broader extracellular redox environment.

    3. Reviewer #2 (Public review):

      Summary:

      Rajagopalan et al show how extracellular domain features regulate KIR2DL4 internalization. The trafficking phenotypes of cysteine mutants are logically organized, and well-summarized in a Table. The disulfide mapping and differential alkylation strategy are appropriate and provide strong support for alternative disulfide configurations in D0. The higher accessibility or more selective reduction of Cys10-Cys28 as compared to Cys28-Cys74 by PDI is a key mechanistic anchor.

      Strengths:

      The identification of a conformational switch in KIR2DL4 is conceptually novel. Experimental elegance, detailed and well-written.

      Weaknesses:

      Most of the mechanistic work was shown in HEK293. The authors should exhibit relevance using primary NK cells (using primary NK)

    1. eLife Assessment

      This study shows that Znhit1, a regulator of chromatin and of the histone variant H2A.Z, is required for progression through meiotic prophase. It is an important observation that describes the role of epigenetics and gene expression during meiosis. The analysis is based on complementary approaches at the cytological, single-cell, and genomic levels that provide solid evidence for the role of Znhit1 in the control of gene expression and in the loading of H2A.Z in mouse spermatocytes.

    2. Reviewer #1 (Public review):

      Summary:

      Sun et al. generated germline-specific cKO mice for the Znhit1 gene and examined its effect on male meiosis. The authors found that the loss of Znhit1 affects the transcriptional activation of pachytene. Znhit1 is a subunit of the SRCAP chromatin remodeling complex and a depositor of H2AZ, and in cKO spermatocytes, H2AZ is not deposited into the gene region. The authors claim that this is why the PGA was not activated. These findings provide important insights into the mechanisms of transcriptional regulation during the meiotic prophase.

      Strengths:

      The authors used samples from their original mouse model, analyzing both the epigenome and the transcriptome in detail using diverse NGS analyses to gain new insights into PGA. The quality of the results appeared excellent.

      Comments on revisions:

      Sun et al. have responded to each comment with great care and sincerity, and substantial improvements are evident.

      In particular, the addition of scRNA-seq data from P35 samples appears to play an important role in supporting the authors' claims.

      However, there is still room for improvement in the reanalysis of the data and in the Discussion section.

      From the data perspective, for example, the authors state in line 347 of the revised manuscript that "We found that Znhit1-deficient spermatocytes phenocopied abnormal meiotic phenotypes observed in A-MYB mutants." However, the corresponding descriptions in the main text and figure legends are not sufficiently detailed, and therefore do not fully support or substantiate this interpretation. Incorporating a statistical comparison between DEGs in Znhit1-sKO and A-myb KO would likely strengthen this point.

      Regarding the overall structure of the Discussion, the connections among delayed DSB repair, MSCI, and PGA regulation via H2A.Z remain somewhat descriptive and difficult to follow. This may reflect a lack of direct evidence linking these processes; however, a more logically structured and clearly articulated Discussion would improve clarity.

    3. Reviewer #2 (Public review):

      Summary:

      The study demonstrates that Znhit1 regulates male meiosis, with deletion causing pachytene failure associated with defective expression of pachytene genes and subtle effects on X-Y pairing and DSB repair. The authors attribute this phenotype to the defective incorporation of the Znhit1 target H2A.Z into chromatin.

      Strengths:

      The paper and the figures are well presented and the narrative is clear. Evidence that the conditional deletion strategy removes Znhit1 is strong, with multiple orthogonal approaches used. Most of the meiotic phenotyping is well performed, and the omics analysis clearly identifies a dramatic effect on the meiotic gene expression program. The link to H2A.Z and A-MYB adds a mechanistic angle to the study.

      Comments on revisions:

      In the revision, the authors have addressed most of my comments. The only incomplete one is comment 1, where I asked them to define the stage of germ cell arrest by histology. I requested this because the stage of arrest they identified is so unique. They didn't do it, and instead used the scRNAseq to show a depletion at the late pachytene stage onwards. I guess it supports their main findings, but it's a bit disappointing.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Sun et al. generated germline-specific cKO mice for the Znhit1 gene and examined its effect on male meiosis. The authors found that the loss of Znhit1 affects the transcriptional activation of pachytene. Znhit1 is a subunit of the SRCAP chromatin remodeling complex and a depositor of H2AZ, and in cKO spermatocytes, H2AZ is not deposited into the gene region. The authors claim that this is why the PGA was not activated. These findings provide important insights into the mechanisms of transcriptional regulation during the meiotic prophase.

      Strengths:

      The authors used samples from their original mouse model, analyzing both the epigenome and the transcriptome in detail using diverse NGS analyses to gain new insights into PGA. The quality of the results appeared excellent.

      Weaknesses:

      Overall, the data is inconsistent with the authors' claims and does not support their final conclusions. In addition, the sample used may not be the most suitable for the analysis, but a more suitable sample would dramatically improve the overall quality of the paper.

      Thank you for your comprehensive summary of our study and your thoughtful insights into its strengths and weaknesses. We greatly appreciate this valuable feedback, which helps us further improve our work. Below, we provide a detailed response addressing each of the points you raised.

      Reviewer #1 (Recommendations For The Authors):

      Major revisions:

      Surprisingly, many genes were upregulated in the scRNA-seq results. How many XY genes are included? Discuss why many genes are up-regulated in Fig. 5E whereas bulk RNA-seq showed only 70 genes were down-regulated. Since apoptosis-related factors are up-regulated in Fig5E, could these up-regulated genes be due to the high content of the transcriptome of dead cells? As you know, cell death starts, but randomly and violently disrupts the transcriptome, so we think it is not desirable to analyze the transcriptome with dead cells in the mix. Describe this point appropriately in the text or generate new data without dead cells.

      We sincerely appreciate the reviewer’s critical points. Below, we address each point sequentially:

      (1) To address the question about XY-linked genes, we utilized scRNA-seq data to identify differentially expressed sex chromosome genes in spermatocytes at different stages. Our analysis revealed an aberrant activation of XY-linked genes relative to controls. Specifically, 120 XY-linked genes were aberrantly activated in zygotenestage spermatocytes, and 119 XY-linked genes showed aberrant activation in pachytene-stage spermatocytes (revised Fig. 4F). This observation directly indicates that Znhit1 knockout impairs Meiotic Sex Chromosome Inactivation (MSCI), a finding that aligns with our prior characterization of XY chromosome synapsis defects in Znhit1-deficient spermatocytes.

      (2) Two key reasons explain the discrepancy between scRNA-seq and bulk RNA-seq results:

      First, scRNA-seq employs a more permissive threshold for identifying DEGs (log2 fold change [log2FC] = 0.25), thereby enhancing sensitivity to subtle expression changes and enabling the detection of more upregulated genes. In contrast, bulk RNAseq uses a stricter threshold (log2FC = 1), which filters out these subtly upregulated transcripts, resulting in fewer DEGs overall.

      Second, scRNA-seq can capture cell subset-specific differential expression. In contrast, bulk RNA-seq averages signals across mixed cells, masking such subsetspecific expression changes.

      These clarifications have been included in the Data Analysis section of the revised manuscript.

      (3) We fully agree with the reviewer’s concern that dead cells could confound transcriptomic analyses. Before downstream analysis, we excluded non-viable cells via stringent QC: cells with mitochondrial RNA (mtRNA) content exceeding 15% were removed, as high mtRNA content is a well-established marker of cell death or compromised viability. To further validate that upregulated genes were not driven by dead cell contamination, we analyzed the correlation between the expression of apoptosis-related genes and mtRNA fractions in our data. This analysis revealed no significant correlation (Pearson correlation coefficient, r = -0.02; please see Author response image 1). These results collectively rule out dead cell transcriptome contamination as the primary cause of the observed gene upregulation.

      Author response image 1.

      Scatter Chart showing the Pearson correlation between apoptosisrelated genes and mitochondrial RNA fractions in scRNA-seq data.

      Line 280-286: The data in Figures 7I and J are confusing: as shown by KAS-seq, it is natural that ssDNA is not formed in the promoter region in Znhit1-cKO sample because transcription does not proceed, but why is ssDNA formed in the enhancer region in the first place in control and then lost in Znhit1-cKO sample? Generally, it is said that in the enhancer region, including the super-enhancer region, doublestranded DNA is not dissociated, thus not forming ssDNA. Discuss why the loss of ssDNA in the enhancer region affects transcription with appropriate citations. Also, show whether genes downstream of the missing ssDNA in the promoter region have abnormal transcriptional activity, along with the RNA-seq data. Furthermore, in the region shown in Figure 7I, why the chromatin is even more open, as shown by ATACseq in Znhit1-cKO. Discuss whether this is related to transcriptional progression or aberrant substitution with H2A. If the function of ZNHIT1 is to replace H2A with H2AZ for PGA, it is not necessary to show the H2A level in Znhit1-cKO.

      We appreciate the reviewer’s constructive comments.

      (1) ssDNA dynamics in enhancer regions: Emerging evidence demonstrates that active enhancers undergo transient DNA unwinding to form ssDNA, a process critical for transcriptional regulation by transcribing enhancer RNAs (eRNA). KAS‑seq is sufficiently sensitive to detect ssDNA in enhancer regions (Kim et al., 2010; Wu et al., 2020). It has been shown that H2A.Z (deposited by the ZNHIT1-SRCAP complex) is required for maintaining enhancer accessibility and dynamic unwinding (Sporrij et al., 2023). In this study, we found that Znhit1 deletion and defective H2A.Z incorporation impaired enhancer ssDNA formation, indicating that ZNHIT-H2A.Z plays an important role in the activity of both promoter and enhancer.

      (2) Impact of ssDNA loss on transcription: To address how missing ssDNA affects transcriptional activity, we further analyzed changes in KAS‑seq signals following Znhit1 knockout. Overall, KAS‑seq signals were significantly reduced upon Znhit1 depletion, confirming that Znhit1 is essential for ssDNA formation. Further examination of KAS‑seq signals at promoters of downregulated genes also revealed reduced signals (revised manuscript, Fig. S8). In contrast, KAS-seq signals of upregulated genes remained relatively low and showed no changes in both the control and knockout groups, and their upregulation probably results from indirect regulation. These results underscore the importance of ZNHIT1-mediated chromatin states in regulating ssDNA formation and gene expression.

      (3) Aberrant chromatin openness in Znhit1-cKO (ATAC-seq): The increased chromatin accessibility detected by ATAC-seq likely represents a disorganized, nonfunctional state rather than productive transcriptional openness. H2A.Z normally constrains chromatin dynamics to facilitate ordered transcriptional regulation (Cole et al., 2021); its absence in Znhit1-cKO leads to higher ATAC-seq signals, suggesting that this aberrant openness fails to support proper assembly of the transcriptional machinery.

      Minor revisions:

      Line 106. The text says that they looked for chromatin factors, but the legend says that they looked for epigenetic factors. The text must be consistent.

      We have corrected it in the revised manuscript (line 801).

      Line 107. Although it is stated that the transcriptional data published here were used, it appears from the cited references that they are scRNA-seq data. A clear explanation is required in the text or legend.

      We have revised this data as scRNA-seq data (line 107).

      Line 141-143: Using TUNEL analysis in Figure 4F, the authors show that Znhit1cKO testis cells contain many dead cells. Describe the type or stage of the apoptotic cells.

      We appreciate the reviewer’s suggestion. Specifically, we performed TUNEL staining on testes isolated from P14 mice, a critical time point for pachytene development (revised Fig. 2D). We tested this by showing that apoptosis-related genes were significantly upregulated in pachytene-stage spermatocytes in scRNA-seq data (revised Fig. 4D). To further validate this observation, we performed scRNA-seq from P35 testis samples. The results revealed a significant reduction in late pachytene-stage spermatocytes in Znhit1-cKO samples (revised Fig. 2F), consistent with apoptotic loss of pachytene cells. Collectively, these data confirm that Znhit1 knockout impairs pachytene-stage spermatocyte development.

      The authors claimed that the loss of Znhit1 lowers the transcription of a group of genes involved in homologous recombination, including Rnf212, causing a delay in homologous recombination; however, if the process of homologous recombination is delayed, homologous chromosome pairing and synapsis are affected unless DSB repair is completed. Provide a satisfactory explanation for the fact that DNA damage remains on autosomes despite complete synapsis, as shown in Figure 3C, which is likely not solely due to delayed homologous recombination.

      Thank you for this insightful comment. We fully agree that persistent autosomal DNA damage cannot be explained solely by delayed homologous recombination. To resolve this question, we further analyzed autosomal synapsis through SYCP1 and SYCP3 staining. While autosomal synapsis appeared morphologically complete, we identified subtle but significant synapsis defects in autosomal terminal regions (revised Fig. 3A). This suggests that Znhit1 knockout also results in autosomal synapsis defects. We speculate that these synapsis defects are associated with the unresolved autosomal DNA damage we observed.

      Lines 150-163. With regard to XY unpairing in Znhit1-cKO pachytene spermatocytes, there is insufficient discussion as to whether this is due to transcriptional aberrations.

      Thank you for highlighting the need to link transcriptional aberrations to XY unpairing in Znhit1-cKO pachytene spermatocytes. To address this, we analyzed sex chromosome transcription using scRNA-seq data. Relative to controls, 120 XYlinked genes were aberrantly activated at zygotene, and 119 were upregulated at pachytene in Znhit1-cKO spermatocytes (revised Fig. 4F), directly demonstrating Znhit1 knockout disrupts Meiotic Sex Chromosome Inactivation (MSCI). Given that intact MSCI is required to stabilize XY synapsis in pachytene spermatocytes, we conclude that the observed XY unpairing is likely a direct consequence of these sex chromosome transcriptional abnormalities. We add this information to the revised manuscript (lines 221-226).

      Line 187-194. Analysis of the scRNA-seq data is shown in Figure 4, but it lists several genes as stage-specific markers, some of which do not have well-understood meiotic functions. Please cite a reference paper that provides sufficient evidence to qualify this stage.

      In response to this comment, we have refined the presentation of marker genes used for cell annotation (revised Fig. S4B). We have incorporated relevant references supporting their utility as stage-specific markers for the meiotic stages (line 187).

      Line 225-233: If Znhit1 is important for H2AZ deposition and regulates PGA through it, how does it regulate HR-related genes that are expressed earlier through H2AZ deposition during the pachytene stage? For example, Rnf212 is not specifically expressed during the pachytene stage but is one of the targets of MEIOSIN, so it is expressed at an earlier stage.

      Thank you for this insightful comment. We fully acknowledge the reviewer’s key observation that HR-related genes such as Rnf212 are MEIOSIN targets that initiate transcription at earlier meiotic stages, before the pachytene stage. Our stage-resolved scRNA-seq data further showed that the expression of Ccnb1ip1 and Rnf212 was significantly upregulated from zygotene to pachytene, following their initial transcriptional onset. We next showed that the loss of H2A.Z deposition induced by Znhit1 deletion specifically impaired this pachytene-specific secondary transcriptional activation, rather than the early MEIOSIN-driven expression onset (please see Author response image 2).

      Author response image 2.

      Plots showing the expression level of indicated genes in scRNAseq data.

      Line 245-251: As shown in Figure 6E, more than 14,000 genes have H2AZ peaks. In contrast, only approximately 60% of the genes downregulated by Znhit1-cKO appeared to be directly affected by H2AZ. Are the remaining 40% of genes regulated in a different way that is not mediated by H2AZ? Also, only a few percent of the genes with H2AZ peaks are affected, but why are only genes with A-MYB involvement affected, as shown in Figure 7?

      Thank you for these insightful and constructive comments. For the ~40% of downregulated genes not directly linked to H2A.Z, they were likely regulated through indirect mechanisms. H2A.Z deposition mediated by ZNHIT1 may influence upstream transcriptional regulators (e.g., transcription factors or coactivators), whose dysregulation in turn affects these genes.

      The selective effect of H2A.Z loss on A-MYB target genes is explained by the strict context-dependent function of H2A.Z, which requires stage-specific partner transcription factors to exert its regulatory activity. During the zygotene-to-pachytene transition, A-MYB acts as the master regulator of pachytene gene activation and forms a functional collaborative complex with H2A.Z to drive target gene transcription. Disrupted H2A.Z deposition upon Znhit1 deletion specifically impairs the activity of this A-MYB-H2A.Z complex, leading to selective downregulation of A-MYB targets. Other H2A.Z peak-associated genes may rely on alternative cofactors and compensatory mechanisms.

      Line 245-256: Figures 6 and F show that the localization of H2AZ is reduced in Znhit1-cKO mice, which means that no substitution with H2A occurs. If so, show it in the data because the localization of H2A should be increased compared to that in the control.

      To clarify the status of H2A, we have now detected immunofluorescent staining against H2A. While H2A.Z deposition was clearly impaired following Znhit1 deletion, the global level of H2A did not change significantly (Author response image 3). We speculate that this observed absence of a compensatory increase in H2A is likely due to the intrinsically low abundance of the histone variant H2A.Z relative to canonical histone H2A under physiological conditions.

      Author response image 3.

      Immunostaining of SYCP3 and H2A in spermatocyte testis sections of control and Znhit1-sKO mice, Scale bar, 40 μm.

      Reviewer #2 (Public Review):

      Summary:

      The study demonstrates that Znhit1 regulates male meiosis, with deletion causing pachytene failure associated with defective expression of pachytene genes and subtle effects on X-Y pairing and DSB repair. The authors attribute this phenotype to the defective incorporation of the Znhit1 target H2A.Z into chromatin.

      Strengths:

      The paper and the figures are well presented and the narrative is clear. Evidence that the conditional deletion strategy removes Znhit1 is strong, with multiple orthogonal approaches used. Most of the meiotic phenotyping is well performed, and the omics analysis clearly identifies a dramatic effect on the meiotic gene expression program. The link to H2A.Z and A-MYB adds a mechanistic angle to the study.

      Weaknesses:

      (1) Current literature demonstrates that meiotic mutants arrest at one of two stages: midpachytene (stage IV of the seminiferous cycle) or metaphase I (stage XII of the seminiferous cycle). This study documents that in the Znhit1 KO the midpachytene marker H1t appears normally, but that cells arrest before diplotene. If this is true, then arrest must occur during late pachytene, which based on my knowledge has never been documented for a meiotic KO. To resolve this, the authors should present stronger histological substaging evidence to support their claim.

      Thank you for this insightful and constructive comment. To achieve highresolution tracking of cell lineage progression, we performed scRNA-seq analysis using P35 testes in this revised manuscript. scRNA-seq data showed that germ cells normally progressed through all meiotic stages and successfully gave rise to spermatids in control groups. By contrast, in the Znhit1 knockout group, late pachytene spermatocytes decreased significantly, and only very few subsequent germ cell types were observable (revised Fig. 2F, G). In scRNA-seq data, although very few diplotene spermatocytes and meiotic metaphase I cells were detectable, these cells still appeared abnormal, as evidenced by their extremely low Pou5f2 expression. We have revised our description of the meiotic arrest stage in the manuscript.

      (2) The authors overlooked the possible effects of Znhit1 deletion on MSCI. Defective MSCI is a well-established cause of pachytene arrest. Actually, the fact that they see X-Y pairing failure should alert them even more strongly to this possibility because MSCI failure is often associated with defective X-Y pairing. This could be easily addressed by examination of their RNAseq data.

      To address the concern that Znhit1 deletion may impact Meiotic Sex Chromosome Inactivation (MSCI), we analyzed XY-linked gene expression using scRNA-seq data from spermatocytes at distinct stages. Our analysis revealed aberrant activation of XY-linked genes in Znhit1-CKO spermatocytes relative to controls. Specifically, 120 XY-linked genes were activated at zygotene, and 119 XY-linked genes were upregulated at pachytene (revised Fig. 4F). This observation directly demonstrates that Znhit1-CKO impairs MSCI, which aligns with our prior characterization of defective X-Y chromosome synapsis in Znhit1-deficient spermatocytes. To explicitly resolve this concern, we have integrated these MSCIfocused RNA-seq analyses into the revised Results section (lines 221-226).

      (3) The recombination assays need attention.

      In the text the authors state that they studied RPA2 and DMC1, but the figures show RPA2 and RAD51.

      The RPA counts are not quantitated.

      The conclusion that crossover formation fails (based on MLH1 staining) is not justified. This marker does not appear in wt males until late pachytene, so if cells in this mutant are dying before that stage, MLH1 cannot be assessed.

      The authors state that gH2AZ persists in the KO, but I'm not convinced that they are comparing equivalent stages in the wt and KO. In Figure 3C, the pachytene cell is late, whereas in the mutant the pachytene cell is early or mid (when residual gH2AX is expected, even in wt males).

      Previous work (PMID: 23824539) has shown that antibodies reportedly detecting pATM in the sex body are non-specific. I therefore advise caution with the data shown in Figure 3D.

      We appreciate the reviewer’s detailed feedback on our recombination assays and have addressed each concern as follows:

      (1) Discrepancy between text and figures (RPA2/DMC1 vs. RPA2/RAD51): We have corrected this in the revised manuscript.

      (2) Quantitation of RPA2 foci: We have supplemented quantitative analysis of RPA2 foci (revised Fig. S3).

      (3) Conclusion on crossover failure: Single-cell RNA sequencing data from P35 testes definitively confirmed that Znhit1 knockout spermatocytes successfully progressed to the late pachytene stage, ruling out the possibility that our MLH1 staining results are confounded by cell death or arrest before this critical stage. In addition, analysis of transcriptome datasets revealed significant downregulation of important genes required for homologous recombination and crossover formation, including Ccnb1ip1 and Rnf212. Reduced expression of these essential factors may impair the assembly of MLH1 crossover foci. These data demonstrate that ZNHIT1 is essential for proper homologous recombination and crossover formation during male meiosis. We have revised the text to emphasize this context.

      (4) γH2AX persistence and stage matching: We have replaced the images with more representative, stage‑matched pachytene spermatocytes from wild‑type and Znhit1‑KO mice (revised Fig. 2C). Furthermore, prompted by the insightful comment from Reviewer 1, we carefully re‑examined autosomal synapsis and identified abnormal synapsis specifically at the terminal regions of autosomes in Znhit1‑deficient spermatocytes (revised Fig. 3A). These data together confirm that ZNHIT1 is essential for DSB repair during male meiotic prophase I.

      (5) pATM staining issue: Following the reviewer’s advice, we carefully reviewed the relevant literature (PMID: 23824539) and confirmed that the anti‑pATM antibody may exhibit non‑specific staining on the XY chromosomes. Accordingly, we have removed the pATM staining data presented in Figure 3D from the revised manuscript to ensure the accuracy and rigor of our results.

      (4) RNAseq data. The authors show convincingly that Znhit1 activates genes that are normally upregulated at the zyg-pachytene transition. They should repeat the analysis for genes normally upregulated at the prelep- lep and lep-zyg transition to show that this effect is really pachytene-gene specific.

      We appreciate this suggestion. To clarify the stage specificity of ZNHIT1’s regulatory role, we analyzed genes upregulated at the prelep-lep and lepzyg transitions. Our results showed that Znhit1 knockout had little impact on the overall expression levels of these genes (as shown in revised Fig. 4B). In contrast, as we previously reported, genes upregulated at the zygotene-pachytene transition were remarkably downregulated in Znhit1-cKO. These findings further confirm the specificity of ZNHIT1 in regulating pachytene gene expression.

      (5) I am puzzled that the title and overall gist of the study focuses on H2A.Z, when it is Znhit1 that has been deleted.

      We appreciate the reviewer’s observation and have revised the study title as suggested. Specifically, the title is now updated to “ZNHIT1-dependent H2A.Z deposition at meiotic prophase I underlies pachytene gene expression and meiotic progression during male meiosis.”

      Reviewer #3 (Public Review):

      Summary:

      Sun et al. present a manuscript detailing the phenotypic characterization of loss of Znhit1 in male germ cells. Znhit1 is a subunit of the chromatin regulating complex SRCAP that functions to deposit the histone variant H2A.Z. Given that meiosis, and specifically meiotic recombination, occurs in the context of the dynamic condensing of chromosomes, the role of chromatin regulators in general, and histone variants specifically, in mammalian meiosis is an active area of research. Previous work has shown that H2A.Z is found at the locations of recombination in plants, although H2A.Z was previously not found at recombination sites in mammalian meiosis. Here the authors use a conditional approach to ablate Znhit1 in spermatocytes and characterize a block in meiosis in prophase I in the transition from pachytene to diplotene stage.

      Strengths:

      The authors combine current methods in immunohistochemistry and functional genomics to provide strong evidence of meiotic block upon the loss of Znhit1. They find that loss of Znhit1 leads to reduced incorporation of the histone variant H2A.Z, specifically at promoters and enhancers. Further, RNA sequencing found more genes are down-regulated upon loss of Znhit1 compared to upregulated, suggesting that incorporation of H2A.Z is critical for the expression of genes necessary for successful meiotic progression.

      A strength of the manuscript is tying the locations of changes in H2A.Z deposition with binding of the transcription factor A-MYB, providing a mechanism that can potentially combine the changes in chromatin regulation with variable binding of a transcription factor in gene expression in pachytene stage spermatocytes.

      Weaknesses:

      A weakness in the single-cell RNA experiment using cells from 16-day-old male mice. The authors suggest that the rationale for the experiment was to determine where the Znhit1-sKO mutant showed an arrest in meiosis, and claim that this is the pachytene stage. However, in the 'first wave' of meiosis 16-day-old mice are just beginning to enter pachytene, so cells from later meiotic stages will be largely absent in these tubules. This is clear from the UMAP showing a similar pattern of cell distributions between wild-type and mutant mice. Using older mice would have better demonstrated where the mutant and wild-type mice differ in cell-type composition.

      We appreciate the reviewer’s constructive comment. To resolve this issue, we have added new scRNA‑seq data from testes of P35 mice, which harbor a full spectrum of meiotic stages, including late pachytene, diplotene, metaphase I spermatocytes, and post-meiotic spermatids. Compared with wild-type controls, Znhit1-sKO testes exhibited a marked reduction in late pachytene spermatocytes and a near-complete loss of post-pachytene cell types, directly validating the pachytenestage meiotic arrest (revised Fig. 2F, G). All updated analyses have been integrated into the manuscript to strengthen our conclusions.

      The authors use the term pachytene genome activation (PGS) in the manuscript to suggest a novel process by which genes are specifically increased in expression in the pachytene stage of meiotic prophase I, without reference to literature that establishes the term. If the authors are putting forward a new concept defined by this term, it would strengthen the manuscript to describe it further and delineate what the genes are that are activated and discuss potential mechanisms.

      We appreciate the reviewer’s valuable feedback on our use of the term "pachytene genome activation (PGA)".

      To address this, we have revised the text to explicitly frame PGA as a stage-specific transcriptional program observed in our data, defined by the coordinated upregulation of a distinct set of genes during the pachytene stage of meiotic prophase I.

      (1) Definition and Gene Set: Using the scRNA-seq dataset, we formally defined PGA as the transcriptional wave characterized by genes with increased expression in pachytene vs. zygotene spermatocytes (n = 1,560 genes). Functional enrichment analysis shows these genes are primarily involved in DNA repair, cilium organization, and spermatid development (Table S3), consistent with the biological process of germ cell development.

      (2) Relationship to existing literature: While PGA as a term is not widely established, our data align with prior observations of pachytene-specific transcriptional upregulation (Alexander et al., 2023; Ernst et al., 2019; Turner, 2015). Importantly, Alexander et al reveals that in late meiotic stages, starting from pachynema, chromatin has a ~3-fold increase in transcription. We have added these citations to clearly illustrate the relevant advances in the field (lines 68-71).

      (3) Regulation of pachytene-stage gene expression: We further delineate that PGA is regulated by ZNHIT1-dependent H2A.Z deposition. Znhit1 deletion resulted in significant downregulation of 70.1% (1,094 out of 1,560) of these genes. This links PGA to chromatin-based regulation, where ZNHIT1-dependent H2A.Z deposition enables pachytene-specific transcription.

      Generally speaking, the authors present solid evidence for a pachytene block in male germ cell development in mice lacking Znhit1 in spermatocytes. The evidence supporting a change in gene expression during pachytene, that more genes are downregulated in the mutant compared to increased expression, and changes in histone modification dynamics and placement of H2A.Z all support a role in alterations in meiotic gene regulation. However, the support that changes in H2A.Z impacting meiotic recombination (as suggested in the manuscript title) is less supported, rather than a general cell arrest in the pachytene stage leading to cell death. The conclusions around the role of Znhit1 influencing meiotic recombination directly could use further justification or mechanistic hypothesis.

      We acknowledge the reviewer’s comments. Indeed, existing data support the presence of a pachytene block in spermatocytes of Znhit1-deficient mice, along with aberrant pachytene gene expression and impaired H2A.Z deposition.

      In response, we made the following revisions: (1) we adjusted the manuscript title and conclusion to reduce emphasis on a direct H2A.Z-recombination link, and focus instead on ZNHIT1/H2A.Z in pachytene gene regulation and meiotic progression; (2) recombination defects may be indirect consequences of failed pachytene gene regulation, rather than a direct regulatory effect of ZNHIT1 on recombination machinery (lines 314-319).

      Reviewer #3 (Recommendations For The Authors):

      Quality of the images for meiotic spreads - images have low contrast and are tiny. It is difficult to see the SYCP3 results even when the images are magnified on the computer screen.

      We have provided new images with high resolution to ensure a clear visualization of SYCP3 signals.

      Line 165 - indicates the results for DMC1, although the figure suggests the results are for RAD51 foci.

      We have corrected this mistake.

      Line 306 - this manuscript 'confirms' that H2AZ is not found at mammalian recombination sites, a result already in the literature.

      We have corrected this mistake (lines 309-312).

      Reviewing Editor Comments:

      Major points and revisions highlighted by the reviewers:

      (1) Meiotic prophase in Znhit1KO: The main questions to clarify are the stage and status of progression, the analysis of apoptosis, and the consequences of gene expression on the X and Y. Additional analysis for DSB repair foci, gH2AX is also required. Those analysis are needed to answer to reviewer 2. Even if H2AZ was not detected at recombination hotspots, it may be possible that it plays a role in DSB repair but the level is too low for detection. This should be discussed as H2AZ was shown to be involved in DNA repair.

      We sincerely appreciate the reviewing editor’s constructive comments.

      (1) Stage and progression of meiotic prophase: We supplement P35 testes for scRNAseq. Results confirmed Znhit1-KO spermatocytes arrest at late pachytene, and postpachytene stages (diplotene, metaphase I) were nearly absent (revised Fig. 2F, G).

      (2) Apoptosis analysis: We studied this by demonstrating that apoptosis-related genes were upregulated in pachytene spermatocytes at the single-cell level (revised Fig. 4D). To further validate this finding, we performed scRNA-seq analysis on P35 testis samples. Our results revealed a marked reduction in late pachytene spermatocytes in Znhit1-cKO testes (revised Fig. 2F, G), consistent with apoptotic depletion of pachytene-stage cells. Together, these data confirm that Znhit1 ablation impairs pachytene-stage spermatocyte development.

      (3) X/Y gene expression consequences: To address this key point, we performed stage-resolved analysis of XY-linked gene expression using scRNA-seq data from different-stage spermatocytes. Compared with controls, we detected aberrant ectopic activation of XY-linked genes in Znhit1-KO spermatocytes: 120 XY-linked genes were inappropriately activated at zygotene, and 119 remained abnormally upregulated at pachytene (revised Fig. 4F). These results provide direct evidence that Znhit1 deletion impairs Meiotic Sex Chromosome Inactivation (MSCI).

      (4) DSB repair issue: We have replaced the images with more representative, stage‑matched pachytene spermatocytes (revised Fig. 3C). The revised images show consistently increased γH2AX signals in Znhit1-KO spermatocytes. Prompted by Reviewer 1’s comment, we identified abnormal synapsis at autosomal terminal regions in mutant cells. Together, these results confirm that ZNHIT1 is essential for DSB repair during male meiotic prophase I.

      (5) Potential role of H2A.Z in DSB repair: Though H2A.Z was nearly undetectable at recombination hotspots, we discuss two possibilities: (1) ZNHIT1-H2A.Z depletion dysregulated DSB repair-related genes; (2) Current ChIP-seq sensitivity may miss low-abundance H2A.Z at hotspots, which could support repair via chromatin remodeling. Future high-resolution assays (super-resolution imaging, DSB-targeted ChIP-seq) are proposed to validate this. We agree that recombination defects may be indirect consequences of failed pachytene gene regulation, rather than a direct regulatory effect of ZNHIT1 on recombination machinery.

      (2) Gene expression analysis. The first consequence of H2AZ depletion is gene expression downregulation. However, it may be not surprising that some genes are down and others upregulated. There are likely secondary and indirect effects including the upregulation of some genes. The authors should explain and discuss this point such as to answer to questions raised by reviewer 1 and 2.

      The primary consequence of H2A.Z depletion in pachytene spermatocytes is indeed widespread downregulation of genes. For the coexistence of upregulated genes, we explain this via three key points.

      (1) Technical differences between scRNA-seq and bulk RNA-seq (addressing Reviewer 1): scRNA-seq captures cell-type-specific differentially expressed genes that bulk RNA-seq masks (bulk averages signals across mixed cells, hiding changes in rare subsets). Additionally, scRNA-seq uses a lower log2(fold change) threshold (0.25 vs. 1 in bulk RNA-seq), detecting subtle upregulations missed by bulk analysis.

      (2) No dead cell contamination (addressing Reviewer 1): Stringent quality control excluded cells with >15% mitochondrial RNA. Apoptosis-related genes showed no significant correlation with mitochondrial RNA fractions (Pearson correlation coefficient, r = -0.02; please see Author response image 1), ruling out dead cell transcriptome interference.

      (3) Secondary/indirect effects (addressing Reviewers 1 & 2): Upregulated genes likely result from indirect regulatory cascades. H2AZ depletion may disrupt upstream transcription factors, leading to compensatory upregulation of their downstream genes or cell stress responses to meiotic arrest. Notably, Znhit1 knockout specifically impacts genes upregulated at the zygotene-pachytene transition, while genes upregulated at preleptotene-leptotene or leptotene-zygotene transitions remain largely unaffected (revised Fig. 4B), confirming the specificity of H2A.Z’s direct regulatory role and framing upregulation as non-targeted indirect effects.

      (3) The authors should also test the effect of Znhit1KO on the 1196 genes (up PreL/L) and 1325 (up L/Z) as shown in Figure 5D for the PGA. Also in Figure 5B, there is no evaluation of the statistical significance of the variation, this should be revised. X and Y genes should be analysed. KAS-Seq should be correlated with gene expression analysis, and several points as mentioned in the reviews below should be better explained and discussed.

      (1) Effect of Znhit1-KO on PreL/L- and L/Z-upregulated genes: we analyzed the 1196 genes upregulated at the PreL/L transition and 1325 genes upregulated at the L/Z transition. Znhit1 knockout had minimal effect on the expression of these early meiotic gene sets (revised Fig. 4B), whereas genes activated at the zygotene‑pachytene transition were strongly downregulated in Znhit1-KO spermatocytes. These results confirm the specific role of ZNHIT1 in regulating pachytene‑stage gene expression. We have also added a statistical evaluation for the variation shown in Fig. 4B.

      (2) X/Y-linked gene analysis: Analysis of stage‑resolved scRNA‑seq revealed aberrant ectopic activation of 120 XY‑linked genes at zygotene and 119 at pachytene in Znhit1-KO spermatocytes (revised Fig. 4F), demonstrating impaired Meiotic Sex Chromosome Inactivation (MSCI).

      (3) KAS-seq correlation with gene expression: We analyzed the link between KAS‑seq signals and gene expression, and we found that Znhit1 depletion caused a global reduction in KAS‑seq signals, especially at promoters of downregulated genes (revised Fig. S8). Genes with increased expression showed low KAS‑seq signals in both control and mutant groups, likely reflecting indirect regulation. These results highlight the essential role of ZNHIT1 in transcriptional regulation.

      (4) The title should refer to Znhit1, and the effect on meiotic recombination activities may be an indirect consequence of prophase progression arrest, even if some recombination genes are downregulated. This point is important as noted by reviewer 3.

      We fully acknowledge Reviewer 3’s key point and have revised the manuscript title to “ZNHIT1-dependent H2A.Z deposition at meiotic prophase I underlies pachytene gene expression and meiotic progression during male meiosis” to reduce emphasis on a direct H2A.Z-recombination link.

      Regarding meiotic recombination activities: The downregulation of recombinationrelated genes (e.g., Ccnb1ip1, Rnf212) stems from impaired pachytene-stage transcriptional programs caused by ZNHIT1-dependent H2A.Z deposition defects, which in turn leads to prophase progression arrest. Thus, the observed recombination abnormalities may be a secondary consequence of the meiotic prophase arrest, rather than a direct regulatory effect of ZNHIT1 on recombination machinery. This clarification has been integrated into the Discussion section (lines 314-318).

      (5) The recent structural analysis of SRCAP should be cited: Yu et al. Cell Discovery (2024) 10:15 https://doi.org/10.1038/s41421-023-00640-1.

      We have cited this reference in this revised manuscript (lines 234-236).

      (6) The authors should read and answer the specific revisions asked for by the reviewers.

      We have thoroughly read and systematically addressed all specific revisions requested by Reviewers 1, 2, and 3, as detailed in the revised manuscript and supplementary data.

      References

      Alexander, A.K., Rice, E.J., Lujic, J., Simon, L.E., Tanis, S., Barshad, G., Zhu, L., Lama, J., Cohen, P.E., and Danko, C.G. (2023). A-MYB and BRDT-dependent RNA Polymerase II pause release orchestrates transcriptional regulation in mammalian meiosis. Nature communications 14.

      Cole, L., Kurscheid, S., Nekrasov, M., Domaschenz, R., Vera, D.L., Dennis, J.H., and Tremethick, D.J. (2021). Multiple roles of H2A.Z in regulating promoter chromatin architecture in human cells. Nature communications 12, 2524.

      Ernst, C., Eling, N., Martinez-Jimenez, C.P., Marioni, J.C., and Odom, D.T. (2019). Staged developmental mapping and X chromosome transcriptional dynamics during mouse spermatogenesis. Nature communications 10, 1251.

      Kim, T.K., Hemberg, M., Gray, J.M., Costa, A.M., Bear, D.M., Wu, J., Harmin, D.A., Laptewicz, M., Barbara-Haley, K., Kuersten, S., et al. (2010). Widespread transcription at neuronal activity-regulated enhancers. Nature 465, 182-187.

      Sporrij, A., Choudhuri, A., Prasad, M., Muhire, B., Fast, E.M., Manning, M.E., Weiss, J.D., Koh, M., Yang, S., Kingston, R.E., et al. (2023). PGE(2) alters chromatin through H2A.Z-variant enhancer nucleosome modification to promote hematopoietic stem cell fate. Proceedings of the National Academy of Sciences of the United States of America 120, e2220613120.

      Turner, J.M. (2015). Meiotic Silencing in Mammals. Annu Rev Genet 49, 395-412. Wu, T., Lyu, R., You, Q., and He, C. (2020). Kethoxal-assisted single-stranded DNA sequencing captures global transcription dynamics and enhancer activity in situ.

      Nature methods 17, 515-523.

    1. eLife Assessment

      This valuable study provides quantitative data and analysis to reveal that variations in Dorsal (Dl ) nuclear dynamics along the Dorso-ventral axis in the early Drosophila embryo are governed by Dl-Cactus nuclear interactions. The solid evidence partially supports a mechanism where nuclear localized Cactus contributes to the fraction of Dl that binds to DNA, but additional work will be necessary to confirm the claims and the biological significance of these findings.