10,000 Matching Annotations
  1. Apr 2026
    1. eLife Assessment

      This valuable study provides evidence that the integration of the nuclear envelope into the endoplasmic reticulum provides a mechanism for mechanical integration across this continuous membrane system. This work opens up new avenues for studying organelle membrane tension homeostasis. The evidence was found to be convincing and carefully quantified, with minor limitations that we expect to be further explored in future work.

    2. Reviewer #1 (Public review):

      Summary:

      Zare‑Eelanjegh et al. investigate how the endoplasmic reticulum, the nucleus, and the cell periphery are mechanically linked by indenting intact cells with specially shaped atomic‑force probes that double as drug injection devices. Fluorescence‑lifetime imaging of the membrane tension reporter Flipper‑TR reveals that these three compartments are mechanically linked and that the actin cytoskeleton, microtubules, and lamins modulate this coupling in complex ways.

      Strengths:

      * The study makes an important advance by applying FluidFM to probe organelle mechanics in living cells, a technically demanding but powerful approach.

      * Experimental design is quantitative, the data are clearly presented, and the conclusions are broadly consistent with the measurements.

      Weaknesses:

      * Calcium‑dependent effects: Indentation can evoke cytoplasmic Ca²⁺ elevations that drive myosin contraction and reshape the internal membrane network (e.g., vesiculation: PMID : 9200614, 32179693) possibly confounding the Flipper-TR responses; without simultaneous/matching Ca²⁺ imaging, cell viability assays (e.g., Sytox), and intracellular Ca²⁺ sequestration or myosin inhibition experiments, a more complex mechanochemical coupling cannot be excluded, weakening conclusions.

      * Baseline measurements: Flipper‑TR lifetime images acquired without indentation do not exclude potential light‑induced or time‑dependent changes, which weakens the conclusions.

      * Indentation depth versus nuclear stiffness/tension: Because lamin‑A/C depletion softens nuclei, a given force may produce a deeper pit and thus greater membrane stretch. It is unclear how the cytoskeletal perturbations affect indentation depth, which weakens the conclusions.

      Comments on revisions:

      With their responses, the authors have relieved my initial concerns.

    3. Reviewer #2 (Public review):

      Summary

      This valuable study combines atomic force microscopy with genetic manipulations of the lamin meshwork and microinjection of cytoskeletal depolymerizing drugs to probe the mechanical responses of intracellular organelles to combinations of cytoskeletal perturbations. This study demonstrates both local and distal responses of intracellular organelles to mechanical forces, and shows that these responses are affected by disruption of the actin, microtubule, and lamin cytoskeletal systems.

      Strengths:

      This study uses a sensitive micromanipulation system to apply and visualize the effects of force on intracellular organelles.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Zare-Eelanjegh et al. investigate how the endoplasmic reticulum, the nucleus, and the cell periphery are mechanically linked by indenting intact cells with specially shaped atomic force probes that double as drug injection devices. -Fluorescencelifetime imaging of the membrane tension reporter -FlipperTR- reveals that these three compartments are mechanically linked and that the actin cytoskeleton, microtubules, and lamins modulate this coupling in complex ways.

      Strengths:

      (1) The study makes an important advance by applying FluidFM to probe organelle mechanics in living cells, a technically demanding but powerful approach.

      (2) Experimental design is quantitative, the data are clearly presented, and the conclusions are broadly consistent with the measurements.

      Weaknesses:

      (1) Calcium-dependent- effects: Indentation can evoke cytoplasmic CA<sup>2+</sup> elevations that drive myosin contraction and reshape the internal membrane network (e.g., vesiculation: PMID : 9200614, 32179693) possibly confounding the Flipper-TR responses; without simultaneous/matching CA<sup>2+</sup> imaging, cell viability assays (e.g., Sytox), and intracellular CA<sup>2+</sup> sequestration or myosin inhibition experiments, a more complex mechanochemical coupling cannot be excluded, weakening conclusions.

      (2) Baseline measurements: FlipperTR lifetime images acquired without indentation do not exclude potential -light-induced or -time-dependent- changes, which weaken the conclusions.

      (3) Indentation depth versus nuclear stiffness/tension: Because lamin-A/C depletion softens nuclei, a given force may produce a deeper pit and thus greater membrane stretch. It is unclear how the cytoskeletal perturbations affect indentation depth, which weakens the conclusions.

      Reviewer #2 (Public review):

      Summary:

      This useful study combines atomic force microscopy with genetic manipulations of the lamin meshwork and microinjection of cytoskeletal depolymerizing drugs to probe the mechanical responses of intracellular organelles to combinations of cytoskeletal perturbations. This study demonstrates both local and distal responses of intracellular organelles to mechanical forces and shows that these responses are affected by disruption of the actin, microtubule, and lamin cytoskeletal systems. Interpretation of these effects is limited by the absence of key data determining whether acute microinjection of cytoskeleton-depolymerizing drugs has complete or partial effects on the targeted cytoskeletal networks.

      Strengths:

      This study uses a sensitive micromanipulation system to apply and visualize the effects of force on intracellular organelles.

      Weaknesses:

      The choice to deliver cytoskeleton-depolymerizing drugs by local microinjection is unusual, and it is unclear to what extent actin and microtubule filaments are actually depolymerized immediately after microinjection and on the minutes-length timescale being evaluated in this study. This omission limits the interpretation of these data.

      Reviewer #3 (Public review):

      Summary:

      Using an approach developed by the authors (FluidFM) combined with FLIM, they discover that a mechanical force applied over the cell nucleus triggers mechanical responses dependent on the Lamina composition.

      Strengths:

      The authors present a new approach to study mechano-transduction in living cells, with which they uncover lamin-dependent properties of the nucleus.

      Weaknesses:

      (1) The transfer of the mechanical response from the Lamina to the ER is not fully covered.

      (2) In Figure 4D, WT dots are the same for each compartment. Why do the authors not make one graph for each compartment with WT, A-KO, B-KD, and A-KO/B-KD together?

      (3) In Figure 1E, the authors showed well how the probe deforms the nucleus. It is not indicated in the material and methods section or in the figure legend, where, in Z, the acquisition of FLIM images was made or if it is a maximum projection. I assume it was made at a plane in the middle of the nucleus to see the nuclear envelope border and the ER at the same time. Did the authors look at the nuclear membrane facing upward, where most of the deformation should occur? Are there more lifetime changes? In Figure D, before injection of CytoD, we can clearly see a difference at the pyramidal indentation site with two different lifetime colors.

      (4) A great result of this article regards the importance of Lamins, A and B, in triggering the response to a mechanical force applied to the nucleus. Could 3D imaging for LaminA and LaminB be performed at the different time points of indentation to see how the lamins meshworks are deformed and how they return to basal state? This could be correlated with the FLIM results described in the article.

      (5) Lamins form a meshwork underneath the nuclear membrane. They are connected to the cytoskeletons mainly by the LINC complex. Results presented here show that the cytoskeletons are implicated in transferring the stimulus from the nuclear envelope to the ER. Could the author perform the same experiments using Nesprin-2 or/and Nesprin-1 or/and SUN1/2 knockdowns to determine if this transmission is occurring through the LINC complex or rather in a passive way by modifying the nuclear close surroundings?

      (6) The authors used cytoskeleton drugs, CytoD and Nocodazole, with their FluidFM probe, but did not show if the drugs actually worked and to what extent by performing actin or microtubule stainings. In the original paper describing FluidFM, 15s were enough to obtain a full FITC-positive cell after injection. Here, the experiments are around 5 minutes long. I therefore interrogate the rationale behind the injection of the drugs compared to direct incubation, besides affecting only the cell currently under indentation.

      We thank the reviewers for their constructive criticisms and suggestions. Accordingly, we amended the manuscript and the figures.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Calcium-dependent effects: Indentation can evoke cytoplasmic CA<sup>2+</sup> elevations that drive myosin contraction and reshape the internal membrane network (e.g., vesiculation: PMID : 9200614, 32179693) that may affect Flipper-TR signals independent of membrane tension; without simultaneous CA<sup>2+</sup> imaging, cell viability assays (e.g., Sytox costaining), intracellular CA<sup>2+</sup> sequestration or myosin inhibition, a more complex mechanochemical coupling cannot be excluded. Tracking ER morphology during the experiments with luminal and membrane markers would further clarify this point.

      For the goal of our article which is exhibiting and quantifying the tension propagation and tension homeostasis over different organelles managing the mechanosensitivity and thus the mechanoresponse of cell, the test cells (drug injected cells) were compared with the control group of non-drug injected cells (Fig. 2 and Fig. 3), and in these cases potential overall responses of the cells to intendation, e.g. potential changes in CA<sup>2+</sup> sequestration, are covered by the control group.

      Interestingly, using only cylindrical probes in CytoD injection while indenting cells, demonstrated higher tension at the NE compared to the control group of non-drug injected cells. This indicates that a higher effect arising from the F-actin-disturbance phenomena compared to the indention process itself, at least where the cells were stimulated using cylindrical probes. That was also the reason why in the next steps of this study including varying the indentation site from the nucleus to the ER or cell periphery as well as studying WT cells compared to varied lamina compositions, only cylindrical probes with minimized indention effect on the NE and the ER were used.

      Lastly, to examine simultaneously response to tension changes and calcium dynamics, we have meanwhile extended our study and analyzed cells treated with different cytoskeleton disturbing drugs (e.g., CytoD), subjected to viscoelasticity measurements using AFM indentation (i.e. cells relaxation studies following indentation), and injected with drugs perturbing the regulation of CA<sup>2+</sup> homeostasis (i.e., Thapsigargin), combined with simultaneous CA<sup>2+</sup> imaging, for which another manuscript is in preparation.  

      (2) Baseline measurements: FlipperTR lifetime images acquired without indentation, collected with identical timing and illumination, are needed as controls to gauge potential light-induced or time-dependent changes.

      For every cell a baseline referring to its tension at relaxed state (without indentation) was quantified by a Flipper-TR image taken before the indention and injection processes (“before”). As explained in the manuscript (lines 180-184), this baseline tension value was then used to be subtracted from the tension measured over time by the time-lapse FlipperTR imaging over the course of 3-4 min of stimulation (indentation + injection) as well as immediately or 5 min post-stimulus. The control group (i.e., non-drug injected cells or WT cells where the effect of F-actin depolymerization or the effect of lamina composition were studied, respectively) was always performed in the same manner as for test group. As such, tenson increase due to the light-inducing, time-dependent changes or indentation solely, were excluded.

      (3) Indentation depth versus nuclear stiffness/tension: Because laminA/C depletion softens nuclei, a given force may arguably produce a deeper pit and thus greater (not less) membrane stretch. Demonstrating that pit geometry depends only on applied force - and not on genetic or pharmacological perturbations - is necessary to rule out alternative interpretations.

      We thank the reviewer for raising this important point regarding the relationship between indentation depth and nuclear stiffness. To address whether pit geometry depends on applied force rather than genetic perturbations, we analyzed the piezo movement required to reach the 150 nN force setpoint across all experimental conditions (WT, LMNA KO, LMNB KD, and LMNA KO/LMNB KD cells).

      Our results (Fig. S6) demonstrate that there is no statistically significant difference in the piezo displacement from the contact point to the 150 nN setpoint between any of the experimental groups (Kruskal-Wallis H-test: H = 1.744, p = 0.627). This indicates that for a constant applied force of 150 nN, the indentation depth is equivalent across all conditions despite differences in nuclear stiffness.

      Therefore, the observed differences in tension response and perhaps the membrane stretch cannot be attributed to variations in indentation depth but rather reflect the intrinsic differences in molecular mechanical response to equivalent mechanical stimuli.

      This has been added in the manuscript in lines 282-286.

      Reviewer #2 (Recommendations for the authors):

      (1) Please clarify the distinctions between the pyramidal and cylindrical probes. The manuscript alludes to sharpening the cylindrical probe to facilitate membrane rupture. Do both probes rupture the plasma membrane upon force application? If so, at what applied force does this occur? It seems that PM rupture would also affect tension on intracellular membranes during and especially after force application.

      Yes, both cylindrical and pyramidal probes are rupturing PM as well as the nuclear membrane when targeting the nucleus of cells. When targeting Hela cells, used for this study, pyramidal probes puncture the membrane at a higher force of 100 nN compared to rupture forces between 10 nN and 50 nN required for sharpened cylindrical probes used here. This was explained in manuscript lines 112-115 for cylindrical probes and revised for pyramidal probes in lines 115-119.

      (2) Also re: probes: it is clear from Figure 1 that the total volume displacement induced by the pyramidal probe is far greater than the cylindrical probe. This greater displaced volume seems to be a very reasonable explanation for the increased membrane tension detected with the pyramidal probe, but this interpretation is not discussed.

      That is a good point, thank you! This has been added in lines 138-140.

      (3) Both cytochalasin D and nocodazole work by preventing new polymerization of monomers, which acutely affects new assembly and, over time, leads to loss of polymerized filaments. On the timescale of the experiments shown, it seems possible that acute effects on new filament assembly may be occurring, but that pre-assembled filaments may remain stable. It may thus be a misinterpretation to describe these conditions as "without actin fibers" or "without MTs". Further complicating matters, it is possible that the kinetics of filament disassembly may be altered by combinatorial treatment and/or in lamin knockout conditions versus wild-type cells. For instance, it has been shown that microtubule depolymerization increases actin contractility (see PMID 33089509). For these reasons, control experiments showing the extent of actin and/or microtubule disassembly in each condition tested are essential to interpret the data shown.

      Thank you for rasing this valid point. This has been corrected and noted as "less actin fibers" and "less MTs". For what concerns the timescale within which the drugs (e.g., CytoD and Nocodazole) affect the filaments assembly, a higher concentration of 50 µM for each of CytoD and Nocodazole leading to final concentration of 0.5 µM was used for intracellular injection. This final physiologically relevant concentration was expected to act as fast as 12 min for CytoD and 1-5 min for Nocodazole when directly delivered inside the cell, excluding the required time for passing the plasma membrane. Especially in our study examining the dynamic response of cells and change in tension is focusing on the early effects of drugs and deviation from the control groups rather than the steady state achieved at longer time points. The basis for the time estimation relies on the reported values in the literature. For instance, a recent comprehensive study quantified actin dynamics and its interaction with CytoD using high resolution images of single actin filaments acquired by total internal reflection fluorescence (TIRF) microscopy and reported a value of approximately 150 s (depicted from the graphs presented in Fig. 2D and 2F) as a starting point of inhibiting actin filaments polymerization after introducing 5 nM CytoD flow in a chamber containing actin filaments.1 Or in another study, a half-time of 40 s for the complete disassembly of microtubules in monocytes has been reported for cells incubated with 1 µM Nocodazole.2 This part was also included in SI file, section “Mechanochemical stimulation”.

      (4) The presentation of some of the data could be clarified. For instance, it is unclear how some time course experiments can be non-significant but the endpoint analysis can be significant (for instance, Figure 3C vs. Figure 3D.)

      We agree that some instances require clearer interpretation: indenting cell nucleus using cylindrical probes induced a higher tension at CytoD-injected cells compared to control cells at both the ER and NE, during and after stimulus (Fig. 2E-F and Fig. 3C-D). Time lapse tension analysis of these cells at the ER and NE showed a close to significant and significant differences between test and control groups, respectively. p-values of 0.087 for Fig. 2E (bottom row, ER) and 0.042 for Fig. 3C (top row, ER) were captured at the ER for the last time point during stimulus. For “after stimulus” condition, significant differences between CytoD-injected and control cells at both the ER and NE were captured. The ER’s complex morphology consists of many curved structures of lumens and disks which can deform when subjected to external mechanical perturbation, making it prone to absorb stress and strain when directly targeted. That could explain the similar tension levels in both CytoD-injected and control cells during ER indentation. Notably, unlike nucleus-targeted cells, ER-targeted cells only show increased tension at the ER and NE in CytoDinjected cells compared to control ones after stimulation. This suggests fundamental differences in the mechanical coupling of the nucleus and the ER to the cytoskeleton. While the nucleus maintains direct, structural actin connections through the nuclear lamina and LINC complexes3, making it immediately sensitive to actin disruption, the ER relies on indirect, signaling-mediated cytoskeletal interactions4,5. Thus, the ER functions as a dynamic tension buffer that engages cytoskeletal support primarily during active repair processes following mechanical perturbation. This explains why nuclear probing reveals immediate tension differences in actin-disrupted cells, while ER probing only shows post-retraction effects. Consequently, statistical analysis detects significant differences between test and control groups after probe removal, but not during probe contact in ER-targeted experiments. This was also explained better in the manuscript in line 236.

      References

      (1) Mitani, T. et al. Microscopic and structural observations of actin filament capping and severing by Cytochalasin D. bioRxiv, 2025.2001.2028.635382 (2025).

      (2) Cassimeris, L. U., Wadsworth, P. & Salmon, E. D. Dynamics of microtubule depolymerization in monocytes. J Cell Biol 102, 2023-2032 (1986).

      (3) Maurer, M. & Lammerding, J. The Driving Force: Nuclear Mechanotransduction in Cellular Function, Fate, and Disease. Annu Rev Biomed Eng 21, 443-468 (2019).

      (4) Shi, X. et al. Actin nucleator formins regulate the tension-buffering function of caveolin-1. J Mol Cell Biol 13, 876-888 (2022).

      (5) van Vliet, A. R. & Agostinis, P. PERK and filamin A in actin cytoskeleton remodeling at ER-plasma membrane contact sites. Molecular & Cellular Oncology 4, e1340105 (2017).

    1. eLife Assessment

      This valuable manuscript describes ATP5I, a subunit of F1Fo-ATP synthase, as a key target of medicinal biguanides. The knockout of ATP5I in pancreatic cancer cells mimics biguanide treatment, inducing a metabolic switch from OXPHOS to glycolysis due to a compromised expression of the Complex I protein NDUFB8. This results in a markedly decreased NAD/NADH ratio and decreased cell proliferation. These solid findings point out ATP5I as a promising mitochondrial target for cancer therapies and contribute to our understanding of metformin's mechanism of action since many of its molecular mechanisms remain poorly understood.

    2. Reviewer #1 (Public review):

      Summary:

      In the manuscript entitled 'The Role of ATP Synthase Subunit e (ATP5I) in 1 Mediating the Metabolic and Antiproliferative 2 Effects of Biguanides', Lefrancois G et al. identifies ATP5I, a subunit of F1Fo-ATP synthase, as a key target of medicinal biguanides. ATP5I stabilizes F1Fo-ATP synthase dimers, essential for cristae morphology, but its role in cancer metabolism is understudied. The research shows ATP5I interacts with a biguanide analogue, and its knockout in pancreatic cancer cells mimics biguanide treatment effects, including altered mitochondria, reduced OXPHOS, and increased glycolysis. ATP5I knockout cells resist biguanide-induced antiproliferative effects, but reintroducing ATP5I restores the effects of metformin and phenformin. These findings highlight ATP5I as a promising mitochondrial target for cancer therapies. The manuscript is well written.

      Strengths:

      Demonstrated the experiments in a systematic and well accepted methods

      Weaknesses:

      Significance of the target molecule and mechanisms may help in understanding the molecular mechanisms of metformin.

      Comments on revisions:

      In the revised manuscript, the authors addressed all the queries.

    3. Reviewer #2 (Public review):

      Summary:

      The mechanism(s) by which the therapeutic drug metformin lowers blood glucose in type 2 diabetes and inhibits cell proliferation at higher concentrations remain contentious. Inhibition of complex 1 of the mitochondrial respiratory chain with consequent changes in cellular metabolites which favour allosteric activation of phosphofructokinase-1, allosteric inhibition of fructose bisphosphatase-1 and cAMP signalling and activation of AMPK which phosphorylates transcription factors are candidate mechanisms. The current manuscript proposes the e-subunit of ATP-synthase as a putative binding protein of biguanides and demonstrates that it regulates the expressivity of the Complex 1 protein NDUFB8.

      Strengths:

      (1) The metformin conjugate and metformin show comparable efficacy on inhibition of cell proliferation in the millimolar range.

      (2) Demonstration of compromised expression of the Complex I protein NDUFB8 by the ATP5I knock out and its reversal by ATP5I expression is an important strength of the study. This shows that the decreased "sensitivity" to metformin in the ATP5I knock out cells could be due to various proteins.

      (3) Demonstration of converse effects of ATP5I KO and re-expression ATP5I on the NAD/NADH ratio.

      Weaknesses:

      (1) The interpretation of the cellular co-localization of the biotin-biguanide conjugate with TOMM20 (Figure 1-D) as mitochondrial "accumulation" of the conjugate is overstated because it cannot exclude binding of the conjugate to the mitochondrial membrane. It would have been more convincing if additional incubations with the biotin-biguanide conjugate in combination with metformin had shown that metformin is competitive with the biotin-conjugate.

      (2) The manuscript reports the identification of 69 proteins by mass spectrometry of the pull-down assay of which 31 proteins were eluted by metformin. However, no Mass Spectrometry data is presented of the peptides identified. The methodology does not state the minimum number of peptides (1, 2?) that were used for the identification of the 31/69 proteins.

      (3) The validation of ATP5I was based on the use of recombinant protein (which was 90% pure) for the SPR and use of a single antibody to ATP5I. The validity of the immunoblotting rests on the assumption that there is no "non-specific" immunoactivity in the relevant mol wt range. Information on the validation of the antibody would be helpful.

      (4) Knock-out of ATP5I markedly compromised the NAD/NADH ratio (Fig.3A) and cell proliferation (Fig.3D). These effects may be associated with decreased mitochondrial membrane potential which could explain the low efficacy for metformin (and most of the data in Figs 3-5). This possibility should be discussed. Effects of [metformin] on the NAD/NADH ratio in control cells and ATP5I-KO would have been helpful because the metformin data on cell growth is normalized as fold change relative to control, whereas the NAD/NADH ratio would represent a direct absolute measurement enabling comparison of the absolute effect in control cells with ATP5I KO.

      (5) Figure-6 CRISPR/Cas9 KO at 16mM metformin in comparison with 70nM rotenone and 2 micromolar oligomycin (in serum containing medium). The rationale for use of such a high concentration of metformin has not been explained. In liver cells metformin concentrations above 1mM cause severe ATP depletion, whereas therapeutic (micromolar) concentrations have minimal effects on cellular ATP status. The 16mM concentration is ~2 orders of magnitude higher than therapeutic concentrations and likely linked to compromised energy status. The stronger inhibition of cell proliferation by 16mM metformin compared with rotenone or oligomycin raises the issue whether the changes in gene expression may be linked to the greater inhibition of mitochondrial metabolism. Validation of the cellular ATP status and NAD/NADH with metformin as compared with the two inhibitors could help the interpretation of this data.

      Comments on revisions:

      No further comments.

    4. Reviewer #3 (Public review):

      Most of the data are based on measurements of the oxygen consumption rate (OCR) and extracellular acidification rate (ECAR) measured by the Seahorse analyser in control and ATP5l KO cells. However, these measurements are conducted by a single injection of a biguanide, followed over time and presented as fold change. By doing so, the individual information of the effect to of metformin and derivate on control and KO cells are lost. In addition, the usual measurement of OCR is coupled with certain inhibitors and uncouplers, such as oligomycin, FCCP and Antimycin A/rotenone, to understand the contribution of individual complexes to the respiration. Since biguanides and ATP5l KO affect protein levels of components of complex I and IV, it would be informative to measure their individual contributions/effects in the Seahorse. To further strengthen the data, it would be helpful to obtain measurements of actual ATP levels in these cells, as this would explain the activation of AMPK.

      The authors report on alterations in mitochondrial morphology upon ATP5l KO, which is measured by subjective quantifications of filamentous versus puncta structures. Fiji offers great tools to quantify the mitochondrial network unbiased and with more accuracy using deconvolution and skeletonization of the mitochondria, providing the opportunity to measure length, shape and number quantitatively. This will help to understand better, whether mitochondria are really fragmented upon ATP5l KO and rescued by its re-introduction.

      Finally, the authors report in the last part of the paper a genetic CRISPR/Cas9 KO screen in NALM-6 cells cultured with high amounts of metformin to identify potential new mediators of metformin action. It is difficult to connect that to the rest of the paper, because a) different concentrations of metformin are used and b) the metabolic effects on energy consumption are not defined. They argue about molecular function of the obtained hits based on literature, and on comparison the pattern of genetic alterations based on treatments with known inhibitors such as oligomycin and rotenone. However, a direct connection is not provided, thus the interpretation at the end of the results that "the OMA1-DEL1-HRI pathway mediates the antiproliferative activity of both biguanides and the F1ATPase inhibitor oligomycin" while increasing glycolysis, needs to be tuned down. This is an interesting observation, but no causality is provided. In general, this part stands alone and needs to be better connected to the rest of the paper.

      Comments on revisions:

      Thanks to the authors for addressing the concerns raised during the review of the original manuscript. The data now include proper measurements of OCR and quantifications of the mitochondria network. The screening data is better connected to the rest of the paper and provide compelling evidence for mitochondria and in particular the ATP synthase as potential targets of metformin.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In the manuscript entitled 'The Role of ATP Synthase Subunit e (ATP5I) in 1 Mediating the Metabolic and Antiproliferative 2 Effects of Biguanides', Lefrancois G et al. identifies ATP5I, a subunit of F1Fo-ATP synthase, as a key target of medicinal biguanides. ATP5I stabilizes F1Fo-ATP synthase dimers, essential for cristae morphology, but its role in cancer metabolism is understudied. The research shows ATP5I interacts with a biguanide analogue, and its knockout in pancreatic cancer cells mimics biguanide treatment effects, including altered mitochondria, reduced OXPHOS, and increased glycolysis. ATP5I knockout cells resist biguanide-induced antiproliferative effects, but reintroducing ATP5I restores the effects of metformin and phenformin. These findings highlight ATP5I as a promising mitochondrial target for cancer therapies. The manuscript is well written.

      Strengths:

      Demonstrated the experiments in systematic and well-accepted methods.

      Weaknesses:

      The significance of the target molecule and mechanisms may help in understanding the molecular mechanisms of metformin.

      We greatly appreciate the reviewer’s insightful comment regarding the importance of the target molecule and its mechanisms in elucidating metformin’s molecular actions. ATP5I plays a key role in the dimerization and assembly of the F1F0-ATP synthase complex. To address this, we performed Blue Native-PAGE followed by western blotting using an antibody against the β-subunit of the F1 domain. Our results show that metformin affects the oligomeric state of the F1F0-ATP synthase in a way that partially reproduces the effect of the KO of ATP5I (Fig 2G). This provides direct evidence that metformin acts on-target through ATP5I.

      Reviewer #2 (Public review):

      Summary:

      The mechanism(s) by which the therapeutic drug metformin lowers blood glucose in type 2 diabetes and inhibits cell proliferation at higher concentrations remain contentious. Inhibition of complex 1 of the mitochondrial respiratory chain with consequent changes in cellular metabolites which favour allosteric activation of phosphofructokinase-1, allosteric inhibition of fructose bisphosphatase-1 and cAMP signalling and activation of AMPK which phosphorylates transcription factors are candidate mechanisms. The current manuscript proposes the e-subunit of ATP-synthase as a putative binding protein of biguanides and demonstrates that it regulates the expressivity of the Complex 1 protein NDUFB8.

      Strengths:

      (1) The metformin conjugate and metformin show comparable efficacy on inhibition of cell proliferation in the millimolar range.

      (2) Demonstration of compromised expression of the Complex I protein NDUFB8 by the ATP5I knockout and its reversal by ATP5I expression is an important strength of the study. This shows that the decreased "sensitivity" to metformin in the ATP5I knock-out cells could be due to various proteins.

      (3) Demonstration of converse effects of ATP5I KO and re-expression ATP5I on the NAD/NADH ratio.

      Weaknesses:

      (1) The interpretation of the cellular co-localization of the biotin-biguanide conjugate with TOMM20 (Figure 1-D) as mitochondrial "accumulation" of the conjugate is overstated because it cannot exclude binding of the conjugate to the mitochondrial membrane. It would have been more convincing if additional incubations with the biotin-biguanide conjugate in combination with metformin had shown that metformin is competitive with the biotin-conjugate.

      We appreciate the reviewer’s comment and agree that the resolution provided by fluorescence microscopy makes it challenging to pinpoint the specific mitochondrial compartment where the biotin-biguanide conjugate localizes, even with additional markers such as TOMM20 antibodies for the inner mitochondrial membrane. While it remains a possibility that the conjugate binds to the mitochondrial surface, another plausible explanation is that the biotin moiety may facilitate entry into mitochondria through a biotin-specific transporter, adding further mechanistic intricacies. Furthermore, while a competition assay with metformin might help investigate interactions with mitochondrial targets and transporters (OCT family), it would not compete for biotin-mediated transport. Thus, while we acknowledge the reviewer’s suggestion, we believe such an experiment may not provide conclusive evidence regarding the conjugate’s mitochondrial localization or mechanism of entry. Instead, we revised the manuscript to more accurately describe the findings as "mitochondrial association" rather than "mitochondrial accumulation," ensuring that our interpretation remains consistent with the resolution and limitations of the data presented.

      (2) The manuscript reports the identification of 69 proteins by mass spectrometry of the pull-down assay of which 30 proteins were eluted by metformin. However, no Mass Spectrometry data is presented of the peptides identified. The methodology does not state the minimum number of peptides (1, 2?) that were used for the identification of the 31/69 proteins.

      We added a comprehensive table summarizing these findings (Figure 1- figure supplement 2). We considered all peptides and decided to perform stringent validation tests for those chosen to be further studied.

      (3) The validation of ATP5I was based on the use of recombinant protein (which was 90% pure) for the SPR and the use of a single antibody to ATP5I. The validity of the immunoblotting rests on the assumption that there is no "non-specific" immunoactivity in the relevant mol wt range. Information on the validation of the antibody would be helpful.

      Regarding the recombinant protein used for SPR, its purity was evaluated using a Coomassie-stained gel. For the antibody used in immunoblotting, its specificity was validated through knockout cell lines (Figure 2A), ensuring minimal concerns about non-specific immunoactivity within the relevant molecular weight range. Unfortunately, the KO data comes in the paper after the first immunoblots are presented. We outlined this validation in the methods section.

      (4) Knock-out of ATP5I markedly compromised the NAD/NADH ratio (Fig.3A) and cell proliferation (Figure 3D). These effects may be associated with decreased mitochondrial membrane potential which could explain the low efficacy of metformin (and most of the data in Figures 3-5). This possibility should be discussed. Effects of [metformin] on the NAD/NADH ratio in control cells and ATP5I-KO would have been helpful because the metformin data on cell growth is normalized as fold change relative to control, whereas the NAD/NADH ratio would represent a direct absolute measurement enabling comparison of the absolute effect in control cells with ATP5I KO.

      The mitochondrial membrane potential depends on a functional electron transport chain which drives proton pumping from the matrix to the intermembrane space. Metformin can decrease the mitochondrial membrane potential and this is usually explained as a consequence of complex I inhibition [1]. It has been published that metformin requires this membrane potential to accumulate in mitochondria so the actions of metformin are self-limiting due to this requirement. The reviewer is right that ATP5I KO cells could be resistant to metformin because they may have a lower membrane potential. We do not believe this to be the case because the response to phenformin, another biguanide that can enter mitochondria through the membrane without the need of the OCT transporters [2], is also affected in ATP5I KO cells. Of note, compensatory mechanisms such as enhanced glycolysis, as observed in ATP5I KO cells (elevated ECAR and increased sensitivity to 2-D-deoxyglucose), and the ATPase activity of F<sub>1</sub>F<sub>0</sub>-ATP synthase could potentially help maintain membrane potential suggesting that this might not be an issue in the ATP5I KO cells. Chandel and colleagues already proposed that reversal of the F<sub>1</sub>F<sub>0</sub>-ATPase keeps this membrane potential in metformin-treated cells [3].

      Nevertheless, to experimentally address this point, we measured the mitochondrial membrane potential using tetramethylrhodamine methyl ester (TMRE) and ATP levels using luciferase-based assays (CellTiter-Glo) in ATP5I KO cells. We sow now that ATP levels are not significantly reduced in ATP5I KO cells, likely because of compensatory glycolysis (Figure 5D), while the mitochondrial membrane potential remains close to normal (Figure 6D and E).

      We did not measure the NAD<sup>+</sup>/NADH in both control and KO cells treated with metformin because we provide now a more direct measurement of metformin acting on ATP5I: the state of oligomerization of the F<sub>1</sub>F<sub>0</sub>-ATPase (Figure 2G) as well as a Seahorse Bioenergetic Stress test (Figure 6A-C). Both figures provide results consistent with targeting ATP5I by biguanides. We also discuss that targeting ATP5I can result in complex I inhibition due to the well-known role of F<sub>1</sub>F<sub>0</sub>-ATPases in cristae formation and the assembly of the respiratory complexes. We do not believe ATP5I is the only target of metformin and in the paper we properly acknowledged and discussed other proposed targets in the introduction, results section page 8 and the discussion.

      (5) Figure-6 CRISPR/Cas9 KO at 16mM metformin in comparison with 70nM rotenone and 2 micromolar oligomycin (in serum-containing medium). The rationale for the use of such a high concentration of metformin has not been explained. In liver cells metformin concentrations above 1mM cause severe ATP depletion, whereas therapeutic (micromolar) concentrations have minimal effects on cellular ATP status. The 16mM concentration is ~2 orders of magnitude higher than therapeutic concentrations and likely linked to compromised energy status. The stronger inhibition of cell proliferation by 16mM metformin compared with rotenone or oligomycin raises the issue of whether the changes in gene expression may be linked to the greater inhibition of mitochondrial metabolism. Validation of the cellular ATP status and NAD/NADH with metformin as compared with the two inhibitors could help the interpretation of this data.

      NALM-6 cells are very glycolytic, have low respiration rates, and weak dependence on ATP5I (DepMap score: -0.47) [4]. The concentration of 16 mM metformin was chosen based on the IC<sub>50</sub> for this cell line. Both ATP status and NAD<sup>+</sup>/NADH ratios will depend on the extent of the compensatory glycolysis. On the other hand, our genetic screening evaluates cell proliferation as an integration of all metabolic activities required for the process. This unbiased screening revealed a common pathway affected by metformin and oligomycin different that the pathway affected by rotenone, which is consistent with the finding that metformin acts of the F<sub>1</sub>F<sub>0</sub>-ATPase. Our new Seahorse data demonstrate that oligomycin has a markedly reduced effect in metformin-treated cells, supporting a shared mechanism of action. Notably, uncouplers restore respiration in both metformin-treated and ATP5I knockout cells, which aligns with the mechanism we propose (please see our new section on the Seahorse Mito Stress test and the new discussion). In the discussion, we acknowledged—based on existing literature—that the cellular context may play a significant role in determining the response to this drug.

      Reviewer #3 (Public review):

      Most of the data are based on measurements of the oxygen consumption rate (OCR) and extracellular acidification rate (ECAR) measured by the Seahorse analyser in control and ATP5l KO cells. However, these measurements are conducted by a single injection of a biguanide, followed over time and presented as fold change. By doing so, the individual information on the effect of metformin and derivate on control and KO cells are lost. In addition, the usual measurement of OCR is coupled with certain inhibitors and uncouplers, such as oligomycin, FCCP, and Antimycin A/rotenone, to understand the contribution of individual complexes to respiration. Since biguanides and ATP5l KO affect protein levels of components of complex I and IV, it would be informative to measure their individual contributions/effects in the Seahorse. To further strengthen the data, it would be helpful to obtain measurements of actual ATP levels in these cells, as this would explain the activation of AMPK.

      Thank you for this valuable comment. We have now performed the suggested analysis, which is presented in the new Figure 6. The data are consistent with our proposition that biguanides target ATP5I, but they also suggest the possibility of additional targets, such as Complex I, as proposed by other groups. Please see our new section on the Seahorse Mito Stress test and the new discussion. We also measured ATP (Figure 5D). and the mitochondrial membrane potential (Figure 6D and E). These measurements reflect the powerful compensation provided by glycolysis.

      The authors report on alterations in mitochondrial morphology upon ATP5l KO, which is measured by subjective quantifications of filamentous versus puncta structures. Fiji offers great tools to quantify the mitochondrial network unbiasedly and with more accuracy using deconvolution and skeletonization of the mitochondria, providing the opportunity to measure length, shape, and number quantitatively. This will help to understand better, whether mitochondria are really fragmented upon ATP5l KO and rescued by its re-introduction.

      Thanks for the suggestion. We used the Mitochondrial analyzer plugin from ImageJ/Fiji and redid Figure 2 and 4 and quantified details of the mitochondrial network reporting differences in branches number, length, endpoints and diameter.

      Finally, the authors report in the last part of the paper a genetic CRISPR/Cas9 KO screen in NALM-6 cells cultured with high amounts of metformin to identify potential new mediators of metformin action. It is difficult to connect that to the rest of the paper because a) different concentrations of metformin are used and b) the metabolic effects on energy consumption are not defined. They argue about the molecular function of the obtained hits based on literature and on a comparison of the pattern of genetic alterations based on treatments with known inhibitors such as oligomycin and rotenone. However, a direct connection is not provided, thus the interpretation at the end of the results that "the OMA1-DEL1-HRI pathway mediates the antiproliferative activity of both biguanides and the F1ATPase inhibitor oligomycin" while increasing glycolysis, needs to be toned down. This is an interesting observation, but no causality is provided. In general, this part stands alone and needs to be better connected to the rest of the paper.

      NALM-6 are very glycolytic, have low respiration rates, and weak dependence on ATP5I [4], forcing us to use higher concentrations of metformin to inhibit their growth. Recent results show that metformin targets PEN2 in the cytosol to increase AMPK activity, controlling both the glucose lowering and the life span extension abilities of metformin [5]. This work raises the question whether the antiproliferative and anticancer effects of metformin are due to a mitochondrial activity or are controlled by this new pathway of AMPK activation. Hence, the genetic screening was performed to unbiasedly find how metformin works. The results provide compelling evidence for mitochondria and in particular the ATP synthase as potential targets of metformin and a foundation for future studies. We added to the following text to the beginning of this section: “Several candidate targets have been reported for biguanides and our results presented so far suggest a new one. Clues about drug mechanism of action can be obtained in unbiased manner using genetic perturbation [6]. To obtain an unbiased observation of biological processes affected by metformin, we performed a genome-wide pooled CRISPR/Cas9 KO screen in NALM-6 cells cultured in the presence of metformin at a concentration affecting growth (16 mM).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 1B, the total ACC antibody is missing, and the total AMPK should be replaced, especially since they claim pAMPK increases with metformin and BFB treatment. Additionally, the streptavidin pull-down image in Figure 1F needs to be resized to show the fully cropped section.

      We repeated this experiment three times and added the new figures to the supplemental data. We corrected the main figure in the manuscript with a representative blot for total ACC (Fig 1B).

      (2) Clarify whether ATP5I alone activates mitochondrial respiratory activity or if it functions in a complex with other proteins. Also, explain how metformin affects ATP5I-is it phosphorylated directly or through an upstream target

      ATP5I interacts directly with ATP5L and both proteins form part of the peripheral stack of the F<sub>1</sub>F<sub>0</sub>-ATP synthase. ATP5I and ATP5L play demonstrated roles in the dimerization of the F<sub>1</sub>F<sub>0</sub>-ATP synthase. We discussed that they may affect other functions of the enzyme as part of the peripheral stack which interact with the OSCP (oligomycin sensitivity conferring protein) located in the F1 portion of the enzyme. Further work is needed to understand how ATP5I may affect the interactions between the F0 and F1 parts of the enzyme. We did not investigate whether metformin affects the phosphorylation of ATP5I, but this remains an important question for future studies. The PhosphoSitePlus database indicates that ATP5I undergoes phosphorylation and acetylation at multiple sites, suggesting potential regulatory mechanisms worth exploring.

      (3) Ensure that all immunofluorescence (IF) images include a scale bar.

      Done

      Reviewer #2 (Recommendations for the authors):

      (1) Details of the mass spectrometry analysis and the number of peptides for the proteins identified would increase the merit of the study.

      We added a comprehensive table summarizing these findings (Figure 1- figure supplement 2). We considered all peptides and decided to perform stringent validation tests for those chosen to be further studied.

      (2) The lower NAD/NADH ratios in the ATP5I KO cell lines and the higher ratios with ATP5I expression are convincing data of the cellular redox state of these cells (with variable NDUFB8). Other data sets (e.g. OCR and ECAR and Relative growth, %) are normalized to the respective control and therefore do not show the relative effect of metformin (in control cells) to the ATP5I knock-out. The effects of metformin concentration on the NAD/NADH ratio would provide a direct measure of the extent to which metformin mimics ATP5I KO. This data would be clearer to interpret than Figure 3GHKL; Figures 5EF; S1; S2).

      We did not measure the NAD<sup>+</sup>/NADH in both control and KO cells treated with metformin because we provide now a more direct measurement of metformin acting on ATP5I: oligomerization state F<sub>1</sub>F<sub>0</sub>-ATPase and its vestigial assembly intermediates (Figure 2G) as well as a Seahorse Bionergetic Stress test (Figure 6A-C). Both figures provide results consistent with targeting ATP5I by biguanides. We also discuss that targeting ATP5I can result in complex I inhibition due to the well-known role of F<sub>1</sub>F<sub>0</sub>-ATPases oligomerization in cristae formation and the assembly of the respiratory complexes.

      (3) Figure 6: NAD/NADH data for metformin (16mM) and rotenone (70 nM) /oligomycin 2 uM) would establish whether the concentrations are "matched" to allow a comparison of their gene signatures.

      We used those concentrations based on similar effects on cell growth since the ration NAD/NADH depends on the extent of glycolytic compensation induced by blocking respiration.

      (4) Intramitochondrial accumulation of the biotin conjugate could be demonstrated in Figure 1D from competition between metformin and the biotin-conjugate.

      We appreciate the reviewer’s comment and agree that the resolution provided by fluorescence microscopy makes it challenging to pinpoint the specific mitochondrial compartment where the biotin-biguanide conjugate localizes, even with additional markers such as TOMM20 antibodies for the inner mitochondrial membrane. While it remains a possibility that the conjugate binds to the mitochondrial surface, another plausible explanation is that the biotin moiety may facilitate entry into mitochondria through a biotin-specific transporter, adding further mechanistic intricacies. Furthermore, while a competition assay with metformin might help investigate interactions with mitochondrial targets and transporters (OCT family), it would not compete for biotin-mediated transport. Thus, while we acknowledge the reviewer’s suggestion, we believe such an experiment may not provide conclusive evidence regarding the conjugate’s mitochondrial localization or mechanism of entry. Instead, we revised the manuscript to more accurately describe the findings as "mitochondrial association" rather than "mitochondrial accumulation," ensuring that our interpretation remains consistent with the resolution and limitations of the data presented.

      Reviewer #3 (Recommendations for the authors):

      In addition to my comments for the public review, the manuscript would be strengthened by the following points:

      (1) The abstract needs to be streamlined to communicate more clearly what the paper is about. The last part of the results is not mentioned and is completely disconnected from the ATP5I KO story.

      We have significantly modified our abstract to include both the genetic screening significance and our new findings on the F<sub>1</sub>F<sub>0</sub>-ATP synthase oligomerization.

      (2) Quantifications of the western blots (Figure 1B) are missing. Seems like AMPK total protein levels go down with BFB.

      We quantified the blots.

      (3) How often was the pull-down repeated (Figure 1F)? It would be also important to show this in other cell types, such as pancreatic cancer cells.

      The pull-down was an initial large-scale discovery experiment performed once. However, the findings were subsequently validated in KP-4 pancreatic cancer cells in three independent experiments. As a direct readout of metformin’s impact on ATP5I, we assessed the oligomerization state of the F1ATPase and compared the effects of metformin with those of ATP5I knockout. We show that metformin partially phenocopies the ATP5I KO phenotype, and we reproduced this effect in a second cell line, U2OS osteosarcoma cells.

      (4) Does the KO of ATP5l affect other subunits of the v-ATP5a?

      Yes—we added an immunoblot to document this in Fig. 2A. Notably, ATP5I knockout also reduces ATP5L and OSCP levels.

      (5) Does metformin and BFB itself affect mitochondrial morphology and respiration?

      To evaluate the activity of BFB in comparison with metformin, we performed immunoblot analyses of the AMPK pathway, growth assays, and microscopy-based assessment of mitochondrial morphology. These data are shown in Fig. 1B–D. A more comprehensive analysis of metformin’s effects on mitochondrial respiration has now been added as Fig. 6, using Seahorse measurements and multiple respiratory inhibitors.

      (6) Since there is a strong increase in ECAR, does this correspond to an increase in glucose uptake? Are the proteins or genes involved altered or how to explain the increased flux through glycolysis in ATP5l KO cells?

      This is a very interesting idea, as our CRISPR screen identified several genes that could potentially enhance glycolysis as a vulnerability in metformin-treated cells. In future work, we will explore this biology in greater depth.

      (7) Line 242, for easier understanding, states clearly that metformin reduces growth by x-percent.

      Yes, is a 65-fold chang. We added it to the text.

      (8) The conclusion at the end of the result section is not supported by the data or not well explained. I guess oligomycin will stop the action of metformin on vATP5l, or how to explain this?

      We clarified the conclusion.

      (1) Xian, H., Liu, Y., Rundberg Nilsson, A., Gatchalian, R., Crother, T. R., Tourtellotte, W. G., Zhang, Y., Aleman-Muench, G. R., Lewis, G., Chen, W., Kang, S., Luevanos, M., Trudler, D., Lipton, S. A., Soroosh, P., Teijaro, J., de la Torre, J. C., Arditi, M., Karin, M. & Sanchez-Lopez, E. Metformin inhibition of mitochondrial ATP and DNA synthesis abrogates NLRP3 inflammasome activation and pulmonary inflammation. Immunity 54, 1463-1477 e1411, (2021).

      (2) Hawley, S. A., Ross, F. A., Chevtzoff, C., Green, K. A., Evans, A., Fogarty, S., Towler, M. C., Brown, L. J., Ogunbayo, O. A., Evans, A. M. & Hardie, D. G. Use of cells expressing gamma subunit variants to identify diverse mechanisms of AMPK activation. Cell metabolism 11, 554-565, (2010).

      (3) Wheaton, W. W., Weinberg, S. E., Hamanaka, R. B., Soberanes, S., Sullivan, L. B., Anso, E., Glasauer, A., Dufour, E., Mutlu, G. M., Budigner, G. S. & Chandel, N. S. Metformin inhibits mitochondrial complex I of cancer cells to reduce tumorigenesis. eLife 3, e02242, (2014).

      (4) Hlozkova, K., Pecinova, A., Alquezar-Artieda, N., Pajuelo-Reguera, D., Simcikova, M., Hovorkova, L., Rejlova, K., Zaliova, M., Mracek, T., Kolenova, A., Stary, J., Trka, J. & Starkova, J. Metabolic profile of leukemia cells influences treatment efficacy of L-asparaginase. BMC Cancer 20, 526, (2020).

      (5) Ma, T., Tian, X., Zhang, B., Li, M., Wang, Y., Yang, C., Wu, J., Wei, X., Qu, Q., Yu, Y., Long, S., Feng, J. W., Li, C., Zhang, C., Xie, C., Wu, Y., Xu, Z., Chen, J., Yu, Y., Huang, X., He, Y., Yao, L., Zhang, L., Zhu, M., Wang, W., Wang, Z. C., Zhang, M., Bao, Y., Jia, W., Lin, S. Y., Ye, Z., Piao, H. L., Deng, X., Zhang, C. S. & Lin, S. C. Low-dose metformin targets the lysosomal AMPK pathway through PEN2. Nature 603, 159-165, (2022).

      (6) Bruno, P. M., Liu, Y., Park, G. Y., Murai, J., Koch, C. E., Eisen, T. J., Pritchard, J. R., Pommier, Y., Lippard, S. J. & Hemann, M. T. A subset of platinum-containing chemotherapeutic agents kills cells by inducing ribosome biogenesis stress. Nat Med 23, 461-471, (2017).

    1. eLife Assessment

      This important study introduces an experimental approach for studying Drosophila oviposition rhythms and identifies the subset of circadian clock neurons that mediate the circadian control of oviposition. The authors resolve an inherently noisy rhythm to provide convincing evidence by using statistical averaging techniques, which help reduce this noise but at the cost of variation across individual rhythms. This paper will be of interest to anyone interested in insect ovarian physiology, circadian biology, and reproductive fitness.

    2. Joint Public Review:

      Summary

      Riva et al. introduce a semi-automatic setup for measuring Drosophila melanogaster oviposition rhythms and use it to map the timekeeping function underlying egg laying rhythms to a subset of clock cells. Using a combination of neurogenetic manipulations and referencing the publicly available female hemi-brain connectome dataset, they narrow the critical circuit down to two of the three CRYPTOCHROME expressing lateral-dorsal neurons (CRY[+] LNds). Their findings suggest that different overlapping sets of clock neurons may control different behavioral rhythms in D. melanogaster.

      This work will be of interest to researchers interested in the circadian regulation of oviposition in D. melanogaster (and possibly other insects), a phenomenon which has been left relatively under-explored. The construction of a semi-automated setup which can be made relatively cheaply using available motors and 3D printed molds provides a useful model for obtaining longer records of oviposition activity.

      Strengths

      The authors use a semi-automated monitoring system to detect circadian egg laying rhythms in spite of inherently noisy data. Using this approach they use a variety of different genetic tools to show that CRY+ LNds play a role in generating the circadian rhythm of oviposition, that PDF-expressing neurons seem to be important for maintaining the circadian period of egg laying, and that period locus function is required for the circadian rhythmicity of oviposition. The authors also point to some potentially interesting connectome data that suggest hypotheses regarding the neuronal circuit linking daily timekeeping to oviposition, which will require further validation in future studies.

      Weaknesses:

      The major weaknesses of this work result from the noisy nature of the data, and the need to average the individual records of many animals in order to extract significant rhythmicity values. The predicted neural output pathways will require validation in future studies.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Joint Public Review:

      (1) Problems associated with averaging: The authors intended to focus on the oviposition clock in individual females, however due to the inherent noise in the oviposition rhythm they had to resort to averaging across Lomb-Scargle periodograms generated from individual time-series. They then tested whether the averaged periodogram contains a significant frequency. However, this reduction in noise also reduces the ability to compare differences in power of the rhythm across individuals. Furthermore, this method makes it especially difficult to distinguish the contribution of subsets of the circuit on the proportion of rhythmic flies and the power of the rhythm. In this revised version the authors use two manipulations to disrupt the molecular clock, which could have different success rates based on the type and number of cells targeted. Unfortunately, the type of averaging used prevents the detection of any such effects. It is to be noted that, indeed, individual-level differences in period between the PdfDicerGal4 > perRNAi and UAS-perRNAi lines help the authors to establish that there is a significant reduction in period length when the molecular clock is abolished in PDF cells. These individual measurements are now very helpful in discerning the effect of manipulations carried out on different circadian neural subsets, some of which could have been missed if only averages were considered.

      First, it is important to emphasize that we are certainly not "averaging across LombScargle periodograms". As explained in the paper (and at length in the Supplementary Material), what we do is first to detrend each individual time series, then average _all_ the resulting time series (and not only those of rhythmic individuals), and finally take the Lomb-Scargle periodogram of this average series. Nevertheless, we agree with the reviewer in that the use of averages reduces our ability of understanding what happens at the individual level. The problem is that in most cases the presence of noise has made it difficult to draw any meaningful conclusions. One fortunate exception is the one mentioned by the reviewer. Averaging, on the other hand, has allowed us to extract some useful information in those cases.

      (2) Sensitivity to sample size: Averaging reduces the effect of random background noise but noise reduction is dependent upon sample size. Comparing genotypes with different sample sizes in addition to varying signal to noise ratios (which might also change with neural manipulations) makes it difficult to estimate how much of the rhythm structure is contributed by a given neuronal subset; thus, whenever possible comparisons should be made between groups that include similar number of flies. This problem is compounded when the averaged periodogram is composed of both rhythmic and weakly rhythmic individuals. For instance, in the main text the reported value of period length of pdfDicerGal4 > perRNAi is 20.74h (see also Fig 2J) but in the Supplementary figure 2S1 this is close to 22h, while the values reported for the control are largely similar (24.35h in Fig 2H versus ~24h in Fig 2S1). A difference of 3.6h between control and experimental flies is much greater than 2h. Which estimate (average versus individual) is more reliable in predicting the behavior of these flies is difficult to determine without further experiments.

      In most of the experiments analyzed for this paper the number of flies for control and experimental genotypes are very similar. In the remaining ones, the number of flies for experimental genotypes is roughly twice the number of flies for control genotypes. As mentioned, noise reduction depends on sample size. This implies that, when a genotype is assessed as rhytyhmic the sample size used is evidently large enough. On the other hand, when a genotype is assessed as arrhythmic it is important to know if sample size is large enough. It is for this reason that we have used many more flies for arrhythmic genotypes vs. their control genotypes.

      Regarding the period difference between the average of rhythmic individuals, and the population denoised average, notice first that they are not necessarily excactly the same thing, since our population average uses all flies, and the denoising might introduce some variations over the underlying periods (which would be undetectable without the denoising). Also, and more importantly, Fig. 2S1 shows that for the average of the individual periods the error bars are large, and thus statistically, the reported value for the population average falls within the confidence interval for the individual average.

      (3) Based on the newly provided data for individual fly periodograms the reader can visually evaluate the rhythmicity associated with each genotype. Such visual inspection did not reveal any clear difference between the proportion of rhythmic individuals between experimental and parental GAL4 and/or UAS controls, except for experiments using per01 mutant animals. This is surprising since if these circuits are controlling the oviposition rhythm, perturbing them should affect most individuals in a similar way.

      The problem here is that, given the amount of noise present in this behavior, it is difficult to obtain any reliable information from individual records, since, by its random nature, in a given experiment noise might be disturbing the expected behavior of individuals in very different ways. That is the reason why we have resorted to population averages.

      Other comments

      Disrupting the clock in the 5th sLNv and 3 Cry+ LNds (and weakly in a small subset of DN1) affected egg-laying. Although the work emphasizes the importance of the LNd, the role of the 5th sLNv's role should be discussed.

      As mentioned in the paper, what the experiments show is that the 3 Cry+ LNds and 5th sLNv (usually called E cells) are candidates to be the main drivers of the oviposition rhythm, but the connectomics show that only 2 Cry+ LNds are connected to the oviposition circuit. In order to be more accurate, throughout the corresponding section (now called "The molecular clock in E neurons is necessary for rhythmic egg-laying") of the corrected manuscript we have always referred to the cells marked by the driver as E-cells. In the Discussion, we have added a line commenting that, in the connectome, the 5th sLNv is not connected to any cells of the oviposition circuit.

      Minor corrections:

      In subsection "Two Cry+ LNd neurons directly oviIN", there was a mistake in the use of "E1" and "E2" (their meanings were interchanged). We have corrected this section, giving the correct definitions. We have also corrected some minor english typos.

      Joint Recommendations for the authors:

      (1) Line 234 'to disrupt the molecular clock in (those) neurons', Please clearly describe the cell types in which MB122B driver works.

      We have clarified the cell types in which MB122B driver is expressed (line 236)

      (2) Line 235 gen cycle, should be gen'e' cycle

      The typo has been corrected

      (3) The authors should provide the raw data in repositories as per journal policy of eLife.

      The data are now available at the following links:

      https://github.com/srisaug/flywork/blob/main/RawData_Rivaetal_eLife2025_Fig4_+> UAS-perRNAi.zip

      https://github.com/srisaug/flywork/blob/main/RawData_Rivaetal_eLife2025_Fig4_M 122Bsplit-Gal4>+.zip

      https://github.com/srisaug/flywork/blob/main/RawData_Rivaetal_eLife2025_Fig4_MB122Bsplit-Gal4>UAS-perRNAi.zip

      https://github.com/srisaug/flywork/blob/main/RawData_Rivaetal_eLife2025_Figures1

    1. eLife Assessment

      This study presents valuable computational findings on the neural basis of learning new motor memories and the savings using recurrent neural networks. The evidence supporting the claims of the authors is solid, but it would benefit from more detailed discussion on the specific conditions under which savings emerges from purely implicit mechanisms. This work will be of interest to computational and experimental neuroscientists working in motor learning.

    2. Reviewer #2 (Public review):

      Summary:

      Shahbazi et al. trained recurrent neural networks (RNNs) to simulate human upper limb movement during adaptation to a force field perturbation. They demonstrated that throughout adaptation, the pattern of motor commands to the muscles of the simulated arm changed, allowing the perturbed movements to regain their typical, perturbation-free straight-line paths. After this initial learning block (FF1), the network encountered null-fields to wash out the adaptation, before re-experiencing the force in a second learning block (FF2). Upon re-exposure, the network learned faster than during initial learning, consistent with the savings observed in behavioral studies of adaptation. They also found that as the number of hidden units in the RNN increased, so did the probability of exhibiting savings. The authors concluded that these results propose a neural basis for savings that is independent of context and strategic processes.

      Strengths:

      The paper addresses an important and controversial topic in motor adaptation: the mechanism underlying motor memory. The RNN simulation reproduces behavioral hallmarks of adaptation, and it provides a useful illustration of the pattern of muscle activity underlying human-like movements under both normal and perturbing conditions. While the savings effect produced by the network, though significant, appears somewhat small, the simulation demonstrating an increase in savings with a greater number of hidden units is particularly intriguing.

      Main weakness:

      The introduction details the ongoing debate in the literature regarding the mechanisms underlying savings, particularly whether it stems from explicit or implicit learning processes. However, it remains unclear how the current work addresses this debate. There is already a considerable body of research, particularly in visuomotor adaptation, demonstrating that savings is predominantly driven by explicit strategies (e.g., Morehead et al. 2015, Haith et al., 2015; Huberdeau et al., 2019; Avraham et al., 2021). Furthermore, there have been multiple reports that implicit adaptation exhibits attenuation upon relearning (Avraham et al., 2021, Leow et al., 2020; Yin and Wei, 2020; Hamel et al., 2021; Hamel et al., 2022; Wang and Ivry, 2023; Hadjiosif et al., 2023). In the discussion, the authors acknowledge that their goal was not to model a complete explicit-implicit system, but rather to probe how savings may emerge from a purely implicit mechanism. Given the central debate introduced by the authors, the manuscript would benefit from a more detailed discussion explaining how their findings elucidate the specific conditions under which savings emerge from purely implicit mechanisms versus when cognitive strategies predominate.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Shahbazi et al used a recurrent neural network model trained to control a musculoskeletal model of the arm to investigate how neural populations accommodate activity patterns underpinning savings. The paper draws upon the recent finding of a "uniform shift" in preparatory activity in monkey motor cortex associated with savings, and leverages full access to a computational model to establish causality.

      Strengths:

      The paper is well written, and the figures are clearly presented. The key finding that the uniform shift first reported based on neural recordings by Sun et al. emerges in artificial neural networks performing a similar task is interesting and well-backed by their analyses. Manipulating this uniform shift to show that it drives behavioural savings is an important causal confirmation of the proposal by Sun et al.

      Weaknesses / Comments:

      As mentioned earlier, the core results are well backed by the analyses. Most of my comments relate to adding more controls and additional questions that could be explored with the model to strengthen the paper.

      (1) Savings are quantified as more rapid relearning of the FF upon re-exposure (e.g., Figure 3). This finding is based on backpropagation through time, but would this hold when using a different optimiser, e.g., FORCE?

      This is an interesting question, and indeed, there are an increasing number of studies addressing how different neural network learning rules may affect the kinds of representations that arise after learning (Codol et al., 2024). However the focus of the present paper is not on which neural network approaches or which specific optimisers produce savings, rather, the focus is on the basis and neural geometry of savings when it emerges.

      We have added a short paragraph to the Discussion section [lines 349-355] to address this:

      “The present results are based on RNNs trained in an error-based approach using backpropagation through time (Werbos, 1990) using the Adam optimizer (Kingma and Ba, 2014). Other techniques for training RNNs have been proposed including the FORCE algorithm (Sussillo and Abbott, 2009). In addition, several recent reports have demonstrated success using reinforcement learning approaches to train neural networks in the context of sensorimotor control tasks (Lillicrap et al., 2015; Codol et al., 2024a). An interesting avenue for future work is to determine how the present results may or may not generalize to different neural network architectures and learning rules.”

      (2) The authors should include a "null model" showing that training on a different reaching task following NF, as opposed to FF2, won't show something akin to a uniform shift during preparation due to the adoption of TDR and having similar targets.

      This is a critical point. Training on a different reaching task other than FF2 (e.g. a different force field) will indeed result in a uniform shift, but critically, a shift in a different direction in neural state space than the uniform shift associated with FF2. The central focus of the present paper is to show that when there remains a non-zero projection of preparatory neural activity along the direction of the uniform shift associated with a given learning task, this residual projection underlies savings when networks are subsequently re-exposed to the same task.

      In the Results section we had included a short paragraph to describe control simulations that we performed that address this concept. We have expanded this text and added a Figure and the results of statistical tests to better describe this control [lines 179-187]:

      “As an additional control we trained networks after the growing up phase on an opposing force field (CCW) and then as above, exposed the networks to a NF washout phase, and then to a CW force field. In this case no savings was observed in the CW force field, either for initial lateral deviation, or for learning rate (Figure 3). In fact, we observed that initial lateral deviation is larger for the novel force field (t(39)=-4.918, p=1.6e-5). This observation is in line with the finding that learning opposing force fields sequentially results in interference (Sun et al., 2022). The results of these control simulations underscore that the savings effect observed in our main study was learning-specific—it was due to prior learning of the CCW force field, and not a general effect of learning any novel dynamics.”

      (3) The analyses of network activity during movement preparation (Figure 4) nicely replicate the key finding in Sun et al, but I think the authors could leverage the full access to their network and go further, e.g., by examining changes (or the lack of) during execution in FF2 with respect to FF (and perhaps in a future NF2 with respect to NF), including whether execution activity lives also lives in parallel hyperplanes, etc.

      We agree that a visualization of the neural activity during movement would be beneficial to the reader. To address this we have added a new Figure (Fig. 6) and associated text [lines 210-219]. The Figure shows the neural trajectories when the RNNs are first exposed to the FF1 and when they are first exposed to FF2 (after NF2 washout). Trajectories are plotted in 3D corresponding to the first 3 principal components, starting at the go cue and ending 200 ms into the movement, for each of the 8 movement targets.

      “The neural trajectories for preparation and for movement can be visualized in principal component space. Figure 6 shows trajectories during planning and early execution for initial FF1 and FF2 exposure. Hidden unit activity was subjected to a principal components analysis, and neural trajectories within the first three PCs are shown for movements to each of the eight movement targets. Filled circles indicate neural state 200 ms prior to the go cue. During the preparatory period trajectories travel along PC1 and then disperse across PC2 and PC3 into the circular pattern indicated by the filled stars, which indicate time of the go cue (also see Figure 5A). After the go cue neural trajectories shift back along PC1 and rotate along oscillatory patterns characteristic of populations of motor cortical neurons in non-human primates during movement (Churchland and Shenoy, 2024).”

      (4) Related to the above, while the results are interesting and the paper is well done, I kept wishing that the authors had done "more" with their model. This could be one or two final sections on "predictions" that would nicely complement their "validation" of the uniform shift, and that, in my opinion, would greatly increase the impact of the paper. In particular:

      (a) What would be the effect of learning more "tasks"? For example, is there a limit on how many fields can be learned? (You show something related by manipulating network size, but this is slightly different.)

      These are interesting questions and to some extent they are already addressed in the paper. Of course, the number of tasks that a network is able to learn, will be related to how much those tasks overlap in a control space. Indeed, this idea goes back to early theoretical accounts of connectionist models such as Hopfield nets and capacity for representing information (Hopfield, 1982; Hopfield et al., 1983). The control simulations that we described in the paper [lines 179-187 and Figure 4] are a test of one extreme version of this, in which two tasks are in direct opposition to each other (opposite force fields), and in this situation no savings emerges. We believe it is an interesting question, but beyond the scope of the present paper to undertake a comprehensive exploration of the nature of task-overlap in upper limb reaching learning tasks.

      (b) Figure 5 is a nice causal demonstration that the uniform shift is related to savings. However, and related to comment #3, it'd be interesting to see more details about how the behaviour and the network activity changes as preparatory activity shifts along this axis, in particular regarding how moving the preparatory states affect the organisation and dynamics of upcoming execution activity -these are the kind of intuitions that modelling studies like this one can provide.

      This has been addressed above by the changes we made to address the reviewer’s comment #3.

      (c) The authors focus on a task design that spans baseline, FF, NF, FF2 to replicate the original study by Sun et al. However, it would be interesting if they generated predictions for neural changes to other types of tasks that have been studied behaviourally. These could include, for example: (i) modelling a visuomotor rotation or a mirror reversal task; (ii) having to adapt to a FF in the opposite direction; (iii) investigating the role of adding an explicit context and having the networks learn multiple FF; and (iv) trying to learn FF fields in opposite directions, perhaps restricted to specific targets. As the authors know, all these questions and more have been studied with similar behavioural paradigms, and it would be nice to see what neural predictions are generated by this model.

      See responses above e.g. to comment 4. We have clarified the text and provided a new Figure to illustrate our opposite FF control simulations. The other suggestions about visumotor rotations, and contextual cues, are interesting and potentially important questions that we are working on, but we believe are beyond the scope of the current paper which is focused specifically around the question of savings in FF learning.

      (5) On the Discussion: When extrapolating from neural network results to animals, the fact that your networks can learn implicitly doesn't mean that animals do learn implicitly. Indeed, I think the consensus view is that different perturbations may lead to the expression of different types of savings (e.g., FF vs VR, which seems to be more explicit). Besides, these different mechanisms may be primarily implemented by brain regions less directly tied to motor control (e.g., cerebellum, parietal cortex?), which are not directly implemented in the authors' model.

      Of course the reviewer is correct that our simulations are not evidence that savings in motor tasks learned by animals is only implicit, and we do not make any such claims in the paper. The model we describe in the present paper is not meant to be a comprehensive model of motor learning in humans/animals. Indeed, the pure “context free” type of learning that we implement in our simulations basically cannot occur in animals, because there is always some information that provides contextual information. Indeed there are computational models of motor learning that include these effects, e.g. the COIN model (Heald et al., 2021). Our model however provides a useful window into what the context-free component of savings may look like. The approach we describe in the present paper is a powerful way to probe the context-free component of savings in isolation in a way that is not possible (at least not readily) in animals/humans. We have modified the text in the Discussion [lines 372-379] to better articulate this point.

      “The simulations described here do not constitute evidence that savings in motor learning tasks is exclusively implicit in animals and humans. The purely context-free learning implemented in our simulations is highly unrealistic, as some form of contextual information is invariably available. Indeed, computational models of motor learning that incorporate contextual effects already exist, e.g. (Heald et al. 2021). Nevertheless, our simulations provide a useful window into what the context-free component of savings may look like. This approach offers a powerful means of probing the context-free component of savings in isolation—something that is not readily achievable in animal or human experiments.”

      Reviewer #2 (Public review):

      Summary:

      Shahbazi et al. trained recurrent neural networks (RNNs) to simulate human upper limb movement during adaptation to a force field perturbation. They demonstrated that throughout adaptation, the pattern of motor commands to the muscles of the simulated arm changed, allowing the perturbed movements to regain their typical, perturbation-free straight-line paths. After this initial learning block (FF1), the network encountered null-fields to wash out the adaptation, before re-experiencing the force in a second learning block (FF2). Upon re-exposure, the network learned faster than during initial learning, consistent with the savings observed in behavioral studies of adaptation. They also found that as the number of hidden units in the RNN increased, so did the probability of exhibiting savings. The authors concluded that these results propose a neural basis for savings that is independent of context and strategic processes.

      Strengths:

      The paper addresses an important and controversial topic in motor adaptation: the mechanism underlying motor memory. The RNN simulation reproduces behavioral hallmarks of adaptation, and it provides a useful illustration of the pattern of muscle activity underlying human-like movements under both normal and perturbing conditions. While the savings effect produced by the network, though significant, appears somewhat small, the simulation demonstrating an increase in savings with a greater number of hidden units is particularly intriguing.

      Weaknesses:

      (1) To be transparent, savings in motor adaptation have been a primary focus of my own research. Some core findings presented in this paper are at odds with the ideas I and others have previously put forward. While I don't want to impose my agenda on the authors of this paper, I do think the authors should address these issues.

      (a) The authors acknowledge the ongoing debate in the literature regarding the mechanisms underlying savings, particularly whether it stems from explicit or implicit learning processes. However, it remains unclear how the current work addresses this debate. There is already a considerable body of research, particularly in visuomotor adaptation, demonstrating that savings is predominantly driven by explicit strategies. For example, when people are asked to report their strategy, they recall a strategy that was useful during the first learning block (Morehead et al. 2015). Furthermore, savings are abolished under experimental manipulations designed to eliminate strategic contributions (e.g., Haith et al., 2015; Huberdeau et al., 2019; Avraham et al., 2021). The authors briefly state that their findings support the hypothesis that a neural basis of memory retention underlying savings can be independent of cognitive or strategic learning components, and that savings can be characterized as implicit. While these statements may be true, it is not clear how this work substantiates these claims.

      We have addressed a similar point raised by Reviewer 1, see point #5 above. Our work represents an example of how savings can occur from implicit mechanisms in the absence of explicit contextual cues. Our goal is not to resolve the debate about how this occurs in humans/animals. Rather, our model provides a useful window into what the context-free component of savings may look like. Our approach is a powerful way to probe the context-free component of savings in isolation in a way that is not possible (at least not readily) in animals/humans. We have modified the text in the Discussion [lines 372-379] to better articulate this point.

      “The simulations described here do not constitute evidence that savings in motor learning tasks is exclusively implicit in animals and humans. The purely context-free learning implemented in our simulations is not meant to be a full model of biological learning, as in biological systems some form of contextual information is invariably available. Indeed, computational models of motor learning that incorporate contextual effects already exist, e.g. (Heald et al. 2021). Nevertheless, our simulations provide a useful window into what the context-free component of savings may look like. This approach offers a powerful means of probing the context-free component of savings in isolation—something that is not readily achievable in animal or human experiments.”

      (b) Our research has also demonstrated that if implicit adaptation is completely washed out after the initial learning block, it not only fails to exhibit savings but is actually attenuated relative to the first learning block (Avraham et al., 2021). This phenomenon of attenuation upon relearning can also be seen in other studies of visuomotor adaptation (e.g., Leow et al., 2020; Yin and Wei, 2020; Hamel et al., 2021; Hamel et al., 2022; Wang and Ivry, 2023; Hadjiosif et al., 2023). More recently, we have shown that this attenuation is due to anterograde interference arising from the experience with the washout block experience (Avraham and Ivry, 2025). We illustrated that the implicit system is highly susceptible to interference; it doesn't require exposure to salient opposite errors and can occur even following prolonged exposure to veridical feedback. The central thesis of this paper, namely that implicit savings can emerge through RNNs, is at odds with these empirical results. The authors should address this discrepancy.

      These empirical results are interesting and intriguing, and we agree that they are relevant in the context of the debate about the relative contributions and interactions between explicit and implicit learning systems and savings. Importantly, contextual interference is impossible in our model, since there are no contextual cues about which force field is present or absent. Interactions between an explicit system and an implicit learning system are also impossible in our model, since there is no possibility of context-driven explicit learning or memory. The approach we have taken in the present paper is not to model a full explicit plus implicit learning system but rather to probe how savings may emerge from a purely implicit learning mechanism alone and to compare the neural geometry underlying this implicit-drive savings to the neural recording results from monkey electrophysiology studies. Nevertheless we have added some text to the Discussion [lines 380-391] to situate our findings in the context of the studies mentioned above by the reviewer.

      “Recent empirical work suggests that relearning after washout of implicit adaptation can be attenuated rather than facilitated, a phenomenon attributed to anterograde interference from the washout phase (Avraham et al., 2021; Hadjiosif et al., 2023; Hamel et al., 2022, 2021; Leow et al., 2020; Wang and Ivry, 2025; Yin and Wei, 2020). The savings observed in our simulations differs from these behavioral findings. Crucially, our model excludes both contextual interference (since no cues signal which force field is present) and explicit-implicit interactions (since context-driven explicit learning is absent). Our goal was not to model a complete explicit-implicit system, but rather to probe how savings may emerge from a purely implicit mechanism and to compare the underlying neural geometry to monkey electrophysiology data. Our results suggest that high-dimensional neural circuits possess an intrinsic capacity for savings via persistent preparatory traces. How and when this capacity may be masked by interference or explicit-implicit interactions in biological systems remains an open question for future work.”

      (2) This brings me to the question about neural correlates: The results are linked to activity in the primary motor cortex. How does that align with the well-established role of the cerebellum in implicit motor adaptation? And with the studies showing that savings are due to explicit strategies, which are generally associated with prefrontal regions?

      The modeling approach we use in the present paper is area agnostic, and we do not include different neural modules to represent specific brain areas such as cerebellum or prefrontal regions. In the current approach we specifically exclude explicit strategies, as a way to specifically probe implicit mechanisms alone. Also see response to reviewer 1 comment 5 above.

      (3) The analysis on the complexity of the neural network (i.e., the number of hidden units) and its relationship to savings is very interesting. It makes sense to me that more complex networks would show more savings. I'm not sure I follow the author's explanation, but my understanding is that increased network complexity makes it more difficult to override the formed memory through interference (e.g., from the experience with NF2). Also, the results indicate that a network with 32 units led to a less-than-chance level of networks exhibiting savings (Figure 3b). What behavioral output does this configuration produce? Could this behavior manifest as attenuation upon relearning? Furthermore, if one were to examine an even smaller, simpler network (perhaps one more closely reflecting cerebellar circuits), would such a model predict attenuation rather than savings?

      These are interesting questions, and are potentially important, for future work to explore. Our interpretation of the results of smaller networks is that these small RNNs fail to show savings presumably because the learned FF behavior is 'erased' during washout because of the limited capacity to retain the FF learning in a distinct neighborhood in neural state space. Our paper is focused specifically on the relationship between savings, implicit learning, and neural capacity via network size, in the context of the monkey electrophysiology results in motor cortex. It would be interesting in future work to explore a cerebellar-like modeling approach.

      (4) The authors emphasize that their network did not receive any explicit contextual signals related to the presence or absence of the force field (FF), thus operating in a 'context-free' manner. From my understanding, some existing models of context's role in motor memories (e.g., Oh and Schweighofer, 2019; Heald et al., 2021) propose that memory-related changes can be observed even without explicit contextual information, as contextual changes can be inferred from sudden or significant environmental shifts (e.g., the introduction or removal of perturbations). Given this, could the observed savings in the current simulation be explained by some form of contextual retrieval, inferred by the network from the re-presentation of the perturbation in FF2?

      It is important to note that this is not possible in the context of the modeling approach described in the present paper. For example, in trial 1 of FF2, because the network has no contextual cue signaling the FF’s presence, the network has no information before movement begins that a FF will be present during movement (recall that the FF is velocity-dependent, and so is zero before movement begins). Once the network encounters the FF during movement, some component of its response I suppose could be described as contextual inference derived from effector state (similar to the account described in the COIN model), but strictly speaking the model is only responding to what it encounters in the moment. Any change in behaviour due to prior learning (e.g. savings) is due to the interaction between the residual learning-related neural state (e.g. the uniform shift), the effector state in the moment, and the errors encountered during movement. We don’t interpret this as “inference” in the traditional sense of an explicit learning system.

      (5) If there is residual hidden unit activity related to the FF at the end of the NF2 phase, how does the simulated movement revert back to baseline? Are there any differences in the movement trajectory, beyond just lateral deviation, between NF1 and NF2? The authors state that "changes in the preparatory hidden unit activity did not result in substantive changes in the motor commands (Figure 5b), which emphasizes that the uniform shift resides in the null space of motor output." However, Figure 5b appears to show visible changes in hidden unit activity. Don't these changes reflect a pattern of muscle activity that is the basis for behavior? These changes are indeed small, but it seems that so is the effect size for savings (Figure 3a). Could this suggest that there is not, in fact, a complete washout of initial learning during NF2 within the network?

      This is precisely the point of the paper, i.e. to show that neural activity during the preparatory period before movement onset is different, even though the behaviour during the preparatory period is the same (i.e. no muscle activity and no movement). This recapitulates the empirical findings from the neural data reported in the Sun et al. (2022) paper.

      The reviewer asks “Don't these changes reflect a pattern of muscle activity that is the basis for behavior?” Yes indeed they do, but not during the NF and not during the preparatory activity prior to movement onset.

      The reviewer asks “Could this suggest that there is not, in fact, a complete washout of initial learning during NF2 within the network?” We addressed this in the paper (Results/Washout) by comparing kinematics after washout to that prior to FF learning; e.g. any differences in lateral deviation of the hand path for the entire reach trajectory was in the range of 0.1 mm, which is less than 0.25 % of the lateral deviation encountered in the FF and only 0.1 % of the reach distance (10 cm).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 1c, lower panel: Is this from the early or late stage of FF1?

      This is an example movement after learning in a null field (NF). We have clarified this in the Figure caption.

      (2) Please clarify what the two panels in Figure 1e represent.

      We have clarified in the Figure caption that these are activity from two example hidden units.

      (3) If Figure 2c is intended to illustrate the changes in motor commands for individual muscles, consider reorganizing the plots by muscle to more clearly show the change for each muscle from NF1 to FF1.

      The point here is not to make fine-grained comparisons between specific muscles, rather to show a general example of how muscle activity is different. For the sake of visual simplicity in a Figure that already has many components we have decided to keep Figure 2c the same.

      (4) The text mentions that no savings were observed when the network was trained on CCW followed by CW perturbations. However, no data or statistical analysis is presented to support this claim. I wonder if the authors would expect attenuated learning when exposed to the CW perturbation, given a memory of the opposite perturbation.

      We have added a Figure to provide data for the FF opposite control.

      (5) The relevance of the discussion on choking under pressure to the paper wasn't clear.

      We have modified the relevant text in the Discussion section [lines 356-363] to clarify the relevance of the present work to other recent work on how complex features of motor behaviour can arise due to the dynamics of preparatory neural activity in motor cortex.

      References

      Avraham G, Morehead JR, Kim HE, Ivry RB. 2021. Reexposure to a sensorimotor perturbation produces opposite effects on explicit and implicit learning processes. PLoS Biol 19:e3001147. doi:10.1371/journal.pbio.3001147

      Codol O, Krishna NH, Lajoie G, Perich MG. 2024. Brain-like neural dynamics for behavioral control develop through reinforcement learning. bioRxiv. doi:10.1101/2024.10.04.616712

      Hadjiosif AM, Morehead JR, Smith MA. 2023. A double dissociation between savings and long-term memory in motor learning. PLoS Biol 21:e3001799. doi:10.1371/journal.pbio.3001799

      Hamel R, Dallaire-Jean L, De La Fontaine É, Lepage JF, Bernier PM. 2021. Learning the same motor task twice impairs its retention in a time- and dose-dependent manner. Proc Biol Sci 288:20202556. doi:10.1098/rspb.2020.2556

      Hamel R, Lepage J-F, Bernier P-M. 2022. Anterograde interference emerges along a gradient as a function of task similarity: A behavioural study. Eur J Neurosci 55:49–66. doi:10.1111/ejn.15561

      Heald JB, Lengyel M, Wolpert DM. 2021. Contextual inference underlies the learning of sensorimotor repertoires. Nature 600:489–493. doi:10.1038/s41586-021-04129-3

      Hopfield JJ. 1982. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci U S A 79:2554–2558. doi:10.1073/pnas.79.8.2554

      Hopfield JJ, Feinstein DI, Palmer RG. 1983. “Unlearning” has a stabilizing effect in collective memories. Nature 304:158–159. doi:10.1038/304158a0

      Leow L-A, Marinovic W, de Rugy A, Carroll TJ. 2020. Task errors drive memories that improve sensorimotor adaptation. J Neurosci 40:3075–3088. doi:10.1523/JNEUROSCI.1506-19.2020

      Wang T, Ivry RB. 2025. Contextual effects during sensorimotor adaptation are an emergent property of population coding in a cerebellar-inspired model. Sci Adv 11:eadr4540. doi:10.1126/sciadv.adr4540

      Yin C, Wei K. 2020. Savings in sensorimotor adaptation without an explicit strategy. J Neurophysiol 123:1180–1192. doi:10.1152/jn.00524.2019

    1. eLife Assessment

      This study provides compelling evidence that action potential (AP) broadening is not a universal feature of homeostatic plasticity in response to chronic activity deprivation. By leveraging state-of-the-art methods across multiple brain regions and laboratories, the authors demonstrate that AP half-width remains largely stable, challenging previous assumptions in the field. These important findings help resolve longstanding inconsistencies in the literature and significantly advance our understanding of neuronal network homeostasis. The authors have clarified methodological differences with prior work and expanded the discussion of potential mechanisms, strengthening the interpretation of the findings without altering the central conclusions.

    2. Reviewer #1 (Public review):

      [Editors' note: The Reviewing Editor has assessed the revised manuscript without seeking further input from the original reviewers. The authors have addressed the main points raised during peer review, including clarifying methodological differences with prior work, providing additional analysis, and expanding the discussion of potential mechanisms. These revisions strengthen the interpretation and presentation of the findings, and the conclusions remain supported by the data.]

      Summary:

      Ritzau-Jost et al. investigate the potential contribution of AP broadening in homeostatic upregulation of neuronal network activity with a specific focus on dissociated neuronal cultures. In cultures obtained from a few brain regions from mice or rats using different culture conditions and examined by different laboratories, AP half-width remained stable despite chronic activity block with TTX. The finding suggests that AP width is not significantly modulated by changes in sodium channel activity.

      Strengths:

      The collaborative nature of the study amongst the neuronal culture experts and the rigorous electrophysiological assessments provides for a compelling support of the main conclusion.

    3. Reviewer #2 (Public review):

      Summary:

      This study reexamined the idea that action potential broadening serves as a homeostatic mechanism to compensate for changes in network activity. The key finding was that, while action potential broadening does occur in certain neurons - such as CA3 pyramidal cells-it is far from a universal response. This is important because it helps resolve longstanding discrepancies in the field, thereby contributing to a better understanding of network dynamics. The replication of these findings across multiple laboratories further strengthened the study's rigor.

      Strengths:

      Mechanisms of network homeostasis are essential to understand network dynamics.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript "Unreliable homeostatic action potential broadening in cultured dissociated neurons" by Ritzau-Jost et al. investigates action potential (AP) broadening as a mechanism underlying homeostatic synaptic plasticity. Given the existing variability in the literature concerning AP broadening, the authors address an important and timely research question of considerable interest to the field.

      The study systematically demonstrates cell-type- and model-specific AP broadening in hippocampal neurons after chronic treatment with either tetrodotoxin (TTX) or glutamatergic transmission blockers. The findings indicate AP broadening in CA3 pyramidal neurons in organotypic cultures after TTX treatment, but notably not in dissociated hippocampal neurons under identical conditions. However, blocking glutamatergic neurotransmission caused AP broadening in dissociated hippocampal neurons. Moreover, extensive evaluations in neocortical dissociated cultures robustly challenge previous findings by revealing a lack of AP broadening following TTX treatment. Additionally, the proposed role of BK-type potassium channels in mediating AP broadening is convincingly questioned through complementary electrophysiological and voltage-imaging experiments.

      Strengths:

      The manuscript exhibits an outstanding experimental design, employing state-of-the-art techniques and a rigorous multi-lab validation approach that greatly enhances scientific reliability. The experimental results are meticulously illustrated, and the conclusions drawn are justified and supported by the presented data. Furthermore, the manuscript is comprehensively and clearly written.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Ritzau-Jost et al. investigate the potential contribution of AP broadening in homeostatic upregulation of neuronal network activity with a specific focus on dissociated neuronal cultures. In cultures obtained from a few brain regions from mice or rats using different culture conditions and examined by different laboratories, AP half-width remained stable despite chronic activity block with TTX. The finding suggests that AP width is not significantly modulated by changes in sodium channel activity.

      Strengths:

      The collaborative nature of the study amongst the neuronal culture experts and the rigorous electrophysiological assessments provides for a compelling support of the main conclusion.

      Weaknesses:

      Given the negative nature of the results, a couple of remaining issues (such as the cell density of cultures and the presentation of imaging experiments with a voltage sensor) warrant further consideration. In addition, a discussion of the reasons for the I stability of AP half-width to sodium channel modulation might help extend the scope of the study beyond the presentation of a negative conclusion.

      We would like to thank the reviewer for positively evaluating our manuscript. Please find below our detailed point-to-point response to the reviewer’s comments.

      Reviewer #2 (Public review):

      Summary:

      This study reexamined the idea that action potential broadening serves as a homeostatic mechanism to compensate for changes in network activity. The key finding was that, while action potential broadening does occur in certain neurons - such as CA3 pyramidal cells-it is far from a universal response. This is important because it helps resolve longstanding discrepancies in the field, thereby contributing to a better understanding of network dynamics. The replication of these findings across multiple laboratories further strengthened the study's rigor.

      Strengths:

      Mechanisms of network homeostasis are essential to understand network dynamics.

      Weaknesses:

      No weaknesses were noted by this reviewer.

      We would like to thank the reviewer for the positive evaluation of our manuscript. Please find below our detailed point-to-point response to the reviewer’s comments.

      Reviewer #3 (Public review):

      Summary:

      The manuscript "Unreliable homeostatic action potential broadening in cultured dissociated neurons" by Ritzau-Jost et al. investigates action potential (AP) broadening as a mechanism underlying homeostatic synaptic plasticity. Given the existing variability in the literature concerning AP broadening, the authors address an important and timely research question of considerable interest to the field.

      The study systematically demonstrates cell-type- and model-specific AP broadening in hippocampal neurons after chronic treatment with either tetrodotoxin (TTX) or glutamatergic transmission blockers. The findings indicate AP broadening in CA3 pyramidal neurons in organotypic cultures after TTX treatment, but notably not in dissociated hippocampal neurons under identical conditions. However, blocking glutamatergic neurotransmission caused AP broadening in dissociated hippocampal neurons. Moreover, extensive evaluations in neocortical dissociated cultures robustly challenge previous findings by revealing a lack of AP broadening following TTX treatment. Additionally, the proposed role of BK-type potassium channels in mediating AP broadening is convincingly questioned through complementary electrophysiological and voltage-imaging experiments.

      Strengths:

      The manuscript exhibits an outstanding experimental design, employing state-of-the-art techniques and a rigorous multi-lab validation approach that greatly enhances scientific reliability. The experimental results are meticulously illustrated, and the conclusions drawn are justified and supported by the presented data. Furthermore, the manuscript is comprehensively and clearly written.

      Weaknesses:

      Concerning the statistical analyses employed, it is advisable to consider the Kruskal-Wallis test with corrections for multiple comparisons when evaluating more than two experimental groups.

      We would like to thank the reviewer for the positive evaluation of our manuscript. In the following we first address the comment regarding the used statistical tests. Please also find below the detailed response to the reviewer’s further comments. Indeed, we did not apply a correction for multiple comparisons in Figure 2. This seems justified because in this exceptional case we are more worried about type II errors (false negative). The Kruskal-Wallis test seems not appropriate for this type of data for which only the comparison between the control and respective TTX data is relevant. Instead, we followed the reviewer’s suggestion by applying corrections for false discovery rate (FDR). We thank the reviewer for pointing out this statistical issue and addressed it in the revised manuscript (lines 121–128):

      “Even though AP durations varied up to 2-fold between conditions, statistically significant homeostatic AP broadening was not detectable in any of the tested conditions (Fig. 2B). To minimize type II errors (false negative) we intentionally did not apply a correction for multiple comparisons. The only significance was observed in condition III but in an opposite direction (i.e. AP narrowing with TTX, P=0.026; Fig. 2B). However, this is likely a false positive because application of corrections for false discovery rate results in P=0.268 for both Benjamini–Hochberg and Bonferroni correction.”

      Recommendations for the authors:

      Reviewing Editor Comments:

      The main and most important observation of the study is that the AP does not change in most cases examined. A discussion of the mechanisms of the changes in CA3 neurons would significantly strengthen the compelling evidence presented. The individual reviews are also provided, in case the authors find them useful to include other aspects suggested by the reviewers.

      We would like to thank the Reviewing Editor for handing our manuscript and for the positive evaluation of our work. The main focus of our study was the analysis of homeostatic plasticity in cultured neurons of the neocortex. We agree that the findings in CA3 neurons are interesting. As explained in more detail below, we have carefully discussed the mechanisms of the changes in CA3 neurons in the revised manuscript.

      Reviewer #1 (Recommendations for the authors):

      Major points

      (1) AP widths measured in the present study under basal conditions are generally larger than the value reported in previous work by Li et al. 2020 (~1.5 ms). In particular, rat cortical cultures prepared using the same conditions show that the mean AP half-width in controls of the present study (~2.5 ms) is closer to the mean AP half-width in TTX-treated neurons in Li et al. (~2.0 ms).

      We thank the reviewer for the detailed and positive feedback as well as for the thoughtful questions. The inconsistency of action potential half-duration reported in our and Li et al.’s data is partially due to differences in the way the half-duration was measured. In Li et al. the exact method is unfortunately not defined, but from a personal communication with the authors we know that they measured half-duration based on the AP amplitude between AP peak and AP voltage threshold. In contrast, we measured half-duration based on the AP amplitude between AP peak and the resting membrane potential preceding current injections. When we measure AP half-duration instead from voltage threshold, the average half-durations are 1.97 ms (compared to 2.64 ms from baseline, n = 106 cells; average across conditions I–IV, control and TTX merged). Thus, the discrepancy in the half-duration is to a significant proportion due to methodical differences in the way the half-duration was measured.

      One parameter that is not stated in either study is cell plating density, which can potentially bias the neuronal network activity levels of cultures. Could the authors comment on the possible contribution of neuronal culture density to AP half-width under basal recording conditions and its sensitivity to chronic TTX treatment? Are there any data available? For example, cultures used by Li et al may have been plated at a high density and experienced high activity level during culturing, which could have contributed to the enhanced sensitivity to chronic activity suppression by TTX.

      We agree that neuronal culture density is an important factor influencing neuronal activity and hence potentially also the sensitivity to chronic activity suppression. In our experiments, the number of plated cells per cover slip varied between conditions about 3-fold: 30–50k cells for conditions I and II, 25–30k cells for conditions III, VII, XI, 50k cells for condition IV, 65k for conditions V, VI and VIII, and 70k cells for conditions IX and X. Li et al. do not provide the cell density or the number of plated cells. Despite the difference in the number of plated cells in our dataset across various laboratories, we did not observe a systematic effect of cell number on baseline AP half-duration. Furthermore, we observed strongly different baseline activity across our various experimental conditions (Fig. 3A), which did not correlate with cell density. Also, we did not notice an impact of baseline activity on the sensitivity to chronic activity suppression with TTX (cf. Fig. 3A and 2B). We have now added the number of plated cells per condition to the methods section as well as the following paragraph to the discussion section (lines 256–262):

      “The sensitivity to chronic TTX treatment might depend on baseline neuronal activity, which is in part related to neuronal culture density[37]. However, TTX did not induce AP broadening despite different baseline activities (Fig. 3A) and a nearly threefold variation in the number of plated cells per cover slip between conditions (25k – 70k cells per coverslip).”

      In addition, a discussion of the reasons for the seeming stability of AP half-width to sodium channel modulation might help extend the scope of the study beyond the presentation of a negative conclusion.

      We thank the reviewer for this suggestion and have added a paragraph to the end of the discussion emphasizing potential advantages of cell-type specific AP broadening (lines 353–362):

      “Despite the lack of homeostatic, TTX-induced AP broadening in dissociated cultures, AP duration was broadened upon Kyn-treatment in dissociated cultures and using TTX in CA3 neurons in organotypic cultures. Because BK-channels control AP duration in CA3 neurons of organotypic cultures[79], homeostatic BK-channel downregulation as proposed by Li et al. may be involved in AP broadening in this specific cell type. While the reasons for the variable occurrence of homeostatic AP broadening remain unknown, this may render neuronal circuitries more robust to perturbations. The regulation of AP duration therefore might represent one element in the repertoire of neuronal plasticity that is, similar to other plasticity mechanisms, not generally shared, but specifically expressed in some cell types and neuronal compartments.”

      (2) In this study, CA3 neurons in organotypic cultures were the only cells that showed AP broadening with TTX treatment. Notably, CA3 neurons show strong recurrent activity in general and would be expected to have experienced high levels of activity in culture. For CA3 neurons in organotypic cultures, does IbTx increase basal AP half-width?

      We thank the reviewer for this interesting idea. Even though, to our knowledge, there is no study investigating the effect of IbTx on AP width in CA3 neurons of organotypic cultures, Raffaelli et al. (DOI 10.1113/jphysiol.2004.062661) reported ~15% AP broadening using the BK-channel blocker paxilline. Therefore, TTX-induced broadening in CA3 neurons might be related to BK-channel-dependent AP repolarisation, consistent with the model proposed by Li et al. Because organotypic cultures show increased activity for longer cultivation periods and higher connectivity compared to acute slices (De Simoni et al., DOI 10.1113/jphysiol.2003.039099), the effect of TTX may be aggravated in organotypic cultures compared to acute slices or in vivo. However, the lack of a TTX-effect was not dependent on background neuronal activity or culture density in our recordings (see above as well as lines 306–310 of the revised manuscript).

      (3) Figures 4E-G. In experiments to test the efficacy of IbTx with GEVI, larger fields of view of neuron(s) used for recordings should be included. As shown, it is difficult to discern the quality of the preparation and does not provide a representative indication of the type of signals measured.

      We thank the reviewer for this suggestion and have included an image of a representative neuron expressing the GEVI in Fig. 4E.

      Minor points

      (1) Lines 222-228. With respect to cell-type specificity of TTX-induced AP broadening, the observed lack of effect of TTX in dissociated hippocampal cultures might suggest that the cultures are predominantly DG granule cells and CA1 neurons, with few CA3 neurons surviving. Could the authors comment?

      We thank the review for this interesting hypothesis and have discussed it in the manuscript as a potential explanation for our different findings in the hippocampus.(lines 263–270):

      “Although we mainly focus on neocortical cultured neurons (condition I to VIII, Fig. 2) because Li et al. used neocortical neurons, the absence of AP broadening in hippocampal neurons (group IX to XI) could in principle be explained by the selective loss of CA3 neurons, which show AP broadening in organotypic cultured neurons (Fig. 1A and B). However, CA3 neurons were shown to survive in dissociated cultures following region-specific microdissection[40], and CA1 neurons are generally more stress-sensitive to excitotoxicity with glutamate or NMDA than CA3 and DG neurons[42], arguing against a general selective loss of CA3 neuron in dissociated cultures.”

      (2) Figures 3D, E. To what extent is the observed increase in sEPSC amplitude due to an increase in sEPSC frequency? Is quantal amplitude increased following TTX treatment, a postsynaptic strength parameter that one would not expect to be affected by a change in AP width, but that is known to undergo up-scaling with chronic TTX treatment?

      We would like to thank the reviewer for the question. We cannot rule out an interplay between sEPSC amplitude and frequency. We did not measure quantal amplitude in the presence of TTX. Our experiments were designed to test whether TTX successfully induced homeostatic plasticity, but not to attribute the observed effect to pre- and postsynaptic mechanisms. We have added the following statement to the revised manuscript, to highlight the possible interaction of sEPSC amplitude and frequency (lines 176–178):

      “These changes in sEPSC amplitude and frequency are not specific for somatic, pre- or postsynaptic adaptations. However, the results show that blocking AP firing with TTX successfully induced homeostatic plasticity under our experimental conditions.”

      (3) Line 132. Could the authors explain the rationale for using AP amplitude as a measure of neuronal "viability"?

      In a response to Cell, Li et al. suggested that the lack of a TTX effect was due to recordings from unhealthy neurons and that small AP amplitudes could indicate impaired cell viability. Indeed, we also believe that cells which appear morphologically less healthy tend to have small and slow APs. A mechanistic rationale could be a change resting membrane potential or changes in the expression of voltage-gated sodium and potassium channels. However, AP amplitudes were not affected following TTX treatment in any of the eleven recording conditions (Fig. 2D) or a cross-conditional comparison (Fig. 2E). In the revised manuscript, we have now added a possible rationale (lines 134–137):

      “Because unhealthy neurons tend to have small and slow APs, possibly due to changes in resting membrane potential or expression of voltage-gated sodium and potassium channels, we first analyzed AP amplitude as a measure of neuronal viability.”

      Reviewer #3 (Recommendations for the authors):

      I propose addressing the following questions, either through additional experiments (recommended) or a deeper theoretical discussion:

      (1) Since the authors demonstrate that blocking glutamatergic neurotransmission in dissociated hippocampal neurons causes AP broadening, do similar phenomena occur in organotypic cultures and dissociated neocortical neurons?

      We thank the reviewer for the interesting question. In dissociated hippocampal cultures, we show that AP duration is maintained following treatment with TTX and NBXQ, while Kyn-treatment leads to AP broadening (Figure 1C). To our knowledge, the effect of Kyn on AP duration has not been studied in neocortical dissociated cultured neurons. However, Kyn induced AP broadening in CA3 neurons of hippocampal organotypic cultures (Zbili et al., DOI 10.1073/pnas.2110601118) while CNQX did not induce such broadening in CA1 neurons (Karmarkar and Buonomano, DOI 10.1111/j.1460-9568.2006.04692.x). Both findings are in accord with our recordings from dissociated hippocampal cultures. These data however do not allow inference as to whether AP broadening is a cell-type specific or blocker-specific mechanism in hippocampal organotypic cultures. Because the main focus of our study is the absence of AP broadening in neocortical cultured neurons as described by Li et al., we adjusted the corresponding discussion section (lines 299–322)

      “In contrast, APs were not significantly broader following synaptic block by NBQX (Fig. 1C, D), in accord with recordings from CA1 neurons in organotypic cultures using CNQX. TTX-induced broadening may therefore be cell-type specific or due to a differential effect of the glutamate receptor blockers on NMDA receptors which are blocked by Kyn but not NBQX/CNQX or TTX and which have recently been demonstrated to be important for the induction of synaptic homeostatic plasticity[41].”

      (2) Are BK channels involved in AP broadening observed in CA3 pyramidal neurons in organotypic cultures?

      We thank the reviewer for the question. BK channels control spike duration in CA3 neurons of organotypic cultures (~15% broadening upon block by paxilline; Raffaelli et al., DOI 10.1113/jphysiol.2004.062661). Even though there is no available data on the contribution of BK channels to homeostatic spike broadening in this cell type, CA3 neurons in organotypic cultures thereby fulfil the two necessary preconditions of the model proposed by Li et al. (namely, the control of the resting AP width by BK-channels and TTX-induced AP broadening). We include this possibility in the discussion (lines 355–357):

      “Because BK-channels control AP duration in CA3 neurons of organotypic cultures[79], homeostatic BK-channel downregulation as proposed by Li et al. may be involved in AP broadening in this specific cell type.”

      (3) AP broadening consistently occurs in CA3 neurons within organotypic cultures; what molecular or cellular mechanisms underpin this phenomenon, and is there a potential contribution from glial cells?

      We thank the reviewer for this interesting question. CA3 neurons show AP broadening upon chronic inactivity across various studies that has not been observed in CA1 or DG neurons. Recordings from CA3 neurons served as a positive example for TTX-induced AP broadening in our study, in contrast to a lack of broadening in dissociated (neocortical and hippocampal) cultured neurons. The discrepancy between the results in dissociated and organotypic cultured neurons could indeed be due to interactions with glia cells. We have added this possibility to the discussion in the revised version of the manuscript (lines 270–273)

      “Altered cell-cell interactions with glia and neurons in organotypic and dissociated neuronal cultures could instead contribute to the different findings in various hippocampal preparations.”

    1. eLife Assessment

      This valuable study demonstrates that the E3 ligase ITCH regulates several steps of the SARS-CoV-2 replication cycle by enhancing ubiquitination of viral envelope and membrane proteins. The phenotypic data are based on solid evidence showing a role for ITCH in distinct phases of viral replication and host processes. The findings lay the ground work for future studies to decipher detailed molecular mechanisms that explain how ITCH regulates SARS-CoV-2.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigated the role of an E3 ubiquitin ligase ITCH in regulating the viral life cycle of SARS-CoV-2. The authors showed that ITCH mediates ubiquitination of the membrane (M) and envelope (E) proteins of SARS-CoV-2. Ubiquitination of E and M result in enhanced interactions between the structural proteins and redistribution of the structural proteins into autophagosomes. The authors claim that the enhanced interactions between structural proteins and trafficking of the structural proteins into autophagosomes contribute to SARS-CoV-2 replication and egress, prompting ITCH as a potential antiviral target. ITCH also alters the cellular distribution of host proteases important for spike cleavage which protect and stabilize spike with cleavage. The authors also demonstrated that SARS-CoV-2 replication is augmented by ITCH in which virus replication is significantly impaired in cells lacking ITCH expression.

      Strengths:

      The authors provided high quality data with appropriate experimental controls to justify their claims and conclusions. The mechanistic analyses are excellent and presented in a logical manner. The investigation of the role of ubiquitination in coronavirus assembly and egress is novel as most previous studies focused on its role in mediating innate immune responses.

      Comments on revisions:

      The authors have addressed my previous concerns.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigated the role of an E3 ubiquitin ligase ITCH in regulating the viral life cycle of SARS-CoV-2. The authors showed that ITCH mediates ubiquitination of the membrane (M) and envelope (E) proteins of SARS-CoV-2. Ubiquitination of E and M results in enhanced interactions between the structural proteins and redistribution of the structural proteins into autophagosomes. The authors claim that the enhanced interactions between structural proteins and trafficking of the structural proteins into autophagosomes contribute to SARS-CoV-2 replication and egress, prompting ITCH as a potential antiviral target. ITCH also alters the cellular distribution of host proteases important for spike cleavage which protect and stabilize spike with cleavage. The authors also demonstrated that SARS-CoV-2 replication is augmented by ITCH in which virus replication is significantly impaired in cells lacking ITCH expression.

      Strengths:

      The authors provided high-quality data with appropriate experimental controls to justify their claims and conclusions. The mechanistic analyses are excellent and presented in a logical manner. The investigation of the role of ubiquitination in coronavirus assembly and egress is novel as most previous studies focused on its role in mediating innate immune responses.

      Weaknesses:

      Although the authors showed that ITCH ubiquitinates E and M proteins, the claim that such ubiquitination promotes virion assembly and egress is circumstantial. The enhanced interaction between the structural proteins and targeting of ubiquitinated structural proteins into autophagosomes does not necessarily result in increased virion production and release as suggested by the authors. There is a disconnect between the ubiquitination of structural proteins and the role of ITCH in augmenting virus replication as shown in Fig. 6A and B. In addition, the authors showed that the catalytic activity of ITCH is important for the localization and maturation of host proteases. However, the mechanism behind is unknown. Also, it is unclear how protection of spike from cleavage conferred by ITCH explains its role in promoting replication as a lack of spike cleavage would inevitably compromise entry. The major weakness of the manuscript is the lack of experimental data that explains the molecular role of ITCH in relation to its phenotype observed during SARS-CoV-2 infection.

      We sincerely thank the reviewer for the positive evaluation of the quality, rigor, and novelty of our study. We particularly appreciate the thoughtful comments regarding the mechanistic link between ITCH-mediated ubiquitination and viral assembly/egress, as well as the broader implications for SARS-CoV-2 replication.

      Our data support a model in which ITCH-mediated ubiquitination of the structural proteins M and E enhances their interactions and promotes their trafficking into autophagosomal compartments, ultimately contributing to increased virion production and release. The phenotypic outcomes observed in Fig. 6A-B (replaced by re-measured viral infectious titer and genomic copy number in the culture medium of vT2-WT and vT2-KO cells) are consistent with our earlier findings in Figs. 1-5, which demonstrate that ITCH promotes SARS-CoV-2 replication. Thus, the replication defect observed in ITCH-deficient cells aligns with the mechanistic effects of ITCH on structural protein ubiquitination and trafficking.

      We agree with the reviewer that directly linking ubiquitination of structural proteins to virion production would further strengthen the mechanistic connection. However, direct detection of ubiquitinated virions in vitro, particularly by electron microscopy (EM), remains technically challenging. Our laboratory has not yet established an EM-based platform optimized for high-resolution SARS-CoV-2 virion analysis. Furthermore, it is possible that ubiquitin chains conjugated to structural proteins are cleaved during or after virion egress, which would complicate their detection in released particles. These technical and biological considerations currently limit direct visualization of ubiquitinated virions.

      Regarding the role of ITCH in regulating the localization and maturation of host proteases, our recent studies [1, 2] have demonstrated that ITCH is involved in Golgi fragmentation, leading to altered furin distribution and impaired cathepsin L maturation. These findings provide mechanistic insight into how ITCH catalytic activity may influence host protease processing. We have incorporated this discussion into the revised manuscript (last paragraph of the Discussion section) to better contextualize our observations.

      With respect to spike cleavage, although S1/S2 processing is required for SARS-CoV-2 entry, accumulating evidence suggests that excessive intracellular cleavage may be detrimental to virion stability. For example, in Vero cells lacking TMPRSS2, virions containing cleaved S1 and S2 are less stable [3]. Additionally, the D614G substitution renders the spike protein more resistant to cleavage, reduces S1 shedding, and enhances incorporation of intact spike into virions, thereby increasing infectivity and stability [4-6]. These findings suggest that maintaining intact spike during intracellular assembly may be advantageous for the viral life cycle. In this context, ITCH-mediated modulation of host protease distribution and spike processing may help preserve spike integrity within assembling virions.

      Taken together, the ability of ITCH to (i) enhance structural protein interactions, (ii) facilitate trafficking through autophagosomal pathways, and (iii) promote incorporation of intact spike into virions provides a coherent mechanistic framework explaining how ITCH enhances virion production and release. While additional studies will be required to further dissect the precise molecular details, our data collectively support a functional link between ITCH ubiquitin ligase activity and SARS-CoV-2 assembly and egress.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript Qiwang Xiang et al. investigated the role of the E3 ubiquitin ligase ITCH in the life cycle of SARS-CoV-2. They claim the following:

      (i) ITCH promotes virion assembly by interacting with E and M proteins and enhancing their K63-linked ubiquitination

      (ii) ITCH-mediated ubiquitination promotes autophagosome-dependent secretion of viral particles.

      (iii) ITCH stabilizes the viral spike protein by impairing its processing by furin and catepsin L proteases.

      The manuscript provides an interesting exploration of ITCH's role in the SARS-CoV-2 life cycle but requires additional work to strengthen key claims and address potential confounding factors.

      Strengths:

      The experiments are sufficiently clear in documenting that ITCH activity is critical for efficient SARS-CoV-2 replication and for M and E proteins K63-linked ubiquitination

      Weaknesses:

      The manuscript does not convincingly demonstrate how ITCH-mediated ubiquitination of E and M impacts virus assembly and release. Identifying the specific lysine residues in M and E targeted by ITCH, and generating mutant VLPs or recombinant viruses, would strengthen the conclusions.

      Most of the conclusions rely on ITCH overexpression data, which may have off-target effects on Golgi integrity and vesicular trafficking. For instance, figure 4F provides evidence of altered Golgi morphology and TGN46 fragmentation raising concerns that ITCH overexpression could indirectly mislocalize furin, affecting S1/S2 cleavage of the spike protein. In addition, inhibition of furin activity may also lead to off-target effects, given its role in processing numerous host proteins.

      Similarly, ITCH overexpression is likely to indirectly affect cathepsin-L maturation. In addition, the manuscript does not clarify how impaired cathepsin L activity would influence virus assembly or release.

      A major concern is also the lack of quantification and statistical analysis of immunofluorescence images throughout the manuscript, which undermines the reliability of these observations.

      We sincerely thank the reviewer for recognizing the importance of ITCH in SARS-CoV-2 replication and for the constructive and insightful suggestions to further strengthen the manuscript.

      Regarding the impact of ITCH-mediated ubiquitination of E and M on virus assembly and release, our data support a model in which ITCH promotes K63-linked ubiquitination of the E and M proteins, facilitating their recruitment to p62-positive autophagosomal compartments. This recruitment likely enhances the spatial proximity and interaction frequency of structural proteins within assembly sites, thereby promoting efficient virion assembly and subsequent release via autophagosome-dependent secretory pathways.

      We agree that identifying the specific lysine residues in M and E targeted by ITCH and generating mutant VLPs or recombinant viruses would provide a more direct mechanistic link. These are important and technically demanding experiments that require extensive mutagenesis and reverse genetics approaches. While beyond the scope of the current study, we fully acknowledge their value and plan to pursue these directions in future work to further refine the mechanistic understanding of ITCH-dependent ubiquitination during coronavirus assembly.

      Regarding the reliance on ITCH overexpression systems, we acknowledge the reviewer’s concern that ectopic ITCH expression may affect Golgi integrity and vesicular trafficking. Indeed, our recent studies [1, 2] demonstrate that ITCH catalytic activity disrupts Golgi structure, resulting in altered furin distribution and impaired cathepsin L maturation. These findings provide mechanistic context for the phenotypes observed in the present study and suggest that ITCH regulates host protease localization through defined cellular pathways rather than nonspecific overexpression artifacts. We have now expanded the Discussion section (last paragraph) to clarify this mechanistic framework.

      Importantly, SARS-CoV-2 infection itself significantly activates endogenous ITCH, and therefore our ectopic expression system likely mimics infection-induced ITCH activation rather than representing a purely artificial condition. In addition, key phenotypes, such as reduced viral replication and altered structural protein behavior, are consistently observed in ITCH-deficient cells, supporting the physiological relevance of ITCH activity in the viral life cycle.

      Regarding cathepsin L (CTSL) maturation, we have expanded the Discussion to clarify how impaired CTSL activity may influence viral assembly and egress. ITCH inhibits CTSL maturation, thereby reducing excessive spike cleavage into smaller fragments. Although CTSL-mediated spike processing facilitates genome release following endocytosis [7, 8], CTSL is a lysosomal protease, and lysosomes are exploited by β-coronaviruses as egress organelles [9]. Excessive lysosomal proteolysis may therefore compromise virion integrity during egress. In this context, ITCH-mediated inhibition of CTSL maturation may preserve spike stability within assembling or trafficking virions, thereby promoting the production and release of infectious particles during the replication phase.

      Regarding quantification and statistical analysis of immunofluorescence data, we appreciate this important point. In the revised manuscript, we have included expanded image panels with increased cell numbers, quantitative colocalization analyses to enhance the rigor of these observations.

      Reviewer #3 (Public review):

      Summary:

      Xiang et al. investigated the role of ubiquitin E3 ligase ITCH in SARS-CoV-2 replication. First, they described the role of ITCH on the structural proteins. Here, the ubiquitination of E and M (but not S) leads to an enhanced interaction and presumably virion assembly. In addition, E and M ubiquitination seems to be necessary for p62-guided sequestration into autophagosomes for secretion. Furthermore, ITCH regulates S proteolytic cleavage by changing furin localization and inhibiting CTSL protease maturation. In addition, SARS-CoV-2 infection upregulates ITCH phosphorylation, whereas knockout of ITCH reduces SARS-CoV-2 replication.

      Strengths:

      The proposed study is of interest to the virology community because it aims to elucidate the role of ubiquitination by ITCH in SARS-CoV-2 proteins. Understanding these mechanisms will address broadly applicable questions about coronavirus biology and enhance our knowledge of ubiquitination's diverse functions in cell biology.

      Weakness:

      The involvement of ubiquitin ligases in SARS-CoV-2 replication is not entirely new (see E3 Ubiquitin Ligase RNF5; Yuan et al., 2022; Li et al., 2023). While the data generally support the conclusions, additional work is needed to confirm the role of ITCH in SARS-CoV-2 replication in a biologically relevant context. The vast majority of data is based on transient overexpression experiments of ITCH, which ultimately leads to massive ubiquitination of several viral and host cell factors, including potentially low-affinity substrates not typically recognized under physiological conditions. In addition to that, nearly all experiments were done in cells co-overexpressing ITCH and the viral structural proteins (or cellular proteases) in HEK293T cells. Therefore, a proteomic analysis of protein ubiquitination in a) SARS-CoV-2-infected cells (ideally several cell types) and b) SARS-CoV-2-infected v2T-ITCH-KO cells would verify the ITCH-related ubiquitination of e.g., E and M and would strengthen the whole manuscript. In addition, the few key experiments using SARS-CoV-2 infected cells were performed in VeroE6 cells, which are neither human nor lung-derived. Only in one experiment were lung-derived Calu3 cells included.

      Moreover, the manuscript names ITCH as a central regulator of SARS-CoV-2 replication. If ITCH is beneficial for E and M interaction and thereby aids virion assembly, showing its effect on VLP production would be desirable. Clarifications regarding data acquisition and data analysis could strengthen the manuscript and its conclusions.

      We sincerely thank the reviewer for the thoughtful evaluation and for highlighting the importance of demonstrating physiological relevance.

      We agree that the involvement of E3 ubiquitin ligases in SARS-CoV-2 replication is not entirely unprecedented. Accordingly, we have expanded the Introduction to discuss RNF5 and other E3 ligases previously implicated in SARS-CoV-2 biology (e.g., Yuan et al., 2022; Li et al., 2023), thereby clarifying how ITCH differs mechanistically.

      Regarding the reliance on transient overexpression systems, we acknowledge the reviewer’s concern. Importantly, SARS-CoV-2 infection itself significantly induces ITCH phosphorylation and activation. Therefore, our ectopic expression system likely mimics infection-driven ITCH activation rather than representing a purely artificial condition. Moreover, key findings, including reduced viral replication and diminished E/M ubiquitination, were validated in ITCH knockout cells, supporting the physiological relevance of ITCH-dependent structural protein ubiquitination under endogenous conditions.

      We appreciate the suggestion to perform a global proteomic analysis of ubiquitinated proteins in (i) SARS-CoV-2-infected cells and (ii) SARS-CoV-2-infected ITCH-KO cells. Such analyses would indeed provide a comprehensive and unbiased assessment of ITCH-dependent ubiquitination events. While this approach is beyond the scope of the current study, we fully recognize its value and plan to pursue it in future investigations to further refine the mechanistic understanding of ITCH-mediated ubiquitination during coronavirus assembly.

      With respect to the cellular models used, Vero E6/TMPRSS2 cells are widely established for SARS-CoV-2 propagation due to their robust viral replication, rapid growth, and reduced culture-adapted mutations. Compared with Calu-3 cells, which grow more slowly and may acquire specific adaptations in certain viral genes during prolonged passage, Vero E6/TMPRSS2 cells maintain high viral stability and reproducibility, making them suitable for mechanistic studies. Nevertheless, we agree that human lung-derived systems are highly relevant, and we have included Calu-3 cell data where feasible to support translational relevance.

      Regarding the role of ITCH in virion assembly, our data in Fig. 2 demonstrate that ITCH-mediated K63-linked ubiquitination enhances the interaction between E and M proteins, supporting a functional role in virus-like particle (VLP) formation. We agree that direct visualization and quantification of VLP production by EM would further strengthen this conclusion. Such experiments require additional optimization and will be pursued in future work to provide more direct structural evidence.

      Finally, in response to the reviewer’s comments on data acquisition and analysis, we have expanded image panels, increased the number of quantified cells, and included quantitative colocalization analyses with appropriate statistical evaluation in the revised manuscript to enhance rigor and reproducibility.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should compare the infectivity of SARS-CoV-2 generated in cell lines expressing or lacking ITCH to investigate the effects of ITCH on infectivity, possibly by measuring RNA to PFU ratio and determining the S cleavage pattern in purified virions.

      We re-measured the viral infectious titer and genomic copy number in the culture medium of vT2-WT and vT2-KO cells infected at an MOI of 0.0001 for 24 h. ITCH ablation reduced the viral copy number by approximately 8-fold (Fig. 6B), while the infectious titer (TCID<sub>₅₀</sub>) decreased by at least 25-fold (Fig. 6A), indicating that loss of ITCH markedly impairs the formation of infectious viral particles. This finding is consistent with the role of ITCH in promoting Spike (S) protein cleavage.

      As suggested, to assess the S cleavage pattern in secreted virions, we precipitated proteins from the culture medium of SARS-CoV-2–infected cells with or without ITCH expression. Analysis of the precipitated S proteins revealed that the loss of ITCH markedly altered the integrity of full-length S in SARS-CoV-2 virions (Fig. S7A).

      (2) The authors should strengthen the connection between ubiquitination of structural proteins and viral egress by measuring infectious virus particles in the supernatants from cells with or without ITCH expression by plaque assay. However, this cannot be accurately achieved without performing the experiment described in point 1 as cleavage of spike and infectivity would affect the results.

      While a plaque assay was not performed, we quantified infectious viral particles in the supernatants using the TCID<sub>₅₀</sub> assay. These analyses showed that loss of ITCH resulted in a marked reduction in infectious virion production (>25-fold; Fig. 6A). In contrast, viral genomic copy numbers, which reflect both infectious and non-infectious particles, were reduced by approximately eightfold (Fig. 6B). The disproportionate reduction in infectious titer relative to viral copy number (approximately threefold difference) is consistent with a defect in virion infectivity, most likely due to impaired S cleavage in the absence of ITCH (Fig. S7A). The reduction in viral copy numbers suggests that ITCH-dependent ubiquitination of viral structural proteins contributes to efficient viral assembly and egress.

      (3) The authors should strengthen the connection between ubiquitination of structural proteins and virion assembly by EM.

      We appreciate the reviewer’s insightful comment. However, detecting ubiquitinated virions in vitro via electron microscopy (EM) remains technically challenging. At present, our laboratory has not yet established an EM-based system optimized for SARS-CoV-2 virion analysis. Moreover, it is also possible that ubiquitin chains present on virions may be cleaved during or after the viral egress process, further complicating their detection.

      Reviewer #2 (Recommendations for the authors):

      Supp. Figure 2: the authors should provide sequencing data for both ITCH-KO clones for consistency.

      The sequence for both ITCH-KO clones have been included now (Fig. S2C).

      Figure 2: All interaction data between structural proteins and p62 rely on ITCH overexpression. It would be helpful to include data in ITCH-KO cells as controls to validate these findings.

      As suggested, we performed E-based immunoprecipitation in wild-type (WT) and ITCH-knockout (KO) cells and found that E pulled down less p62 in the absence of ITCH, confirming that ITCH-mediated ubiquitination of E facilitates its interaction with p62 (Fig. 3C).

      Figure 3H: Verify the middle LC3B panel, as it does not match the merge panel. Please, correct any discrepancies.

      We thank the reviewer for pointing out this error. Fig. 3H (now Fig. 3J) has been corrected accordingly.

      Figure 4F: the labeling of the different panels seems incorrect.

      We have corrected the figure labeling.

      The authors should perform cell viability assays in clomipramine-treated cells. In addition, the authors should clarify whether clomipramine's antiviral effects depend on ITCH expression, given the comparable virus copy numbers in treated WT (Fig. S7B) and ITCH-KO cells (Fig. S7C)

      We thank the reviewer for this helpful comment. As shown Author response image 1., while clomipramine (Clom) treatment for 48 hours resulted in a modest reduction in cell number compared with the DMSO control, no apparent cell death was detected under these conditions.

      Author response image 1.

      Vero-TMPRSS2 (A) or Vero-ITCH-KO (B) cells were treated with DMSO or chloroquine (Clo) for 48 h, and cell viability was assessed by calcein AM staining (n = 3).

      Reviewer #3 (Recommendations for the authors):

      Results:

      Fig.2A and 2E display controversial results with different outcomes depending on the used bait. In my opinion, in both approaches, the overexpressed ITCH should be able to ubiquitinate M and E (since they are co-expressed). However, the interaction of E and M is not affected by the overexpression of ITCH or ITCH-CS when E is used as a bait (Fig.2A). In contrast, the interaction of E and M is enhanced in the presence of overexpressed ITCH (Fig.2E), when M is used as a bait.

      We thank the reviewer for pointing this out. It should be noted that the blots display only the major (un-ubiquitinated) bands of E and M. When M was used as the bait, more E (main band, un-ubiquitinated form) was co-precipitated in the presence of ectopically expressed ITCH. In contrast, when E was used as the bait, comparable levels of M (main band, un-ubiquitinated form) were detected regardless of ITCH expression. These results suggest that ubiquitin-modified M can bind more E, whereas ubiquitin-modified E does not significantly affect its interaction with M. A more detailed explanation has been added to the revised text.

      Fig.3A+3F: The authors claim a reduced E secretion when ITCH-KO cells or shRNA-treated p62 cells are used. I believe an input loading control of the supernatant displaying an equal amount of e.g. BSA is missing.

      In response to the reviewer’s suggestion, we have now included Coomassie Brilliant Blue (CBB) staining of the culture medium (now shown in Fig. 3A and Fig. 3F).

      Fig.3B: ITCH does not interact with E (or M) alone in the displayed data. The data is comparable with data observed for the interaction with S (Supp.4A). However, the author claims that ITCH interacts with M and E but not S (page 11).

      We would like to clarify that in ECL-based Western blotting, strong signals can mask weaker ones due to contrast limitations. In this experiment, ectopic expression of ITCH produced a strong signal that obscured the endogenous ITCH band. Upon longer exposure, the endogenous ITCH signal becomes visible. Additionally, our data presented in Fig. 1 and the new data in Fig. 3C demonstrate the interaction between the relevant proteins.

      Fig 3F: A scrambled control is missing. Moreover, it would be desirable to see if overexpression of p62 would enhance E release to verify that ITCH ubiquitination and p62-positive autophagosomes are necessary for E release.

      We appreciate the reviewer’s comment. Proteins in the culture medium were precipitated using TCA, and Coomassie Brilliant Blue (CBB) staining has been included (now shown in Fig. 3F). Additionally, E release was examined in the presence of overexpressed p62, and the results showed that p62 overexpression increased the level of E detected in the medium (now shown in Fig. 3G).

      Fig.3: Overall, an experiment using, e.g. cycloheximide (protein synthesis inhibitor) and MG132 (proteasome inhibitor) would strengthen the hypothesis that E and M are not degraded in a lysosome after ITCH overexpression. In my opinion, a colocalization experiment with LAMP1 is unsuitable to draw this conclusion. Would the overexpression of a deubiquitinating enzyme diminish M, E and p62 interaction? Does ITCH/p62 only regulate the release of the overexpressed single E or M protein, or does it also affect VLP release? An experiment analyzing purified VLPs produced in ITCH- or ITCH-CS overexpressing cells would be desirable.

      We thank the reviewer for these important questions. As suggested, we performed additional CHX and MG132 experiments. As shown in Fig. 3H and Fig. S3I, degradation of both E and M proteins was blocked by MG132 treatment, indicating that they are degraded via the proteasome pathway. Notably, MG132 treatment did not rescue the ITCH-mediated decrease of E/M levels, suggesting that the ITCH-dependent reduction of E and M is not mediated through the proteasome pathway. In addition, our recent back-to-back studies [1, 2] demonstrated that ITCH overexpression inhibits lysosomal function by impairing hydrolase maturation, suggesting that ITCH-mediated ubiquitination of E or M is unlikely to promote their degradation through the lysosomal pathway. Together, these data suggest that ITCH-mediated reduction of E and M is not due to enhanced degradation but is instead associated with their secretion.

      Overexpression of deubiquitinating enzymes specifically targeting E or M (which remains to be identified) would likely reduce their interaction with p62.

      Our data indicate that ITCH-mediated ubiquitination of E and M enhances their mutual interaction, supporting a role for this process in virus-like particle (VLP) formation. P62 would facilitate the release of VLPs by promoting the secretion of ubiquitinated E and M. In addition, the data presented in Fig. 2 indicate that ITCH enhances the mutual interaction of these structural proteins, thereby promoting virus-like particle (VLP) formation.

      Fig.4A: PPC site mutation indicated in yellow. There is no yellow color.

      We have revised the label to read “PPC site mutation indicated in red and green”.

      Fig.4C: Why should the overexpression of ITCH or ITCH-CS affect the S protein cleavage when the cleavage site is anyhow mutated?

      In this analysis, we aimed to verify that neither ITCH nor ITCH-CS affects the cleavage pattern of the mutated S protein. As these data are already presented in Fig. 4D (now Fig. 4C), the redundant result has been removed, and the corresponding description has been added to the revised text.

      Fig.4C: Lysates from the single expression of S wt protein (-ITCH/ +ITCH-CS; as indicated in Fig.4B) is missing for comparison to S mut protein.

      As these controls and related data are already presented in Fig. 4D (now Fig. 4C), the redundant result here has been removed.

      Fig. 4D: Lane 5 and Lane 7 are labeled similarly. ITCH+ in Lane 5 needs to be removed.

      We thank the reviewer for pointing out this error. The labeling (now Fig. 4C) has been corrected.

      Fig 4G: A theoretical MOI of 1 does not lead to an infection of all cells. Therefore, including a third marker for infection control, e.g., N protein, would be helpful. This would clarify whether the changes in furin localization are due to infection.

      We appreciate the reviewer for raising this point. Our goal was to examine whether SARS-CoV-2 infection affects the localization of furin (mouse antibody) relative to the Golgi marker (rabbit antibody). As suitable E, N, or M antibodies raised in goat or donkey were not available, we could not include those markers in this experiment. However, we did confirm M protein expression in parallel, and the infection efficiency was higher than 80% (Author response image 2.). To further validate that the observed changes in furin localization were due to viral infection, we have now included additional images showing a larger field of view containing more cells .

      Author response image 2.

      Fig.4: Generally, the colocalization of proteases with TGN46 should be analyzed quantitatively using, for example, Madner's overlap coefficient. This would be needed to draw the conclusion stated in the manuscript.

      We appreciate the reviewer’s suggestion. We now have included the colocalization analysis in the Fig. 4E and F.

      Fig.4/5: Overview IF pictures displaying additional cells would be desirable to clarify furin/cathepsin L localization in ITCH/ITCH-CS expressing cells. Otherwise, it looks (in my opinion) very subjective.

      In response to the reviewer’s suggestion, we have included additional images with a larger field of view encompassing more cells for Fig. 4 and 5 (presented in Fig. S5B and S5H).

      Fig.5D/G: MOI is missing in the figure legend.

      As suggested, the MOI information has been added to the figure legend.

      Fig.5D/G/6C/F: Infection control (e.g., N-protein) is missing in the Western Blots.

      We have added the infection control M in the figures.

      Fig.6: Why is the overall amount of ITCH reduced during the course of infection?

      We appreciate the reviewer for raising this point. As shown in Fig. 6C and F, ITCH was significantly activated, as indicated by its phosphorylation at the T222 site during viral infection. This activation promotes ITCH self-ubiquitination.

      Fig.6A: Would an overexpression of ITCH enhance viral replication?

      Moderate upregulation of ITCH promotes viral replication, whereas excessive ITCH overexpression leads to cell death, which in turn partially reduces viral titers.

      Discussion:

      Is there an explanation of how ITCH changes furin localization and CSTL maturation?

      Our recent back-to-back studies[1, 2] demonstrated that ectopic ITCH expression disrupts Golgi integrity, resulting in altered furin distribution and impaired CSTL maturation. The relevant discussion has now been incorporated into the revised text (last paragraph of the Discussion section).

      It would also be helpful to discuss the role of other known ubiquitin ligases like RNF5 in the replication of SARS-CoV-2 and other CoVs. Since the pandemic began, many interactome and host-factor studies in various cell types have been published. None of these studies identified ITCH so far. Could you comment on this?

      As suggested, we have included additional known ubiquitin ligases involved in SARS-CoV-2 replication and in other viral systems (see the third paragraph of the Introduction).

      Overall, in my opinion, the figure legends need to be improved. It is often not clear if ITCH is endogenously detected or overexpressed.

      We thank the reviewer for the helpful suggestion. Additional details have been incorporated into the figure legends.

      (1) Xiang Q, Lu Y, Wang H, Chen H, Chen P, Zhao X, et al. ITCH regulates Golgi integrity and proteotoxicity in neurodegeneration. Science Advances 2025; 11:eado4330.

      (2) Xiang Q, Liu Y, Wang J. Golgi fragmentation driven by the USP11-ITCH axis triggers autolysosomal failure in neurodegeneration. Autophagy 2026.

      (3) Peacock TP, Goldhill DH, Zhou J, Baillon L, Frise R, Swann OC, et al. The furin cleavage site in the SARS-CoV-2 spike protein is required for transmission in ferrets. Nature microbiology 2021; 6:899-909.

      (4) Zhang L, Jackson CB, Mou H, Ojha A, Peng H, Quinlan BD, et al. SARS-CoV-2 spike-protein D614G mutation increases virion spike density and infectivity. Nature communications 2020; 11:1-9.

      (5) Plante JA, Liu Y, Liu J, Xia H, Johnson BA, Lokugamage KG, et al. Spike mutation D614G alters SARS-CoV-2 fitness. Nature 2021; 592:116-21.

      (6) Daniloski Z, Jordan TX, Ilmain JK, Guo X, Bhabha G, Sanjana NE. The Spike D614G mutation increases SARS-CoV-2 infection of multiple human cell types. Elife 2021; 10:e65365.

      (7) Jaimes JA, Millet JK, Whittaker GR. Proteolytic cleavage of the SARS-CoV-2 spike protein and the role of the novel S1/S2 site. IScience 2020; 23:101212.

      (8) Zhao M-M, Yang W-L, Yang F-Y, Zhang L, Huang W-J, Hou W, et al. Cathepsin L plays a key role in SARS-CoV-2 infection in humans and humanized mice and is a promising target for new drug development. Signal transduction and targeted therapy 2021; 6:1-12.

      (9) Ghosh S, Dellibovi-Ragheb TA, Kerviel A, Pak E, Qiu Q, Fisher M, et al. β-Coronaviruses use lysosomes for egress instead of the biosynthetic secretory pathway. Cell 2020; 183:1520-35. e14.

    1. eLife Assessment

      This study presents an important finding regarding how partner preference formation and pair bonding behavior are related to the oxytocin receptor gene expression in the NAc and paraventricular nucleus of the hypothalamus in prairie voles. The evidence supporting this claim is solid but could benefit from increased sample size and more thorough behavioral phenotyping. This study will be of interest to social scientists and neuroscientists who work on pair bonding and oxytocin.

    2. Reviewer #1 (Public review):

      Summary:

      In this remarkable study, the authors use some of their recently-developed oxytocin receptor knockout voles (Oxtr1-/- KOs) to re-examine how oxytocin might influence partner preference. They show that shorter cohabitation times leads to decreased huddling time and partner preference in the KO voles, but with longer periods preference is still established, i.e., the KO animals have a slower rate of forming preference, or are less sensitive to whatever cues or experiences lead to the formation of the pair bond as measured by this assay. This helps relate the authors recent study to the rest of the literature on oxytocin and partner preference in prairie voles. To better understand what might lead to slower partner preference, they quantified changes to the durations and frequency of huddling. In separate assays they also found that Oxtr1-/- KOs interacted more with stranger males than wild-type females. In a partner choice assay they found that wild-type males prefer wild-type females more than Oxtr1-/- KO females. They then performed bulk RNA-Seq profiling of nucleus accumbens of both wild-type and Oxtr1-/- KO males and females, either housed with animals of the same sex or paired with a wild-type of opposite sex. 13 differentially expressed genes were identified, mostly due to downregulation in wild-type females. These genes were also identified in a module lost in the Oxtr1-/- voles by correlated expression profiling. They also compared results of transcriptional profiling in female and male wild-type vs Oxtr1-/- voles (independently of bonding state), and found hundreds of differentially expressed genes in nucleus accumbens, mostly in females and often with some relation to neural development and/or autism. Some of the reduction in transcript was confirmed with in situs, as well as compared to changes in transcription in the lateral septum and paraventricular nucleus (PVN) of the hypothalamus. Finally they find fewer oxytocin+ and AVP+ neurons in the anterior PVN.

      Strengths:

      This is an important study helping to reveal the effects of oxytocin receptor knockout on behavior and gene expression. The experiments are thorough and reveal a surprising number of genetic and anatomical differences, with some sexual dimorphism as well, and the authors have more carefully examined the behavioral changes after shorter and longer periods of partner preference formation.

      Weaknesses:

      It is surprising that given all the genetic changes identified by the authors, that the behavioral phenotypes are fairly mild. The extent of gene changes also might be under-reported given the variability in the behavior and relative low number of animals profiled.

      Comments on revisions:

      No further recommendations. I commend the authors for finding the typos in their first version and correcting the manuscript.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this remarkable study, the authors use some of their recently-developed oxytocin receptor knockout voles (Oxtr1-/- KOs) to re-examine how oxytocin might influence partner preference. They show that shorter cohabitation times lead to decreased huddling time and partner preference in the KO voles, but with longer periods preference is still established, i.e., the KO animals have a slower rate of forming preference or are less sensitive to whatever cues or experiences lead to the formation of the pair bond as measured by this assay. This helps relate the authors' recent study to the rest of the literature on oxytocin and partner preference in prairie voles. To better understand what might lead to slower partner preference, they quantified changes to the durations and frequency of huddling. In separate assays, they also found that Oxtr1-/- KOs interacted more with stranger males than wild-type females. In a partner choice assay, they found that wild-type males prefer wild-type females more than Oxtr1-/- KO females. They then performed bulk RNA-Seq profiling of nucleus accumbens of both wild-type and Oxtr1-/- KO males and females, either housed with animals of the same sex or paired with a wild-type of the opposite sex. 13 differentially expressed genes were identified, mostly due to downregulation in wild-type females. These genes were also identified in a module lost in the Oxtr1-/- voles by correlated expression profiling. They also compared results of transcriptional profiling in female and male wild-type vs Oxtr1-/- voles (independently of bonding state) and found hundreds of differentially expressed genes in nucleus accumbens, mostly in females and often with some relation to neural development and/or autism. Some of the reduction in the transcript was confirmed with in-situs, as well as compared to changes in transcription in the lateral septum and paraventricular nucleus (PVN) of the hypothalamus. Finally, they find fewer oxytocin+ and AVP+ neurons in the anterior PVN.

      Strengths:

      This is an important study helping to reveal the effects of oxytocin receptor knockout on behavior and gene expression. The experiments are thorough and reveal a surprising number of genetic and anatomical differences, with some sexual dimorphism as well, and the authors have more carefully examined the behavioral changes after shorter and longer periods of partner preference formation.

      We thank Reviewer #1 for the positive assessment of the study’s significance and for recognizing the value of our behavioral and transcriptional analyses in refining the role of oxytocin signaling in pair bonding.

      Weaknesses:

      It is surprising that given all the genetic changes identified by the authors, the behavioral phenotypes are fairly mild. The extent of gene changes also might be underreported given the variability in the behavior and relatively low number of animals profiled.

      Pair bonding is a robust behavior composed of distinct modules that are supported by redundant and compensatory neural pathways. Our findings support a model in which Oxtr functions in parallel with other mechanisms to modulate specific components of social attachment. We have addressed this point in the discussion. We have also updated our result and method section to more clearly reflect our cohort size which is comparable to similar studies.

      Reviewer #1 (Recommendations for the authors):

      How do the wild-type males 'know' which animal is which during the three-chamber assay test of Figure 4B? Do the Oxtr1-/- KO females act in some way different from the wild types in this experiment?

      We thank the reviewer for this question. During follow-up analyses prompted by reviewer requests to characterize the behaviors underlying the apparent bias in WT male choice, we discovered a labeling error in the metadata used to analyze these assays. The error flipped the genotypes of the tethered stimulus animals at the ends of the chamber. After correcting this error and reanalyzing the data, we find that naïve WT males do not show a significant preference for naïve WT females over naïve Oxtr<sup>1-/-</sup> females. We have reconfirmed the metadata used in all assays in this study; no other datasets or conclusions are affected.

      While overall choice frequency is equivalent for males and females, our revised analyses demonstrate that Oxtr loss nonetheless alters the dynamics of social interactions in a sex-specific manner. In particular, the presence of an Oxtr<sup>1-/-</sup> male significantly alters WT females’ social behavior—enhancing prosocial engagement and reducing aggression—independent of which male is ultimately chosen. These findings support the conclusion that Oxtr function modulates early reciprocal social interactions rather than categorical choice outcomes.

      MOAT and LOAT seem like cumbersome acronyms, more so than something simpler like vole 1 vs vole 2.

      We have replaced these acronyms throughout the manuscript with the simpler, descriptive terminology; winner (MOAT) and loser (LOAT).

      Only three animals per condition seemed to have been used for RNA-Seq studies in Figure 5. Given the high behavioral variability in the earlier figures, did the authors screen for animals with exemplar or similar behavior within groups? The lack of significance of other genes or across other groups might just be due to a low-powered experiment given the high behavioral and genetic variability.

      We thank the reviewer for raising the important point regarding behavioral preselection, which has been performed in some similar studies. For our study, animals were not preselected based on exemplar or matched behavioral performance prior to tissue collection, as doing so would risk introducing variation in gene expression patterns due to the experience of complex social interactions. Instead, given that our prairie vole lines are maintained on an outbred background, tissue from three animals was pooled for each RNA-seq sample to reduce inter-individual variability and to capture representative transcriptional states within each experimental group. While this approach increases robustness to individual variability, we acknowledge that it may limit sensitivity to detect low expression behavior linked gene transcripts.

      On lines 426-429, the authors state that "While there was no significant difference in Oxtr transcript levels by genotype (padj = 0.753)-consistent with minimal nonsensemediated decay despite a premature stop codon-we have previously shown that no functional protein is produced in Oxtr1-/- animals (52)." This assertion could use strengthening, even if just to explain how this was verified in their previous publication. What is the evidence for nonsense decay and a full knockout of functional receptors at the protein level?

      We agree that this point benefits from clarification. Although Oxtr transcript levels were not significantly different by genotype (padj = 0.753), consistent with minimal nonsense-mediated decay, transcript abundance alone does not reflect receptor functionality. In our prior study, we directly assessed Oxtr protein function using receptor autoradiography and found a complete absence of specific ligand binding in Oxtr<sup>1-/-</sup> animals across brain regions that show robust Oxtr binding in wild-type voles, demonstrating a full loss of functional receptor protein. We have clarified this in our manuscript.

      Reviewer #2 (Public review):

      Summary:

      This manuscript uses a recently published oxytocin receptor null prairie vole line to examine the effects of this mutation on pair bonding behavior and PVN gene expression. Results reveal that Oxtr sex specifically influences early courtship behavior and partner preference formation as well as suppressing promiscuity toward novel potential mates. PVN gene expression varies between Oxtr null and WT prairie voles.

      Strengths:

      Behavioral analyses extend beyond the typical reporting of frequency and duration. The gene expression models and analyses are well-done and convincing. The experimental designs and approaches are strong.

      We thank Reviewer #2 for highlighting the strengths of the gene expression modeling and behavioral analyses.

      Weaknesses:

      More details and background literature explaining the role of the Oxt system in pair bonding behaviors is necessary, particularly for the Introduction. The authors overstate several times that Oxtr expression is not necessary for partner preference formation, based on their previous findings. However, it does appear, particularly, in the short cohabitation that it is necessary. Thus, the nuanced answer may be that Oxt may accelerate partner preference formation. Improving the presentation of the statistics and figures will make the manuscript more reader-friendly.

      We thank the reviewer for this thoughtful feedback and agree that additional background on the oxytocin (Oxt) system’s role in pair bonding will strengthen the manuscript. We have revised the introduction to expand our discussion of prior pharmacological and comparative studies suggesting that Oxt signaling modulates multiple components of pair bonding.

      Finally, in response to the reviewer’s suggestion, we have improved the presentation of figures and statistical reporting by interlacing figures with figure legends and updating the supplementary statistics table.

      Reviewer #2 (Recommendations for the authors):

      Major concerns

      (1) The Introduction provides a "broad strokes" approach to link the oxytocin and vasopressin systems as neuromodulators of social attachment processes. This study is a follow-up to a recent publication by the senior authors' groups which reported that the Oxtr null prairie voles were able to form typical pair bonds. Now, the authors are revisiting the same question by developing a series of behavioral assays to probe distinct aspects of pair bonding behavior. However, the Introduction lacks a nuanced examination of how the oxytocin system has been shown to regulate an array of social behaviors in prairie voles and other social species.

      We thank the reviewer for this observation and agree that the original Introduction did not capture the breadth and nuance of oxytocin system involvement in social behavior. We have substantially revised the Introduction in response to the reviewer’s suggestion to include a more detailed discussion of the role played by oxytocin signaling in social behaviors displayed across multiple phyla, including during the early stages of pair bonding.

      (2) In addition, there seems to be relevant viral Oxtr KD and KO studies in prairie voles which could be referenced to reflect differences between acute pharmacological Oxtr inhibition and prolonged viral KD of Oxtr on behavioral outcomes. This could also be put into context with the authors' first paper in prairie voles and others' work with mice showing how congenital Oxtr null rodent models may result in behavioral changes that are not reflected in the pharmacological or viral manipulation research. This could help justify the approach of the current study.

      We thank the reviewer for suggesting this comparison and have included a section in the discussion comparing pharmacological manipulations and global knock outs as well as the discrepancy in phenotypes that arise due to these methods. This expanded discussion clarifies why a congenital genetic model provides complementary insights: it allows us to identify which components of pair bonding are robust to developmental loss of Oxtr and which remain sensitive, thereby distinguishing between Oxtr-dependent behavioral modules and those supported by parallel mechanisms. Additionally, we have included viral manipulations of Oxtr in prairie voles during the early phase of interactions between the sexes in the introduction, to contextualize our study in the broader field. 

      (3) On lines 129-130: The authors state, "We previously found that Oxtr is not required for the display of partner preference following 1 week of cohabitation". While this is the general conclusion of their previous publication, this seems like a rather larger overgeneralization. There are many studies that have documented the functional regulation and necessity of the Oxt system for partner preference behavior in prairie voles. Therefore, it would be more appropriate to state that their previous study demonstrated that "Oxtr null prairie voles are able to develop a partner preference", but not that Oxtr is not necessary for partner preference formation. This may be a question about when the KO occurs, whether it be congenital or conditional.

      (4) This statement is repeated in Lines 350-352. However, the authors can now qualify this statement at this point in the manuscript with their new data which suggests that Oxtr null voles fail to form a partner preference after short cohabitation, but WT still form such preferences. This would suggest the qualification of this statement should be on the onset of partner preference formation as Oxtr is necessary for partner preference formation after a "short" cohabitation. Therefore, both findings are more in line with previous results which suggest that Oxt signaling accelerates partner preference formation.

      We have revised this language throughout the manuscript to state that our prior work demonstrated that Oxtr null voles are capable of forming a partner preference after extended cohabitation.

      (5) It appears Supplementary Table 1 is not scaled to the page size, so not all statistical results are clear. This limits the accuracy of my review.

      This table has been reformatted to ensure all statistical results are properly scaled to page size.

      (6) It is not always clear what statistical analyses are being performed. For example, how were the data in Figures 4G-H analyzed? What statistics were used and the output should be more readily available.

      During follow-up behavioral analyses prompted by Reviewer #1 requests to characterize the basis of the apparent WT male bias, we discovered a labeling error in the metadata associated with a subset of naïve three-chamber choice assays. In these cases, the genotypes of the tethered stimulus animals had been inadvertently flipped. After correcting this error and reanalyzing the data, we find that naïve WT males do not show a significant preference for naïve WT females over naïve Oxtr1-/- females. We have rechecked the metadata for all assays included in this study and confirmed that this was the only instance in which such an error occurred. We further analyzed the temporal dynamics of naive choice to find that Oxtr function modulates early reciprocal social interactions but does not affect the genotype ultimately chosen.

      To improve the clarity of the statistical analyses performed, we have reformatted our presentation of figure legends and our statistics table. All statistical tests, sample sizes, and relevant parameters (including exact tests used, correction methods where applicable, and definitions of units of analysis) are explicitly stated in the figure legends and compiled in the supplementary statistical summary table, in accordance with eLife reporting guidelines.

      (7) Oxytocin plays a critical role in development as early as embryogenesis. It may be useful to frame some of the Introduction and Discussion recognizing the congenital deletion of Oxtr may affect much of development. With that in mind, it is not surprising to see changes in gene expression associated with neurodevelopmental disorders.

      We now explicitly acknowledge in both the Introduction and Discussion that congenital Oxtr deletion likely impacts neural development which provides context for the observed enrichment of neurodevelopmental gene expression changes.

      Minor concerns

      (1) It was not clear why vasopressin was referenced in the Introduction. Specifically, the study documents that Oxtr null prairie voles have a reduction in Avp neurons in the PVN, which would suggest some aspects of Oxt signaling regulate Avp expression. However, the Introduction is not focused on how Oxt regulates the Avp system but rather on how each is a modulator of social attachment. It would improve the justification of this study to focus on Avp expression if the Introduction presented this concept.

      We thank the reviewer for pointing out the need for greater clarity around our reference to vasopressin (Avp) in the Introduction. We have simply stated that the potential for pair bonding is correlated with the patterns of expression of Oxtr and V1ar in the introduction. The goal of this study was to find evidence of behavior and gene expression changes due to the chronic loss of Oxtr which lead to our finding that a population of Avp neurons is lost in the animals lacking Oxtr. As we did not intend to justify our study on this basis, we have clarified our discussion to include previous studies where OT manipulation affects Avp neurons.

      (2) Figures and supplemental figures need figure legends.

      We have re-arranged the figure legends for each figure (including the supplementary figures) to follow the figures for easier readability and accessibility.

      (3) Figure 1 Timeline is focused more on the male timeline with "bond formation" and "bond maintenance" reflecting the days required to form a partner preference for males. The figure should be revised to reflect similar time points for female pair bonding.

      Figures have been revised to reflect each sex's bonding timeline.

      (4) Figure 1 has a color theme with females represented by red/pink and males represented by dark/light blue. However, this is not true for Figures 1C and 1D. Please revise these color schemes.

      Color schemes have been standardized across all figures.

      (5) It is not clear what is being graphed in Figures 2 and 3. The duration graphs have many more data points than the frequency graphs. Can this be explained?

      We thank the reviewer for pointing out this lack of clarity. The difference in the number of data points reflects how these measures are defined. Duration plots are generated at the level of individual huddle events, specifically pooling all huddles whose duration falls within the top quartile for a given animal, whereas frequency plots are generated at the level of individual animals and therefore contain one data point per subject. As a result, duration graphs necessarily include more data points than frequency graphs. The figure legends and Methods section explicitly state the unit of analysis for each metric and to clarify why the number of data points differs between duration and frequency plots.

      (6) What are the black bars in Figure 4H meant to represent?

      We thank the reviewer for this question. In the original submission, the black bars in Figure 4H were intended to indicate time periods showing statistically significant convergence in the chooser’s preference for the MOAT (More Of Assay Time, now winner) animal, based on the sliding preference index analysis. However, as mentioned during revision we identified a metadata error affecting the dataset used to generate this figure. After correcting the error, the figure was fully reanalyzed and regenerated. As a result, Figure 4H now presents a different analysis and no longer includes these black bars, and the conclusions drawn from this panel have been revised accordingly. The updated figure, legend, Results text and statistics table now accurately reflect the new analysis.

    1. eLife Assessment

      This important study shows that an odorant that is typically thought of as a repellant actually activates both attractant and repellant olfactory neurons in C. elegans. Convincing evidence is provided that nematode worms can integrate signals in different sensory pathways to drive different behavioral responses to the same cue. These findings will be of interest to scientists interested in combinatorial coding in sensory systems.

    2. Reviewer #1 (Public review):

      The authors investigated the response of worms to the odorant 1-octanol (1-oct) using a combination of microfluidics-based behavioral analysis and whole-network calcium imaging. They hypothesized that 1-oct may be encoded through two simultaneous, opposing afferent pathways: a repulsive pathway driven by ASH, and an attractive pathway driven by AWC. And the ultimate chemotactic outcome is likely determined by the balance between these two pathways.

      It is not surprising that 1-octanol is encoded as attractive at low concentrations and repulsive at higher concentrations. However, the novel aspect of this study is the discovery of the combinatorial coding of 1-oct in the periphery, where it serves as both an attractant and a repellent. Furthermore, the study uses this dual encoding as a model to explore the neural basis of sensory-driven behaviors at a whole-network scale in this organism. The basic conclusions of this study are well supported by the behavioral and imaging experiments, though there are certain aspects of the manuscript that would benefit from further clarification.

      A key issue is that several previous studies have demonstrated a combinatorial and concentration-dependent coding of odorant sensing in the nematode peripheral nervous system. Specifically, ASH and AWC are the primary receptors for repellent and attractive responses, respectively. However, other neurons such as AWB, AWA, and ADL are also involved in the coding process. These neurons likely communicate with different interneurons to contribute to 1-oct-induced outputs. The authors' conclusion that loss of tax-4 reduces attractive responses and that osm-9 mutants reduce repulsive responses is not entirely convincing. TAX-4 is required for both AWC (an attractive neuron) and AWB (a repulsive neuron), and osm-9 is essential for ASH, ADL, and AWA (attraction-associated). Therefore, the observed effects on the attractive and repulsive responses could be more complex. Additionally, the interpretation of results involving the use of IAA to reduce the contribution of AWC at lower concentrations lacks clarity.

      The authors did not observe any increased correlation between motor command interneurons and sensory neurons, which is consistent with the absence of a consistent relationship between state transitions and 1-oct application. Furthermore, they did not observe significant entrainment of AIB activity with the 2.2 mM 1-oct application. This might be due to the animals being anesthetized with 1 mM tetramisole hydrochloride, which could affect neural activity and/or feedback from locomotion.

      Comments on revisions:

      The authors have addressed all my previously raised concerns.

    3. Reviewer #2 (Public review):

      Summary:

      The authors used whole-network imaging to identify sensory neurons that responded to the repellant 1-octanol. While several olfactory neurons responded to the initial onset of odor pulses, two neurons consistently responded to all the pulses, ASH and AWC. ASH typically activates in response to repellants, and AWC typically activates in response to the removal of attractants. However in this case, AWC activated in response to the removal of 1-octanol, which was unexpected because 1-octanol is a harmful repellant to the worm. The authors further investigated this phenomenon by testing different concentrations of 1-octanol in a chemotaxis assay, and found that at lower (less harmful) concentrations the odor is actually an attractant, but becomes repulsive at higher concentrations. The amplitude of the ASH response appeared to be modulated by concentration, but this was not true for AWC. The authors propose a model where the behavioral response of the worm is the result of integrating these two opposing drives, where repulsion is a result of the increased ASH activity over-riding the positive drive from AWC. The authors further tested this theory by testing mutants that ablated the AWC response (tax-4 or AWC::HisCl) or ASH response (osm-9 or ASH::HisCl). The chemo-silencing (HisCl) and tax-4 experiments were consistent with their hypothesis, while the osm-9 mutation had a limited impact on chemotaxis behavior, highlighting the potential role of osm-9-independent signaling in ASH in response to 1-octanol. While the interneuron(s) that integrate these signals to influence behavior were not identified, the authors did find that increasing concentrations of 1-octanol did increase the likelihood of AVA activity, a neuron which drives reversals (and hence, behavioral repulsion).

      Strengths:

      This was simple and elegant work that identified specific neurons of interest which generated a hypothesis, which was further tested with mutants that altered neuronal activity. The authors performed both neuronal imaging and behavioral experiments to verify their claims.

      Weaknesses:

      The authors note that other sensory neurons likely contribute to 1-octanol chemotaxis. Given the NeuroPAL data, it would have been nice to identify these other neurons as well. However, the reviewer is aware that this is tangential to the primary focus of this study.

    4. Reviewer #3 (Public review):

      Summary:

      This work describes how two chemosensory neurons in C. elegans drive opposite behaviors in response to a volatile cue. Because they have different concentration dependencies, this leads to different behavioral responses (attraction at low concentration and repulsion at high concentration). It has been known that many odorants that are attractive at low concentrations are aversive at high concentrations, and the implicated neurons (at least AWC for attraction and ASH for repulsion) have been well established. None the less, by studying behavior and neural responses in a common context (odor pulses, as opposed to gradients) this provides a clear picture of how these sensory neurons may guide the dose dependent response by separately modulating odor entry and odor exit behaviors.

      Strengths:

      (1) This work provides good evidence that worms are attracted to low concentrations and repelled by high concentrations of 1-oct. Calcium imaging also makes it clear that dose-dependence of this response is stronger for ASH than AWC.

      (2) This work presents calcium imaging and behavior with the same stimulus (sudden pulses in volatile odor concentration), while previous studies often focus on using neuronal responses to pulses to understand navigation of gentle gradients.

      Weaknesses:

      (1) As a whole it is not clear precisely how important AWC is (compared to other cells) for the attractive response (as the authors correctly acknowledge).

      (2) The evidence that AIB minus AVA contains relevant information is weak. It appears the entrainment index in Fig. 6H for AIB-AVA could easily be explained by the negative entrainment between AVA and the stimulus (along with no effect or role for AIB). This is suggested by the similar p-values and similar distribution of random EIs (stretched and mirrored) between the first and last rows of this figure.

      (3) The model in Figure 7 would be strengthened if it was demonstrated that IAA is attractive when worms are saturated in a 1/10^4 concentration. Panel 7G (and ref. 39) indicate that 10^-4 IAA activates ASH, which would suggest a different explanation for the change from attraction to repulsion in 7C.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      …other neurons such as AWB, AWA, and ADL are also involved in the coding process. These neurons likely communicate with different interneurons to contribute to 1-octinduced outputs. The authors' conclusion that loss of tax-4 reduces attractive responses and that osm-9 mutants reduce repulsive responses is not entirely convincing. TAX-4 is required for both AWC (an attractive neuron) and AWB (a repulsive neuron), and osm-9 is essential for ASH, ADL, and AWA (attraction-associated). Therefore, the observed effects on the attractive and repulsive responses could be more complex. Additionally, the interpretation of results involving the use of IAA to reduce the contribution of AWC at lower concentrations lacks clarity. A more effective approach might involve using transgenically expressed miniSOG or histamine (HisCl1) to specifically inhibit AWC neurons.

      We agree that the sensory inputs into chemotactic behavior are likely more complex, involving other neurons besides ASH and AWC. We now explicitly discuss possibility in the Discussion (lines 449-467).

      We have also utilized transgenically expressed HisCl1 in ASH and AWC to address this concern. Crucially, we observe that some of the effects of the broad mutations are reproduced by inactivating ASH and AWC. This finding validates our overall hypothesis that sensory-driven behavior is a balance of simultaneous afferent inputs of opposite valence AND shows that ASH and AWC are involved as expected. We are currently performing a comprehensive analysis of sensory inputs into locomotory decision making, including the neurons mentioned in the Reviewer’s comment.

      We also agree that using IAA is not a very clean way to inactivate AWC. The AWC HisCl results referenced above should alleviate this concern. However, the IAA result does put our findings into a broader context of multi-sensory integration which demonstrates the potential usefulness and selective advantages of the dual-input coding architecture that we are hypothesizing.

      Furthermore, they did not observe significant entrainment of AIB activity with the 2.2 mM 1-oct application. This might be due to the animals being anesthetized with 1 mM tetramisole hydrochloride, which could affect neural activity and/or feedback from locomotion. 

      We now mention these caveats “It is possible that immobilization and anesthetization may be affecting AIB responses to sensory activity and/or proprioceptive feedback from locomotion. However, it is also possible that motor feedback from RIM was obscuring the sensory signal.” Line 357

      It is unclear whether subtracting AVA activity from AIB activity provides a valid measure. Similarly, it is unclear how the behavioral data from freely moving worms compares to the whole-network calcium imaging results obtained from immobilized worms.

      Ray and Gordus 2025 (Current Biology 35:5534) recently demonstrated that AIB activity can be modeled as the additive convolution of AVA, AWC, and AIA activity, lending validity to our subtractive approach. In their study, AVA was the major contributor, but addition of AWC and AIA signals (i.e. sensory inputs) resulted in a significant greater accuracy. We have now mentioned their work in the manuscript (line 363) “To address this possibility, we subtracted AVA activity, representing the motor state, from the AIB activity (AVA closely mirrors RIM), based on the observation that AIB activity can be modeled as the sum of convolutions of motor activity and sensory activity.” (lines 360-363)

      The relationship between network activity in freely moving worms and immobilized worms has been explored by Kato et al 2015 (Cell 163:656-669); we now refer to this work on line 131 “These transitions are related to network state changes which drive spontaneous reversals during foraging in freely moving worms. Immobilization and anesthetization, necessary for confocal imaging, distort certain aspects of these motor command sequences compared to freely moving worms executing the motor commands and receiving proprioceptive feedback. However, the intrinsic motor programs remain intact under these conditions.” (lines 131-136)

      Reviewer #2 (Public review):

      tax-4, but not osm-9 mutants were used in chemotaxis and imaging assays. It would have been nice to have osm-9 results as well for these assays. The mutants are not specific to AWC and ASH. Cell-specific rescue of these neurons would have strengthened the proposed model.

      Osm-9 data are now included in the chemotaxis assays (Fig. 4E).

      Cell-specific HisCl data are now included for ASH and AWC (Fig. 4F, G, 5D), confirming our proposed model.

      Limited tax-4 data were included in the imaging (Fig. 6), but unfortunately, NeuroPAL imaging in tax-4 has proven to be technically difficult. NeuroPAL images in the tax-4 background appear different, perhaps because of developmental effects on gene expression due to the lack of sensory input (recall that the NeuroPAL color scheme is based on the relative expression levels of 40+ neuronal promoters). Inactivation of individual sensory neurons using HisCl1 or other transgenes may be the simpler approach.

      The Results and Discussion have been significantly rewritten to incorporate these new data

      We are currently working on a comprehensive study of the sensory inputs into locomotory decision making in the context of chemosensation, which we expect to reveal roles of other neurons besides ASH and AWC and provide a fuller picture of the complexities of this system.

      Reviewer #3 (Public review):

      (1) It is not clear precisely how important AWC is (compared to other cells) for the attractive response, though the presence of odor-off behavior implicates it. This could be resolved by looking at additional mutants (tax-4 is broad).

      We have addressed this concern using transgenically-expressed HisCl1 which has demonstrated a clear role for AWC in overall chemotaxis and locomotory decision making upon encountering the 1-oct/buffer interface in microfluidics devices (Fig. 4F, G, 5D).

      (2) Relatedly, dose-dependent chemotaxis data (Figure 4C, D) should be provided for osm-9 animals to get a sense of the degree to which dose-dependence is explained by ASH.

      Osm-9 data now included (Fig. 4E)

      The Results and Discussion have been significantly rewritten to incorporate these new data

      (3) Figure 4A, B should include average traces with errors, as there are several ways the responses can vary across conditions.

      Averaged traces with error bars now shown (Fig. 4A, B)

      (4) The data in Figure 6G does not appear to have error bars.

      Error bars now shown for 6G

      Also, it would help to include a more conventional demonstration of AIB responding to stimuli (e.g. averaging stimulus-aligned responses as a percent of the fluorescence value at stimulus onset to perform the desired subtraction).

      Fig. 6G top panel shows the stimulus-aligned responses of AIB with no subtraction performed. The 6 sequential stimulations are shown as a single continuous trace, consistent with the experimental protocol utilized. Averaging was performed across the 12 individuals of the sample set. However, we did not calculate the average of responses within a dataset (i.e. first plus second plus third etc.) to avoid obscuring any sensitization/desensitization that might be occurring with multiple stimuli.

      Subtracted calcium traces are harder to interpret. As it stands, the evidence that sensory signals are persisting in AIB and not being shunted by proprioceptive feedback in microfluidic devices is not strong.

      Addressing the point about proprioceptive feedback in microfluidics devices, the following sentence was added in the Results section: “Immobilization distorts certain aspects of these motor command sequences compared to freely moving worms executing the motor commands and receiving proprioceptive feedback, but the intrinsic motor programs remain intact.” (lines 131-136).

      To add context for the AIB-AVA subtraction, Ray and Gordus 2025 (Current Biology 35:5534) recently demonstrated that AIB activity can be modeled as the additive convolution of AVA, AWC, and AIA activity, lending validity to our subtractive approach. In their study, AVA was the major contributor, but addition of AWC and AIA signals (i.e. sensory inputs) resulted in a significant greater accuracy. We have now mentioned their work in the manuscript: “To address this possibility, we subtracted AVA activity, representing the motor state, from the AIB activity (AVA closely mirrors RIM), based on the observation that AIB activity can be modeled as the sum of convolutions of motor activity and sensory activity.” (lines 360-363)

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Figure 1: The number of replicates (n) is missing.

      In Fig. 1D, only a single trial is shown as a representative example rather than averages, which would necessitate error bars. The Results and Figure Legend text has been updated to clarify this, and the average CI is now included in the first Results section (lines 111, 976)

      Figure 4: The sample size (n = 3-5) is relatively small, which may limit the statistical power.

      Sample size was increased to 5 for all data points shown on the new graph (Fig. 4E and noted in the figure legend (line 1019)

      Figure 4: The 0.22 mM concentration significantly affects both AWC and ASH. It is also unclear whether this concentration also affects other neurons, such as AWB, ADL, and AWA.

      We have not performed exhaustive analysis of other neurons in these datasets. These analyses are difficult and time consuming, so we have opted to present a dataset which supports our hypothesis that multiple afferent pathways of opposite valence act in a balanced way to drive chemotaxis. We are currently performing an in-depth analysis of the sensory inputs into the circuit, which we expect to present in a future study

      Reviewer #2 (Recommendations for the authors):

      The tax-4 and osm-9 experiments are great, but I recommend clarifying that tax-4 and osm-9 are expressed in other neurons as well. The text gives the impression that these mutants are specific to AWC and ASH, respectively. The authors should note these caveats.

      This concern is thoroughly addressed in the descriptions and rationale presented for the use of ASH and AWC HisCl strains.

      The authors should also provide the code used to interpret their results.

      Code will be provided through Zenodo.org

      Reviewer #3 (Recommendations for the authors):

      It would help to clarify (early on) the degree to which you are attributing responses to particular cells (e.g. AWC) as opposed to a class of cells with AWC as an example.

      This concern is thoroughly addressed in the descriptions and rationale presented for the use of ASH and AWC HisCl strains.

      The NeuroPAL imaging and analysis (especially Figures 3D, E) is a bit distracting and appears non-essential. If possible, it would help to combine Figures 2 and 3 with a focus on panels 3ABC to streamline the narrative.

      We would prefer to keep the present format so the reader can appreciate the power of the whole-brain approach for analyzing network activity and behavioral outputs in the context of sensory-motor responses. Specifically, our insight that attractive and aversive afferent inputs were activated simultaneously was wholly dependent on this approach. Otherwise, there would have been little to no reason for examining AWC activity at aversive 1-oct concentrations, which was essentially the foundation of the study.

      To highlight this point, we have added the following sentence in the Discussion: “This novel insight highlights the value of the whole-brain approach (enabled by the NeuroPAL system) for studying the network dynamics underlying sensory driven behaviors.” Lines 431-433.

    1. eLife Assessment

      The findings in this paper provide solid support for a hypothesis that has valuable implications at the intersection of value-based and social decision-making. The findings suggest that the brain processes rewards received for effort differently when they are earned for themselves versus someone else.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      Summary:

      The Authors test the hypotheses, using and effort-exertion and an effort-based decision-making task, while recording brain dynamics with EEG, that the brain processes reward outcomes for effort differentially when they earned for themselves versus others.

      Strengths:

      The strengths of this experiment include what appears to be a novel finding of opposite signed effects of effort on the processing of reward outcomes when the recipient is self versus others. Also, the experiment is well-designed, the study seems sufficiently powered, and the data and code are publicly available.

      Weaknesses:

      There is some concern about the fact that participants report feeling less subjective effort, but also more disliking of tasks when they were earning rewards for others versus self. The concern is that participants worked with less vigor during self-versus-others trials and this may partly account for a key two-way Recipient x Effort interaction on the size of the Reward Positivity EEG component. Of note, participants took longer to complete tasks when working for others. While it is true that, in all cases, participants met the requisite task demands (they pressed the required number of buttons) they did so more sluggishly when earning rewards for others. The Authors argue that this reflects less motivation when working for others, which is a plausible explanation. The Authors also try to rule out this diminished vigor as a confounding explanation by showing that the two way interaction remains even when including reaction times (and also self-reported task liking) as a covariate. Nevertheless, it is possible that covariates do not fully account for the effects of differential motivation levels which would otherwise explain the two-way interaction. As such, I think a caveat is warranted regarding this particular result.

    3. Reviewer #2 (Public review):

      Summary:

      Measurements of the reward positivity, an electrophysiological component elicited during reward evaluation, have previously been used to understand how self-benefitting effort expenditure influences processing of rewards. The present study is the first to complement those measurements with electrophysiological reward after-effects of effort expenditure during prosocial acts. The results provide solid evidence that effort adds reward value when the recipient of the reward is the self but discounts reward value when the beneficiary is another individual.

      Strengths:

      An important strength of the study is that amount of effort, the prospective reward, the recipient of the reward, and whether the reward was actually gained or not were parametrically and orthogonally varied. In addition, the researchers examined whether the pattern of results generalized to decisions about future efforts. The sample size (N=40) and mixed-effects regression models are also appropriate for addressing the key research questions. Those conclusions are plausible and adequately supported by statistical analyses.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors test the hypotheses, using an effort-exertion and an effort-based decision-making task, while recording brain dynamics with EEG, that the brain processes reward outcomes for effort differentially when they earned for themselves versus others.

      Strengths:

      The strengths of this experiment include what appears to be a novel finding of opposite signed effects of effort on the processing of reward outcomes when the recipient is self versus others. Also, the experiment is well-designed, the study seems sufficiently powered, and the data and code are publicly available.

      Weaknesses:

      There is some concern about the fact that participants report feeling less subjective effort, but also more disliking of tasks when they were earning rewards for others versus self. The concern is that participants worked with less vigor during self-versus-others trials and this may partly account for a key two-way Recipient x Effort interaction on the size of the Reward Positivity EEG component. Of note, participants took longer to complete tasks when working for others. While it is true that, in all cases, participants met the requisite task demands (they pressed the required number of buttons) they did so more sluggishly when earning rewards for others. The Authors argue that this reflects less motivation when working for others, which is a plausible explanation. The Authors also try to rule out this diminished vigor as a confounding explanation by showing that the two way interaction remains even when including reaction times (and also self-reported task liking) as a covariate. Nevertheless, it is possible that covariates do not fully account for the effects of differential motivation levels which would otherwise explain the two-way interaction. As such, I think a caveat is warranted regarding this particular result.

      We thank Reviewer #1 for the continued positive assessment and for continuing to highlight the caveat regarding the potential influence of differential vigor on the observed RewP interaction effects.

      We agree that a caveat is warranted. As detailed in our previous response (R5), we had already conducted control analyses addressing this concern; however, we acknowledge that these results were not incorporated into the manuscript itself. We have now addressed this by adding the covariate analyses to the Result section, along with an explicit caveat in the Discussion.

      Before describing the specific revisions, we would like to offer a minor clarification: the covariates in our control analyses were trial-by-trial response speed and self-reported effort ratings, rather than task liking ratings as noted in the summary above. Neither response speed nor effort rating predicted RewP amplitudes, and the critical Recipient × Effort and Recipient × Effort × Magnitude interactions remained significant and essentially unchanged. However, as the reviewer rightly pointed out, covariates may not fully capture the effects of differential motivation. Specifically, we have made the following revisions:

      First, we added the covariate control analyses to the Result section: “To rule out the possibility that the differential vigor between self- and other-benefiting trials drove the Recipient × Effort and Recipient × Effort × Magnitude interactions on the RewP, we conducted two control analyses by including trial-by-trial response speed and subjective effort ratings as separate covariates in the RewP model. Neither response speed (b = -0.07, p = .641) nor effort rating (b = 0.10, p = .186) predicted RewP amplitudes, and the critical Recipient × Effort and Recipient × Effort × Magnitude interactions remained significant and essentially unchanged (see Supplementary Table S3 for full regression estimates)” (page 12, para. 1).

      Second, we added a caveat to the Discussion section acknowledging this alterative explanation, which reads, “Another concern is that participants exhibited less vigor when working for others, as indicated by slower response speed and lower subjective effort ratings for other- versus self-benefiting trials. Although our control analyses confirmed that neither covariate predicted RewP amplitudes and the critical interactions remained significant, covariates may not fully capture the effects of differential motivation, and this alternative explanation cannot be entirely ruled out” (page 22, para. 2, lines 9–12; page 23, para. 1).

      Reviewer #2 (Public review):

      Summary:

      Measurements of the reward positivity, an electrophysiological component elicited during reward evaluation, have previously been used to understand how self-benefitting effort expenditure influences processing of rewards. The present study is the first to complement those measurements with electrophysiological reward after-effects of effort expenditure during prosocial acts. The results provide solid evidence that effort adds reward value when the recipient of the reward is the self but discounts reward value when the beneficiary is another individual.

      Strengths:

      An important strength of the study is that amount of effort, the prospective reward, the recipient of the reward, and whether the reward was actually gained or not were parametrically and orthogonally varied. In addition, the researchers examined whether the pattern of results generalized to decisions about future efforts. The sample size (N=40) and mixed-effects regression models are also appropriate for addressing the key research questions. Those conclusions are plausible and adequately supported by statistical analyses.

      We sincerely appreciate Reviewer #2’s positive evaluation of our manuscript and thank the reviewer for recognizing the strength of our experimental design and analysis approach.

    1. eLife Assessment

      This landmark study investigates how patterned human gastruloids can provide insights into neural tube closure. Using a screen, they identified positive and negative regulators and defines the epistasis among them using optimization of micro-pattern based gastruloid protocol and CRISPRi. This technical tour de force is exceptional and one of the first studies to reveal new knowledge on human development through embryo models.

    2. Reviewer #1 (Public review):

      Summary:

      This is a wonderful and landmark study in the field of human embryo modeling that uses patterned human gastruloids and conducts a functional screen on neural tube closure and identified positive and negative regulators and defines the epistasis among them.

      Strengths:

      This was achieved following optimization of micro-pattern based gastruloid protocol to achieve high efficiency, and then optimize was to conduct and deliver CRISPRi without disrupting the protocol. This is a technical tour de force as well as one of the first studies to reveal new knowledge on human development through embryo models which has not been done before.

      Weaknesses:

      A minor one. One can never find out if findings in human embryo models can be in vitro revalidated in humans in vivo for obvious and justified ethical reasons. However, the authors indicate that in the "limitations of study" section.

      Comments on revisions:

      The authors have adequately addressed all comments raised.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This is a wonderful and landmark study in the field of human embryo modeling. It uses patterned human gastruloids and conducts a functional screen on neural tube closure, and identifies positive and negative regulators, and defines the epistasis among them.

      Strengths:

      The above was achieved following optimization of the micro-pattern-based gastruloid protocol to achieve high efficiency, and then optimized to conduct and deliver CRISPRi without disrupting the protocol. This is a technical tour de force as well as one of the first studies to reveal new knowledge on human development through embryo models, which has not been done before.

      The manuscript is very solid and well-written. The figures are clear, elegant, and meaningful. The conclusions are fully supported by the data shown. The methods are well-detailed, which is very important for such a study.

      Thank you for this feedback! We are excited for the possibilities of this method to discover genes required for various morphogenetic processes associated with human embryonic development.

      Weaknesses:

      This reviewer did not identify any meaningful, major, or minor caveats that need addressing or correcting.

      A minor weakness is that one can never find out if the findings in human embryo models can be in vitro revalidated in humans in vivo. This is for obvious and justified ethical reasons. However, the authors acknowledge this point in the section of the manuscript detailing the limitations of their study.

      Reviewer #2 (Public review):

      Summary:

      This manuscript is a technical report on a new model of early neurogenesis, coupled to a novel platform for genetic screens. The model is more faithful than others published to date, and the screening platform is an advance over existing ones in terms of speed and throughput.

      Thank you for this feedback! We agree that the robust symmetry breaking observed in our model, the comparisons to the human embryo in our cell type analysis, and the ability to conduct large-scale genetic screens represent advancements in the modeling of human neural tube closure that may be built upon in the future.

      Strengths:

      It is novel and useful.

      Weaknesses:

      The novelty of the results is limited in terms of biology, mainly a proof of concept of the platform and a very good demonstration of the hierarchical interactions of the top regulators of GRNs.

      The value of the manuscript could be enhanced in two ways:

      (1) by showing its versatility and transforming the level of neural tube to midbrain and hindbrain, and looking at the transcriptional hierarchies there.

      We thank the reviewer for this valuable suggestion and will keep this in mind for future work. As accurate answers to this question would require the development of robust midbrain and hindbrain organoid models, we believe that this question is outside the scope of the present work.

      (2) by relating the patterning of the organoids to the situation in vivo, in particular with the information in reference 49. The authors make a statement "To compare our findings with in vivo gene expression patterns, we applied the same approach to published scRNA-seq data from 4-week-old human embryos at the neurula stage" but it would be good to have a more nuanced reference: what stage, what genes are missing, what do they add to the information in that reference?

      We agree that a more comprehensive comparison of in vitro and in vivo data would add value to the study. We have added an analysis of the human Week 3 data, as neurulation occurs between Weeks 3 and 4 of human embryogenesis (new Figure 1F). We see our in vitro cell types in both datasets. We also included volcano plots in our supplementary figure to show major differences in gene expression (new Figure S1G). Somewhat surprisingly, embryo samples show higher expression of hemoglobin subunits and other hypoxia-related genes than organoids do, which may indicate hypoxic stress during sample handling during ex vivo experimentation (Schelshortn, et al., 2008) or alternatively, reflect differences in the metabolic environment between embryos and organoids. We did not find any differences would have affected our transcription factor candidate selection.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The reviewers were very enthusiastic about the work and provided suggestions for textual changes that will clarify the figures, methods, and results for readers.

      Reviewer #2 (Recommendations for the authors):

      (1) In Figure 1:

      (a) What is the orientation of the images in 1C?

      We have specified in the text and figure legend that this is a top-down view of an outer organoid.

      In this panel, what is the problem with ZO-1 in D4?

      We believe this is non-specific staining of dead cells that shed into the lumen during folding and closure. We have added this interpretation to the figure legend and added two supplementary time lapse videos (new Supplementary Video 1 and new Supplementary Video 2) of organoid closure that show dead cells being shed into the lumen as support to this interpretation.

      (b) What is the three-dimensional organization of these structures, if any? Or are they two-dimensional? In a way, this also refers to 1C.

      We have clarified in the text and figure legend that these organoids are three dimensional, and that Fig. 1B-C are top-down views.

      (c) Why can't we see FOXG1 amidst the markers forebrain? This is a very characteristic one.

      We see sparse FOXG1 expression in the human embryo samples at Week 4 (new Figure 1F), which may indicate that FOXG1 expression is upregulated later in the human embryo, after neural tube closure. We do see high levels of other fore brain associated transcription factors by this time however, including OTX2, LHX2, and SIX3.

      (d) The Figure 1 legend needs to be clear about the issues raised here.

      We have updated the Figure 1 legend to address these points.

      (2) Figure 2, could they explain in the text better how they organize the ML gene expression? What are their criteria?

      We thank the reviewer for catching this critical omission. We have added details of our medio lateral axis generation to the Methods section under “Single cell RNA sequencing analysis.”

      (3) Explain how and why the 77 genes were picked up?

      We have clarified at our first mention of 77 genes that this is a subset of our original 78 candidate genes, which were selected as described in the text (last paragraph in the results section “Identifying transcription factor candidates for regulation of anterior neurulation”. We have added a line in the Methods section that we were unable to clone a functional guide plasmid against one our candidates (NR6A1).

      (4) The authors mention the value of the geometry and the mechanics in neural tube closure, but they make no attempt to unravel these inputs, or at least the genes, from their screen, associated with them.

      We have rewritten this discussion of the literature to emphasize the active role of the neural ectoderm compared to the surface ectoderm, in order to justify the genetic analysis of the neural ectoderm rather than the surface ectoderm. We have clarified that our goal is to find upstream developmental drivers (transcription factors) of folding and closure, rather than investigate mechanical mechanisms of this process.

    1. eLife Assessment

      This important study employed a multi-stage behavioural paradigm of increasing cognitive complexity to investigate the role of inhibitory interneurons in the medial prefrontal cortex (mPFC) in avoidance behaviour in mice. The authors used imaging and optogenetic techniques, combined with this behavioural task, to show that mPFC interneurons are necessary for encoding but not for executing avoidance under threat. The evidence supporting these claims is compelling, and findings will be of interest to researchers in behavioural and systems neurosciences.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates the role of the medial prefrontal cortex (mPFC) in generating goal-directed actions under threat, using a progressive behavioral paradigm, neural recordings, and optogenetic inhibition in mice. The authors demonstrate that while mPFC GABAergic neurons strongly encode cues, actions, and errors, particularly under high cognitive demand, this neural activity is not causally required for executing avoidance behaviors. By rigorously controlling for movement and arousal, the researchers found that much of the observed mPFC signaling actually reflects baseline behavioral states rather than the generation of the actions themselves. This dissociation between encoding and causality challenges traditional views of mPFC as an executive controller of action and provides a nuanced understanding of its role in evaluative and contextual processing.

      Strengths:

      The behavioral paradigm employed in this study is one of its greatest strengths, offering a rigorous, progressive, and well-controlled framework to dissect the neural mechanisms underlying avoidance under threat. This three-phase task design is particularly well-suited to tease apart the contributions of learning, discrimination, and cognitive load to both behavior and neural activity.

      By tracking movement (speed, rotations) and including it as a covariate in statistical models, the authors also underscore the need to control for movement and baseline activity when interpreting cortical signals, which is relevant for all studies of brain-behavior relationships, ensuring that behavioral changes are not due to general arousal or motor activity.

      Finally, the study combines multiple advanced techniques-fiber photometry, single-cell calcium imaging (miniscopes), and two distinct optogenetic inhibition methods-to provide a comprehensive look at both neural encoding and causal necessity.

      Weaknesses:

      The authors conclude that mPFC is not required for avoidance, based on the minimal behavioral effects of optogenetic inhibition. While this interpretation is supported by the data, the choice of viral constructs could lead to an underestimation of the mPFC's role for other reasons. First, the choice of viral constructs could lead to an underestimation of the mPFC's role for several reasons. Specifically, the efficacy of eArch3.0 inhibition was not verified beyond histology, and its non-cell-type-specific nature could lead to disinhibition or compensatory activity in downstream regions. Although the authors' use of visual cortex (VI) inhibition as a control suggests that broad cortical inhibition does not impair avoidance, subcortical compensation cannot be ruled out. Additionally, Vgat-ChR2 targets only GABAergic neurons, potentially missing glutamatergic contributions. Addressing these limitations in the Discussion section would strengthen the manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Sajid et al. describes a comprehensive behavioral, imaging, and optogenetic dataset investigating the role of the mPFC in avoidance and escape behaviors. Although many movement- and task-related variables are encoded by mPFC GABAergic neurons, the main conclusion is that they are unlikely to control behavioral output.

      Strengths:

      The manuscript is generally well executed and plausible in its conclusions. It provides an alternative viewpoint to many articles describing the involvement of mPFC in behavior, based on a complex multi-stage behavioral paradigm acquired and analyzed in an unbiased way.

      Weaknesses:

      This reviewer sees three main weaknesses.

      (1) There are few details on the linear mixed models in the methods. This section could be improved by including a mathematical description. More importantly, the reader never learns how accurately the models capture the data. Given that most conclusions rely on the models, it seems central to address this point carefully. For example, what is the explained variance, marginal, and conditional? Were the nested models compared to non-nested ones (e.g., AIC), what are the specific outputs of the likelihood ratio tests briefly mentioned in the methods?

      (2) For several figures, there is a disconnect with the main text, in the sense that it is difficult to understand how statements in the main text connect with specific figure panels or bars in their graphs. This is particularly the case for the most complex figures, e.g., Figures 3, 4, and their supplements. It would be beneficial to introduce subfigure labels (A1, etc) and state explicitly in the main text what figure panel is described (in parentheses). Alternatively breakdown the figures into multiple ones, decreasing ambiguity. This is important because it will help the reader better assess the strength of the results.

      (3) It does not appear that the code and data used to produce the figures are made available. That would be very beneficial, given the complexity of the analysis and dataset collection procedures. It would also help readers better understand the results and probe their validity.

    4. Reviewer #3 (Public review):

      I first want to state that I am not an expert in the field, making it hard for me to provide informed comments on the value of the scientific results. But from where I stand, the study seems very carefully designed, very well controlled, and the statistical methodology used across the manuscript is strong and sound.

      Summary:

      The authors investigated the role of PFC interneurons in cue-guided behaviour under threat. They designed a behavioural task with increasing levels of difficulty that allows them not only to correlate the activation of cortical interneurons with different parameters of the tasks, but also to assess if this correlation changes with increasing cognitive load. They carefully take into account confounding factors such as movement and show that indeed neuronal activity is strongly driven by movement. Using generalised linear models throughout their manuscript, the authors could include movement as a confounding factor in their statistical analysis, thus allowing them to next correlate interneuron activity with task-specific parameters. Using first fibre photometry to image bulk activity of the interneurons and by comparing the responses in the PFC and in the visual cortex, they identify that PFC neurons show stronger activation related to punishment compared to the sensory cortex. Interestingly, under high cognitive demand, PFC interneurons show cue-specific activation, which could reflect the involvement of the PFC in cue-selective action selection.

      In a second set of experiments, they use Miniscope to image individual interneurons. They classified interneurons, not based on their expression of specific markers as usually done, but based on their correlation with movement. Using this classification, they identify clusters of neurons that show activity modulation related to various behavioural parameters.

      Lastly, they performed optogenetic manipulations to silence the PFC during cue-guided behaviour and showed little behavioural effect of the manipulation, which they suggest means the PFC is not involved in taking action in this task.

      Strengths:

      The design of the study is backed by convincing arguments from the authors. The confounding factors are carefully taken into account and integrated into state-of-the-art statistics. The results thus appear robust and reliable. The authors do not overinterpret their results; quite the contrary, they are prone to toning down the interpretation of statistically significant results and they warn the readers about potential misinterpretation or confounding factors. The discussion makes for a very interesting and informative reading.

      Weaknesses:

      The main weakness, in my view, lies in the Results section. In the figures, the authors do not present any raw data, and the plots are shown as mean {plus minus} SEM without displaying the distribution of individual data points. It is both a strength and a weakness that the authors do not attempt to guide the reader through the Results section and instead present the findings with very little emphasis on the key outcomes of the GLM. While this approach is arguably the most transparent way to report results, it also makes the section quite difficult to follow and may discourage readers.

      I would recommend rewriting the Results section to make it more accessible to a broader audience. A similar issue applies to the figures: presenting all plots reflects a commendable commitment to transparency, but it would greatly benefit from a clearer narrative. As it stands, it is difficult to grasp the message of each figure by simply browsing through them.

    1. eLife Assessment

      This important technical study introduces SCOPE, an optics-free spatial reconstruction method based on bidirectional sender and receiver oligonucleotides on barcoded hydrogel beads. By sequencing proximity-encoded chimeric molecules, the authors computationally reconstruct 2D and 3D spatial information at an impressive scale. The technical demonstrations in synthetic bead systems are convincing and establish proof-of-principle that large spatial domains can be reconstructed without microscopy. The methodological advance is clear and the scale is impressive. Direct validation in biological samples would help clarify what additional limitations on applicability may exist. This work will be of interest to those working on spatial mapping.

    2. Reviewer #1 (Public review):

      Summary:

      Liao et al. present SCOPE (Spatial reConstruction via Oligonucleotide Proximity Encoding), a method for reconstructing spatial organization from diffusion-defined DNA barcode interactions without the use of optical imaging. In SCOPE, hydrogel beads bearing unique DNA barcodes contain both "sender" and "receiver" oligonucleotides. Upon enzymatic release, sender oligos diffuse locally and hybridize to receiver oligos on neighboring beads, forming chimeric molecules that encode spatial proximity. Sequencing these products yields an interaction matrix, which is then used to reconstruct a spatial coordinate map.<br /> The authors demonstrate reconstruction of synthetic two-dimensional shapes, a large multicolor Snellen eye chart, and the interior surface of three-dimensional molds. The work expands the conceptual and experimental landscape of optics-free spatial sequencing.

      Strengths:

      SCOPE employs bidirectional sender and receiver oligonucleotides on every bead, rather than using asymmetric transmitter-receiver architectures found in other diffusion-based methods. The symmetric design may improve detection sensitivity and reconstruction strategies, and represents a meaningful variation on optics-free spatial encoding.

      A notable strength of this study is the physical scale achieved. The authors reconstruct a Snellen chart spanning approximately 704 mm² and demonstrate molded 3D structures on the order of 75-100 mm³. Although some larger-scale warping is evident, and is discussed as potentially due to non-uniform diffusion, the relative local positioning across these large areas appears impressively accurate.

      The authors extend reconstruction beyond two-dimensional arrays to three-dimensional molded surfaces. This demonstrates that the assay and the computational methods for interpreting proximity graphs can support non-planar spatial relationships, expanding the scope of optics-free spatial inference.

      Weaknesses:

      Although the method is discussed in the context of spatial genomics and potential tissue applications, it is currently demonstrated only on engineered two-dimensional bead arrays and three-dimensional shapes fabricated in molds. It remains unclear how SCOPE would perform in heterogeneous biological environments, where diffusion may exhibit additional non-uniformities. A biological proof-of-concept, even limited in scope, would help define the method's strengths and limitations more clearly.

      The reconstruction of three-dimensional structures lacks strong sampling from volume interiors. This is speculated to be due to several possible factors; however, this limitation constrains the method to reconstruction of volume surfaces rather than comprehensive three-dimensional profiling.

      The reconstruction workflow involves multiple preprocessing steps and embedding choices. While these appear to work well for synthetic shapes with known geometry, it is less clear how parameter choices would be made in contexts where ground truth is unknown. Clarifying how reconstruction robustness is assessed without prior knowledge of spatial structure would help readers understand how the method could be practically deployed, particularly in more heterogeneous tissue contexts.

    3. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Liao et al. present SCOPE (Spatial reConstruction via Oligonucleotide Proximity Encoding), a method for reconstructing spatial organization from diffusion-defined DNA barcode interactions without the use of optical imaging. In SCOPE, hydrogel beads bearing unique DNA barcodes contain both "sender" and "receiver" oligonucleotides. Upon enzymatic release, sender oligos diffuse locally and hybridize to receiver oligos on neighboring beads, forming chimeric molecules that encode spatial proximity. Sequencing these products yields an interaction matrix, which is then used to reconstruct a spatial coordinate map.

      The authors demonstrate reconstruction of synthetic two-dimensional shapes, a large multicolor Snellen eye chart, and the interior surface of three-dimensional molds. The work expands the conceptual and experimental landscape of optics-free spatial sequencing.

      Thank you for this accurate summary of the work.

      Strengths:

      SCOPE employs bidirectional sender and receiver oligonucleotides on every bead, rather than using asymmetric transmitter-receiver architectures found in other diffusion-based methods. The symmetric design may improve detection sensitivity and reconstruction strategies, and represents a meaningful variation on optics-free spatial encoding.

      A notable strength of this study is the physical scale achieved. The authors reconstruct a Snellen chart spanning approximately 704 mm² and demonstrate molded 3D structures on the order of 75-100 mm³. Although some larger-scale warping is evident, and is discussed as potentially due to non-uniform diffusion, the relative local positioning across these large areas appears impressively accurate.

      The authors extend reconstruction beyond two-dimensional arrays to three-dimensional molded surfaces. This demonstrates that the assay and the computational methods for interpreting proximity graphs can support non-planar spatial relationships, expanding the scope of optics-free spatial inference.

      Thank you for highlighting these strengths of SCOPE.

      Weaknesses:

      Although the method is discussed in the context of spatial genomics and potential tissue applications, it is currently demonstrated only on engineered two-dimensional bead arrays and three-dimensional shapes fabricated in molds. It remains unclear how SCOPE would perform in heterogeneous biological environments, where diffusion may exhibit additional non-uniformities. A biological proof-of-concept, even limited in scope, would help define the method's strengths and limitations more clearly.

      We concur with the reviewer that a biological proof-of-concept is a key next step, and that diffusion will be more heterogeneous in this more complex environment. To this end, we are actively working to further develop SCOPE for use in tissue sections, with the goal of capturing transcriptomes, accessible chromatin, and genomes. As part of this work, we also hope to systematically explore a range of tissue permeabilization and tissue clearing approaches to mitigate the impact of heterogeneity on performance.

      The reconstruction of three-dimensional structures lacks strong sampling from volume interiors. This is speculated to be due to several possible factors; however, this limitation constrains the method to reconstruction of volume surfaces rather than comprehensive three-dimensional profiling.

      Thank you for highlighting this important limitation. The 3D reconstructions are indeed constrained by under sampling of volume interiors. We anticipate that this might be addressed via relatively minor adjustments to the protocol, e.g. using light or base-labile linkers to trigger oligo release, with the expectation that this will improve reaction consistency throughout the volume. However, even if we are unable to resolve this issue, we note that surface-resolved reconstructions may be useful for some goals, e.g. embedding a bead-packed gel within a tissue lumen, such as the gut. This could enable surface beads to capture RNA transcripts from adjacent cells, while bead–bead associations serve to define the surface topology.

      The reconstruction workflow involves multiple preprocessing steps and embedding choices. While these appear to work well for synthetic shapes with known geometry, it is less clear how parameter choices would be made in contexts where ground truth is unknown. Clarifying how reconstruction robustness is assessed without prior knowledge of spatial structure would help readers understand how the method could be practically deployed, particularly in more heterogeneous tissue contexts.

      Thank you for the opportunity to clarify. The computational pipeline used for 2D SCOPE reconstruction is designed to operate on a standardized input format and can be applied to arbitrary datasets without prior knowledge of spatial structure. For example, as shown in Figure 3, both the circle and “swoosh” geometries were reconstructed using the same algorithm and identical initial parameters. While certain hyper parameters are pre-specified (e.g. the number of k-nearest neighbors used to compute the pairwise distance matrix for UMAP), these are fixed across datasets. Other parameters, such as UMAP’s “min_dist,” are selected via an automated heuristic grid search that proceeds without user intervention. The agreement with ground truth in these controlled settings, together with the reproducibility of stochastic reconstructions (see Figure 3E-F), supports the robustness of the approach.

      Importantly, there was one exception. Reconstruction of the Snellen eye chart dataset required a manual step, involving an initial 3D UMAP embedding followed by a 2D projection to “flatten” the result. We suspect this reflects radial non-uniformities in sender/receiver oligo diffusion at larger spatial scales. Addressing such confounders algorithmically by explicitly modeling diffusion heterogeneity represents an important area for future work, with the goal of entirely eliminating the need for manual intervention.

      Finally, we note that these benchmark shapes represent somewhat contrived examples, and the geometries encountered in practice may often be much less complex. For example, in conventional spatial genomics, the geometry consists of a bead monolayer forming a flat, regular surface on a rectangular slide of known dimensions. Regardless of the tissue architecture overlaid on this surface, the reconstruction problem is defined by the bead monolayer itself, inferred through sender-receiver interactions.

      References

      Qian N, Li J, Yasser R, Yu M, Weinstein JA. 2026. Volumetric DNA microscopy for mapping spatial transcriptomes in three dimensions. Nat Protoc. doi:10.1038/s41596-025-01329-3

      Qian N, Weinstein JA. 2025. Spatial transcriptomic imaging of an intact organism using volumetric DNA microscopy. Nat Biotechnol 1–11.

    1. eLife Assessment

      This important study investigates how the brain categorizes written words from different writing systems (e.g., alphabetic vs. non-alphabetic), shedding potential light on the neural basis of language's social‑categorization function. Overall, the evidence supporting the authors' claims is solid, though some analyses and key interpretations would benefit from fuller justification.

    2. Reviewer #1 (Public review):

      Summary:

      This study demonstrates, through a series of EEG and MEG experiments, that the human brain automatically categorizes words from alphabetic and non-alphabetic languages, and it unpacks the neural mechanisms of this process from multiple angles. The work examines not only univariate repetition-suppression (RS) effects, but also how repeating or alternating languages influences the representational similarity of words within and across language categories.

      Strengths:

      The univariate RS effects across multiple experiments lend support to some of the main conclusions

      Weaknesses:

      I have reservations about the logic underlying the multivariate analyses, and I believe the implications of the control experiments merit fuller discussion.

      (1) Question 1: Logic of the multivariate analyses

      The original text states:

      "The processing of intra-language similarity was quantified as correlation distances between neural responses to two words of the same language, which occurred more frequently and would be inhibited in the Rep-Cond (vs. Alt-Cond) due to habituation (Fig. 1c)...".

      I argue that this passage conflates two levels. Building a representational dissimilarity matrix (RDM) is a data-analysis step; it cannot be equated with a cognitive computation. Hence, there is no sense in which this computation occurs "more frequently" in one condition. RDM construction rests on the pairwise similarity of activity patterns, so even if a task engaged no cognitive computation of representational similarity, we could still compute an RDM. Conversely, if a task factor alters the RDM, we must explain how that factor changes the underlying neural patterns, not claim that it triggers specific cognitive processing. Therefore, I neither understand what "more frequent processing" the authors refer to, nor accept their account of the multivariate results.

      The multivariate result pattern, briefly, is that distances between words, both within and across languages, are larger under the repetition condition. One plausible interpretation is that a word representation comprises two parts: language-type (alphabetic vs. non-alphabetic) and fine-grained identity features (visual shape, orthography, semantics, phonology, etc.). Repetition of language type may, via RS, reduce the weight of the first component, thereby increasing the relative contribution of fine-grained features and amplifying inter-word differences. This could explain the multivariate findings.

      (2) Question 2:

      For unlearned languages, people cannot distinguish lexical from sub-lexical levels. What, then, determines (i) the RS-effect difference between letters and radicals in familiar languages and words in unlearned ones, and (ii) the similarity of repetition effects between words in unlearned and familiar languages? An explicit account is needed.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates how the human brain categorizes visual words from distinct writing systems (alphabetic vs. non-alphabetic) as a neural basis for the social-categorization function of language. Using a repetition suppression paradigm combined with electroencephalography and magnetoencephalography, the authors conducted nine experiments with independent participants to identify the neural network underlying language-based categorization, characterize its temporal dynamics, and test whether this process operates independently of linguistic properties such as semantic meaning and pronunciation.

      Strengths:

      (1) The study employs a well-validated design with clear control conditions and systematically manipulates key variables, including writing system, language familiarity, and native language background. The use of nine experiments with independent participant samples strengthens the reliability and replicability of the results.

      (2) The work combines EEG and MEG, cross-validating findings across imaging modalities to support the reported neural effects. A combination of univariate, multivariate, and connectivity analyses is used to characterize neural responses and network interactions.

      (3) Results are consistent across multiple language groups and for both familiar and unfamiliar languages, supporting the generalizability of the identified neural mechanism beyond specific languages or prior experience.

      Weaknesses:

      The authors provide compelling evidence that the identified neural network supports the categorization of words by language, including computations of intra-language similarity and inter-language difference. However, the conceptual framing of this finding as directly reflecting the social-categorization function of language may be premature. While the task captures spontaneous language categorization, it does not involve social evaluation or intergroup processes. The connection to social categorization is inferred from prior literature rather than demonstrated within the current experimental design. Clarifying this distinction would strengthen the conceptual precision of the manuscript.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study demonstrates, through a series of EEG and MEG experiments, that the human brain automatically categorizes words from alphabetic and non-alphabetic languages, and it unpacks the neural mechanisms of this process from multiple angles. The work examines not only univariate repetition-suppression (RS) effects, but also how repeating or alternating languages influences the representational similarity of words within and across language categories.

      Strengths:

      The univariate RS effects across multiple experiments lend support to some of the main conclusions

      Weaknesses:

      I have reservations about the logic underlying the multivariate analyses, and I believe the implications of the control experiments merit fuller discussion.

      (1) Question 1: Logic of the multivariate analyses

      The original text states:

      "The processing of intra-language similarity was quantified as correlation distances between neural responses to two words of the same language, which occurred more frequently and would be inhibited in the Rep-Cond (vs. Alt-Cond) due to habituation (Fig. 1c)...".

      I argue that this passage conflates two levels. Building a representational dissimilarity matrix (RDM) is a data-analysis step; it cannot be equated with a cognitive computation. Hence, there is no sense in which this computation occurs "more frequently" in one condition. RDM construction rests on the pairwise similarity of activity patterns, so even if a task engaged no cognitive computation of representational similarity, we could still compute an RDM. Conversely, if a task factor alters the RDM, we must explain how that factor changes the underlying neural patterns, not claim that it triggers specific cognitive processing. Therefore, I neither understand what "more frequent processing" the authors refer to, nor accept their account of the multivariate results.

      The multivariate result pattern, briefly, is that distances between words, both within and across languages, are larger under the repetition condition. One plausible interpretation is that a word representation comprises two parts: language-type (alphabetic vs. non-alphabetic) and fine-grained identity features (visual shape, orthography, semantics, phonology, etc.). Repetition of language type may, via RS, reduce the weight of the first component, thereby increasing the relative contribution of fine-grained features and amplifying inter-word differences. This could explain the multivariate findings.

      Thank you for these insightful comments regarding the logic of the multivariate analyses. In the revision, we will clarify that the multivariate analyses were conducted to assess correlation distances between neural responses to pairs of words, either within the same language or across different languages. The processing of intra-language similarity was assessed rather than defined by conducting the multivariate analyses. We will further elaborate the rationale underlying our experimental design, specifically why the processing of intra-language similarity is expected to occur more frequently in the repetition condition (Rep-Cond) than in the alternation condition (Alt-Cond).

      We also appreciate the alternative account of the observed neural repetition suppression (RS) effects in terms of language-type versus fine-grained identity feature processing. This perspective will be incorporated into the revised Discussion. In particular, we will outline the patterns of neural activity predicted by an account that assumes an increasing contribution of fine-grained features, and evaluate the extent to which our findings are consistent with these predictions.

      (2) Question 2:

      For unlearned languages, people cannot distinguish lexical from sub-lexical levels. What, then, determines (i) the RS-effect difference between letters and radicals in familiar languages and words in unlearned ones, and (ii) the similarity of repetition effects between words in unlearned and familiar languages? An explicit account is needed.

      Thank you for this helpful suggestion. In the revised manuscript, we will include a dedicated paragraph addressing these two issues. Specifically, we will provide a more precise account of the differences in repetition suppression (RS) effects between letters and radicals in familiar languages, as well as the similar RS effects observed for unlearned and familiar languages. These additions will help clarify the interpretation of the neural RS effects associated with visual word processing and strengthen the theoretical implications of our findings.

      Reviewer #2 (Public review):

      Summary:

      This study investigates how the human brain categorizes visual words from distinct writing systems (alphabetic vs. non-alphabetic) as a neural basis for the social-categorization function of language. Using a repetition suppression paradigm combined with electroencephalography and magnetoencephalography, the authors conducted nine experiments with independent participants to identify the neural network underlying language-based categorization, characterize its temporal dynamics, and test whether this process operates independently of linguistic properties such as semantic meaning and pronunciation.

      Strengths:

      (1) The study employs a well-validated design with clear control conditions and systematically manipulates key variables, including writing system, language familiarity, and native language background. The use of nine experiments with independent participant samples strengthens the reliability and replicability of the results.

      (2) The work combines EEG and MEG, cross-validating findings across imaging modalities to support the reported neural effects. A combination of univariate, multivariate, and connectivity analyses is used to characterize neural responses and network interactions.

      (3) Results are consistent across multiple language groups and for both familiar and unfamiliar languages, supporting the generalizability of the identified neural mechanism beyond specific languages or prior experience.

      Weaknesses:

      The authors provide compelling evidence that the identified neural network supports the categorization of words by language, including computations of intra-language similarity and inter-language difference. However, the conceptual framing of this finding as directly reflecting the social-categorization function of language may be premature. While the task captures spontaneous language categorization, it does not involve social evaluation or intergroup processes. The connection to social categorization is inferred from prior literature rather than demonstrated within the current experimental design. Clarifying this distinction would strengthen the conceptual precision of the manuscript.

      Thank you for raising this important point. In the revised Discussion, we will include an additional paragraph to clarify several related issues. First, prior research suggests that language can serve as a socially relevant category cue. Second, these findings imply that rapid categorization of words by language may occur in the human brain. Third, our results identify a neural network supporting such rapid language-based categorization but do not directly test how this process relates to social categorization. Highlighting these points will help delineate the scope of our findings and point to important directions for future research.

      We'll work on a revision of the manuscript and will submit the revision when it's ready.

    1. eLife Assessment

      This important study reports that an oncogenic population in an epithelium can either be repressed or spread, depending on the tissues. This is explained by hypothesising the existence of a heterotypic tension at the boundary between different cell types, and supported by pharmacological perturbations and numerical simulations using the vertex model. The solid study conveys a key message, although some uncertainty remains regarding the origin of the heterotypic tension in relation to acto-myosin organisation in the boundary cells.

    2. Reviewer #1 (Public review):

      Summary:

      The behaviour of cells expressing constitutively active HRas is examined in mosaic monolayers, both in MCF10a breast epithelial and Beas2b bronchial epithelial cell lines, mimicking the potential initial phase of development of carcinoma. Single HRas-positive cells are excluded from MCF10a but not Beas2b monolayers. Most interestingly, however, when in groups, these cells are not excluded, but rather sharply segregated within a MCF10a monolayer. In contrast, they freely mix with wt Beas2b cells. Biophysical analysis identifies high tension at heterotypic interfaces between HRas and wild-type cells as the likely reason for segregation of MCF10a cells. The hypothesis is supported experimentally, as myosin inhibition abolishes segregation. The probable reason for lack of segregation in the bronchial epithelium is to be found in the different intrinsic properties of these cells, which form a looser tissue with lower basal actomyosin activity. The behaviour of single cells and groups is recapitulated in a vortex model based on the principle of differential interfacial tension, under the condition of high heterotypic interfacial tension.

      Strengths:

      Despite being long recognized as a crucial event during cancer development, segregation of oncogenic cells has been a largely understudied question. This nice work addresses the mechanics of this phenomenon through a straightforward experimental design, applying the biophysical analytical approaches established in the field of morphogenesis. Comparison between two cell types provides some preliminary clues on the diversity of effects in various cancers.

      Weaknesses:

      Although not calling into question the main message of this study, there are a few issues that one may want to address:

      (1) One may be careful in interpreting the comparison between MCF10a and Beas2b cells as used in this study. The conditions may not necessarily be representative of the actual properties of breast and bronchial epithelia. How much of the epithelial organization is reconstituted under these experimental conditions remains to be established. This is particularly obvious for bronchial cells, which would need quite specific culture conditions to build a proper bronchial layer. In this study, they seemed to be on the verge of a mesenchymal phenotype (large gaps, huge protrusions, cells growing on top of each other, as mentioned in the manuscript).

      As an alternative to Beas2b, comparison of MCF10a with another cell line capable of more robust in vitro epithelial organization, but ideally with different adhesive and/or tensile properties, would be highly interesting, as it may narrow down the parameters involved in segregation of oncogenic cells.

      (2) While the seminal description of tissue properties based on interfacial tensions (Brodland 2002) is clearly key to interpreting these data, the actual "Differential Interfacial Tension Hypothesis" poses that segregation results from global differences, i.e., juxtaposition of two tissues displaying different intrinsic tensions. On the contrary, the results of the present work support a different scenario, where what counts is the actual difference in tension ALONG the tissue boundary, in other words, that segregation is driven by high HETEROTYPIC interfacial tension. This is an important distinction that should be clarified.

      (3) Related: The fact that actomyosin accumulates at the heterotypic interface is key here. It would be quite informative to better document the pattern of this accumulation, which is not clear enough from the images of the current manuscript: Are we talking about the actual interface between mutant and wt cells (membrane/cortex of heterotypic contacts)? Or is it more globally overactivated in the whole cell layer along the border? Some better images and some quantification would help.

      (4) In the case of Beas2b cells, mutant cells show higher actin than wt cells, while actin is, on the contrary, lower in mutant MCF10a cells (Figure 2b). Has this been taken into account in the model? It may be in line with the idea that HRas may have a different action on the two cell types, a possibility that would certainly be worth considering and discussing.

      Comments on revisions:

      There is still one last point that should be made even clearer:

      The system is being modelled based on the principle of INTERFACIAL TENSION, a description pioneered by the works of Steinberg and of Harris, and nicely conceptualized by Brodland (2002). Now the observed behaviour is a perfect case of sorting based on higher interfacial tension AT the boundary between cell types (with nice additional documentation of local actin and myosin enrichment in the revised manuscript). What needs to be made crystal clear it that this is NOT equivalent to the model of DITH ("DIFFERENTIAL INTERFACIAL TENSION HYPOTHESIS)" (Brodland 2002, Krieg et al 2008). It is important to stop using DITH in this context, as it leads to confusion and misinterpretations. Indeed, DITH predicts cell/tissue sorting based on differences in interfacial tension WITHIN the two cell types. While DITH accounts for relative POSITIONING (one tissue engulfing the other), it is now established that this is not the motor for cell sorting and tissue segregation, the key parameter is being heterotypic tension at the heterotypic interface. I thus invite the authors to avoid the terms "differential"/DITH, and rather use either "interfacial tension", or specifically to "HIGH HETEROTYPIC INTERFACIAL TENSION".

      Related: the authors correctly cite Canty et al NatComm2017 when discussing this phenomenon. I suggest to add an additional key supporting reference "D.M. Sussman, J.M. Schwarz, M.C. Marchetti, M.L. Manning, Soft yet sharp interfaces in a vertex model of confluent tissue, Phys. Rev. Letters 120 (2018) 058001". One may also include another pioneer work in Drosophila is "M. Aliee, J.C. Roper, K.P. Landsberg, C. Pentzold, T.J. Widmann, F. Julicher, C. Dahmann, Physical mechanisms shaping the Drosophila dorsoventral compartment boundary, Curr. Biol. 22 (2012) 967-976."

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigate the behavior of oncogenic cells in mammary and bronchial epithelia. They observe that individual oncogenic cells are preferentially excluded from the mammary epithelium, but they remain integrated in the bronchial epithelium. They also observe that clusters of oncogenic cells form a compact cluster in mammary epithelium, but they disperse in the bronchial epithelium. The authors demonstrate experimentally and in the vertex model simulations that the difference in observed behavior is due to the differential tension between the mutant and wild-type cells due to a differential expression of actin and myosin.

      Strengths:

      * Very detailed analysis of experiments to systematically characterize and quantify differences between mammary and bronchial epithelia

      * Detailed comparison between the experiments and vertex model simulations to identify the differential cell line tension between the oncogenic and wild-type cells as one of the key parameters that are responsible for the different behavior of oncogenic cells in mammary and bronchial epithelia

      Weaknesses:

      * It is unclear what is the mechanistic origin of the shape-tension coupling, which is used in the vertex model, and how important that coupling is for the presented results. Authors claim that the shape-tension coupling is due to the anisotropic distribution of stress fibers when cells are under external stress. It is unclear why the stress fibers should affect an effective line tension on the cell boundaries and why the stress fibers should be sensitive to the magnitude of the internal isotropic cell pressure. In experiments, it makes sense that stress fibers form when cells are stretched. Similar stress fibers form when cytoskeleton or polymer networks are stretched. It is unclear why the stress fibers should be sensitive to the magnitude of internal isotropic cell pressure. If all the surrounding cells have the same internal pressure, then the cell would not be significantly deformed due to that pressure and stress fibers would not form. Authors should better justify the use of the shape-tension coupling in the model, since most of the observed behavior is already captured by the differential tension even if there is no shape-tension coupling.

      * The observed difference of shape indices between the interfacial and bulk cells in simulations in the absence of differential line tension is concerning. This suggests that either there are not enough statistics from the simulations or that something is wrong with the simulations. For all presented simulation results, the authors should repeat multiple simulations and then present both averages and standard deviations. This way it would be easier to determine whether the observed differences in simulations are statistically significant.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      (1) One may be careful in interpreting the comparison between MCF10a and Beas2b cells as used in this study. The conditions may not necessarily be representative of the actual properties of breast and bronchial epithelia. How much of the epithelial organization is reconstituted under these experimental conditions remains to be established. This is particularly obvious for bronchial cells, which would need quite specific culture conditions to build a proper bronchial layer. In this study, they seemed to be on the verge of a mesenchymal phenotype (large gaps, huge protrusions, cells growing on top of each other, as mentioned in the manuscript).

      We thank the reviewer for this important point. We agree that our experimental conditions do not fully recapitulate the in vivo architecture of either breast or bronchial epithelia. As the reviewer points out, the two cell lines need typical culture conditions to grow in an in-vivo like architecture, such as acinar structures for mammary tissue, and a pseudostratified architecture for the bronchial tissue, and it certainly would be interesting to subject the cell lines in these organotypic architectures and study the fate of oncogenic mutant cells. However, this would be an independent study on its own and is out of the scope of the current manuscript. Here, we intend to compare these two well-established epithelial lines from mammary and bronchial epithelial tissues, with distinct intrinsic mechanical and organisational properties, in minimal culture conditions, and study how just the context of having two different sources of epithelial cells can change the fate of oncogenic cells present in the wild-type population. We have now also performed experiments with the MDCK cell line, which is not like the BEAS2B line, and has well-defined cell-cell adhesions [Supplementary figure. 4a], and epithelial morphology, and shown that the fate of HRasV<sup>12</sup> mutants is different here as well, as compared to the MCF10A cell line.

      (2) As an alternative to Beas2b, comparison of MCF10a with another cell line capable of more robust in vitro epithelial organization, but ideally with different adhesive and/or tensile properties, would be highly interesting, as it may narrow down the parameters involved in the segregation of oncogenic cells.

      We agree with the reviewer and in line with this suggestion, we have repeated the key experiments using Madin-Darby Canine Kidney (MDCK) cells, a well-established model epithelial cell line. Our results show that even though MDCK cells show significantly distinct properties compared to BEAS2B cells (MDCK being more epithelial like than BEAS2B), the dynamics of the HRasV<sup>12</sup> clusters in both these systems are similar [Supplementary figure. 4b], and distinctly different from the mammary epithelial cells (MCF10A). We did not observe the formation of an actin belt around HRasV<sup>12</sup> clusters in MDCK monolayers, which indeed forms in MCF10A monolayers. Additionally, in MDCK cells, the HRasV<sup>12</sup> mutant clusters are not under compaction or jamming, instead, they form protrusions similar to the ones seen in BEAS2B monolayers. These results solidify our hypothesis of tissue-specific differences in the mechanics of cancer initiation.

      (3) While the seminal description of tissue properties based on interfacial tensions (Brodland 2002) is clearly key to interpreting these data, the actual "Differential Interfacial Tension Hypothesis" poses that segregation results from global differences, i.e., juxtaposition of two tissues displaying different intrinsic tensions. On the contrary, the results of the present work support a different scenario, where what counts is the actual difference in tension ALONG the tissue boundary, in other words, that segregation is driven by high HETEROTYPIC interfacial tension. This is an important distinction that should be clarified.

      We thank the reviewer for this insightful comment. As correctly noted, Brodland’s 2002 work provided a foundational formulation of the Differential Interfacial Tension Hypothesis (DITH), which frames tissue organization in terms of effective interfacial tensions.

      While in its original form, DITH emphasised segregation as a consequence of global differences in the intrinsic (bulk) tensions of juxtaposed tissues, our results specifically show that segregation is determined by local interfacial mechanics between transformed- and host cells. These local interfacial dynamics, however, is related to global contractility of cells- From our experiments with blebbistatin, we have observed a loss in the efficiency of segregation upon reducing global contractility, consequently inhibiting the formation of the interfacial actomyosin belt, which serves as the source of the interfacial tension between healthy and mutant populations. Therefore, the differences in local interfacial mechanics stem from intrinsic global contractility of cells in discussion here.

      We have also clarified this distinction more clearly in the discussion and have explicitly stated that while DITH provided the foundation for conceptualizing tissue mechanics, our findings on transformed cell- healthy cell interactions specifically demonstrate that a higher efficiency of segregation is driven by high heterotypic interfacial tension at the tissue boundary.

      (4) Related: The fact that actomyosin accumulates at the heterotypic interface is key here. It would be quite informative to better document the pattern of this accumulation, which is not clear enough from the images of the current manuscript: Are we talking about the actual interface between mutant and wt cells (membrane/cortex of heterotypic contacts)? Or is it more globally overactivated in the whole cell layer along the border? Some better images and some quantification would help.

      We agree that a detailed visualisation of actomyosin distribution would strengthen our conclusions. We have now added a few more images of the interface to the Supplementary Data [Supplementary figure. 5], which show that cortical actin accumulates in individual cells, at the wild type cell-mutant cell interface, and actin levels go up in both wild type and mutant populations at the interface. This is also clear from the quantifications of different region of interests [Figure 2e], which is done by segmenting individual cells in these regions and quantifying actin intensity in each cell.

      (5) In the case of Beas2b cells, mutant cells show higher actin than wt cells, while actin is, on the contrary, lower in mutant MCF10a cells (Author response image 2). Has this been taken into account in the model? It may be in line with the idea that HRas may have a different action on the two cell types, a possibility that would certainly be worth considering and discussing.

      We thank the reviewer for raising this important point. While a direct experimental dissection of how HRasV<sup>12</sup> mutation affects actin levels in BEAS2B and MCF10A cells individually is beyond the scope of the present study, we do not rule out the possibility that a HRasV<sup>12</sup> mutation may exert cell-type-specific biochemical effects on actin regulation in these two epithelial systems.

      Although the difference in actin between the mutants and the wild-type cells has not been incorporated into the model presented in the manuscript, we have now shown how actin levels change in response to the interfacial tension formed between the mutant and wildtype cells by adding a mechanochemical feedback to the model. Rather than prescribing intrinsic differences in actin levels between mutant and wild-type cells, we asked whether the feedback between the actin cytoskeleton and mechanical stress alone is sufficient to generate the observed actin reorganization. To address this, we incorporate a mechanochemical feedback loop (MCFL-I), originally developed in our earlier work [35], into the vertex model framework. This feedback captures the experimentally observed coupling between cell shape, actomyosin organization, and mechanical stress (i.e., heterotypic interfacial tension), and has previously been shown to reproduce biologically realistic epithelial behaviours such as dynamic cell shapes and heterogeneous actomyosin distributions [35].

      In this framework, actin is not introduced as an explicit or intrinsic variable. Instead, changes in actomyosin organization emerge dynamically in response to mechanical stresses. Specifically, MCFL-I allows the preferred area and preferred perimeter of cells to evolve depending on cell shape and actomyosin binding, rather than remaining fixed. From these evolving parameters, we compute the normalized contractility, , which we interpret as a proxy for bulk actin, and normalized line tension which we interpret as a proxy for junctional actin. These normalized quantities provide size-independent measures of actomyosin organization across the tissue. 

      The equations for MCFL-I can be written as:

      Thus, with MCFLs, the vertex model does not have fixed 𝐴<sub>0</sub> and 𝑃<sub>0</sub>. The cells dynamically change these parameters depending on the vertex model dynamics. The constitutive relations for the and are given below [1]:

      Here, is the fraction of myosin bound to actin as a function of cell area 𝐴. This nonlinear dependence arises from the load or strain-dependent binding of myosin to actin, and is a model parameter which is proportional to the binding affinity of myosin to actin in the absence of any strain. We consider to the be the same for both mutant and wild-type . Importantly, both mutant and wild-type cells obey identical mechanochemical rules in the model. Differences in actin organization arise solely due to differences in mechanical stress generated by differential interfacial tension. Positive differential interfacial tension compresses mutant cells within clusters. This will lead to different and P<sub>0>/sub> across the monolayer via MCFL-I, and thus reduced bulk actin and increased junctional actin [Appendix figure. 4], consistent with experimental observations. Conversely, when differential interfacial tension is weak or negative, mutant and wild-type cells experience similar stresses, and the model predicts minimal differences in actin organization [Appendix figure. 5].

      Thus, while HRasV<sup>12</sup>-dependent biochemical effects may indeed differ between BEAS2B and MCF10A cells, our results demonstrate that mechanical interactions at mutant– wild-type interfaces are sufficient to generate distinct actin signatures in the two tissues, without invoking cell-type-specific actin regulation. We have added the details of the mechanochemical feedback loop in the model to the Appendix to emphasize that the model tests the sufficiency of mechanics-driven actin reorganization rather than excluding additional biochemical contributions. 

      Although it looks that even for Λ > 0 we see that the normalized line tension seems to be negative. This is however just an artefact of the colorbar limits we have used to compare with the Λ < 0 case. If we plot with different colorbar limits, we see that the interface has as shown in Author response image 1.

      Author response image 1.

      Reviewer #2 (Public review):

      (1) It is unclear what the mechanistic origin of the shape-tension coupling is, which is used in the vertex model, and how important that coupling is for the presented results. The authors claim that the shape-tension coupling is due to the anisotropic distribution of stress fibers when cells are under external stress. It is unclear why the stress fibers should affect an effective line tension on the cell boundaries and why the stress fibers should be sensitive to the magnitude of the internal isotropic cell pressure. In experiments, it makes sense that stress fibers form when cells are stretched. Similar stress fibers form when the cytoskeleton or polymer networks are stretched. It is unclear why the stress fibers should be sensitive to the magnitude of internal isotropic cell pressure. If all the surrounding cells have the same internal pressure, then the cell would not be significantly deformed due to that pressure, and stress fibers would not form. The authors should better justify the use of the shape-tension coupling in the model and also present simulation results without that coupling. I expect that most of the observed behavior is already captured by the differential tension, even if there is no shape-tension coupling.

      The reviewer is correct in stating that most of the observed behaviour is already captured by the differential tension, without the shape-tension coupling. However, the shape tension coupling has been used here in accordance with the experimental observation that the cells at the interface are aligned and elongated along the interface [Fig. 2h], which can not be captured without the shape-tension coupling. The difference between shape indices of cells at the interface and away from the boundary is plotted versus the interfacial tension in the case of no shape-tension coupling [Appendix figure 2]. The red dashed line represents the experimental value of the shape index difference. The blue line is the shape index difference between two randomly chosen groups of cells (half of the total number of cells in each group is taken). At zero line-tension, the difference in shape index between interface cells and cells away from the interface is same as that between randomly chosen groups of cells, which is expected since there should be no interface at zero line-tension. The no shape-tension data presented here are averaged over 19 seeds. Although the results without shape-tension coupling reaches experimental values at high enough differential tension [Appendix figure 3], a closer inspection of the simulation results show that the cells are just squeezed and are aligned perpendicular to the interface, which is contrary to what is seen in experiments [Fig. 2h].

      Calculating the average of the absolute value of the dot product of the nematic director and the interface edge for simulations with and without shape-tension coupling [Appendix figure 3] clearly shows that with shape-tension coupling, the cells align and elongate along the interface as is seen in experiment, given by an interface dot product value > 0.5 at high enough line-tension values. Further, shape-tension coupling or biased edge tension has been used before to model for cell elongation during embryo elongation [45] and here we use it as an active line-tension force, which elongates cells along the interface, in addition to the differential tension which is passive. This additional quantification of the alignment and elongation of cells along the interface will be added to the Appendix.

      (2) The observed difference of shape indices between the interfacial and bulk cells in simulations in the absence of differential line tension is concerning. This suggests that either there are not enough statistics from the simulations or that something is wrong with the simulations. For all presented simulation results, the authors should repeat multiple simulations and then present both averages and standard deviations. This way, it would be easier to determine whether the observed differences in simulations are statistically significant.

      The difference in shape indices between the interfacial and bulk cells in simulations has now been calculated over 11 different seed values. The observed differences in simulations, along with the standard deviations have been plotted in Figure 4b. This figure will be updated to include the standard deviations. The nonzero difference in shape index in the absence of differential line tension for low values of stress threshold is due to the shape-tension coupling acting even at low differential tension. Thus, a non-zero, sufficiently high value of the stress threshold is required in our model with shape-tension coupling. This has also been stated in section 4 of the paper. The importance of the shape-tension coupling has been stated in response to the previous point.

      (3) The authors should also analyze the cell line tension data in simulations and make a comparison with experiments.

      The line tension for each edge can be calculated as .

      Although the line tension distributions look similar to the ones obtained from Bayesian Force Inference, a better comparison is between the normalized line tension and actin seen in experiment as we have discussed under point (4) asked by Reviewer 1.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) The authors claim that the negative tension Lambda<0 resembles the Beas2b phenotype. This is not consistent with the expression of actin in Figure 2f, which seems very similar in all four regions of interest (ROIs). Also, the segregation index data for Beas2b in Figure 1h looks very different from the demixing parameter in Figure 4f for the negative value of Lambda.

      In the model presented in the previous version of the manuscript, actin differences have not been incorporated. We have only added an interfacial line tension, which might arise only at the interface between cells. In response to comment (4) from Reviewer 1, we have considered a vertex model with mechanochemical feedback and interfacial line tension to understand how actin distribution in the tissue is affected by interfacial tension. The results presented match very well with experimental images.

      The reviewer has rightly pointed out that the segregation index (SI) data presented in Fig. 1h have a different trend compared to those in Fig. 4f. However, it is essential to note that in the simulation, the initial condition is one in which the mutant cluster is already fully segregated, and thus, at the initial time point. This is not the case in experiments, and at initial time points. Thus, the two plots are not directly comparable and only show how SI changes in our simulations. It is more effective to compare the final time points in Fig. 2f with those in Fig. 4e, where we observe that Mcf10a has a higher SI compared to Beas2b, and the case with Λ > 0 has a higher SI than the case with Λ < 0. This supports our claim that Λ < 0 resembles the Beas2b phenotype and Λ > 0 resembles the Mcf10a phenotype.

      (2) It is unclear how the threshold pressure Pi_0 is implemented for the shape-tension coupling in the vertex model. Is the value of the additional tension gamma_ij equal to 0 if the internal pressure is below that threshold?

      The stress threshold is implemented for the shape-tension in the vertex model in the following way. The line tension forces can be written as:

      where, and . If the stress on the cell is below the threshold, then for those cells.

      (3) In vertex model simulations, the authors use identical parameters for wild-type and mutant cells. This does not seem to be consistent with experimental observations in Figure 2, where the expression of actin is different, and also, cell shape indices are different for the wild-type and mutant cells. The authors should comment on how that choice affects their simulation results.

      We thank the reviewer for this comment. As noted in our response to comment 4 from  reviewer 1, we have now attempted our simulations after adding a mechanochemical feedback to the model. Here, both wild-type and mutant cells follow identical mechanochemical rules within the vertex model. This choice does not imply that the cells are mechanically identical in the tissue; rather, it allows us to test whether differences in cell shape and actin organization can emerge purely from mechanical interactions.

      By incorporating the mechanochemical feedback loop (MCFL-I), the model captures how heterotypic interfacial tension redistributes mechanical stresses between mutant and wild-type cells. These stresses lead to differences in cell area, perimeter, and shape, which are then translated via MCFL-I into distinct bulk and junctional actin signatures. Consequently, even though the intrinsic parameters are the same, the emergent mechanical environment reproduces the experimentally observed differences in actin intensity and cell shape indices (as shown in Figure 2).

      Thus, our approach demonstrates that the experimentally observed heterogeneity between mutant and wild-type cells can arise solely from interface-driven mechanical effects, without prescribing any cell-type-specific parameters in the model.

      (4) Also provide data for cell line tensions in the vertex model, which can then be compared with the experimental data in Figure 2. This is especially important because the differential cell line tension at the interface of mutants and wild-type cells seems to be playing a very important role.

      The cell tensions from the vertex model have been plotted in the response to main comment (3) from Reviewer 2. Since the interfacial tension has been included as an extra term in the vertex model by hand, it is not trivial to simply compare the line tensions from the vertex model to the experimental data. However, we can understand how the tensions are by looking at the normalised tension and normalised contractility plotted as a response to comment (4) from Reviewer 1. Those plots are from a vertex model with mechanochemical feedback and the plots match well with experimental actin images.

      (5) In Figure 2j, the authors should report the relative cell pressure and line tension for all four ROIs. The data is only shown for the wild-type cells and for mutants in clusters, even though the figure caption states that the data is presented for all four ROIs. It would also be useful to report the cell tension at the interface between the mutant cells and wild-type cells since this is the key parameter for the vertex model simulations.

      We agree and have updated the graph [Figure 2j].

      (6) The tangential motion of cells around oncogenic clusters only shows up towards the end of Supplementary Video 3. It is unclear whether this is a transient effect or whether this tangential motion would persist for a longer time.

      We thank the reviewer for raising this point. In our experiments, tangential cell motion in the wild type population along the boundary of oncogenic cluster consistently emerges as the oncogenic cluster becomes compacted. We have plotted tangential velocity in interfacial wild type cells over time (Supplementary Fig. 6b), and show that such a motion persist at the cluster-wild-type interface, until the end of time-lapse recordings in all cases. 

      (7) It is very awkward that the authors are representing an integral of the tangential velocity over different loops in Figures 3c and 4i. Thus, it is very hard to separate how much of the increase in the integrated velocity is due to larger loops and how much is due to changes in the average tangential velocity. Since different loops have different perimeters, it would have been better to report the average tangential velocity by dividing the integrated tangential velocity by the perimeter length of each loop. In the methods, the authors state that the concentric circles go from the center to a point twice the radius of the mutant cluster, but this is not consistent with the image in Figure 3c, where the concentric circles seem to go only to the boundary of the mutant cluster.

      We thank the reviewer for raising the point regarding the dependence of the loop-integrated tangential velocity on the perimeter length. While the circulation (loop-integrated tangential velocity) indeed scales with loop size, it increases with radius only if tangential velocity components are directionally coherent along the loop.

      In our data, concentric-loop analysis centered on mutant clusters reveals a systematic increase in tangential motion with radius, with the largest values occurring at the outermost loops corresponding to the cluster–tissue interface. In contrast, applying the identical analysis to randomly selected wild-type regions does not yield any monotonic increase with radius, despite the increasing perimeter of the loops, and instead shows fluctuations around zero. This control demonstrates that the observed increase around mutant clusters is not a trivial geometric consequence of larger loop size but reflects the emergence of coherent tangential motion specifically at the mutant cluster boundary.

      To further address the reviewer’s concern, we additionally computed the mean tangential velocity by normalizing the loop-integrated tangential velocity by the loop perimeter. As shown in Supplementary figure. 6a, this normalization preserves the same qualitative trend: tangential motion peaks near the periphery of mutant clusters, whereas no such trend is observed in wild-type regions. We therefore conclude that both metrics capture the same physical phenomenon: enhanced tangential cell motion localized to the mutant cluster boundary, consistent with the behavior observed in the time-lapse videos.

      Author response image 2.

      From simulation data

      (8) The authors should comment on how jamming and unjamming are related to shape indices because some readers may not be familiar with them.

      We have updated the same in the text of Results 2.

      (9) In the captions of Figure 3, the authors state that the bronchial epithelium gets kinetically arrested. This is not evident from the data in Figure 3d, where the velocity magnitude drops just a little bit for the bronchial epithelium, and it remains much higher compared to the mammary epithelium at long times.

      We agree with this comment, and that using the word, kinetically arrested, for Beas2b cells is misleading, since their motion is much higher, even after the initial drop. We have updated the text in the caption accordingly.

      (10) It is unclear why the authors have used the segregation index for analyzing experiments and the demixing parameter for analyzing simulations. Both parameters are trying to quantify the same thing, so it would have been better to use the same quantity for both experiments and simulations to enable easier comparison.

      We agree that using the same quantity for both experiments and simulation would enable easier comparison. Thus, we have replaced the demixing parameter with segregation index in Figure 4. 

      (11) It is unclear what experimental data were used for shape indices in Figure 4c. Was it the data from Mcf10a or Beas2b? It is also unclear which ROIs were used because different ROIs have very different shape indices in experiments, according to Figure 2e,f.

      We have used the experimental ∆(𝑆ℎ𝑎𝑝𝑒 𝑖𝑛𝑑𝑒𝑥) = 0.75, which is a rough estimate of the difference between the shape indices for ROI 2 (interface), and ROI 1, ROI 3 and ROI 4 (away from interface) from Fig. 2 e for MCFL10a. 

      (12) The authors find that the differences in shape indices are non-zero even for Lambda=0 for some threshold pressure parameters Pi_0 in Figure 4c. This should not happen because all the cells are identical in that case. This suggests that either there are not enough statistics from the simulations or that something is wrong with the simulations. How is this simulation data obtained? Is it from a single simulation, or is this averaged over a certain number of simulations? Authors should perform multiple simulations and report both the mean values and the standard deviation.

      We have addressed this in the response under main comments (1) and (2) from Reviewer 2.

      (13) It is unclear how the cell extrusion was simulated in the vertex model.

      Extrusion probability calculation: Simulations with just a single mutant cell were run for a range of differential interfacial line tension values (Λ = 0, 0.1, 0.4, 0.8, 1.2, 1.6) with shape tension coupling. The simulation was run till the area of the mutant cell fell below a threshold area = 0.1, after which we consider the mutant cell to be extruded. 9 different random initial seeds were run and analysed. Each seed gives a binary result – either extruded or not. This was used to calculate the extrusion probability. We have added this section to the Appendix.

      (14) The authors claim that HRas^V12 clusters in bronchial epithelium grew on top of one another, but it is not clear how this can be observed in Figure 2b or in any other Figure.

      We thank the reviewer for raising this point. Our original statement that cells were growing on top of each other was based on observations from the Z-stack images, which allowed us to resolve cell positions along the apico–basal axis. However, since these Zstack data are not included in the current manuscript, we agree that this claim cannot be directly supported by the figures shown. We have therefore removed this statement from the text and restricted our conclusions to what is directly supported by the presented data.

      (15) In the main text, the authors state that bronchial epithelial cells exhibited higher F-actin intensities compared to mammary bronchial cells, but this difference is not statistically significant according to Figure 5e.

      We agree with the reviewer and have thus changed the text because even though the Factin intensities seemed higher in bronchial epithelium visually, the difference was not statistically significant.

      (16) The definition of eccentricity is incorrect in the text. The authors state that the eccentricity is quantified as the ratio of the length of the minor axis to the major axis of an ellipse. According to this definition, the eccentricity would be 1 for a circle and not 0.

      We have updated the definition of eccentricity in the text to the correct one, including the correct equation.

      (17) It is unclear whether the active force F_act is used in the vertex model simulations. The active force is defined, but then its value is never specified. Note that the motility force is also an active force, so it is unclear why the motility and active forces were separated.

      In our model, the line tension force arising from the shape tension coupling is the active force. We agree that the motility force is also an active force, however, in the absence of any directional movement for instance, the homeostatic tissues in discussion here, we have discounted the role of motility force in our mode, presented here. 

      (18) The authors use inconsistent naming for different types of epithelia throughout the manuscript. Mcf10a cells are referred to as either mammary epithelium or breast epithelium, and Beas2b cells are referred to as either lung epithelium or bronchial epithelium. Because of the very broad spectrum of journal readers, it may not be obvious to all readers that different names refer to the same cell types.

      We have updated the text to keep the naming consistent throughout.

      (19) Many references to individual figure panels in the main text are incorrect. The authors should carefully check all the references to figures.

      We apologize for these errors. We have updated the incorrect references after carefully reviewing the entire manuscript.

      (20) In Figure 5, panel b is incorrectly labeled as d.

      We have corrected the same.

    1. eLife Assessment

      This fundamental work substantially advances our understanding of a major research question: whether collagen can be directly imaged with MRI. The evidence supporting the conclusion is compelling, with methods, data, and analyses that are more rigorous than those currently considered state-of-the-art. The work will be of high interest to MR physicists and clinicians, as collagen is the most abundant protein in the human body and plays an essential role in health.

    2. Reviewer #1 (Public review):

      Summary:

      The aim of this work is to directly image collagen in tissue using a new MRI method with positive contrast. The work presents a new MRI method that allows very short, powerful radio frequency (RF) pulses and very short switching times between transmission and reception of radio frequency signals.

      Strengths:

      The experiments with and without removal of 1H hydrogen, which is not firmly bound to collagen, on tissue samples from tendons and bones are very well suited to prove the detection of direct hydrogen signals from collagen. The new method has great potential value in medicine, as it allows for better investigation of ageing processes and many degenerative diseases in which functional tissue is replaced by connective tissue (collagen).

      Comments on revisions:

      All points of criticism in the reviews were answered very well and led to further improvement of the article.

    3. Reviewer #2 (Public review):

      Summary:

      This work presents direct magnetic resonance imaging (MRI) of collagen, which is not possible with conventional MRI or other tomographic imaging modalities.

      Strengths:

      The experimental work is impressive, and the presentation of results is clear and convincing.

    4. Reviewer #3 (Public review):

      The paper is well written and well presented. The topic is important, and its significance is explained succinctly and accurately. I am only capable of reviewing the clinical aspects of this work which is very largely technical in nature. Several clinical points are worth considering:

      (1) Tendons typically display large magic angle effects as a result of their highly ordered collagen structure (cortical bone much less so) and so it would have been of interest to know what orientation the tendons had to B 0 (in vitro and in vivo). This could affect the signal level at the longer echo time and thus the signal on the subtracted images.

      (2) The in vivo transverse image looks about mid-forearm where tendons are not prominent. A transverse image of the lower forearm where there is an abundance of tendons might have been preferable.

      (3) The in vivo images show the interosseous membrane as high signal on both the shorter and longer TE images. The structure contains ordered collagen with fibres at different oblique angles to the radius and ulnar and thus potentially to B 0. Collagen fibres may have been at an orientation towards the magic angle and this may account for the high signal on the longer TE image, and the low signal on the subtracted image.

      (4) Some of the signals attributed to muscle may be from an attachment of the muscle to aponeurosis.

      (5) There is significant collagen in subcutaneous tissues so the designation "skin" may more correctly be "skin and subcutaneous tissue".

      (6) Cortical bone is very heterogeneous with boundaries between hard bone and soft tissue with significant susceptibility differences between the two across a small distance. This might be another mechanism for ultrashort T 2 * tissue values in addition to the presence of collagen. The two effects might be distinguished by also including a longer TE spin echo acquisition.

      Solid cortical bone may also have an ultrashort T 2 * in its own right.

      (7) It may be worth noting that in disease T 2 * may be increased. As a result, the subtraction image may make abnormal tissue less obvious than normal tissue. Magic angle effects may also produce this appearance.

      (8) It may be worth distinguishing fibrous connective tissue (loose or dense) which may be normal or abnormal, from fibrosis which is abnormal accumulation of fibrous connective tissue in damaged tissue. Fibrosis typically has a longer T 2 initially and decreases its T 2 * over time. In places, the context suggests that fibrous connective tissue may be more appropriate than fibrosis.

      Overall, the paper appears very well constructed and describes thoughtful and important work.

      Comments on revisions:

      The responses to my criticisms are well thought out and are fine as far as I am concerned.

      I suggest in Figure 5 line 6 changing "trabecular bone" to "trabecular bone marrow".

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The aim of this work is to directly image collagen in tissue using a new MRI method with positive contrast. The work presents a new MRI method that allows very short, powerful radio frequency (RF) pulses and very short switching times between transmission and reception of radio frequency signals.

      Strengths:

      The experiments with and without the removal of 1H hydrogen, which is not firmly bound to collagen, on tissue samples from tendons and bones, are very well suited to prove the detection of direct hydrogen signals from collagen. The new method has great potential value in medicine, as it allows for better investigation of ageing processes and many degenerative diseases in which functional tissue is replaced by connective tissue (collagen).

      Weaknesses:

      It is clear that, due to the relatively long time intervals between RF excitation and signal readout, standard hardware in whole-body MRI systems can only be used to examine surrounding water and not hydrogen bound to collagen molecules.

      We agree that this is a regrettable situation (see also Discussion section). We are hoping that current and future efforts of MRI manufacturers towards improved hardware will eventually enable the technique for broader application.

      Reviewer #2 (Public review):

      Summary:

      This work presents direct magnetic resonance imaging (MRI) of collagen, which is not possible with conventional MRI or other tomographic imaging modalities.

      Strengths:

      The experimental work is impressive, and the presentation of results is clear and convincing. Through a series of thoughtfully prepared experiments, I found the evidence that the images reflect direct measurements of collagen to be highly compelling.

      Due to the technical demands, direct collagen imaging is unlikely to become widespread for routine clinical work, at least not anytime soon. That said, this work is nonetheless transformative and will likely be highly significant for research and perhaps clinical trials.

      Reviewer #3 (Public review):

      The paper is well written and well presented. The topic is important, and its significance is explained succinctly and accurately. I am only capable of reviewing the clinical aspects of this work, which is very largely technical in nature. Several clinical points are worth considering:

      (1) Tendons typically display large magic angle effects as a result of their highly ordered collagen structure (cortical bone much less so), and so it would have been of interest to know what orientation the tendons had to B 0 (in vitro and in vivo). This could affect the signal level at the longer echo time and thus the signal on the subtracted images.

      We have added arrows in the images showing the direction of the main magnetic field. For the in vivo case, the subject lay in the superman position, with B0 pointing from the hand towards the shoulder.

      (2) The in vivo transverse image looks about mid-forearm, where tendons are not prominent. A transverse image of the lower forearm, where there is an abundance of tendons, might have been preferable.

      We have added a distal view of the forearm, where more tendon structures are observed.

      (3) The in vivo images show the interosseous membrane as a high signal on both the shorter and longer TE images. The structure contains ordered collagen with fibres at different oblique angles to the radius and ulnar, and thus potentially to B 0. Collagen fibres may have been at an orientation towards the magic angle, and this may account for the high signal on the longer TE image and the low signal on the subtracted image.

      This is certainly an interesting take. While the magic angle effect is well established for collagen bound water, the orientation effects on the macromolecular collagen signal are still to be investigated. Our initial experiences so far suggest that the direct collagen signal is not as sensitive to orientation as the bound water.  

      Regarding the described observation for the interosseous membrane, we expect the high signal coming from collagen-bound water (yet not quite at the magic angle), which hardly decays between the two TEs, as their difference is small as compared to the T2* of this signal. Hence, this signal is removed in the subtraction image, and only the macromolecular collagen signal remains, which appears to be very low. Working with samples of the interosseus membrane may provide further insights into why this is the case.

      (4) Some of the signals attributed to the muscle may be from an attachment of the muscle to the aponeurosis.

      We have added the aponeurosis as a possible signal contributor in the muscle tissue.

      (5) There is significant collagen in subcutaneous tissues, so the designation "skin" may more correctly be "skin and subcutaneous tissue".

      We have updated the label accordingly.

      (6) Cortical bone is very heterogeneous, with boundaries between hard bone and soft tissue with significant susceptibility differences between the two across a small distance. This might be another mechanism for ultrashort T 2 * tissue values in addition to the presence of collagen. The two effects might be distinguished by also including a longer TE spin echo acquisition.

      Solid cortical bone may also have an ultrashort T 2 * in its own right.

      The described effect is clearly of importance for bone water but plays a negligible effect for the macromolecular signal. We would like to support this by a brief, coarse estimation. 𝑇<sub>2</sub>* can be approximated by 1/𝑇<sub>2</sub>* = 1/𝑇<sub>2</sub> + 1⁄𝑇<sub>2</sub>′, where 1⁄𝑇<sub>2</sub>′ \= 𝛾∆𝐵 = 𝛾∆𝜒𝐵<sub>0</sub> (Ref. 1).

      The susceptibilty difference reported for the interface between bone and water is ∆𝜒 = 2.5 ppm (Refs. 2 and 3), which at 3T leads to a 𝑇<sub>2</sub>′ ≈ 3000 𝜇𝑠. From our recorded FIDs, we use a 𝑇<sub>2</sub>* of 10 μs and thus obtain 𝑇<sub>2</sub> \= 10.03 𝜇𝑠.

      As can be seen, the change in the transverse relaxation constant due to susceptibility is negligible compared to the intrinsic decay of the macromolecular collagen signal. Notably, this is not the case for the pore water signal where T<sub>2</sub>s are on the order of milliseconds (Ref. 2).

      A footnote was added in the Introduction section regarding this topic.

      (7) It may be worth noting that in disease T 2 * may be increased. As a result, the subtraction image may make abnormal tissue less obvious than normal tissue. Magic angle effects may also produce this appearance.

      This is an important point regarding image interpretation. For this reason, it is advantageous that also the original anatomical images prior to subtraction are available, which will show such effects. They can be used in conjuction with the collagen-specific image to provide further insights regarding tissue disease. Increased T<sub>2</sub>* of diseased tissue has so far been reported for the bound water components due to a reduction of dipolar interactions between bound water and collagen (Ref. 4). A potential related change in T<sub>2</sub> for the macromolecular collagen component itself is certainly of interest and an avenue to explore in future work.

      (8) It may be worth distinguishing fibrous connective tissue (loose or dense), which may be normal or abnormal, from fibrosis, which is an abnormal accumulation of fibrous connective tissue in damaged tissue. Fibrosis typically has a longer T 2 initially and decreases its T 2 * over time. In places, the context suggests that fibrous connective tissue may be more appropriate than fibrosis.

      We are aware of this important distinction. We therefore checked the manuscript for references to fibrosis, making sure that the meaning is as intended.

      Overall, the paper appears very well constructed and describes thoughtful and important work.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) It should be stated that various methods with very short echo times (e.g. SWIFT by Garwood et al.) have been described in the past. This work shows for the first time that direct signals from collagen and be systematically detected in tissue samples.

      We have expanded a sentence in the introduction and reference selected publications studying short-T<sub>2</sub> water signal in collagen, including SWIFT.

      (2) It should be noted that the 1H atoms bound to collagen are located at different sites (at different amino acids of the protein) of the molecule and have different frequencies, and that further signal analyses are of interest.

      We have included additional information regarding distinct resonances of proton-binding sites of collagen in the introduction. The discrete observation of such signals requires advanced NMR methodology such as magic-angle spinning and RF decoupling, which is not a suitable approach for in vivo MRI. Without such methods, the broad lineshapes overlap strongly and are rather observed as a single decaying exponential with the dipolar oscillation as we observe in the FIDs.

      (3) Is it certain that the bump at 30 microseconds comes from 'dipolar coupling'? Is the development time probably too short for chemical shift-induced interference or J-coupling effects?

      30 microseconds is an extremely short interval to accumulate phase and requires large resonance offsets to observe significant changes. To investigate the nature of the bump, we also collected data on a Bruker 7T NMR spectrometer (see Author response image 1). Overall the same signal characteristics are observed as with 3T. In particular, the position of the bump is the same, excluding chemical shift as as source. However, with the higher field strength, chemical shift becomes significant for the signal phase, as observed by the change in the phase behavior at 50 microseconds, when the collagen component has decayed.

      While J-coupling is independent of field strength, the typical ranges are single-digit to tens of Hertz. In contrast, dipolar coupling interacts on the order of thousands of Hertz, which coincides with the values extracted from our signal model.

      To clarify this point, we extended the respective sentence in the Results section.

      Author response image 1.

      (4) It should be noted that short RF pulses have a relatively high energy content, and whether there are any particular stresses on patients during the examination (SAR, nerve stimulation?).

      SAR is an important issue in ZTE MRI. Since imaging bandwidths are large and excitation is performed with the imaging gradient being on, broadband pulses are necessary. Hence, significant RF deposition occurs and in vivo the flip angle can often not be optimized for the maximum signal, but will be limited by the SAR limit. We have added an explanation in the Discussion section.

      Peripheral nerve stimulation is generated by rapid switching of strong gradients. However, ZTE sequences are usually operated without switching gradients on and off, but with only minor adjustments of the gradient direction between TR intervals. Therefore, PNS is not a relevant issue.

      (5) In the Results section, Part B, 'substantial signal intensity' should be written instead of 'substantial image intensity'.

      We have changed this as suggested.

      References

      (1) Chavhan GB, Babyn PS, Thomas B, Shroff MM, Haacke EM. Principles, techniques, and applications of T2*-based MR imaging and its special applications. Radiographics. 2009 Sep-Oct;29(5):1433-49. doi: 10.1148/rg.295095034. PMID: 19755604; PMCID: PMC2799958.

      (2) Seifert, AC, Wehrli, SL, and Wehrli, FW (2015), Bi-component T<sub>2</sub>* analysis of bound and pore bone water fractions fails at high field strengths. NMR Biomed., 28, 861– 872. doi: 10.1002/nbm.3305.

      (3) Hopkins JA, Wehrli FW. Magnetic susceptibility measurement of insoluble solids by NMR: magnetic susceptibility of bone. Magn Reson Med. 1997 Apr;37(4):494-500. doi: 10.1002/mrm.1910370404. PMID: 9094070.

      (4) Loegering IF, Denning SC, Johnson KM, Liu F, Lee KS, Thelen DG. Ultrashort echo time (UTE) imaging reveals a shift in bound water that is sensitive to sub-clinical tendinopathy in older adults. Skeletal Radiol. 2021 Jan;50(1):107-113. doi: 10.1007/s00256-020-03538-1. Epub 2020 Jul 8. PMID: 32642791; PMCID: PMC7677198.

    1. eLife Assessment

      This is a useful study that seeks to elucidate the molecular mechanisms underlying spinal motor circuit assembly. The authors demonstrate that loss of Onecut transcription factors in spinal motor neurons affects the size and spatial distribution of pre-motor interneurons. However, the study in its current form is incomplete: the data and analyses do not fully support the main conclusion that Onecut acts through Neurotrophin-3 to regulate interneuron development in a non-cell autonomous manner. The work will be of broad interest to cell and developmental biologists.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Angla et al investigate the basis of observations made from previous studies where loss of Onecut (OC) transcription factors leads to changes in spinal interneuron populations that do not themselves normally express OC. The authors hypothesize that OC expression in spinal motor neurons has non-cell-autonomous effects on pre-motor interneuron (V1, V2a/b/c) population size and distribution. By knocking out OC in the motor neuron lineage (i.e., downstream of Olig2, a motor neuron progenitor marker gene), they indeed show that motor neuron-specific loss of OC expression decreases V2c interneuron number and alters the spatial distribution of V1, V2a, V2b, and V2c populations. Using bulk RNA-sequencing of WT and OC conditional knockout (cKO) motor neurons, the authors identify that the neurotrophic factor Ntf3 is downregulated by OC expression. They subsequently hypothesize that the non-cell-autonomous effects observed by loss of OC expression in motor neurons can be explained by de-repression of Ntf3. To test this, the authors conditionally knock out Ntf3 downstream of Olig2 and show that this leads to increased interneuron numbers and alters their spatial distribution, ultimately leading to dysregulation of spinal motor circuits and motor activity.

      Strengths:

      The authors use sophisticated genetic tools to precisely remove OC and Ntf3 expression in a lineage-specific manner and comprehensively assess the downstream effects across brachial, thoracic, lumbar levels of the spinal cord, as well as at two developmental timepoints, E12.5 and E14.5.

      Weaknesses:

      There are two main concerns that are not fully addressed:

      (1) Based on the effects observed with OC vs. Ntf3 cKO, it is unclear whether OC is indeed exerting its non-cell-autonomous effects via Ntf3. Knocking out both Ntf3 and OC and comparing the effects to those seen with just OC cKO alone could provide more insight on this point. Also, a quantitative summary of the effects of Ntf3 overexpression in motor neurons in the chick is lacking.

      (2) How the authors assess changes in the spatial distribution of interneurons is unclear. In Figures 2 and 4, the control distributions (despite reporting the same populations in the same regions) look different, suggesting large sample-to-sample variance in distribution. Although the authors report that several sections in each level were taken from at least three animals for each condition, it's unclear how variance within WT or cKO sections was accounted for in the final statistical evaluation. It seems at a glance that a comparison between control samples in Figure 2 and Figure 4 could report statistically significant differences, which would be problematic. A more rigorous report of sample-to-sample variance and a more in-depth explanation of the statistical methods are needed.

    3. Reviewer #2 (Public review):

      The study by Angla et al. proposes a model in which NT-3 produced by motor neurons regulates interneuron numbers and distribution in a non-cell autonomous manner. The authors demonstrate that ablation of motor neurons (MNs) and global and conditional deletion of OC transcription factors lead to changes in interneuron distribution. They identify that NT3 is upregulated after MN-specific OC deletion in RNA-seq experiments and show that olig2-cre mediated NT3 deletion leads to increased ventral interneuron numbers, altered distribution, and defects in locomotor behavior. The authors conclude that MN-derived NT3, under OC control, regulates interneuron development. While this is an intriguing hypothesis, additional experiments are needed to support it and strengthen the link between the different experiments described here.

      (1) The study primarily quantifies interneuron numbers and distribution at different levels of the spinal cord and under different genetic manipulations. Experimental details are lacking, defining how many sections were analyzed (several are noted in the methods) and how the rostrocaudal levels of the spinal cord were precisely aligned. In different figures, the values and distributions shown for controls vary quite a lot. For example, in Figure 2B vs Figure 4B, the number of FoxP2+ V1 neurons at brachial levels is ~350 vs 125. Similarly, the control distributions in 2I and 4I are quite different. This makes it challenging to determine whether the conclusions regarding the impact of each genetic manipulation on interneuron numbers and distribution are valid.

      (2) The relationship between OC and NT3 deletion data is not entirely clear. Both deletions presumably lead to changes in interneuron distribution, but is there any reverse relationship between the two that relates to relative changes in NT3 levels? The authors do not directly compare NT3 and OC KO IN distributions. Similarly, one might expect a decrease in interneuron numbers in OC mutants, which is only reported for V2c neurons. However, the image presented in Figure 2G shows an equal number of V2c INs in control and mutant.

      (3) It is not clear that the behavioral phenotypes seen in the olig2-cre mediated deletion of NT3 can be attributed to changes in interneuron development. How about a role of NT3 in oligodendrocytes? There is a big gap between the embryonic changes shown here and behavior, with no in-between circuit-level changes in locomotor circuits shown. A more restricted manipulation would be deleting TrkC from specific interneuron populations. Related to this, although TrkC is shown to be broadly expressed in ventral interneurons, it is not shown specifically to colocalize with any of the interneuron markers. The authors should validate that the receptor is expressed in the subsets that they are investigating.

      (4) The rationale for following up on NT3 seems to be the chick electroporation experiments; however, no changes in distribution are shown in those experiments, and only a very minor decrease in Chx10 interneurons. Shouldn't NT3 overexpression lead to substantial decreases in IN numbers according to the authors' model? The "data not shown", which presumably refers to distribution, would be important to show here, to further support this rationale.

      (5) The idea that NT3 downregulation causes an increase in IN numbers is not intuitive. Also, considering the DTA experiments in Figure 1, showing that MN ablation leads to a decrease in several IN subtypes and no changes in V2a neurons. It would be helpful for the reader if the authors could synthesize their results in the discussion and reconcile their experimental findings.

    4. Reviewer #3 (Public review):

      This manuscript aims to investigate cell extrinsic mechanisms that regulate the differentiation and distribution of interneuron types in the spinal cord. The authors demonstrate that the loss of motor neurons leads to changes in the number and distribution of different interneuron types, specifically V0v, V1, and V2b (but not V2a). The authors then hypothesize that this phenotype may be controlled by the action of Onecut (OC) transcription factors in motor neurons. Conditional knockout of OC1 + OC2 in motor neurons using Olig2-Cre, however, does not lead to significant changes in the numbers of V1, V2a, and V2b interneurons, although there is a change in their spatial distribution. While the authors do not check V0v neurons in OC mutants, they do check V2c, which show a reduction in number and change in distribution. Why the same neurons are not checked across experiments is unclear. The authors then analyze existing RNA-seq data to identify factors that could be mediating the effects of the OC factors in motor neurons. They identify Ntf3 as a candidate and confirm that it is upregulated in OC mutants. Conditional loss of function of Ntf3 (Olig2-Cre) leads to increases in V1, V2a, and V2b (but not V2c) interneurons and changes in the distribution of all four interneuron types. Finally, the authors demonstrate that these Ntf3 conditional mutants have major defects in motor function.

      The conclusions of this manuscript are not well supported by the data for the reasons listed below, making it difficult to assess the impact of this work on the field.

      (1) The manuscript relies heavily on quantifying numbers and the spatial distribution of interneuron populations. However, these do not seem to be consistent in control animals across experiments, making it difficult to interpret any changes observed in genetic manipulations. Specifically, in Figures 2 and 4, the same markers are being used to quantify V1, V2a, V2b, and V2c interneurons in controls vs. OC (Figure 2) or Ntf3 (Figure 4) conditional knockouts, but the numbers of neurons and their distribution in control animals are variable between these two figures. For example, there seems to be a mean of >300 V1 neurons in E12.5 brachial sections of Fig. 2 controls, but this number is <150 in Fig. 4 controls. The cell distribution scoring is similarly variable between these controls without any explanation. The same is true for E14.5 controls used in Figure S1 vs. Figure S3.

      (2) Neurotrophic factors generally promote neuronal survival. However, in this study, the loss of Ntf3 leads to increased numbers of interneurons. This finding is in disagreement with previous observations in slice cultures of spinal cords, as stated in the discussion. This discrepancy makes it even more important that the cell counts reported in the figures discussed above are robust.

      (3) The claim that phenotypes are non-cell autonomously driven by motor neurons is not well supported. In Olig2-Cre conditional knockouts of Onecut and Ntf3, there is no confirmation that the loss of these factors is specific to motor neurons. Therefore, it cannot be ruled out that other cell populations may be mediating the phenotypes.

      (4) The claim that interneuron development is regulated by OC control of Ntf3 expression in motor neurons is not well supported. The authors show that loss of OC1/2 leads to an increase in Ntf3 expression in motor neurons. If this pathway were controlling interneurons, loss of OC function and overexpression of Ntf3 would have the same phenotype, which is not the case. Additionally, it would also be expected that loss of OC function and loss of Ntf3 function would have inverse phenotypes, which is also not the case. The phenotypes from OC loss of function and Ntf3 loss of function seem distinct from one another. The authors state that too little and too much Ntf3 are both bad for interneuron development, but there is no data to support their claim that OC1/2 mutants have altered interneuron development because of higher Ntf3 expression.

      (5) It is not clear that interneurons being studied express the Ntf3 receptor TrkC, which makes it difficult to assess whether changes in Ntf3 signaling are directly responsible for the phenotype.

      (6) While the behavioral phenotypes are consistent with Ntf3 playing a role in motor circuits, there is no evidence to suggest that Ntf3's influence on premotor interneurons being studied is driving or contributing to this phenotype, as discussed by the authors.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Based on the effects observed with OC vs. Ntf3 cKO, it is unclear whether OC is indeed exerting its non-cell-autonomous effects via Ntf3. Knocking out both Ntf3 and OC and comparing the effects to those seen with just OC cKO alone could provide more insight on this point.

      In this study, we did not intend to demonstrate that Onecut transcription factors exert their non-cell autonomous action on spinal interneuron development by regulating Ntf3 expression, and we do not state in the manuscript that this is the case. We only show that Onecut factors and Ntf3, the expression of which they regulate, contribute to the non-cell autonomous regulation of spinal interneuron development by the motor neurons. We are convinced that Onecut factors could regulate multiple independent factors and pathways involved in extrinsic regulation of interneuron development, as supported by the regulation of multiple secreted factor or membrane protein expression in motor neurons detected in the reported RNA-sequencing experiment (this manuscript and [1]). This possibly also includes, as demonstrated in cell culture for multiple homeoproteins including human Onecut factors [2], the intercellular transfer of the Onecut homeoproteins during spinal cord development, a process that we are currently investigating. Knocking out both OC and Ntf3 in the motor neurons, beyond being technically extremely challenging (1/64 probability to obtain triple-mutant embryos), would not enable to address this question, as it will simply results in the addition of two different defects.

      Also, a quantitative summary of the effects of Ntf3 overexpression in motor neurons in the chick is lacking.

      A quantitative summary of the effects of Ntf3 overexpression in the chicken embryonic spinal cord is provided in Figure S2.

      (2) How the authors assess changes in the spatial distribution of interneurons is unclear. In Figures 2 and 4, the control distributions (despite reporting the same populations in the same regions) look different, suggesting large sample-to-sample variance in distribution. Although the authors report that several sections in each level were taken from at least three animals for each condition, it's unclear how variance within WT or cKO sections was accounted for in the final statistical evaluation. It seems at a glance that a comparison between control samples in Figure 2 and Figure 4 could report statistically significant differences, which would be problematic. A more rigorous report of sample-to-sample variance and a more in-depth explanation of the statistical methods are needed.

      The experimental procedure to analyze the spatial distribution of spinal interneurons at different stages of development is described in details in the “Statistical analyses” paragraph of the Materials and Methods section of the manuscript, and has been repeatedly used by ourselves [3,4] and by others (see for example [5-7]) to conduct similar analyses.

      We also noticed that the distribution of the different analyzed interneuron populations in the control embryos showed some differences between the cOc1Oc<sup>2-/-</sup> and the cNtf3<sup>-/-</sup> lines. Several parameters can account for this observation. First, this study has been conducted over a period of 15 years, different investigators each contributing to different steps of the analysis. Second, the genetic background of these two lines is not identical, impacting both the duration of the gestation (hence, the embryonic stage of the performed analyses, even if the embryos were collected on the same gestation day) and possibly the distribution of some interneuron populations. Third, because of evolutions in the availability of the primary antibodies used to label the interneuron populations of interest, the same antibodies were not used throughout the study, as stated in the Materials and Methods section, although the same antibody was used by the same investigator to label the same interneuron population in each mouse line at each developmental stage.

      A detailed description of the number of sections and embryos included in each analysis as well as the whole statistical workflow that was used for the distribution analyses, which takes into account variance within control or mutant samples, will be provided in the revised version of the manuscript.

      Reviewer #2 (Public review):

      (1) The study primarily quantifies interneuron numbers and distribution at different levels of the spinal cord and under different genetic manipulations. Experimental details are lacking, defining how many sections were analyzed (several are noted in the methods) and how the rostrocaudal levels of the spinal cord were precisely aligned.

      A detailed description of the number of sections and embryos included in each analysis as well as the whole statistical workflow that was used for the distribution analyses will be provided in the revised version of the manuscript. The rostrocaudal levels of the spinal cord were precisely aligned using the distribution of Foxp1 in the Lateral Motor Columns (LMCs) at brachial or lumbar levels of the spinal cord [8,9], which will also be indicated in the revised version.

      In different figures, the values and distributions shown for controls vary quite a lot. For example, in Figure 2B vs Figure 4B, the number of FoxP2+ V1 neurons at brachial levels is ~350 vs 125. Similarly, the control distributions in 2I and 4I are quite different. This makes it challenging to determine whether the conclusions regarding the impact of each genetic manipulation on interneuron numbers and distribution are valid.

      Multiple factors may explain these observations. First, this study spans a 15-year period, with different researchers contributing to various stages of the analysis. Second, the genetic backgrounds of the two mouse lines are not identical, affecting both gestation length (thus influencing the embryonic stage at which analyses were performed, even when embryos were collected on the same gestational day) and potentially the distribution of certain interneuron populations. Third, due to changes in the availability of primary antibodies used to label the targeted interneuron populations, the same antibodies were not consistently employed throughout the study as noted in the Materials and Methods section though each investigator used the same antibody for a given interneuron population and developmental stage within each mouse line.

      (2) The relationship between OC and NT3 deletion data is not entirely clear. Both deletions presumably lead to changes in interneuron distribution, but is there any reverse relationship between the two that relates to relative changes in NT3 levels? The authors do not directly compare NT3 and OC KO IN distributions. Similarly, one might expect a decrease in interneuron numbers in OC mutants, which is only reported for V2c neurons. However, the image presented in Figure 2G shows an equal number of V2c INs in control and mutant.

      This study was not designed to demonstrate that Onecut transcription factors influence spinal interneuron development in a non-cell-autonomous manner through Ntf3 regulation, nor do we claim this in the manuscript. Instead, we show that Onecut factors and Ntf3, whose expression they control contribute to the non-cell-autonomous regulation of spinal interneuron development by motor neurons. We believe Onecut factors may regulate multiple independent factors and pathways involved in the extrinsic control of interneuron development. For instance, as noted earlier [2], we observed intercellular transfer of Onecut homeoproteins during spinal cord development, suggesting alternative mechanisms for non-cell-autonomous regulation.

      The two mouse lines studied here consist, on the one side, in a combination of OC inactivation and Ntf3 increased expression, and, on the other side, in Ntf3 inactivation. Therefore, a reverse relationship between the changes in interneuron distribution is not expected. Furthermore, gain-of-function and loss-of-function experiments in mouse models frequently generate phenotypes that are not inverse to each other [10-13].

      (3) It is not clear that the behavioral phenotypes seen in the olig2-cre mediated deletion of NT3 can be attributed to changes in interneuron development. How about a role of NT3 in oligodendrocytes? There is a big gap between the embryonic changes shown here and behavior, with no in-between circuit-level changes in locomotor circuits shown.

      We agree, the motor behavior changes that we recorded in Ntf3 conditional mutant mice are, as stated, “consistent with the hypothesis that Ntf3 produced by MNs is required to generate locomotor circuits with properly coordinated activity” but do not demonstrate a direct causal relationship. However, investigating the intrinsic activity of the spinal locomotor circuits, independently from, for example, oligodendrocyte contribution may prove to be extremely challenging and was beyond the scope of this study. In addition, to our best knowledge, Ntf3 has not been shown to be expressed in healthy oligodendrocytes in vivo, and TrkC has not been reported to be displayed by these cells in the same conditions.

      A more restricted manipulation would be deleting TrkC from specific interneuron populations. Related to this, although TrkC is shown to be broadly expressed in ventral interneurons, it is not shown specifically to colocalize with any of the interneuron markers. The authors should validate that the receptor is expressed in the subsets that they are investigating.

      We agree, investigating the consequences of inactivating the TrkC receptor in specific interneuron populations would be extremely informative. However, this experiment is also very challenging to perform, as most of the driver lines available to target spinal interneuron populations additionally target multiple neuronal populations outside of the spinal cord that are also involved in the control of movements and could therefore induce confounding effects on motor behavior analyses [14-20].

      We thank the reviewer for suggesting to investigate in more details the interneuron populations that display TrkC receptors, this will be include in the revised version of the manuscript.

      (4) The rationale for following up on NT3 seems to be the chick electroporation experiments; however, no changes in distribution are shown in those experiments, and only a very minor decrease in Chx10 interneurons. Shouldn't NT3 overexpression lead to substantial decreases in IN numbers according to the authors' model? The "data not shown", which presumably refers to distribution, would be important to show here, to further support this rationale.

      Chicken spinal cord electroporation only enables to study spinal cord development in a limited time-window, given the high mortality rate observed after longer incubation. At the stage we collected the electroporated embryos for analyses, interneuron migration has barely been initiated, and distribution cannot be studied yet. Consistently, we are not aware of any report of interneuron distribution analysis in electroporated chicken embryonic spinal cord, as compared to mouse embryos [3-7].

      (5) The idea that NT3 downregulation causes an increase in IN numbers is not intuitive. Also, considering the DTA experiments in Figure 1, showing that MN ablation leads to a decrease in several IN subtypes and no changes in V2a neurons. It would be helpful for the reader if the authors could synthesize their results in the discussion and reconcile their experimental findings.

      We agree, this will be included in the revise version of the manuscript.

      Reviewer #3 (Public review):

      (1) The manuscript relies heavily on quantifying numbers and the spatial distribution of interneuron populations. However, these do not seem to be consistent in control animals across experiments, making it difficult to interpret any changes observed in genetic manipulations. Specifically, in Figures 2 and 4, the same markers are being used to quantify V1, V2a, V2b, and V2c interneurons in controls vs. OC (Figure 2) or Ntf3 (Figure 4) conditional knockouts, but the numbers of neurons and their distribution in control animals are variable between these two figures. For example, there seems to be a mean of >300 V1 neurons in E12.5 brachial sections of Fig. 2 controls, but this number is <150 in Fig. 4 controls. The cell distribution scoring is similarly variable between these controls without any explanation. The same is true for E14.5 controls used in Figure S1 vs. Figure S3.

      We indeed observed variations in the quantifications and distributions of the analyzed interneuron populations in control embryos between the cOc1/Oc2<sup>⁻/⁻</sup> and cNtf3<sup>⁻/⁻</sup> lines. Several factors may explain this discrepancy. First, the study was carried out over 15 years, with different investigators contributing to distinct stages of the analysis—meaning interneuron distribution was not assessed by the same researchers in both lines. Second, the genetic backgrounds of the two lines differ, affecting gestation length (and thus the embryonic stage at analysis, even when embryos were collected on the same gestational day) as well as potentially altering the distribution of certain interneuron populations. Third, changes in the availability of primary antibodies targeting the interneuron populations of interest led to inconsistencies in antibody use across the study, as detailed in the Materials and Methods section. However, each investigator consistently used the same antibody for a given interneuron population and developmental stage within each mouse line.

      (2) Neurotrophic factors generally promote neuronal survival. However, in this study, the loss of Ntf3 leads to increased numbers of interneurons. This finding is in disagreement with previous observations in slice cultures of spinal cords, as stated in the discussion. This discrepancy makes it even more important that the cell counts reported in the figures discussed above are robust.

      Considering that neurotrophic factors only support neuronal survival would strongly neglect their important function in neuronal differentiation, which has been broadly demonstrated. Severe immunotoxic ablation of motor neurons or anti-serum blockade of Ntf3 activity severely depleted inhibitory, but not excitatory, interneurons in a highly apoptotic-prone organotypic culture model of embryonic rat spinal cord slices, which was rescued by Ntf3 in the first model [21]. Opposite results were obtained in vivo by other researchers using mouse models lacking almost all MNs due to the elimination of skeletal muscles, where the number of spinal INs remained unaffected [22,23]. Combined to our results, these in vivo observations suggest that Ntf-3 is involved in interneuron differentiation rather in their survival. Consistently, Ntf3 has been shown to promote neuronal differentiation [24].

      (3) The claim that phenotypes are non-cell autonomously driven by motor neurons is not well supported. In Olig2-Cre conditional knockouts of Onecut and Ntf3, there is no confirmation that the loss of these factors is specific to motor neurons. Therefore, it cannot be ruled out that other cell populations may be mediating the phenotypes.

      Combined conditional inactivation of Oc1 and Oc2 has been reported in [1]. Conditional inactivation of Ntf3 only impacts motor neurons as it is the only cell population in the ventral spinal cord wherein this factor is produced (this study and [25-27]). Furthermore, Olig2-Cre has been shown to be active in motor neurons and in V3 interneurons (see for example [10]), which, for this reason, have not been studied in the frame of this project as stated in the manuscript.

      (4) The claim that interneuron development is regulated by OC control of Ntf3 expression in motor neurons is not well supported. The authors show that loss of OC1/2 leads to an increase in Ntf3 expression in motor neurons. If this pathway were controlling interneurons, loss of OC function and overexpression of Ntf3 would have the same phenotype, which is not the case. Additionally, it would also be expected that loss of OC function and loss of Ntf3 function would have inverse phenotypes, which is also not the case. The phenotypes from OC loss of function and Ntf3 loss of function seem distinct from one another. The authors state that too little and too much Ntf3 are both bad for interneuron development, but there is no data to support their claim that OC1/2 mutants have altered interneuron development because of higher Ntf3 expression.

      This study was not aimed at proving that Onecut transcription factors mediate their non-cell-autonomous effects on spinal interneuron development through Ntf3 regulation, nor do we make this claim in the manuscript. Rather, we demonstrate that Onecut factors and Ntf3, whose expression they control—participate in the non-cell-autonomous regulation of spinal interneuron development by motor neurons. We propose that Onecut factors likely modulate multiple independent factors and pathways involved in the extrinsic regulation of interneuron development, as evidenced by the regulation of various secreted factors and membrane proteins in motor neurons observed in our RNA-sequencing data (this study and [1]). This may also involve intercellular transfer of Onecut homeoproteins during spinal cord development—a mechanism previously shown in cell culture for several homeoproteins, including human Onecut factors [2] and which we are currently exploring.

      (5) It is not clear that interneurons being studied express the Ntf3 receptor TrkC, which makes it difficult to assess whether changes in Ntf3 signaling are directly responsible for the phenotype.

      Immunofluorescence experiment in Figure 3C shows that TrkC receptor is present in cell populations surrounding motor neurons at e12.5, a stage where only the pre-motor interneuron populations reported in the manuscript are present. However, we thank the reviewer for suggesting to investigate in more details the interneuron populations that display TrkC receptors, this will be include in the revised version of the manuscript.

      (6) While the behavioral phenotypes are consistent with Ntf3 playing a role in motor circuits, there is no evidence to suggest that Ntf3's influence on premotor interneurons being studied is driving or contributing to this phenotype, as discussed by the authors.

      We acknowledge that the motor behavior changes observed in Ntf3 conditional mutant mice—as noted—are “consistent with the hypothesis that MN-derived Ntf3 is necessary for the formation of locomotor circuits with properly coordinated activity,” but they do not establish a direct causal link. However, analyzing the intrinsic activity of spinal locomotor circuits was beyond the scope of this study.

      (1) Toch, M. et al. Onecut-dependent Nkx6.2 transcription factor expression is required for proper formation and activity of spinal locomotor circuits. Sci Rep 10, 996 (2020). https://doi.org/10.1038/s41598-020-57945-4

      (2) Lee, E. J. et al. Global Analysis of Intercellular Homeodomain Protein Transfer. Cell Rep 28, 712-722 e713 (2019). https://doi.org/10.1016/j.celrep.2019.06.056

      (3) Harris, A. et al. Onecut factors and Pou2f2 regulate the distribution of V2 interneurons in the mouse developing spinal cord. Front Cell Neurosci 13 (2019). https://doi.org/10.3389/fncel.2019.00184

      (4) Kabayiza, K. U. et al. The Onecut Transcription Factors Regulate Differentiation and Distribution of Dorsal Interneurons during Spinal Cord Development. Front Mol Neurosci 10, 157 (2017). https://doi.org/10.3389/fnmol.2017.00157

      (5) Deska-Gauthier, D. et al. Embryonic temporal-spatial delineation of excitatory spinal V3 interneuron diversity. Cell Rep 43, 113635 (2024). https://doi.org/10.1016/j.celrep.2023.113635

      (6) Bikoff, J. B. et al. Spinal Inhibitory Interneuron Diversity Delineates Variant Motor Microcircuits. Cell165, 207-219 (2016). https://doi.org/10.1016/j.cell.2016.01.027

      (7) Hayashi, M. et al. Graded Arrays of Spinal and Supraspinal V2a Interneuron Subtypes Underlie Forelimb and Hindlimb Motor Control. Neuron 97, 869-884 e865 (2018). https://doi.org/10.1016/j.neuron.2018.01.023

      (8) Rousso, D. L., Gaber, Z. B., Wellik, D., Morrisey, E. E. & Novitch, B. G. Coordinated actions of the forkhead protein Foxp1 and Hox proteins in the columnar organization of spinal motor neurons. Neuron59, 226-240 (2008). https://doi.org/10.1016/j.neuron.2008.06.025 [pii]

      (9) Roy, A. et al. Onecut transcription factors act upstream of Isl1 to regulate spinal motoneuron diversification. Development 139, 3109-3119 (2012). https://doi.org/10.1242/dev.078501

      (10) Debrulle, S. et al. Vsx1 and Chx10 paralogs sequentially secure V2 interneuron identity during spinal cord development. Cell Mol Life Sci 77, 4117-4131 (2020). https://doi.org/10.1007/s00018-019-03408-7

      (11) Brunklaus, A. et al. in Brain Vol. 145 3816-3831 (2022).

      (12) Scekic-Zahirovic, J. et al. in EMBO J Vol. 35 1077-1097 (2016).

      (13) Wong, J. C. in Epilepsy Curr Vol. 25 347-349 (2025).

      (14) Hafler, B. P., Choi, M. Y., Shivdasani, R. A. & Rowitch, D. H. Expression and function of Nkx6.3 in vertebrate hindbrain. Brain Res 1222, 42-50 (2008). https://doi.org/10.1016/j.brainres.2008.04.072 [pii]

      (15) Nardelli, J., Thiesson, D., Fujiwara, Y., Tsai, F. Y. & Orkin, S. H. Expression and genetic interaction of transcription factors GATA-2 and GATA-3 during development of the mouse central nervous system. Dev Biol 210, 305-321 (1999).

      (16) Bretzner, F. & Brownstone, R. M. in J Neurosci Vol. 33 14681-14692 (2013).

      (17) Chopek, J. W., Zhang, Y. & Brownstone, R. M. in J Neurophysiol Vol. 126 1978-1990 (2021).

      (18) Miyagi, S., Kato, H. & Okuda, A. in Cell Mol Life Sci Vol. 66 3675-3684 (2009).

      (19) French, C. A. et al. in Mol Psychiatry Vol. 24 447-462 (2019).

      (20) Khouri-Farah, N., Guo, Q., Perry, T. A., Dussault, R. & Li, J. Y. H. in Nat Neurosci Vol. 28 2022-2033 (2025).

      (21) Bechade, C., Mallecourt, C., Sedel, F., Vyas, S. & Triller, A. in J Neurosci Vol. 22 8779-8784 (2002).

      (22) Grieshammer, U., Lewandoski, M., Prevette, D., Oppenheim, R. W. & Martin, G. R. Muscle-specific cell ablation conditional upon Cre-mediated DNA recombination in transgenic mice leads to massive spinal and cranial motoneuron loss. Dev Biol 197, 234-247 (1998). https://doi.org/10.1006/dbio.1997.8859

      (24) Kablar, B. & Rudnicki, M. A. Development in the absence of skeletal muscle results in the sequential ablation of motor neurons from the spinal cord to the brain. Dev Biol 208, 93-109 (1999). https://doi.org/10.1006/dbio.1998.9184

      (25) Dutton, R., Yamada, T., Turnley, A., Bartlett, P. F. & Murphy, M. Regulation of spinal motoneuron differentiation by the combined action of Sonic hedgehog and neurotrophin 3. Clin Exp Pharmacol Physiol 26, 746-748 (1999). https://doi.org/10.1046/j.1440-1681.1999.03108.x

      (26) Buck, C. R., Seburn, K. L. & Cope, T. C. Neurotrophin expression by spinal motoneurons in adult and developing rats. J Comp Neurol 416, 309-318 (2000).

      (27) Henderson, C. E. et al. Neurotrophins promote motor neuron survival and are present in embryonic limb bud. Nature 363, 266-270 (1993). https://doi.org/10.1038/363266a0

      (28) Usui, N. et al. Role of motoneuron-derived neurotrophin 3 in survival and axonal projection of sensory neurons during neural circuit formation. Development 139, 1125-1132 (2012). https://doi.org/10.1242/dev.069997

    1. eLife Assessment

      This important paper provides novel information on the function of the Drosophila ryanodine receptor (RyR) during muscle development. The authors analyze the effects of a rare human mutation that causes myopathy that affects a conserved region of the gene. They present compelling evidence that this variant affects muscle function in flies. These results suggest that Drosophila can be used as a tool for screening additional variants.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Zmojdzian et al. provide an analysis of ryanodine receptor (RyR) expression and function in Drosophila. They also use CRISPR to engineer into flies a RyR variant of unknown significance (VUS) found in a human myopathy patient and demonstrate that it is likely a pathogenic mutation. From studies of RyR expression in embryonic and larval stages, and effects of RyR knockdown or overexpression in various muscle groups, the authors show that, in addition to its known actions in calcium-dependent excitation-contraction coupling, RyR promotes myogenesis during development.

      The key conclusions of the paper are convincing. I do not have suggestions for necessary additional experimental work, and my comments are minor. One conclusion, that RyR dysfunction may be involved in aging, is stated in multiple places, sometimes speculatively but once very forcefully. The latter is in the final paragraph of the Discussion, which states RyR "plays an instrumental anti-aging role in differentiated striated muscle". This conclusion must be tempered, as even if RyR knockdown phenotypes resemble some of those seen in aging flies, the study does not examine aged flies, and there is no mechanistic analysis that might link the two. I assume the authors would prefer to modify that sentence than initiate work with aging flies to prove the assertion. Finally, the use of CRISPR to test a VUS is excellent and suggests a good way for testing of additional RyR variants in the future.

      Significance:

      The paper is significant in that RyR is known to be a critical protein in calcium-dependent excitation-contraction coupling but its role in developmental myogenesis is poorly studied. This study demonstrates that it is expressed during, and is important for, embryonic and larval myogenesis in the fly. RyR is also understudied in this valuable model organism, even though a P element-based mutant has been available since 2000. The mechanistic basis for the functional observations is not explored here but the work is well performed and will be of interest to investigators studying muscle development (my own field) and diseases caused by RyR mutations.

    3. Reviewer #2 (Public review):

      Summary:

      This paper presents data using the Drosophila model to analyze the effects of a rare human mutation in the gene encoding the ryanodine receptor (ryr). The authors present a nice, comprehensive phylogenetic analysis that shows the Drosophila version of Ryr to be most similar to human RYR2 and that the known "hot spots" for mutations in RYR2 coincide with highly conserved regions of the Drosophila Ryr. They characterize the functional effects of ryr knockdown and overexpression on both adult heart function and larval body wall muscle. They identified embryonic ryr expression in association with actin-stained muscle precursor cells and provide beautiful stains, which clearly showed that embryonic muscle cell development was disrupted in ryr mutants. In support of these findings, KD of Calmodulin in larva (an Ryr inhibitor) phenocopied Ryr OE. They recreated a human variant of unknown function (RyR1 p.Met4881Ile ) in the conserved region of the fly gene and tested the effect on larval muscle. Their data suggested that this variant was likely deleterious as it negatively affected most muscle parameters.

      Major comments:

      (1) Fig, 1 In G there is no data for the RNAi KD situation.

      (2) Fig. 2 Authors should include Diastolic Diameters; they mention dilated cardiomyopathy but don't show the dilation. The authors should also show staining in hearts with RYR OE and RNAi. It would be nice to have some kind of quantification of disorganized myofibrils.

      (3) To evaluate and reproduce the data on the larva muscle parameters the authors should provide more details on how sarcomere length was quantified in each larva (replicates, ROI size, etc). Similarly, how were # of nuclei quantified / normalized? Importantly for these measurements, did the authors know what the contraction state of the muscles were when fixed?

      (4) Fig. 3, Are RNAi and OE in the same background? I only see one control in the graphs for the RNAi line background.

      (5) Fig. 3 How VL3 length was determined needs more detail, the Zhang ref is not adequate.

      (6) In order to be able to evaluate the data, the statistical tests used should be cited in the figure legends along with what *, ** ,*** stand for (or just provide p values).

      Significance:

      The authors nicely characterized the role of Ryr in muscle development and function and recreated a human variant of unknown function (RyR1 p.Met4881Ile ) in the conserved region of the fly gene. Their data suggested that this variant was likely deleterious as it negatively affected most muscle parameters. This work supports a role for the fly model in testing potential human disease gene variants.

      Comments on Revised Version:

      The authors have very adequately addressed the points raised by all reviewers.

    4. Author response:

      General Statements

      We would like to extend our gratitude to all reviewers for their supportive feedback, which acknowledges our study as well performed and of interest to investigators studying muscle development and diseases and supporting a role for the fly model in testing potential human disease gene variants. We also thank the reviewers for their valuable critical comments. We carefully considered all of them and made additional experiments and suggested text amendments.

      We believe these modifications substantially improve the quality of our results and enhance general interest of our work.

      Point-by-point description of the revisions

      Reviewer #1:

      In this manuscript, Zmojdzian et al. provide an analysis of ryanodine receptor (RyR) expression and function in Drosophila. They also use CRISPR to engineer into flies a RyR variant of unknown significance (VUS) found in a human myopathy patient and demonstrate that it is likely a pathogenic mutation. From studies of RyR expression in embryonic and larval stages, and effects of RyR knockdown or overexpression in various muscle groups, the authors show that, in addition to its known actions in calcium-dependent excitation-contraction coupling, RyR promotes myogenesis during development.

      The key conclusions of the paper are convincing. I do not have suggestions for necessary additional experimental work, and my comments are minor. One conclusion, that RyR dysfunction may be involved in aging, is stated in multiple places, sometimes speculatively but once very forcefully. The latter is in the final paragraph of the Discussion, which states RyR "plays an instrumental anti-aging role in differentiated striated muscle". This conclusion must be tempered, as even if RyR knockdown phenotypes resemble some of those seen in aging flies, the study does not examine aged flies, and there is no mechanistic analysis that might link the two. I assume the authors would prefer to modify that sentence than initiate work with aging flies to prove the assertion.

      We thank the Reviewer for this comment and remove from the concluding sentence hypothetical anti-aging role of RyR. The modified sentence reads as follow:

      “To conclude, we report functional analysis of dRyR, the sole fruit fly RyR gene and show that in addition to ensuring contractile properties of differentiated striated muscle it plays a key pro-myogenic role during muscle development.”

      Finally, the use of CRISPR to test a VUS is excellent and suggests a good way for testing of additional RyR variants in the future.

      Minor comments:

      (1) Figure 1A: In the Introduction it is stated that non-mammalian vertebrates have two RyR genes, alpha and beta. In Fig. 1A, a single chicken and single frog gene are listed under names different than alpha or beta. The figure also focuses on RyR2 genes, yet the Introduction states that the non-mammalian vertebrate genes are homologous to RyR1 and RyR3 in mammals. The dichotomy between the text and the figure is confusing. Finally, the font used in Fig. 1A should be enlarged for better visibility.

      To avoid the dichotomy we modified our sentence concerning the non-mammalian vertebrate RYR genes in the Introduction section. As indicated, there are two RYR genes in chicken and frog, with one that shares homology with vertebrate RYR2 and is represented in the phylogenetic tree (Fig. 1A).  As requested by the reviewer, to ensure better visibility we enlarged the font in the revised Fig. 1A.

      (2) Figure 3G-I: IF to Kettin is used to reveal sarcomeres but is not mentioned in the text. This protein is not present in vertebrates (I believe) and may not be familiar to many readers. It should be described in the text when it is used.

      We are grateful for reminding us to provide information about Kettin, which represents the Drosophila counterpart of Titin. The following information has been added to the text on page 9: “ …which in turn correlated with shortening of Kettin/D-Titin-labelled sarcomeres…”

      (3) Figure S2: The panels are labelled E, F, G. They should be A-D, as is used in the text.

      In the revised version of Fig. S2 panel labels were amended and the panel E view enlarged. We also provide an additional control context (C57>LacZ).

      (4) The dRyR16 allele is used in Figure 5 and S4. It is described as a hypomorph in the text on page 12 but as a null in the legend to Figure 5. Do the authors actually mean "homozygous" in the legend? The difference should be clarified.

      The dRyR<sup>16</sup> allele has been previously described as hypomorph. Indeed, in the legend of Fig. 5 we by mistake describe it as a “null”. As suggested by the Reviewer we modify it to « homozygous ».

      (5) The Met codon that is mutated in the variant studied in Figure S5 and Figure 6 is position 488 in humans. It is referred to that way in the fly version also. Is that true, the actual amino acid number is identical in humans and flies? In Figure S5B, it might be worth showing the primary amino acid sequence surrounding Met488 to reveal the degree of local conservation (beyond the orange domain in that panel).

      To provide more information about the conservation we include to the revised Fig. S5 an alignment of amino acid sequence surrounding the human RYR1 4881 variant position, which corresponds to position 4971 in the Drosophila dRyR.

      Author response image 1 shows a snapshot from a larger portion of alignment encompassing variant mutation showing a high amino acids conservation around the variant position:

      Author response image 1.

      (6) At least two references cited in the text are not listed in the References section (Hadiatullah et al. and Nishimura et al.).

      We double check reference citation and two indicated positions are now listed in the References section.

      Reviewer #1 (Significance):

      The paper is significant in that RyR is known to be a critical protein in calcium-dependent excitationcontraction coupling but its role in developmental myogenesis is poorly studied. This study demonstrates that it is expressed during, and is important for, embryonic and larval myogenesis in the fly. RyR is also understudied in this valuable model organism, even though a P element-based mutant has been available since 2000. The mechanistic basis for the functional observations is not explored here but the work is well performed and will be of interest to investigators studying muscle development (my own field) and diseases caused by RyR mutations.

      To reinforce mechanistic/functional side of our studies we include to the revised Fig.5 a new panel G showing promyogenic role of another major cellular calcium regulator, ER calcium pump SERCA. The Lms targeted RNAi knockdown of SERCA leads to affected myotube growth resulting in a thin muscle fiber phenotype. This indicates that both dRyR-regulated cytosolic and SERCA-regulated ER store calcium levels are required to promote muscle development.

      Reviewer #2:

      Summary:

      This paper presents data using the Drosophila model to analyze the effects of a rare human mutation in the gene encoding the ryanodine receptor (ryr). The authors present a nice, comprehensive phylogenetic analysis that shows the Drosophila version of Ryr to be most similar to human RYR2 and that the known "hot spots" for mutations in RYR2 coincide with highly conserved regions of the Drosophila Ryr. They characterize the functional effects of ryr knockdown and overexpression on both adult heart function and larval body wall muscle. They identified embryonic ryr expression in association with actin-stained muscle precursor cells and provide beautiful stains, which clearly showed that embryonic muscle cell development was disrupted in ryr mutants. In support of these findings, KD of Calmodulin in larva (an Ryr inhibitor) phenocopied Ryr OE. They recreated a human variant of unknown function (RyR1 p.Met4881Ile ) in the conserved region of the fly gene and tested the effect on larval muscle. Their data suggested that this variant was likely deleterious as it negatively affected most muscle parameters. This work supports a role for the fly model in testing potential human disease gene variants.

      Major comments:

      (1) Fig, 1 In G there is no data for the RNAi KD situation.

      We are grateful to the Reviewer for pointing this out. We initially didn’t include these data because of large difference in crawling capacities of dRyR RNAi larvae. In the revised version of Fig. 1 we provide now dRyR-RNAi larva crawling data. Because of their inefficient crawling, the time scale in panel 1G was modified.

      (2) Fig. 2 Authors should include Diastolic Diameters; they mention dilated cardiomyopathy but don't show the dilation. The authors should also show staining in hearts with RYR OE and RNAi. It would be nice to have some kind of quantification of disorganized myofibrils.

      As requested, in the revised Fig. 2 we provide diastolic diameter measures. We also include systolic interval graph to show a full picture of cardiac parameters. We do not observe all signs of dilated cardiomyopathy in dRyR-RNAi context as there is systolic diameter increase but no significant change in diastolic diameter.

      We modify our comments in the text accordingly (page 7).

      “…As the diastolic diameter remained unchanged, we conclude that cardiac dRyR knockdown affects cardiac performance without causing dilated cardiomyopathy…”

      Regarding circular myofibrils pattern, we do not observe irregularity of myofibrils orientation but rather a fuzzy and less distinctive sarcomeric pattern that is difficult to quantify. We specify this in the figure 2 legend (page 8).

      “…circular fibers in Hand>dRyR RNAi (E) context showed a fuzzy pattern suggesting an affected sarcomeric organisation…”

      Author response image 2 shows the entire view of the cardiac tube in dRyRRNAi context (stained with phalloidin) in which in spite of less distinctive circular myofibrils no obvious differences with wt are observed.

      Author response image 2.

      (3) To evaluate and reproduce the data on the larva muscle parameters the authors should provide more details on how sarcomere length was quantified in each larva (replicates, ROI size, etc). Similarly, how were # of nuclei quantified / normalized? Importantly for these measurements, did the authors know what the contraction state of the muscles were when fixed?

      We add the requested information to the Materials and Methods section:

      “Muscle characteristics measurements:

      All analyses of muscle length and sarcomere size were performed on fixed larval muscle preparations in a relaxed state. Acquired confocal images were analysed in Fiji using the line tool. Analyze – Measure tool was then applied to obtain muscle length values and measurements were analysed with Prism. Sarcomere size and number were calculated using Analyze – Plot profile Fiji tool. The sarcomere size was measured between peaks corresponding to Z-disc (revealed with Z-line specific marker) on approximatively 100 µm of muscle length. Sarcomere measurements were then analysed with Prism.

      DAPI-stained nuclei were counted in Z-stacks of confocal views of VL3 larval muscle and data analysed with Prism. About 30 larval muscles from 6-8 larval filets were analysed for each measurement. »  Statistics

      All statistical analyses were performed using Prism (v9.5.1, GraphPad, Software, La Jolla, CA, USA). The t test was used to compare control to variant context and one-way ANOVA tests were used for comparisons with more than two datasets. Bar plot represent the mean and the standard deviation. On the figures, statistical comparisons of sample vs control are indicated as ****: P ≤ 0.0001; ***: P ≤ 0.001; **: P ≤ 0.01; *: P ≤ 0.05; ns > 0.05.

      (4) Fig. 3, Are RNAi and OE in the same background? I only see one control in the graphs for the RNAi line background.

      We agree and to avoid potential bias between the RNAi versus OE genetic contexts we provide now in the revised version of Fig. 3 an additional OE control (C57>lacZ).

      Thus, two controls, one for RNAi and one for OE contexts are now included.

      (5) Fig. 3 How VL3 length was determined needs more detail, the Zhang ref is not adequate.

      We are thanking the Reviewer for this comment and provide now more details about the method used to calculate VL3 length (new paragraph in Materials and Methods), see also our answer to point 3. Zhang et al. reference is in relation to the mitochondria pattern quantification.

      (6) In order to be able to evaluate the data, the statistical tests used should be cited in the figure legends along with what *, ** ,*** stand for (or just provide p values).

      We add now the information about the statistical tests to the Fig legends in addition to the specific paragraph in Materials and Methods section (answer to point 3).

      Minor comments:

      (1) Need more detail in the figures, e.g. add what colors go with which stain to the picture.

      We provide this information in the revised version of the figure legends

      (2) Page 13, (Fig. ?F, G).

      We apologize for this mistake and add the number - Fig. 5

      (3) Fig. 4 "partially co-localizing with actin".... this is confusing and probably an overstatement based on the staining pattern in a whole embryo and not on an optical section or a higher power image with a more restricted field of view.

      We agree and remove this statement from the Fig.4 legend.

      (4) Some of the graphs are a bit small, recommend reducing the statistical comparison brackets to straight lines, which eliminates a lot of white space and would allow the graphs to be enlarged.

      We increased the size of graphs in revised Fig. S2 and Fig.5.

      Reviewer #2 (Significance):

      The authors nicely characterized the role of Ryr in muscle development and function and recreated a human variant of unknown function (RyR1 p.Met4881Ile ) in the conserved region of the fly gene. Their data suggested that this variant was likely deleterious as it negatively affected most muscle parameters. This work supports a role for the fly model in testing potential human disease gene variants. The reviewers field of expertise is in Drosophila genetics and in the use of the fly as a model system for understanding the genetic networks contributing to muscle structure and function at the cellular level.

      Reviewer #3:

      Summary

      This paper examines the Drosophila Ryanodine Receptor (RyR or dRyR). Ryanodine receptors are enormous channel proteins that mediate calcium efflux from the endoplasmic reticulum and sarcoplasmic reticulum. One goal of the work is to describe salient developmental features of Drosophila RyR (i.e., where it localizes in the cell and how it contributes to muscle development and function) and to refine knowledge from prior reports. Many of the analyses toward that goal are well done; this reviewer especially likes the examination of how muscles develop (Fig. 5).

      Another goal is to compare this information with what is known about mammalian RyRs. There seems to be a lot in common between Drosophila and mammalian RyRs. The paper finishes by taking a human ryanodine receptor variant of unknown significance and generating the corresponding amino-acid substitution in Drosophila RyR. The substitution has some phenotypic consequences for fly coordination, so the authors conclude that the human variant is likely to be pathogenic.

      In terms of investigation, a refined description of RyR biology is welcome. Ryanodine receptors are critical contributors/mediators of intracellular calcium signaling processes. Understanding their properties can help to contextualize the results of studies where calcium dynamics are at play. This is true of for both Drosophila and non-Drosophila work. For this version of the paper, there are several statements that should be edited, both in terms of accuracy and in terms of reporting prior knowledge. Additionally, some experiments are missing controls or reagent verification. Importantly, the anti-RyR antibody needs supporting information regarding its specificity.

      Main Comments

      (1) The paper does not fully state what has been done before in terms of studying Drosophila ryanodine receptor expression. In comparing the work on ryanodine receptors in vertebrates versus Drosophila, the authors write, "By contrast, no systematic analyses have yet been performed to assess the expression of the sole Drosophila dRyR gene." I was a little surprised by this sentence, so I examined the literature. There are hundreds of Drosophila publications that mention the ryanodine receptor in some way, but they are not about gene expression . As stated, the sentence might depend on what the authors mean by "systematic analyses." Two early works are relevant here: the Hasan and Rosbash, 1992 paper and the Sullivan et al., 2000 paper. Both are cited in this study. And both of these early papers addressed RyR gene expression, so that fact should be acknowledged up front.

      We agree with the Reviewer that there is a large number of publications that mention Drosophila ryanodine receptor with two of them identified by the Reviewer that provide information about Drosophila RyR expression. We refer to both of them and follow Reviewer’s suggestion to further acknowledge their work. The modified sentence in the text reads as follow:

      “…in spite of early works by Hasan and Rosbash (1992) and Sullivan et al., (2000) no systematic analyses have yet been performed to assess the developmental expression pattern of the sole Drosophila dRyR gene…”

      Concerning “systematic analyses” we mean the analyses of dRyR expression at both transcripts and protein levels during embryonic development and in differentiated muscles.

      (2) (Related) I examined those two early papers to cross-check the extent of analysis done previously. The text of Hasan and Rosbash reports in situ examination of RyR transcript using a digoxigenin probe (though the online version of that 1992 paper seems to have left out the relevant mesodermal and muscle images referenced in the paper, in favor of duplicating Figure 5 three times - I emailed Development to alert them). More relevant, several experiments executed in the Sullivan paper agrees closely with the current paper. As such, it needs more complete referencing. The Sullivan paper showed short, round larvae in mutants (Fig. 1 of Sullivan); ubiquitous mRNA, strongly in muscle and mesoderm (Fig. 2 of Sullivan); impaired muscle function in mutants (Fig. 3 of Sullivan), and impaired larval heart rate (Fig. 4 of Sullivan).

      Sullivan et al. paper is indeed a reference paper for Drosophila RyR. Our data are however largely novel and/or substantially extending those reported by Sullivan. Notably, we show for the first time developmental dRyR protein expression pattern in embryos and in larval filets, we also analyse dRyR isoform transcripts expression and provide for the first time embryonic muscle phenotype analyses that shed light on so far under investigated developmental function of dRyR.

      We follow Reviewer’s suggestion and provide in the revised version additional citations of this work:

      “…attenuation of dRyR (C57>dRyR RNAi) led to a significantly reduced larva body length (Fig. 3B, M) compared to control (Fig. 3A, Q), an observation that correlates with previously observed (Sullivan et al., 2000) reduced body size of dRyR<sup>16</sup> mutant larvae…”.

      “…our data extend previous observations of affected muscle contractility in RyR mutants (Sullivan et al., 2000)…”

      “…Overall, observed dRyR loss-of-function heart phenotypes with a slow heart rate and increased arrhythmia correlate with impaired cardiac function in RyR mutant larvae (Sullivan et al., 2000)…”

      (3) Fig. 1B-D (antibody staining): There are puzzles with this experiment. The first is with the anti-Dlg channel. Dlg is a core component of the NMJ postsynaptic density, and the antibody reveals a bright cage of Dlg around the boutons. But with the muscle images in Figure 1B, there are no boutons apparent (unless they are so far out of focus as to be invisible).

      Indeed, Dlg also stains postsynaptic NMJs at the muscle surface. On the Fig. 1B showing more internal optical sections to reveal T tubules Dlg-positive NMJs are out of focus.

      The second question centers on the dRyR antibody. The results state, "We first tested the expression of dRYR at the protein level." This sentence appears immediately after the sentence for gene expression from point 1. Technically, this antibody will help determine protein localization, not gene expression. But more importantly, there is no supporting/verifying information about this guinea pig anti-dRYR antibody. The methods state that it was provided by Robert Scott from NIMH. But there is no accompanying citation, no information about the antigen used to raise the antibody, and no negative control (either mutant or RNAi) to show that the staining is specific. If this is a published anti dRyR antibody that already meets the standards of specificity, that should be made clear, and the citation should be given. But if not, the information and data about the production of the antibody and the testing of its quality needs to be shared.

      We apologize for this omitted citation. The anti-dRyR antibody has been previously described and its specificity tested in the article Gao et al., (2013). Corresponding author of this paper David J. Sandstrom left NIMH and anti-dRyR antibodies are currently curated by Rob Scott from Benjamin White’s lab at NIMH.

      He generously sent us sample of this antibody. We add this information to the Material and Methods section.

      (4) Fig. S1: Similar to the antibody, is there a negative control probe that does not reveal this expression pattern? There are any number of probes or secondary antibodies that non-specifically label Drosophila muscles in patterns just like this.

      We are confident that the HCR probes are working properly as they reveal dRyR transcripts expression that is consistent with dRyR protein expression pattern. In parallel they show differential expression in embryos.

      Author response image 3 shows the control HCR ISH experiment with a probe that detects Apterous transcripts (specific for a subset of embryonic muscles and not present in L3 larval muscles).

      Author response image 3.

      A comparison between Ap HCR (A, A’) and dRyR Ex23 HCR (E, E’) signals.

      Minor Comments

      (1) "Overall, observed dRYR loss-of-function heart phenotypes...are reminiscent of those associated with aging (Nishimura et al., 2010), indicating that dRyR RNAi-induced impairment of Ca2+ homeostasis contributes to cardiac aging..." The conclusion of the sentence does not logically follow from the first part. This is because the tests conducted here were on rhythm, not on calcium homeostasis and cardiac aging.

      So, the tests cannot definitively say anything about those latter phenotypes.

      To answer this reviewer’s coment we modify the concluding sentence as follow:

      “…We hypothesize that dRyR RNAi-induced impairment of Ca2+ homeostasis could contribute to cardiac aging, for which Drosophila is a recognized model (Nishimura et al., 2011).”

      (2) Fig. S2 (bar graph): "% of total" - Is this supposed to refer to the percentage of the total muscle area that is positive for ATP5a staining? That should be clarified.

      We provide clarification in the Fig.S2 legend. “% of total” means the percentage of the measured muscle area that is positive for ATP5a staining”.

      (3) Fig. 3M, should say length

      Done

      (4) Fig. 5A legend - See Sullivan; that paper concluded that RyR[16] was hypomorphic instead of null, based on RyR[16]/Df comparison to RyR[16]/RyR[16]. Intuitively, I agree; a lesion that rips out the start site would likely be null. The antibody could help with classifying the allele, depending on the part of RyR used as the antigen.

      The RyR<sup>16</sup> mutants were indeed described by Sullivan et al., as hypomorphic and not null. In the Fig. 5 legend we modify the comment to: “…homozygous dRyR<sup>16</sup> mutant embryo…”

      (5) Discussion: "This also suggests that all dRyR isoforms are collectively required for larval muscle function." That sentence does not logically follow the expression information. In order to test that idea, individual isoforms would need to be eliminated or knocked down.

      We agree with this comment and modify our sentence accordingly.

      “However, whether all dRyR isoforms are collectively required for larval muscle function requires further investigation.”

      Reviewer #3 (Significance):

      The idea that RyR is expressed in many kinds of muscle is put forth as a major conclusion. It is good that the authors report this fact, and the impacts on muscle development documented in Figure 5 are some of the best data in the paper. However, in terms of opening up a new understanding of RyR biology, the impact of this information seems modest. Prior Drosophila work and the work of others studying these channels show that ryanodine receptors are ubiquitous. The fact that there is only one Drosophila RyR gene would lead most scientists to hypothesize that it would be present on the ER surfaces of all kinds of tissues, including different types of muscle.Novel phenotypic information for Drosophila RyR is reported in the study, and this is good. But in terms of the model system, the strength of Drosophila is in using genetic combinations to make refined conclusions. That toolkit is not fully used here; therefore, the paper is mostly descriptive. This study is mostly a single-gene study (dRyR), with isolated exceptions, like Cam knockdown in Figure 5.

      To improve the functional/mechanistic aspect of the manuscript in the revised version we include to Fig.5 the analysis of myogenic role of additional calcium regulator: ER calcium pump SERCA.

    1. eLife Assessment

      This important study uses a tripartite transdiagnostic computational framework to distinguish depression-specific, anxiety-specific, and shared psychopathology dimensions, in their relationships to mood variability and mood reactivity to reward prediction errors across multiple large non-clinical cohorts and a clinical sample. The evidence is convincing overall because the study combines large samples, a well-characterized gambling task and in-depth computational and psychometric analyses, and it replicates the depression-specific association with blunted reward prediction error-sensitivity in a clinical sample. However, the anxiety-specific effects are less consistently supported across individual datasets, may be underpowered in the clinical cohort because of comorbidity, and some aspects of the factor-analytic, risk-attitude, and mediation analyses would benefit from clearer explanation. These findings advance a mechanistic account of how distinct symptom dimensions differentially shape reward-based mood updating and variability, providing a principled framework for future transdiagnostic modeling.

    2. Reviewer #1 (Public review):

      This is a very interesting paper. The research question is intriguing, allowing the authors to address commonly observed comorbidities between depression and anxiety and their dissociable and opposite relationship to mood fluctuations and sensitivity to reward prediction errors. The computational analyses are very in-depth, including many state-of-the-art checks and validations. Another strength is the inclusion of several large or very large samples, including a patient sample in addition to the general population sample.

      I have the following questions:

      (1) Factor analysis I found the hierarchical organization of the factors interesting. While this is a very common procedure in, for example, the field of intelligence (producing sub-scores and a general g factor), it is not yet very commonly used in the field of computational psychiatry (though it has been validated before for anxiety/depression, so it is used here with good reason). I was also impressed by the methodological depth. In particular, it was of note how thoroughly done it was (for example, repeating the EFA on the second half of the data set). I have one question though: is the sample size too small for the exploratory analyses, given the number of items? Given the stability across the half-split, I imagine it is not. Perhaps the authors could spell out how many items, what would be the recommended standard for a subject-to-item ratio, and comment on this. A very technical point, the authors should specify how they extracted the factor scores from the other data sets (is it using the Thurstone or Bartlett method)? From experience (though not doing a hierarchical factor analysis), Bartlett can be somewhat better compared to the default (Thurstone) - better as in the resulting factors more closely recapitulating the factor correlations in the original sample (and independence of responses of other participants in a sample for computing a person's factor score). Could you also comment on similarities or divergences in this hierarchical factor analysis approach from another one recently used transdiagnostically in Wise et al. (2026, Translational Psychiatry)?

      (2) Linking factors to task parameters As I understand it, the authors relate the orthogonalized depression/anxiety to task parameters (sensitivity to RPEs on mood and mood variations) using correlations. In order to have a better understanding of how this relates to other commonly used approaches, I would pose two questions:

      (i) What are the correlations when the full (non-orthogonalized) factor scores for depression and anxiety are used? Are the signs the same? (ii) What are the results when, instead of the independent correlations, the authors perform b_RPE ~ anxiety + depression (again using the non-orthogonalized factors)?

      I'm assuming all of these analyses should give the same results if the authors' hypothesis of opposing effects of anxiety and depression holds true.

      Minor comments:

      (1) The authors should write down when the data were collected for each study. This is because AI capabilities have massively increased since ~2020 in quite specific steps (with the public release of new AI models), meaning that AI is likely to have been able to do tasks and questionnaires without detection if data were collected recently.

      (2) The authors should include a statement in the methods section that checks for AI were done. If none yet, could you do any? Recent papers (Westwood, PNAS 2025; van der Stigchel PNAS, 2026) point to the risk since at least the release of o4-mini (used in the cited paper to create very human-like behaviour).

      (3) It would have been good to collect questionnaires of other, thought to be unrelated psychiatric traits, like compulsivity or schizophrenia symptoms, to check the specificity of the results, also under the assumption that higher scores on either of these skewed questionnaires can pick up individual differences in 'bad questionnaire completion'. The authors should comment on the absence of other questionnaires in the discussion in the limitations section.

      (4) The authors could include a more explicit sentence in the abstract stating that the anxiety result did not hold up in the clinical population.

    3. Reviewer #2 (Public review):

      Summary:

      Despite their common co-occurrence, depression and anxiety are known to alter mood fluctuations in opposite ways. Here, the authors aimed at distinguishing depression-specific from anxiety-specific from psychopathology-general effects of reward processing on mood fluctuations, focusing on reward prediction errors (RPEs), which are known to be linked to mood fluctuations. This mechanistic study aims at uncovering the process through which these psychopathologies are associated with mood modulations. The authors were able to appropriately test their hypothesis and obtained results corroborating their conclusions.

      This work provides a convincing demonstration of the relevance of computational psychiatry (Huys et al, 2016) and the use of decision neuroscience to shed light on the interplay of anxiety, depression, and mood.

      Strengths:

      The authors used a tripartite model to distinguish depression vs anxiety, as well as a computational model distinguishing reward expectation (EV in the model) from outcome processing through RPE, which are two sequential cognitive processes.

      The manuscript adequately addresses the concerns one would have regarding risk-attitudes and regarding referring to trending statistical results.

      Weaknesses:

      The sample size of the clinical sample (N=116) may not be sufficient to detect anxiety-specific effects due to the high rate of comorbid anxious depression. It would be beneficial to include the number of MDD vs GAD vs anxious depression diagnoses in the clinical population, as this would likely shine light on the power limitations.

    4. Reviewer #3 (Public review):

      Summary:

      In this submission, Wang and colleagues jointly examine the association between depression and anxiety symptoms and individuals' affective reactivity to reward prediction errors in Ruttledge et al.'s gambling paradigm. Taking a bifactor approach to anxiety and depression in several non-clinical (and one clinical sample), the authors find that anxiety-specific symptoms relate to over-reactivity of mood to reward prediction errors (RPEs) as well as heightened mood variability, while depression-specific symptoms relate to blunted mood sensitivity to RPEs. These depression- but not anxiety-specific relationships replicated in patient samples.

      Strengths:

      I was impressed that the data-driven, transdiagnostic approach employed by the authors uncovered specific relationships between anxiety and depression-specific factors and RPE reactivity in a well characterized task and computational model, especially in a non-clinical sample. This sheds new light on how these affective processes may be perturbed-and importantly, in different ways-by anxiety and depression symptoms. Likewise, the replication of the depression-specific finding (RPE hypo-reactivity) in a clinical sample was nice to see.

      Weaknesses:

      (1) While the anxiety- and depression-specific factors had differential effects on mood variability (Figure 2A-D) and RPE reactivity (Figure 2E-G) in all samples, such that the correlations between the two factors and these mood parameters were significantly different, the anxiety factor was not consistently (significantly) associated with either mood-related parameter across samples. However, the authors resolve anxiety-specific predictive effects when they collapse across datasets. While it is intuitive that achieving a larger effective sample size would afford the power necessary to detect such individual differences, this struck me as a major caveat for this set of results.

      (2) The authors observe associations between the 'common factor' of depression and anxiety and risk-attitude tendencies, presumably the alpha (exponent) parameter in a prospect theory-type subjective value model. But where is this analysis explained? (i.e. how was this model formulated and how were risk attitude parameters estimated?) And what is the interpretation of this finding - is there precedent for looking at risk attitudes in this task? And why would these predictive effects only be observed in relation to the common, but not unique, factors of anxiety and depression?

    1. eLife Assessment

      This valuable study addressed a key question in epilepsy research: whether the recordings of very fast oscillations in the brain (>250Hz, fast ripples) reflect underlying pathology or might be a property that emerges from a neuronal network at random. The strengths of the study are the importance of the question, the multiple methods, and the solid evidence. However, there are limitations to the methods that should be addressed.

    2. Reviewer #1 (Public review):

      Summary:

      This is a study utilizing several types of analyses (computational modeling, neuronal cultures, rodent epilepsy model, and human intracranial multi-scale recordings) to address a highly relevant conceptual question: Are fast ripples (FRs) distinct pathological entities or largely emergent products of stochastic spike clustering? The results can potentially reshape current approaches to incorporating fast ripples into the epilepsy surgery evaluation.

      Strengths:

      The conceptualization of fast ripples as potentially arising by chance is highly novel and builds effectively on questions raised in prior studies that have never been satisfactorily resolved.

      The integration across biological scales and models is a major strength. The state dependency analysis provides additional, strong support. The methodology and statistical approaches used are thoughtfully presented and rigorously applied.

      In particular, this paper provides a strong response to the findings from Gliske et al, Nat Commun 2018. This study utilized long-term data analysis to uncover low rates of FRs detected from most recording sites, suggesting spurious detections, although FRs were concentrated within seizure onset areas.

      Weaknesses:

      The authors clearly aimed to use a statistical rather than a mechanism-based approach in this work. However, the paper's framing of true fast ripples as oscillatory events with stochastic fast ripples considered as confounders does not take prior investigations into biological mechanisms, particularly prior studies that point to an important role for stochastic fast ripples in some contexts. Incorporating recognition of these mechanisms would strengthen the manuscript and provide a more complete and nuanced characterization.

      Some examples from the literature:

      Eissa et al, eNeuro 2016, a paper that closely parallels this manuscript but took a mechanistic rather than statistical approach, showed that fast ripples can arise from population paroxysmal depolarizations - a key feature of epileptiform discharges - as temporally clustered, jittered population firing, with FRs appearing in LFP or EEG due to summated postsynaptic potentials (which are slower than action potentials and can generate signals in the high gamma range).

      Foffani et al., 2007, Neuron, and Ibarz et al., 2010, J Neurosci, argue that FRs are pseudo-oscillations created by jittered neuronal populations in the setting of altered spike timing.

      Smith et al., 2020, Sci Rep, contrasts FR characteristics in different regimes, i.e., intact inhibition early in a seizure vs. implied collapse of inhibition after recruitment. Schlingloff et al., 2025, J Neurosci, reported analogous findings in an animal model.

      The computational model and subtraction approach provide a strong case for the random emergence of clustered activity in the high gamma band, given its assumptions. However, any such modeling effort needs to account for inhibitory activity, including impaired inhibitory function that is expected in epileptic brain regions, which has a strong modulating effect on excitatory firing and is thought to play a significant role in FR generation.

      The shuffling procedure aims to preserve the power spectrum but randomizes high frequency phase (>200 Hz). However, this procedure removes biologically meaningful spike timing correlations, as well as structured cross-frequency coupling. The subtraction method thus likely underestimates the incidence of structured "distinct" FRs, while perhaps overestimating "chance" FRs due to biologically infeasible activity, making the statement that most FRs are due to chance correlation too strong.

      The kainate findings underscore this point: the increase in the number of FR detections could be, as the authors state, an increase in chance clustering due to increased network excitability generally. However, the likelihood of a parallel increase in pathological FRs cannot be ruled out, given likely pro-epileptic alterations in spike timing and circuit function.

    3. Reviewer #2 (Public review):

      Summary:

      This paper asks an important question that has not been discussed much in the extensive literature on the High Frequency Oscillations (HFOs) that have been extensively studied in patients with epilepsy and experimental models of epilepsy. The question is whether the Fast Ripples (FRs), the HFOs in the 250-500 Hz frequency band, represent a pathological phenomenon or represent a physiological phenomenon that occurs in the healthy brain but happens to be more frequent in epileptic tissue. It is an important question that has not been systematically addressed until now. The authors conclude, from very extensive simulations, from extensive experimental animal studies (the systemic kianate model of epilepsy in rats), and from a modest amount of human data, that FRs occur in healthy brains as a result of the chance occurrence of bursts of action potentials, and that in epileptic tissue, their frequency of occurrence is approximately 30% higher than what is expected by chance. They conclude that FRs are not a separate phenomenon of epileptic tissue. This finding is reinforced by the recent findings of FRs in experimental models of Alzheimer's disease.

      Strengths:

      This is a valuable study because it asks an important and original question and because it evaluates it from several angles (simulation, tissue culture, experimental animals, and human patients). The simulations and the analyses of real data are performed very carefully and with original and solidly documented approaches, using extensive simulations and extensive data sets in the cultured cell data and in the in vivo experiments. The paper is clearly written and well-illustrated.

      Weaknesses:

      I found only one serious weakness in this study, but it is one that is of importance. Although the original work on FRs was done in an experimental model of epilepsy, the field really became prominent when ripples and fast ripples were found first in microelectrode recordings of epileptic patients and then in the intracerebral EEG of such patients. Numerous studies have been performed since then, with a valuable meta-analysis including 700 patients (Wang Z, Guo J, van 't Klooster M, Hoogteijling S, Jacobs J, Zijlmans M. Prognostic Value of Complete Resection of the High-Frequency Oscillation Area in Intracranial EEG: A Systematic Review and Meta-Analysis. Neurology. 2024 May 14;102(9). Although the consensus at this point is that FRs are not the ideal and totally specific marker of epileptic tissue that many thought it could be, FRs are nevertheless much more frequent in epileptic tissue than in non-epileptic tissue and are a solid biomarker. It is also well established that they are much more frequent in NREM sleep than in wakefulness, as reported in the original paper of Staba et al (Staba RJ, Wilson CL, Bragin A, Jhung D, Fried I, Engel J Jr. High-frequency oscillations recorded in human medial temporal lobe during sleep. Ann Neurol. 2004 Jul;56(1):108-15., not mentioned in this paper) and in the study of Bagshaw et al (2009). In this last paper, using SEEG in various brain regions, the average rate of FRs in NREM sleep is about 6 times that in wakefulness. In the paper by Staba, with microelectrodes in mesial temporal structures, it is about twice. As a separate issue, the paper of Fraucher et al (Frauscher B, von Ellenrieder N, Zelmann R, Rogers C, Nguyen DK, Kahane P, Dubeau F, Gotman J. High-Frequency Oscillations in the Normal Human Brain. Ann Neurol. 2018 Sep;84(3):374-385), which is not quoted, found that, in an extensive sample, non-epileptic human tissue sampled with SEEG generated extremely rare FRs (an average rate of 0.04/min/channel, i.e. 1 every 25 min).

      The results above are mentioned because they do not fit with the data provided in the present study: FRs are much more frequent in NREM sleep than in wakefulness in human epileptic patients, and they are much more frequent (not 30% more, but many hundreds of percent more) in epileptic tissue than in non-epileptic human tissue. The fundamental phenomenon of interest is, I believe, the FRs in epileptic patients. The animal experiments, tissue studies, and simulations are models to study the human phenomenon. With respect to the modulation by sleep and the differentiation between epileptic and non-epileptic tissue, it seems that the systems studied in this paper are not good models of the human condition. The human results presented in the study only reflect wakefulness recordings, which is not the condition in which most HFO studies have been done and in which most HFOs occur. The authors refer to the study of long-term fluctuations in HFO rates by Gliske et al. (2018) to say that one has to be careful with the results regarding sleep, for example, Bagshaw et al (2009), but the clear predominance in of HFOs in NREM sleep has been observed by many studies. The cautions regarding fluctuations over extended periods also apply to the awake human data analyzed in this study.

      The study's conclusions regarding the generation of FRs are therefore questionably applicable to the human condition. I do not dispute their validity for the models and situations in which they were studied.

    4. Reviewer #3 (Public review):

      Summary:

      An outstanding question in the field of high-frequency oscillations (HFOs) in the context of epilepsy is how these oscillations emerge, considering that they occur at such high frequencies, i.e., 250Hz, well above the firing ability of single neurons. One hypothesis that has been suggested in the past is that neurons that fire in an out-of-phase fashion, or rather at random intervals,s may contribute to a spectrum of HFOs ranging from 250-500Hz that are observed in epilepsy. However, how possible it is that random action potentials could aggregate to the extent that they could give rise to HFOs in the so-called fast ripple (FRs) frequency range (>200 according to the authors) remains unclear. To test this hypothesis, they used computational modeling to randomly insert action potentials in a signal, and they found that this approach is sufficient to generate FRs. Some of the predictors of whether FRs could occur were neuronal count, firing rate, and synchronization. Besides computational modeling, they used different model systems to test whether that would be possible to be observed in neuronal cultures, in epileptic rats (intrahippocampal kainic acid model), and human data. Neuronal cultures treated with picrotoxin did not show evidence that FRs could be generated beyond chance aggregation of action potentials. They then asked whether synchronization and firing rate could play a role in the emergence of FRs. They found that changes in neural firing and synchronization, such as those occurring during differences phase of the sleep-wake cycle, could affect the number of FRs occurring by chance aggregation, with more FRs seen during periods of wakefulness, a result that they replicated in human data.

      The authors largely achieve their proposed aims of demonstrating that random neuronal firing can, in principle, generate FRs. Results from this study could influence current thinking around mechanisms generating FRs in epilepsy. The use of different computational approaches and model systems could offer new analytical methodologies for the study of FRs in the context of brain disease.

      Strengths:

      (1) The authors used a multi-level approach combining computational modeling with experimental datasets, including neuronal cultures, a rat model of temporal lobe epilepsy, and human data.

      (2) Identification of key parameters such as neuronal count, firing rate, synchronization, and brain state in observed incidence of FRs generated through random aggregation of neural firing.

      (3) Cross-species validation increases the likelihood of generalizability of the findings.

      Weaknesses:

      (1) Some of the simulated FRs appear short in duration and may not meet standard detection and definition criteria, potentially influencing validity.

      (2) The neuronal culture approach does not directly test random insertion of action potentials, limiting interpretation.

      (3) Sleep is treated as a homogeneous state in the rat dataset, without accounting for stage-specific differences in synchronization, which may affect the results and interpretation.

      (4) The analyses conducted in human data lack direct comparison with sleep data.

    1. eLife Assessment

      This study uses convincing modeling methods and analyses of rich behavioral datasets to investigate the role of attention in value-based decision making; for instance, as when choosing between two snacks. The results are valuable, as they challenge existing theories that assume that paying attention to an available option biases the eventual choice toward that option. The results suggest that the correlation between attention and decision-making is formed largely after and not before the (internal) choice process has terminated, a finding that offers an intuitively appealing rethinking of how attention and decision-making processes interact during value-based choices.

    2. Reviewer #1 (Public review):

      Summary:

      This study examines whether gaze direction actively shapes choice during food preference decisions or whether gaze and choice evolve largely independently until the moment of commitment. The established framework in this context, the aDDM, assumes that gaze causally biases the accumulation of evidence in favour of the fixated item. The authors show convincingly that this model fails to fit key behavioural patterns across several datasets, as do other published models that make the same assumption. The authors propose an alternative model (Post-Decision-Gaze or PDG) in which gaze and decision formation are decoupled: gaze does not influence the decision process, nor is it drawn toward the ultimately chosen item, until after the decision threshold is reached. Only during the motor execution period (after commitment) is gaze directed to the chosen option. They demonstrate that this model fits several observed patterns better than the aDDM and related variants.

      Strengths:

      The work thoroughly considers multiple models and datasets. It advances an interesting alternative perspective on gaze-decision interactions and highlights meaningful shortcomings in existing models. The authors take the time to explain how modelling assumptions produce specific patterns in the data, which is certainly insightful to readers interested in the modelling of value-based decision making.

      Weaknesses:

      It is unclear to what extent the model's success relies on the way non-decision time is formalised in the model. In the proposed PDG model, non-decision time is decomposed into separate visual encoding, saccadic execution, and manual execution components. Several values (assumed or recovered) do not match known physiological or behavioural ranges. This is a common issue in the literature, and the authors may want to address it in light of broader work discussing what non-decision time consists of in both manual and saccadic actions (e.g., Bompas et al., 2024, Non decision time: the Higgs boson of decision, Psychological Review).

      In particular, the "saccadic execution" parameter appears far too long and too variable to reflect merely execution; instead, it likely includes decisional components. This would make more sense since manual and saccadic planning essentially rely on distinct brain areas, hence it seems unrealistic that crossing a single threshold would trigger both manual and saccadic execution. Similarly, recovered manual non-decision times are substantially longer (though not more variable) than expected motor execution durations for button presses. These patterns suggest that parts of what the model treats as non-decision time are likely decisional in nature, although perhaps related to "action decision" rather than the "value-based decision" of interest to the authors. To what extent these two processes neatly follow each other or overlap could be usefully considered.

    3. Reviewer #2 (Public review):

      Summary:

      Zylberberg et al. reanalyze eye-tracking and behavioral data (mostly from Krajbich et al., 2010) to test two predictions of the attentional Drift Diffusion Model, finding that these predictions are not met. Similarly, predictions of normative models (inspired by rational inattention) are not in line with the data, and the authors propose a post-choice model of attention. This model better accounts for the two effects but also does not account for all patterns, so the authors conclude that eye movements most likely reflect both pre- and post-decisional processes.

      Strengths:

      A clear strength is the systematic falsification-based approach of the paper, establishing (partially) new predictions and testing to what extent these are met by extant models and by a newly developed theory. The authors do a good job in providing intuitions behind the effects and the reasons why models such as the aDDM predict them. The paper is of substantial relevance for the field, as it shows that effects pertaining to the last fixation(s) should be interpreted with caution. Another strength is the paper's transparency as the authors clearly acknowledge that their new model does not do a perfect job either.

      Weaknesses:

      The paper focuses on analyzing the Krajbich 2010 data, but shows that the second effect replicates in many other datasets. A more principled approach, in which both effects are analyzed and presented for all datasets, would be more convincing. The results should then be shown together for clarity/readability.

      Similarly, it would be nice to show to what extent the models' predictions depend (not depend) on using the best-fitting parameter values (are there any parameter settings under which the two effects are not predicted?)

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors reanalyzed choice, RT and gaze datasets collected from human subjects performing a food-choice task. They show that models that posit a causal role for attention in shaping the decision-making process fail to account for empirical observations in the data. These include the attentional drift diffusion model (aDDM) and models that derive attention-choice associations from an optimal policy. The authors show that a model that assumes that gazes are directed towards the chosen option after decision commitment captures more (but not all) empirical findings, suggesting that attention may reflect decisions once they are made instead of contributing to their formation. However, this post-decision-gaze (PDG) model failed to capture all aspects of the data, suggesting that gaze may reflect both decisional and post-decisional operations, and existing models are still missing some features of the gaze-directing process. The authors provide convincing evidence that post-decision gaze explains a number of empirical findings in this task.

      Strengths:

      (1) The analyses are generally appropriate, and the conclusions are supported by the data.

      (2) The study was rigorous, as the authors considered a number of alternative possible models for behavior, and evaluated their performance based on a wide range of qualitative predictions (as opposed to exclusively relying on model comparison).

      (3) The proposal that gaze may largely reflect post-decisional processes is interesting, and as far as I am aware, novel.

      Weaknesses:

      There was limited discussion about why one might allocate attention post-decision. I would have appreciated more discussion on the potential functional consequences or implications of post-decision gaze.

    1. eLife Assessment

      This study uses convincing modeling methods and analyses of rich behavioral datasets to investigate the role of attention in value-based decision making; for instance, as when choosing between two snacks. The results are valuable, as they challenge existing theories that assume that paying attention to an available option biases the eventual choice toward that option. The results suggest that the correlation between attention and decision-making is formed largely after and not before the (internal) choice process has terminated, a finding that offers an intuitively appealing rethinking of how attention and decision-making processes interact during value-based choices.

    2. Reviewer #1 (Public review):

      Summary:

      This study examines whether gaze direction actively shapes choice during food preference decisions or whether gaze and choice evolve largely independently until the moment of commitment. The established framework in this context, the aDDM, assumes that gaze causally biases the accumulation of evidence in favour of the fixated item. The authors show convincingly that this model fails to fit key behavioural patterns across several datasets, as do other published models that make the same assumption. The authors propose an alternative model (Post-Decision-Gaze or PDG) in which gaze and decision formation are decoupled: gaze does not influence the decision process, nor is it drawn toward the ultimately chosen item, until after the decision threshold is reached. Only during the motor execution period (after commitment) is gaze directed to the chosen option. They demonstrate that this model fits several observed patterns better than the aDDM and related variants.

      Strengths:

      The work thoroughly considers multiple models and datasets. It advances an interesting alternative perspective on gaze-decision interactions and highlights meaningful shortcomings in existing models. The authors take the time to explain how modelling assumptions produce specific patterns in the data, which is certainly insightful to readers interested in the modelling of value-based decision making.

      Weaknesses:

      It is unclear to what extent the model's success relies on the way non-decision time is formalised in the model. In the proposed PDG model, non-decision time is decomposed into separate visual encoding, saccadic execution, and manual execution components. Several values (assumed or recovered) do not match known physiological or behavioural ranges. This is a common issue in the literature, and the authors may want to address it in light of broader work discussing what non-decision time consists of in both manual and saccadic actions (e.g., Bompas et al., 2024, Non decision time: the Higgs boson of decision, Psychological Review).

      In particular, the "saccadic execution" parameter appears far too long and too variable to reflect merely execution; instead, it likely includes decisional components. This would make more sense since manual and saccadic planning essentially rely on distinct brain areas, hence it seems unrealistic that crossing a single threshold would trigger both manual and saccadic execution. Similarly, recovered manual non-decision times are substantially longer (though not more variable) than expected motor execution durations for button presses. These patterns suggest that parts of what the model treats as non-decision time are likely decisional in nature, although perhaps related to "action decision" rather than the "value-based decision" of interest to the authors. To what extent these two processes neatly follow each other or overlap could be usefully considered.

    3. Reviewer #2 (Public review):

      Summary:

      Zylberberg et al. reanalyze eye-tracking and behavioral data (mostly from Krajbich et al., 2010) to test two predictions of the attentional Drift Diffusion Model, finding that these predictions are not met. Similarly, predictions of normative models (inspired by rational inattention) are not in line with the data, and the authors propose a post-choice model of attention. This model better accounts for the two effects but also does not account for all patterns, so the authors conclude that eye movements most likely reflect both pre- and post-decisional processes.

      Strengths:

      A clear strength is the systematic falsification-based approach of the paper, establishing (partially) new predictions and testing to what extent these are met by extant models and by a newly developed theory. The authors do a good job in providing intuitions behind the effects and the reasons why models such as the aDDM predict them. The paper is of substantial relevance for the field, as it shows that effects pertaining to the last fixation(s) should be interpreted with caution. Another strength is the paper's transparency as the authors clearly acknowledge that their new model does not do a perfect job either.

      Weaknesses:

      The paper focuses on analyzing the Krajbich 2010 data, but shows that the second effect replicates in many other datasets. A more principled approach, in which both effects are analyzed and presented for all datasets, would be more convincing. The results should then be shown together for clarity/readability.

      Similarly, it would be nice to show to what extent the models' predictions depend (not depend) on using the best-fitting parameter values (are there any parameter settings under which the two effects are not predicted?)

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors reanalyzed choice, RT and gaze datasets collected from human subjects performing a food-choice task. They show that models that posit a causal role for attention in shaping the decision-making process fail to account for empirical observations in the data. These include the attentional drift diffusion model (aDDM) and models that derive attention-choice associations from an optimal policy. The authors show that a model that assumes that gazes are directed towards the chosen option after decision commitment captures more (but not all) empirical findings, suggesting that attention may reflect decisions once they are made instead of contributing to their formation. However, this post-decision-gaze (PDG) model failed to capture all aspects of the data, suggesting that gaze may reflect both decisional and post-decisional operations, and existing models are still missing some features of the gaze-directing process. The authors provide convincing evidence that post-decision gaze explains a number of empirical findings in this task.

      Strengths:

      (1) The analyses are generally appropriate, and the conclusions are supported by the data.

      (2) The study was rigorous, as the authors considered a number of alternative possible models for behavior, and evaluated their performance based on a wide range of qualitative predictions (as opposed to exclusively relying on model comparison).

      (3) The proposal that gaze may largely reflect post-decisional processes is interesting, and as far as I am aware, novel.

      Weaknesses:

      There was limited discussion about why one might allocate attention post-decision. I would have appreciated more discussion on the potential functional consequences or implications of post-decision gaze.

    1. eLife Assessment

      This study provides a valuable contribution to understanding grid-to-place transformations, offering new insights into the structure and reliability of these representations and extending prior work in a meaningful way. The evidence supporting the authors' conclusions is solid, based on careful analyses and well-executed experiments, although clarity and mechanistic interpretation would be strengthened by improving sample size reporting, expanding population-level analyses, and future studies including simultaneous entorhinal-hippocampal recordings. The work will be of interest to neuroscientists studying spatial coding and hippocampal-entorhinal circuit function.

    2. Reviewer #1 (Public review):

      This manuscript investigates how chemogenetic depolarization of medial entorhinal cortex layer II stellate cells reshapes spatial coding in downstream hippocampal CA1. Building on the authors' prior work (Kanter et al., Neuron 2017), the study examines changes in grid cell subfield firing rates and CA1 place cell firing patterns after CNO administration. A central advance of the present work is the use of the same manipulation on two consecutive days. The authors show that the induced grid subfield rate changes are highly similar across days and that CA1 place field reorganization is likewise reproducible across days. In addition, they report that CA1 remapping after CNO is not arbitrary. The new main place field often emerges at a location that can be anticipated from the baseline rate map of the same cell, typically corresponding to a weak secondary peak outside the primary field. Finally, the authors demonstrate that these experimental findings can be recapitulated in a feedforward grid to place cell model by selectively redistributing grid subfield firing rates, supporting the interpretation that grid subfield rate changes are sufficient to drive predictable and reproducible place field reorganization.

      Overall, this study is positioned as a follow-up to the authors' previous report in which the main phenomenon (grid subfield rate remapping and accompanying CA1 place cell remapping following chemogenetic depolarization of MEC layer II neurons) was already established. While the conceptual novelty is therefore incremental, the present manuscript adds important and convincing evidence about two key properties of this phenomenon, including its reproducibility across days and the extent to which the direction of place field reorganization is predictable from baseline activity. The experimental approach and analyses appear generally appropriate and carefully executed, and the inclusion of modeling strengthens the mechanistic interpretation. These results provide useful new insight into stable input-output relationships within the entorhinal hippocampal system, and the work will be of interest to researchers studying remapping and the grid to place cell transformation.

    3. Reviewer #2 (Public review):

      Summary:

      Hippocampal remapping - the collective reorganization of neural tuning properties - is thought to be a crucial determinant of memory outcomes. Understanding its mechanistic bases is a fundamental goal of neuroscience and likely to be critical to understanding memory in health and disease. Here, Lykken et al. 2025 leverage a unique empirical manipulation paired with computational modeling to investigate how one mechanism - reorganization of grid cell subfield firing rates - impacts hippocampal remapping. The authors find that repeated chemogenetic excitation of MEC stellate cells induces reliable reorganization of grid cell subfield firing rates, which is in turn coupled with reliable hippocampal remapping. Notably, the authors show that this hippocampal remapping is not random but predictable, with changes in field location that can be predicted based on weak out-of-field firing observed during control sessions. These findings were well-replicated by a simple model of grid-to-place transformation.

      Strengths:

      This work has many strengths. One key strength of this work is its compelling demonstration that chemogenetic activation of stellate cells induces changes to the grid and place cell representations, which are reliable across repeated activations. This reliability means that the functional changes induced by this manipulation are not merely noise but rather contain a consistent structure that can be investigated to gain insight into the entorhinal-hippocampal transformation. Similarly, the demonstration that hippocampal remapping during this manipulation is not random, but predictable at the single-cell level, is also a strength. This predictability can help us distinguish competing mechanisms of remapping and place field formation more generally. Finally, by reproducing key experimental outcomes with a straightforward grid-to-place computational model, the authors show that this relatively simple model is sufficient to understand their results.

      Weaknesses:

      This work also has limitations that leave some relevant questions open at this time. One such set of questions which might be addressable with the author's data and modeling concerns population analyses. Do grid fields at similar locations exhibit similar changes in field properties, or do these fields change independently? Are changes in field location consistent or inconsistent among simultaneously recorded place cells? Would we expect or not expect such a structure given the model? These results might help discriminate between different mechanisms possibly at play.

      Another limitation of this work is its reliance on a single measure of predictability. While this is a great start, and the various controls and modeling are appreciated, I wonder whether the modeling could be used to generate additional verifiable predictions. For example, perhaps analyzing whether there is or is not structure to unpredictable errors (are these distributed around predictions but further away, or are they random)?

      Finally, one limitation comes from the between-group nature of the recordings. Because the MEC and hippocampus are recorded in separate groups of animals, the authors lose the ability to test whether each mouse's particular grid field reorganization predicts its particular pattern of remapping. If the author's model is correct, then one might hope to be able to predict with even higher accuracy the particular patterns of remapping in CA1 given sufficiently well-characterized grid field changes. This ambitious goal would require simultaneous recordings from the hippocampus and entorhinal cortex, which are beyond the scope of the current work, but would ultimately yield even more compelling evidence of the grid-to-place transformation underlying this form of remapping.

    1. eLife Assessment

      This work provides a map of enhancer-promoter interactions associated with genes controlling the development of a specific neuronal cell population. The study offers a valuable resource and integrates multiple complementary datasets to provide insights into regulatory mechanisms, although the conceptual advances are moderate and the central message could be clearer. The evidence supporting the conclusions is generally solid, but the lack of direct functional testing of key regulatory elements limits the strength of some claims.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      This study by Riegman & George et al. investigates the roles of the chromatin remodeling factor CHD7 and the proneural transcription factor Atoh1 at enhancers in cerebellar granule cells (GCs). Enhancers were categorized based on epigenetic marks and cross-referenced with promoter capture-HiC, ATAC-seq, and expression datasets to identify their long-range target genes, which were found to be enriched for critical neurodevelopmental processes. Differential expression and chromatin accessibility analyses in CHD7 knockout (KO) conditions suggest that this factor regulates a significant number of enhancers. These same enhancers are enriched for proneural transcription factor motifs, with Atoh1 being the most frequently present and likely the most affected. Finally, the direct interaction between CHD7 and Atoh1 was assessed via co-immunoprecipitation in co-transfected cells.

      While the paper presents an interesting aspect of enhancer regulation in neurodevelopment, several points warrant attention:

      Major Strengths:

      The use of chromatin marks increases the resolution of promoter-interacting enhancer regions when integrated with capture-HiC, refining the identification of distal enhancers. Additionally, performing promoter capture-HiC experiments for the first time in this cell type constitutes a valuable resource for the community working on 3D genome organization and neurodevelopment.

      Major Weaknesses:

      As noted by the authors, limited sequencing depth reduces confidence in the conclusions and may result in missed weaker long-range interactions. Furthermore, the absence of capture-HiC and Atoh1 ChIP-seq experiments in the KO condition prevents direct comparison, thereby limiting the strength of the conclusions.

      Additional Consideration:

      Caution should be exercised regarding the assumption that every enhancer must physically contact its target promoter. While true for many enhancers, some act in trans through eRNAs or lncRNAs without direct physical contact.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors aim to identify active, long-range regulatory interactions in cerebellar granule cell progenitors (GCps). As such, the authors perform promoter capture Hi-C to map long-range interactions for all gene promoters, using cells isolated from P7 mouse brain samples. While the resolution of these maps is limited by the relatively large fragment sizes generated from a 6-bp cutter, the authors combine these interactions with other available published datasets, including from their own previous work, (e.g. ATAC-seq and ChIP-seq) to more precisely map putative enhancers within the long-range interacting regions of captured promoters. The paper further focuses on the importance of transcription factor Atoh1 and chromatin remodeller CHD7 in regulation of these putative enhancers in GCps. The authors suggest a direct interaction between CHD7 and Atoh1 by overexpression and co-immunoprecipitation in human embryonic kidney cells.

      As stated by the authors, this study represents a valuable resource for researchers interested in the identification of enhancers in GCps cells, and their linked target genes. While broadly descriptive, the study does highlight some gene loci of interest and of biological relevance. For example, through integration of previously published datasets, the study resolves which putative regulatory elements at the Reln locus may regulate its activity.

      This manuscript will be of interest to researchers interested in analysing long-distance targets of as well as researchers trying to understand the precise gene regulation in cerebellar development. It may also be of interest to clinical geneticists to interpret novel putative non-coding disease mutations.

      Strengths:

      The strengths of this manuscript are the integrated approach to identify cell-type specific enhancers utilizing available epigenomic datasets, and leveraging 3D genome topology to directly link them to their target genes. For example for the Reln gene previously implicated in cerebellar phenotypes for CHD7 mutants. The pcHi-C dataset generated in this study provides a valuable reference for the community of enhancer-promoter pairs for a specific cell-type of interest with human disease relevance.

      Weaknesses:

      The limitations of the study are partially addressed in the text by the authors, including the resolution from the pcHi-C using a 6-bp cutter, the limitation of sequencing depth (more interactions may have been identified with more depth), and the limited of correlation between replicates (likely due to undersampling the library). Page 9 "some additional interactions with the nearest gene promoters might be identified in our pcHi-C dataset with deeper sequencing".

    4. Reviewer #3 (Public review):

      Summary:

      In this work, Riegman et al. establish the promoter interactome of cerebellar granule cell progenitors (CGPs) and identify thousands of putative enhancers regulating key genes in this cell population. The authors isolate primary CGps cells from the mouse cerebellum and perform promoter capture Hi-C in order to reanalyse previously generated epigenomic datasets (ATAC-seq, H3K4me1/3, H3K27ac) in these cells. They identify 22'797 enhancers interacting with gene promoters. The authors then use CHD7 ChIP-seq experiments to better annotate regulatory regions linked to genes deregulated upon CHD7 loss of function. After observing that CHD7 is frequently co-bound with ATOH1, they compare the binding profiles of ATOH1 and CHD7 together with genes deregulated in loss-of-function datasets, and refine the regulatory elements associated with each of these proteins.

      Strengths:

      The work is well designed and carefully executed, leading to an enhancer-promoter (E-P) interaction cartography that largely surpasses the current standard in the field. The pc-HiC dataset enables a deeper analysis of previously generated datasets (ChIP-seq and loss-of-function), which clearly improves the understanding of the mechanisms underlying CGps proliferation and differentiation. Moreover, the integration of published loss-of-function datasets for CHD7 and ATOH1 is relatively novel in this type of study and helps reduce the purely descriptive nature of the work. In particular, the analysis sheds light on genes with potential functions in CGps that had not previously been identified, as well as their regulatory connections. Overall, the study is convincing and supports the conclusions presented by the authors.

      Weaknesses:

      (1) A substantial part of the manuscript focuses on E-P interactions in CGPs, which gives the impression that this is primarily a genome organisation study. However, in this regard the manuscript does not bring major conceptual novelties. In contrast, the biological insights related to CGPs and the identification of new candidate genes likely represent the most novel aspect of the work. The authors should clarify the central message of the manuscript and reorganise the presentation of the results accordingly.

      (2) The numbers presented throughout the manuscript are sometimes confusing. For instance, the authors initially report 106'589 PIF (line 175), but later only 61'928 (line 243) when calling enhancers. The relationship between these numbers is not straightforward. More generally, simplifying the nomenclature used to describe interaction analyses would help emphasise the biological insights rather than the computational framework.

      (3) ATAC-seq alone is a relatively poor predictor of enhancers. In this context, H3K27ac would provide a more accurate marker of enhancer activity. This point is particularly important because the authors' data suggest that CHD7 does not function as a pioneer factor capable of opening chromatin. Instead, this role appears to be more closely associated with ATOH1. Therefore, alterations in CHD7 are more likely to affect enhancer activity (reflected by H3K27ac) rather than chromatin accessibility itself. If the authors do not have access to H3K27ac ChIP-seq data, this limitation should be explicitly acknowledged.

      (4) The authors do not functionally test most enhancers and instead discuss primarily putative enhancers (with the exception of VISTA-tested elements). Although the term "putative enhancer" appears in some subsections, it is not consistently applied throughout the manuscript. This limitation should be clearly stated early in the manuscript with a sentence such as: "As these regions have not been functionally validated, they should be considered putative enhancers. However, for simplicity, we will refer to them as enhancers throughout the manuscript."

      (5) Where feasible, the enhancer identified at the Reln gene should be functionally tested to demonstrate the added value of the approach.

    5. Author response:

      General Statements

      We thank the reviewers for their careful and supportive reviews of our manuscript. We have addresses all the reviewers comments and extensively revised the manuscript accordingly.

      During our revisions, we discovered a bug in the code that calculated the linear genomic distance between the captured promoter regions (bait regions) and the promoter-interacting fragments (PIFs). The error inadvertently halved the distance measurements in the output tables. This has been corrected in the revised manuscript and has resulted in updates to Figure 1B and corrected values in the ‘interaction_distance’ and/or ‘interaction_type’ columns of Supplementary Tables 2, 3, 6 and 8. We thank the reviewers for the opportunity to correct this.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      In this article, the authors conducted promoter-capture HiC experiments (pcHiC) in Mouse Cerebellar granule cell progenitors (GCps) and obtained a good set of 3D genome interactions map of protein-coding genes' promoters. This dataset was later integrated with ATAC-seq and ChIP-seq experiments to identify putative enhancer regions within promoter-interacting regions, and with higher base-pair resolution than what is obtained by pcHiC experiments. This set of enhancers is then compared to and presented as being more reliable than those present in VISTA enhancer database. In addition, ATAC-seq sites and RNA-seq datasets, both obtained in WT and CHD7 and KO conditions, are integrated to correlate expression of a set of genes to the chromatin accessibility of their distal enhancer(s) which is believed to be promoted by CHD7. The study is completed by focusing on transcription factor motif analysis on CHD7-regulated enhancers which shows an enrichment for proneural transcription factors, with special emphasis on Atoh1 found to be frequently co-recruited with CHD7. Data and methods are well detailed and correctly replicated and will be useful as a resource for the community. The overlap obtained between pcHiC experiments and auto-criticized by the authors is very common and expected in this kind of experiments. In general, the conclusions drawn the article are convincing but some aspects such as comparison to VISTA and the naming of 'enhancers' should be moderated.

      We thank the reviewer for their positive and constructive comments. We have amended the manuscript as indicated in detail below.

      (1) The comparison of pcHiC-identified enhancers vs. VISTA enhancers should be more balanced, as the two approaches have important conceptual differences. Although VISTA enhancers are based on functional annotation, their target genes might not necessarily be correctly assigned based on the distance. On the other hand, putative enhancer regions identified by pcHiC experiments do not rely on functional testing. So both type of information are useful but can be put in perspective.

      We thank the reviewer for making this point. We have amended the text to present a more balanced view e.g. “Using VISTA-designated hindbrain enhancers as an example, we identify the genes most likely regulated directly by these enhancers and update their annotation accordingly.”

      (2) To increase the strength of the paper, it would be preferable that authors include simple functional enhancer assays (e.g. CRISPR deletion of contacting enhancer, luciferase assay) to support their perspective since 3D conformation information in KO condition is lacking in the article. Although ideally these experiments should be better performed for a full demonstration, it would be acceptable to at least include a simple functional assay in the WT context to demonstrate that the regulatory regions obtained by crossing genomic data are real enhancers. This point is even more critical knowing that enhancers lacking classical histone marks (H3K27ac+H3K4me1) has been described. The same comment applies to promoter interacting fragments lacking these marks, that could be missing enhancers (i.e enhancers without these marks).

      To address this point, we performed luciferase assays to show that putative enhancers identified with our integrated bioinformatic approach (pcHi-C + ATACseq + H3K4me1 + H3K27ac) do indeed exhibit enhancer activity. For these experiments, we tested these putative fragments in an immortalized cell line SHH-NPD, a GCp-derived cell line generated by Fults laboratory (Jenkins et al. 2014). The results of these experiments are included as Suppl. Fig. 1 in the revised manuscript.

      Minor point

      - Figure 5B is lacking labels.

      We apologise for this oversight – labels have now been added.

      Reviewer #1 (Significance):

      This article, when completed with possible revision, will be be useful for the community in terms of useful resource of experimentally determined putative enhancers in Cerebellar granule cell progenitors. It also provides some insights into the association of CHD7 and Atoh1 in distal regulation in these cells.

      We thank the reviewer for acknowledging the significance of our work.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this manuscript, the authors aim to identify active, long-range regulatory interactions in cerebellar granule cell progenitors (GCps). As such, the authors perform promoter capture Hi-C to map long-range interactions for all gene promoters, using cells isolated from P7 mouse brain samples. While the resolution of these maps is limited by the relatively large fragment sizes generated from a 6-bp cutter, the authors combine these interactions with other available published datasets, including from their own previous work, (e.g. ATAC-seq and ChIP-seq) to more precisely map putative enhancers within the long-range interacting regions of captured promoters. The paper further focuses on the importance of transcription factor Atoh1 and chromatin remodeler CHD7 in regulation of these putative enhancers in GCps. The authors suggest a direct interaction between CHD7 and Atoh1 by overexpression and co-immunoprecipitation in human embryonic kidney cells.

      As stated by the authors, this study represents a valuable resource for researchers interested in the identification of enhancers in GCps cells, and their linked target genes. While broadly descriptive, the study does highlight some gene loci of interest and of biological relevance. For example, through integration of previously published datasets, the study resolves which putative regulatory elements at the Reln locus may regulate its activity.

      We thank the reviewer for their supportive comments.

      We provide a summary of our major and minor comments here.

      Major comments:

      (1) The main take-home messages of the manuscript could be more clearly stated in the introduction to help readers understand the main conclusions of the work.

      We have added a sentence to the Introduction to clarify the key take-home messages:

      “We report putative distal regulatory elements for >12,000 genes, identify CHD7- and Atoh1-regulated enhancer elements and show that these factors interact and likely co-regulate the expression of key genes in the GCp lineage.”

      (2) In the discussion, a previous Hi-C dataset is referred to "Reddy et al. annotated 5,175 promoter-enhancer interactions in GCps using Hi-C without enrichment (Reddy, Majidi et al. 2021)." It would be beneficial to compare the interactions identified previously with the current study (5,175 vs 46,428 interactions).

      To address this comment we have performed an additional analysis and include text and Suppl. Figure 3 and Suppl. Table 13 to demonstrate the extent the two datasets compare, overlap and diverge. We have also added additional text to the discussion to highlight the difference and technical considerations between the two approaches and how they complement each other.

      The 5,174 enhancer-promoter (E-P) interactions identified by Reddy et al were downloaded and intersected with the 46,428 promoter-accessible PIF regions identified in our study. The new supplementary Figure 3A illustrates that 82% (843/1207) of genes that Reddy et al identifies long-range interacting regions for are represented in our pcHiC dataset. Our pcHiC data contains information on distal interacting regions and potential enhancer regions for an additional 11,511 protein coding genes. Suppl. Figure 3B provides an overview of the Reddy et al E-P interactions that are, and are not identified in the pcHiC. We replicate 38% of Reddy et al’s E-P findings, whilst 53% of the 3229 interactions unique to the Reddy data would not be detected in the pCHiC data due to technical reasons resulting from the capture design and analysis protocol. Of the remaining interactions that are specific to the Reddy data, we identify other distal regions interacting with those same promoters . Suppl. Table 13 details the full comparision of Reddy’s E-P interactions that are found within our dataset.

      The differences between the two datasets and the increased number of interactions detected in the pcHiC dataset likely result from the increased enrichment for the captured promoters enabling the detection of interactions that would have been below the detection threshold for the HiC study. In addition there are notable differences in analysis strategies for the two datasets which also contribute to differences in detection of regions. Reddy et al binned the HiC data into 10Kb regions to identify interacting regions and subsequently used chromatin marks to identify possible enhancer and promoter regions within these large regions. In contrast we have used the pCHiC and CHiCAGO algorithm to identify individual HindIII restriction fragments that are proximal to targeted promoter regions (PIFs), and prioritised those that have accessible regions within them which could represent various types of regions that play regulatory roles such as enhancers, CTCF site or facilitator regions, independent of their chromatin mark composition rather than focusing solely on enhancers.

      (3) The authors identify an overlap with some of their identified enhancers with those from VISTA. Is this a fair comparison seeing as the enhancer reporters were tested during early embryonic development (e.g. E11.5 and E13.5) and seen to be active in the hindbrain, would these stages be relevant to GCps from P7? Can the authors identify ATAC-seq for example from hindbrain from embryonic stages and determine if the enhancer accessibility profile looks similar to that for the P7 GCps cells?

      We thank the reviewer for this important question regarding the developmental relevance of our VISTA comparison and acknowledge that direct comparison between the time point requires careful consideration. Firstly ,to address the question of how similar the chromatin accessibility profiles are between the embryonic and P7 timepoints, we compared the ATAC-seq data from our paper to ENCODE data from the hindbrain. Of the 140 vista enhancers that were intersected with the pCHi-C dataset, 119 were identified from the lacZ studies as active in the hindbrain at E11.5 whilst 21 were identified as active at timepoint E12.5. We compared ENCODE ATAC-seq peaks from the E11.5 (ENCFF743IYX) and E12.5 ( ENCFF198TLF) hindbrain to the GCps from P7 across both the entire genome (global accessibility) as well as specifically +/- 3MB around the VISTA enhancer regions in the PIFs from the pCHiC to assess the conservation of local accessibility profiles.

      When looking at the global accessibility profile of embryonic hindbrain versus P7 GCps across the whole genome there was a large degree of overlap with ~85% (E11.5) and ~88% (E12.5) of all ENCODE ATAC peaks overlapping with accessible ATAC summit regions from P7 GCps:

      Author response image 1.

      To identify if this was consistent in the immediate chromatin environment of the VISTA enhancers themselves, we compared the accessibility profiles across timepoints in the local environment surrounding the VISTA enhancers. This local environment was defined as a region that added an additional 3MB on either side of all VISTA enhancer positions found in PIFs. 3MB was chosen as the longest interaction found for a single VISTA element was approximately 2.7MB. Consistent with the global analysis a similarly high level of overlap of accessible regions between the timepoints was found for the local chromatin environment in surrounding the VISTA enhancers that were found within PIFs in the pCHiC dataset with ~87% (E11.5) and ~89% (E12.5) of encode detected peaks overlapping with accessible ATAC summit regions from P7 GCps.

      Author response image 2.

      Regions +/-3MB of VISTA enhancers in PIFs

      Author response image 3.

      Regions +/-3MB of VISTA enhancers in PIFs

      Genome browser shots at the three example VISTA loci from Figure 1 further support this approach. In addition to this we also note that a recent study by Chen et al (2024 https://www.nature.com/articles/s41588-024-01681-2) where capture-HiC performed at E11.5 of 935 VISTA enhancers across multiple tissues confirmed that the majority of VISTA enhancer regions (61%) bypass adjacent genes which is consistent with our nearest gene comparison.

      (4) The co-IP experiment appears to support the conclusion that Atoh1 and CHD7 can interact, however there are bands in lanes where there should not be (i.e. Input lanes 1 and 4 for FLAG blot). It would be recommended to repeat this result at least once. [Expected time 2-4 weeks].

      This experiment has been repeated 3 times with the same result. It is normal for non-specific background bands to appear on Western blot from total cell lysates (inputs) as most antibodies have significant cross-reactivity. The anti-FLAG antibody clearly detects bands above background in lysates where FLAG-tagged CHD7 is expressed. Most critically, despite the presence of non-specific bands in input, FLAG-tagged CHD7 is only detected in immunoprecipitated samples where either FLAG-tagged proteins have been precipitated and FLAG-tagged CHD7 is expressed and HA-tagged Atoh1 has been precipitated when both FLAG-tagged CHD7 and HA-tagged Atoh1 are expressed.

      (5) The methods section describes analysis of several datasets, however we could not access the code at the time of review. Do the authors intend to make this code available at the time of publication?

      Yes once the publication is approved all code will be made available along with conda environment yaml files to replicate the software environment in which the analysis was performed.

      (6) Page 7 "replicate one and two, respectively". Can the authors clarify the number of biological replicates performed for pcHi-C?

      Two biological replicates were performed for pcHiC which were then bioinformatically combined into a ‘superset’ for CHiCAGO interaction calling as is standard practice for pcHiC data (see e.g. Cairns et al, 2016. We have revised the text to make this clearer.

      Minor comments:

      (1) Page 3 "controlling the expression of 577 genes in GCps" - the authors do not provide evidence that these enhancers control gene expression directly, this should be reworded.

      Thank you. We have reworded to: “contacting the promoters of 577 genes” to indicate that these were identified using pcHi-C and not functional assays.

      (2) Page 5 "where transient amplifying divisions exponentially expand GCps" - at what stages of embryonic/postnatal development are GCps first detected, and when do they amplify and then differentiate?

      GCps that form the EGL are specified in the rhombic lip from E13.5 (Machold, 2005 and Wang, 2005) and a clear EGL can be observed in the cerebellar anlage from E14 (Ben-Arie, 1997) of development. They amplify from this stage and differentiation, induced by neurogenic factors like NeuroD1 is visible from P0 onwards (Miyata, 1999). We have amended the text to include this additional information: “GCps that form the EGL are specified in the rhombic lip from E13.5 (Ben-Arie et al, 1997; Machold & Fishell, 2005) and a clear EGL can be observed in the cerebellar anlage from E14 (Ben-Arie et al., 1997) of development. They amplify from this stage and differentiation, induced by neurogenic factors like NeuroD1 is visible from P0 onwards (Miyata et al, 1999).”

      (3) Page 7 "identified 164,387 unique and significant interactions" - how is an interaction defined, a single read, or evidenced by a certain number of reads. "promoter interacting fragments or PIFs" - is PIF referring to a single read evidencing an interaction?

      An interaction is defined by the CHiCAGO algorithm. The number of reads needed to score an interaction depends on the both the distance away that PIF is from the promoter (this is modelled using a distance-dependent component that accounts for decay of contact frequence with genomic distance) and also includes a component that models how the sequence or other technical artifacts might influence the capture bias of some sequences compared to others. For each promoter a background model is generated of the expected number of reads that would be captured based on the above considerations and if the number of reads for those regions exceeds this background model by a certain threshold the interaction is deemed significant using a p-value like score. In practice this means that regions further from the promoter will often require less reads to signify a significant interaction compared to regions that are much closer to the promoter. The significant PIFs in the dataset are all evidenced by a minimum of 3 reads in at least one biological replicate. We have included a short explanation of this in the methods of the revised manuscript for clarity.

      The maximum reads in a single replicate library for a specific PIF was 1557, and the median number of reads per PIF was 17.

      (4) Page 8. What is the distinct between PIFs and "promoter interacting regions (PIRs)"? These could be better defined in the text.

      Thank you for picking up this discrepancy, we were using PIR and PIF interchany. We have amended the manuscript to refer to PIFs consistently throughout.

      (5) Figure 1C-F. Labels "Random" and "PIFs" don't line up well with the two bars.

      Thank you, this has been corrected.

      (6) Page 9. Could the authors show some representative images for the "VISTA hindbrain enhancers" (e.g. for Figure 1I-K).

      We have inserted representative images showing in vivo activity of these enhancers in mouse embryos from the VISTA enhancer site.

      (7) Fig 2G, Page 11 "The 12,354 genes that were linked to a PIF containing an ATAC-seq peak were found to have a higher median expression level than the 2,049 genes that had PIFs that did not coincide with ATAC-seq peaks" - is this significant?<br />

      Apologies for this oversight. We have performed a two-sided t-test on the log transformed TPMs between the two groups and have included the significance in the revised figure (p=1.8 e-40).

      (8) "Gene Ontology analysis of genes with accessible PIFs revealed a significant enrichment for 119 biological processes" - can you include the GO terms in a supplementary table? Is there a way to prioritise down the 12,354 genes to a shorter more significant list of genes, this seems a long list to include in GO analysis.

      We have included a supplementary table with this data in the revised manuscript (Suppl. Table 6). We included all 12,354 genes in this analysis as the point of this analysis was to demonstrate that developmental processes are enriched in the PIFs with accessible chromatin, compared to the genes where only PIFs without ATAC were identified.

      (9) Page 11 - "The chromatin remodelling factor CHD7 is essential for normal expansion of GCps in the postnatal mouse cerebellum (Whittaker et al., 2017b) and deletion of Chd7 from GCps results in striking cerebellar hypoplasia and polymicrogyria (Feng et al., 2017; Reddy et al., 2021; Whittaker et al., 2017b). CHD7 haploinsufficiency is also sufficient to cause cerebellar hypoplasia and foliation defects both in mouse models and in the context of CHARGE syndrome in humans (Whittaker et al, 2017a; Yu et al, 2013)." - this appears more suitable for the introduction.

      Thank you, we have moved this text to the Introduction.

      (10) Page 12 "the majority of which (4,663/5,369) displayed decreased accessibility when Chd7 is depleted". This was difficult to understand initially - which are expected to be the direct effects? Increased or decreased accessibility? Perhaps it would be better to focus only on the decreased accessibility sites?

      We have previously shown that the majority of differentially accessible regions in Chd7-deficient GCps show decreased accessibility. Chromatin remodelling by CHD7 could conceptually reduce or increase accessibility of a particular locus and the only way to infer direct effects are by identifying regions to which CHD7 is recruited.

      Approximately ~9% of the sites that decreased in accessibility overlapped with regions bound by CHD7 (464/4663), whilst ~2% of sites that increased in accessibility overlapped with regions of CHD7 binding (14/706). Whilst it is likely that the majority of directly regulated sites decrease in chromatin accessibility when CHD7 is removed, the number of sites that increases in accessibility is small but observed and should be included for completeness.

      (11) The analysis in Fig 3A reveals that only a small number of CHD7-bound enhancers show differential accessibility and altered linked gene expression upon CHD7-knock down. This requires a little more discussion - why do so many sites change in accessibility compared to the number of sites which change accessibility or are associated with gene expression change?

      Identifying CHD7-regulated enhancers is challenging, mostly due to the inefficiency of CHD7 ChIP-seq. The low quality of available CHD7 ChIP-seq data has made it particularly difficult to identify CHD7 peaks. However, the integration of this data with ATAC-seq accessibility, chromatin modification and pcHi-C data has allowed us to identify a subset of enhancers that are most likely directly regulated by CHD7. However, given these technical limitations, we would be hesitant to conclude from the present data that the majority of chromatin accessibility changes in enhancers in Chd7-deficient GCps are indirect. We have added the following text to the discussion to indicate this: “Identifying CHD7-regulated enhancers is challenging, mostly due to the inefficiency of CHD7 ChIP-seq. The low quality of available CHD7 ChIP-seq data has made it particularly difficult to identify CHD7 peaks. However, integrating CHD7 ChIP-seq data with ATAC-seq accessibility, histone modification ChIP-seq and pcHi-C data has allowed us to identify a subset of enhancers that are most likely directly regulated by CHD7. However, given these technical limitations, we would be hesitant to conclude from the present data that the majority of chromatin accessibility changes in enhancers in Chd7-deficient GCps are indirect, as suggested by the data in Fig. 3A.”

      (12) Page 12 - "Over-representation analysis confirmed an enrichment of genes linked to nervous system development" - could this and the GO term analysis be included in a supplementary figure?

      We have included these results as Suppl. Table 7 in the revised manuscript.

      (13) Fig 3D - what does the arrow represent in the chromatin schematic?

      The arrow in the schematic indicates chromatin remodelling – we have clarified this in the figure legend and added headings to these panels to indicate the 3 different types of elements: Direct CHD7 targets, Indirect targets and CHD7-bound elements.

      (14) Fig 3G does not appear to be referenced in the text. The value of the Upset plots in the main figure 3 wasn't very clear, perhaps these could be moved to the supplement? Is there a clearer plot to support the conclusion "CHD7 primarily regulates enhancers".

      We apologise, the panels were mis-labeled in the text. This has now been corrected. We hope that the amendments in response to point 13 above now clarifies these findings showing that direct CHD7 targets are characterised by active enhancer marks.

      (15) Page 14 "putative consensus sites for proneural bHLH TAL-family of proteins Neurog2, Neurod2, Neurod1, and, Atoh1 in elements" - HOCOMOCO motifs are only shown for Atoh1 and Nhlh1. It may be valuable to show the sites for all the listed TFs. What does white represent in the heatmap in Fig 3H? This plot is difficult to interpret, and also relatively small in the figure but appears important to conclusions. Perhaps Fig 3H could be made more prominent?

      Thank you for highlighting that the white boxes might be confusing. The white blocks indicate that these motifs do not pass threshold for significantly enriched in the dataset based on the p and q values.This has now been clarified in the figure legend.

      We have enlarged panel H to make more prominent.

      (16) Page 15 - "Myb was the only motif specific to CHD7 bound regions that changed in accessibility compared to those that exhibited accessibility changes without CHD7 binding or CHD7 binding without accessibility changes (Suppl. Fig. 1)." I couldn't interpret this sentence, requires clarifying.

      We agree that this description is confusing and since it is difficult to draw clear conclusions about the significance of enhancers with Myb motifs in this context, we have removed this sentence from the revised manuscript.

      (17) Page 16 and Fig 4B - a discussion of why both up and down regulated genes are detected for Atoh1 depletion? Which class of genes are expected to be directly regulated (the down-regulated genes)?

      Like most transcription factors, ATOH1 may be able to function as both a repressor and activator depending on the context. Although the majority of genes are downregulated in Atoh1-defivcient cells, suggesting that Atoh1 functions as an activator in most cases, our analysis have identified several up-regulated genes that contain Atoh1 ChIP-seq peaks in their cognate enhancers (See Suppl. Table 7), consistent with these also being direct Atoh1 targets.

      (18) Fig 5B - the genomic traces are not labelled in this figure.

      Thank you, labels have been added.

      (19) Page 17 - "Pathway enrichment analysis of the 22 genes compared to all genes that were expressed in GCps shows a significant enrichment of terms: Hypoplasia of the pons (HP:0012110 P=0.006) and Abnormal pons morphology (HP:0007361 P=0.016) from human phenotype ontology, due to the presence of Reln, Dcc, Mab21l1 and Gli2." - this analysis should be included in the supplementary tables.

      These results have been included as Suppl. Table 12 in the revised manuscript.

      (20) Do the authors have a suggestion for which domains of Atoh1 and CHD7 could be interacting? Could the authors design truncated constructs for overexpression in HEK cells to test this hypothesis? [Expected time 4-6 weeks, interesting but not essential to do experimental work here].

      We agree this is an interesting question. Our collaborator, Professor Peter Scambler (UCL) has performed a yeast two hybrid screen for CHD7 interacting proteins in a mouse E11.5 library using the CHD7 BRK domain (aa 2521-2708) as bait. The screen had a single hit, which encompassed the N-term 127aa of ATOH1 (personal communication). This observation supports our co-IP data and suggests that the N-terminus of ATOH1 interacts with the BRK domain of CHD7 but further validation will be needed to confirm this.

      (21) Page 28 "Differential accessibility analysis was performed using DESeq2 (v 1.22.1)" and Page 19 "Whereas chromatin accessibility at some of these enhancers were affected by Chd7-deficiency" - what were the cutoffs used for looking at differentially accessible regions? Complete loss of accessibility or a quantitative change?

      Quantitative change rather than complete loss was used. Thresholds based on adjusted p-values (padj<0.05) were used as indicated in the methods.

      Requested comments on referencing:

      - "Long-range" - how do the authors define long-range? Can this be referenced. CO? good reference here.- look to CHiCAGO paper

      - "When chromatin conformation or 3D organisation data is not available, studies typically assign regulatory elements to the nearest gene promoter" - needs referencing.

      - "Many of these 22 genes regulated by CHD7 and Atoh1 have established critical roles in cerebellar development, including Neurod2, Pax6 and Gli2 (Fig. 5B)" - needs referencing. "from human phenotype ontology, due to the presence of Reln, Dcc, Mab21l1 and Gli2" - needs referencing.

      Thank you, references have been added.

      - "active enhancers (H3K27ac+, H3K4me1+), promoters (H3K27ac+, H3K4me3+), regulatory elements (H3K27ac+, H3K4me1+, H3K4me3+), or poised enhancers (H3K4me1+)" - needs referencing.

      Thank you, references have been added.

      - Reference required in main text for VISTA (e.g. Visel et al., 2007)

      Thank you, reference added.

      Reviewer #2 (Significance):

      The strengths of this manuscript are the integrated approach to identify cell-type specific enhancers utilizing available epigenomic datasets, and leveraging 3D genome topology to directly link them to their target genes. For example for the Reln gene previously implicated in cerebellar phenotypes for CHD7 mutants. The pcHi-C dataset generated in this study provides a valuable reference for the community of enhancer-promoter pairs for a specific cell-type of interest with human disease relevance.

      We thank the reviewer for recognising the potential value of our work to the community.

      The limitations of the study are partially addressed in the text by the authors, including the resolution from the pcHi-C using a 6-bp cutter, the limitation of sequencing depth (more interactions may have been identified with more depth), and the limitated of correlation between replicates (likely due to undersampling the library). Page 9 "some additional interactions with the nearest gene promoters might be identified in our pcHi-C dataset with deeper sequencing".

      We thank the reviewer for highlighting our acknowledgements of the potential limitations of our work.

      Additional limitations include the use of the VISTA browser mouse LacZ embryos to validate some of their enhancers, the limitation here being that the VISTA browser tests enhancers at embryonic stages (focused at E11.5 and E13.5) while the GCps cells were collected at P7. The LacZ images from VISTA are also not shown. The HEK cells used for the co-IP could be seen as a limitation as these are not relevant cells for the cell state studied, the authors could clarify their use of these cells.

      We thank the reviewer for their careful assessment of the limitations of our study. We have now included images of the VISTA enhancers in Fig. 1I,J,K. Rather than a limitation, using irrelevant cells for co-IP might be seen as a better approach, as conceivably the chances of an indirect interaction between the two proteins being tested by a bridging complex is less in an irrelevant cell types that might not contain such complexes. Either way, HEK293T cells is the standard laboratory model for co-IP studies as they can be transfected with ease.

      The study reported here is largely based on previous work from the authors (Whittaker et al 2017b). This study reported that the chromatin remodelling factor CHD7 is essential for normal expansion of GCps in the postnatal mouse cerebellum and deletion of CHD7 from GCps resulted in the phenotype of cerebellar hypoplasia. This study also largely leverages previously published datasets from the Whittaker et al 2017b (e.g. CHD7 deletion data) and reanalyses it in the light of the new pcHi-C datasets.

      This manuscript will be of interest to researchers interested in analysing long-distance targets of as well as researchers trying to understand the precise gene regulation in cerebellar development. It may also be of interest to clinical geneticists to interpret novel putative non-coding disease mutations.

      We thank the reviewer for highlighting the wide interest of our manuscript.

      In assessing this manuscript, my expertise lies in models of human development and gene regulation, with a focus on enhancer function.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Riegman et al have explored the gene regulatory landscapes of cerebellar granule cell progenitors (GCps). They have generated promoter capture Hi-C data to identify regions that interact with promoters in these cells. In addition they generate ATACseq data in wild-type and CDH7 knock-out cells. They integrate these data to identify enhancers that potentially regulate genes in GCps. In addition, the authors identify an interaction between CHD7 and ATOH1, whose binding sites also overlap in the genome.

      The dataset can be potentially interesting for people studying cerebellar development.

      I have a few concerns regarding the paper. The most pressing one is that the authors seem to equate interactions in pcHi-C with regulation. This is problematic for two reasons. First whether interaction equates regulation is still debated and whether this can be detected with a low-resolution C-method (i.e. using HindIII) is a further point of contention.

      We thank the reviewer for pointing this out. We agree and apologise for not being clear in our manuscript. We have made the necessary amendments to indicate that pcHi-C by itself only assess proximity in the nucleus, not function.

      We acknowledge the limitations of the pcHi-C method, including that resolution is limited by the use of a restriction enzyme. However, we (see e..g. Suppl. Fig. 1) and others (see e.g. Freire-Pritchett et al (2017) and Mifsud et al (2015)) have used this approach successfully to identify functional enhancer elements.

      The second issue has to do with the way the pcHi-C data is interpreted. What is detected as a significant interaction by Chicago are regions that have a contact frequence above background. This means that local regions with a (much) higher contact frequency may not be called as significant. When we follow the logic that contact frequency is related to gene activation (which may not necessarily be true) whether a fragment is more frequently contacted than the background should not matter (relative contact frequency), rather it should be interpreted based on the absolute contact frequency.

      The reviewer is right that local regions will have a higher contact frequency and that local contacts aren’t always captured by the CHiCAGO model. However, the purpose of this study was to prioritise the identification of distal elements that are not captured by existing methods including nearest gene annotation.

      There are a number of reasons why absolute contact frequency might not be an appropriate measure to infer gene regulation: 1) Many factors can affect the absolute contact frequency including the proportion of cells that are exhibiting active transcription at that time across a population, especially if expression is limited to a small number of this population at that time. 2) Absolute contact frequency assumes that more contact results in more regulation which is not necessarily true and would depend on the combination of factors that are associated with that regulatory element. Figure 1 from https://www.nature.com/articles/s41596-023-00817-8 - Figure 1 – Micro capture C show that regions with low absolute contact frequency compared to adjacent regions have potential to regulate gene expression, as have other studies that have used CHiCAGO to identify regulatory elements. 3) The sequence of some fragments makes them more likely to captured or enriched in the HiC protocol, which the relative contact frequency above background controls for.

      This becomes relevant because the authors claim that 80% of enhancers are wrongly annotated based on their metrics. The only way to correctly annotate an enhancer is to knock it out and checking the effect on genes in the vicinity. Therefore, to claim that their method can correctly annotate enhancer is grossly overstated, particularly when considering the issues with contact frequency stated above. Therefore, claims like 80% of enhancers are wrongly annotated should be removed from the paper. The authors should discuss how to annotate enhancers, in the Discussion and what the proper method is for annotations.

      We have amended the text to indicate that we do not suggest that VISTA enhancers are wrongly annotated but incompletely assigned. We apologise for making this suggestion in the first draft. There is however complementary evidence from Cheng et al (2024), now referenced in the revised manuscript, that also find 60% of the VISTA enhancers skip their adjacent gene. It is also well established in the literature that nearest genes are not always regulated.

      Other points:

      - The authors claims that PIFs have 2.14 and 2.69 fold enrichment of H3K4me1 and H3K27ac sites. Did the authors use the whole genome as background. If so, they should take into account that promoter are more likely in regions of high gene density, which are more dense in active marks. It would be better to perform local, circular permuation of the the PIFs around the promoter.

      The reviewer is correct that a whole genome background is not an appropriate background for testing enrichment of active marks within PIFs. Fortunately, this is taken into account in the CHiCAGO enrichment test which selects the background from fragments that are matched to the same distance of the PIFs to account for the observation that promoters are more likely in regions of high gene density and are therefore more enriched for active chromatin modifications.

      - The authors talk about "lead PIF", which is the fragment with the "most significant CHICAGO score". What does this mean? Something is significant or not, despite common misuse of the term there is no gradient of significance.

      The reviewer makes a good point here and we apologise for the oversight in wording and have corrected the text to be more specific that the lead PIF is the one with the highest ChiCAGO score.

      - In the GO analysis the categories with the lowest p-value are presented, but this biases for large categories. It would be more relevant to also select for and show the enrichment scores.

      We agree with the reviewer that a drawback of GO analysis is that it biases for large categories and that if by ‘enrichment score’ the reviewer means the –log10(p-value) we have included that in the supplementary tables which also includes the size of the category and number of genes detected in it.

      Reviewer #3 (Significance):

      The study provides a dataset that may be interesting for people studying cerebellar development. In that sense the data is mostly interesting from a fundamental viewpoint. The data seem of good quality.

      The authors claim that they a very sizeable fraction of enhancers are misannotated, but I do not believe that this is correct.

      We thank the reviewer for pointing this out. We apologise for creating the impression that VISTA enhancers are incorrectly annotated. We have amended the text to reflect that these are incompletely annotated.

      My expertise is 3D genome, bioinformatics.

    1. eLife Assessment

      This important study concerns the propagation of waves in bacterial biofilms, bridging active matter physics and bacterial biophysics. The experimental observations are solid, and the theoretical interpretation and model validation have been refined with revisions. This work will be of interest to microbiologists, biophysicists, and researchers studying collective behavior in biological systems.

    2. Reviewer #1 (Public review):

      Summary:

      Overall, this is an interesting paper. The authors identify several experimental knobs that can perturb mechanical wave behavior driven by pili feedback. They frame these effects in terms of nonreciprocal interactions. While nonreciprocity could indeed play a role, it raises the question of whether mechanical feedback might also contribute. Phenomenological models can be useful, but the model currently lack direct mechanistic insight. It would be more compelling to formulate the model around potential mechanochemical feedback, which could help clarify the underlying microscopic mechanisms.

      Strengths:

      Report of mechanical waves in bacterial collectives, mechanism has potential application in multicellular context such as morphogenesis.

      Weaknesses:

      A minor concern about the language of 'left-right asymmetry.' I believe the correct term is simply 'radial asymmetry' which is a distinct concept. Left-right is not well defined in the current context.

    3. Reviewer #3 (Public review):

      Summary:

      The revised manuscript presents a compelling study of radially propagating metachronal waves on the surface of Pseudomonas nitroreducens biofilms, combining experiments with two theoretical descriptions (a local phase-oscillator model and an active solid/active gel model). The central experimental findings-spiral/target/planar wave patterns, their controllability via water/PEG/temperature perturbations, and the correlation between frequency gradients and propagation direction-remain highly interesting and relevant to both bacterial biophysics and active-matter physics. The revised manuscript also adds substantial new material, including additional analyses of defect dynamics and clearer discussion of the relationship between the two models. The study continues to have a strong interdisciplinary appeal and the potential to stimulate further work on collective oscillations in biological active media.

      Strengths:

      The authors have substantially addressed the major conceptual issue raised in the previous round by clearly distinguishing between nonreciprocity and frequency gradients / global asymmetry. This clarification significantly improves the theoretical interpretation and resolves an important source of confusion in the original version.

      The revised manuscript also improves the connection between the phase-oscillator and active-solid descriptions. In particular, the authors now explain more explicitly how the phase variable is defined in the reduced oscillatory dynamics of confined biofilm motion, and they state that they added a schematic illustration and simulation details (including parameter values and the elastic-force definition) to improve reproducibility. This directly addresses one of my previous major concerns.

      A notable improvement is the newly added defect-based analysis of waveform transitions (spiral -> target -> planar). The revised text argues that defect motility is a key control parameter, linked experimentally to moisture-dependent elasticity and theoretically to nonreciprocity / defect-pair stability. This provides a more concrete mechanistic bridge between experimental perturbations and the modeling framework than in the previous version.

      The manuscript now gives a clearer experimental-theoretical narrative for how environmental manipulations (drying, water addition, PEG, heating) affect wave patterns through changes in effective elasticity and activity, including a useful distinction between short-timescale and long-timescale temperature effects. This added discussion strengthens the biological interpretation and makes the modeling assumptions easier to follow.

      Weaknesses:

      The main remaining limitation is the level of quantitative correspondence between theory and experiment. The revised manuscript now provides a stronger qualitative/mechanistic link, but the mapping between model parameters (e.g., effective coupling terms / elasto-active parameters) and directly measurable biofilm properties is still limited. The authors acknowledge this point, and I agree that it is technically challenging in the present system. However, this means the theoretical framework is currently most convincing as an effective mechanistic model rather than a quantitatively predictive one.

      Relatedly, some conclusions about parameter-level control (especially in connecting moisture/temperature manipulations to specific model parameters) remain qualitative. I do not view this as fatal, but I recommend that the manuscript clearly state this scope and avoid overstating the quantitative predictive power of the theory.

      Although the terminology has improved compared with the original version, the revised manuscript still uses "left-right asymmetry" in places where the underlying geometry and symmetry are more general (e.g., radial inward propagation in circular colonies). Since this wording was one of the original points of confusion, I suggest one final pass to ensure the symmetry language is consistently precise throughout the main text and figure captions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study concerns the propagation of waves in bacterial biofilms, bridging active matter physics and bacterial biophysics. While the experimental observations are solid, the theoretical interpretation and model validation are currently incomplete and require further refinement. This work will be of interest to microbiologists, biophysicists, and researchers studying collective behavior in biological systems.

      In the revised manuscript, we have added new experimental results that strengthen the connection between our observations and the modeling framework used to interpret the collective oscillations. We have not introduced a new theoretical model; rather, we employed established active matter models and sought to link the observed phenomena to these frameworks. In particular, our new data demonstrate that the transition between the motile and biofilm-forming states specifically modulates the elasticity and elasto active coupling of the bacterial structure. This behavior is in excellent agreement with the predictions of the active solid model. All the experimental details are given below. We believe that the revised version of the manuscript now establishes this connection more clearly and convincingly.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Overall, this is an interesting paper. The authors have found multiple experimental knobs to perturb a mechanical wave behavior driven by pilli feedback. The authors framed this as nonreciprocal interactions - while I can see how nonreciprocity could play a role - what about mechanical feedback? Phenomenological models are fine, but a lack of mechanistic understanding is a weakness. I think it will be more interesting to frame the model based on potential mechanochemical feedback to understand microscopic mechanisms. Regardless, more can be done to better constrain the model through finding knobs to explain experimental observations (in Figures 3, 4, 5, and 7).

      We thank the reviewer for the positive assessment and for highlighting this important point. The reviewer is correct that the phenomenological Kuramoto-based model does not explicitly show the detailed cell–cell interactions. However, the active solid model is formulated on detailed elastic couplings and active forces, which inherently represent mechanical feedback within the biofilm structure. In this framework, nonreciprocity emerges naturally from the tensorial nature of active forces between bacteria—a concept already well established in the active matter literature. Importantly, this mechanism is purely mechanical and closely parallels nonreciprocal hydrodynamic interactions among active particles, which also arise from tensorial couplings.

      In our system, elastic interactions within the biofilm matrix, combined with pilus-generated active forces, provide a natural origin for nonreciprocal interactions. To further validate this, we improved our imaging to record single-cell dynamics both at the colony edge and on the biofilm surface. (new supplementary Video). These experiments show that motile bacteria at the leading edge of the biofilm structure do not generate waves, whereas stationary bacteria within the biofilm display local oscillations within the elastic network. This observation supports the view that collective oscillations are a property of the elastic biofilm state rather than of freely motile cells.

      Moreover, the main control parameter for these oscillations is the ratio between elastic strength and the active force generated by pili. In the active solid model, this ratio is captured by the parameter π and alpha terms. Experimentally, we can tune this ratio simply by adding or removing water from the biofilm, thereby modulating its elasto active coupling. We further motivated the controllability of this feature experimentally. We let the plate dry nonuniformly and observed that the transition between spiral target and plane waves could emerge spontaneously across the plate (see Figure 3a). This observation also states the importance of moisture in the biofilm. Starting from this point we established the connection between experimental observation and modelling. In our new simulations we also noticed that the transition from spiral to target wave is particularly driven by merging processes of different topological charges +/- 1 spiral pairs. This critical point was also confirmed by modelling which links the process to elasto active coupling. Further we supported our claim by imagining the edge and the biofilm structure. These new results clarify that elastic structure of the biofilm is critically important (Supplementary Figure 3). We have clarified this mechanistic link in the revised manuscript and rewritten the relevant sections to make this connection explicit.

      Modification in the manuscript:

      “To gain deeper insight into the mechanisms underlying wave formation, we imaged the dynamics of individual bacteria from the fingering regions toward the center of the biofilm. This distinction is critical because, unlike the biofilm center, the edges do not generate waves. We observed that bacteria near the fingering regions remain motile and exhibit collective flow. In contrast, bacteria at the biofilm center are surface-attached and undergo periodic lifting motions. This behavior strongly resembles Mexican-wave dynamics.”

      “We further found that the central region of the biofilm is mechanically more elastic, whereas the edge regions—where wave formation is absent—are motile. These observations suggest that gradual biofilm maturation is a key factor that transforms motile bacteria into a periodically moving but spatially constrained state. Consistent with this picture, the PAO1 strain, which has a strong biofilm-forming capability, completely suppresses surface oscillations. In contrast, the PA14 strain exhibits intermediate behavior, sustaining a partial transition between motile and locally constrained dynamics. Remarkably, signatures of this transition and wave generation are already detectable at the earliest stages of finger formation.”

      Strengths:

      The report of mechanical waves in bacterial collectives. The mechanism has potential application in a multicellular context, such as morphogenesis.

      We thank the reviewer for the positive assessment and for highlighting this potential broad impact of our findings.

      Weaknesses:

      My most serious concern is about left-right symmetry breaking. I fail to see how the data in Figure 6 shows LR symmetry breaking. All they show is in-out directionality, which is a boundary condition. LR SM means breaking of mirror symmetry - the pattern cannot be superimposed on its mirror image using only rigid body transformations (translation and rotation) - as far as I am aware, this condition is not satisfied in this pattern-forming system.

      We thank the reviewer for pointing out this critical issue. We acknowledge that we overlooked the distinction between biological and physical definitions of left–right symmetry in our initial submission, and we agree that our terminology was confusing.

      In developmental biology, the term “left–right symmetry breaking” is often used to describe asymmetric flows generated by nodal cilia, which subsequently establish developmental asymmetry. This usage differs fundamentally from the physical definition of mirror symmetry breaking, which refers to chirality switching upon mirror reflection. As the reviewer correctly noted, our system does not exhibit mirror symmetry breaking in this strict physical sense.

      To avoid confusion, we have revised the manuscript and replaced the term left–right symmetry breaking with left–right asymmetry between the edge and the center of the biofilm. This asymmetry arises from frequency gradients across the biofilm and is not a trivial boundary effect. For circular colonies, this phenomenon is more accurately described as radial asymmetry. We have rewritten the relevant sections of the manuscript to clarify this distinction and prevent misinterpretation.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Altin et al. examines the dynamics of bacterial assemblies, building on previously published work documenting mechanical spiral waves. The authors show that the emergent dynamics can be influenced by various factors, including the strain of bacteria and water content in the sample. While the topic of this paper would be of broad interest, and the preliminary results are certainly interesting, various aspects of this paper are underdeveloped and require further exploration.

      Strengths:

      One of the nice features of this system is the ability to transition between the different states based on the addition or withdrawal of water. The authors use a similar experimental model system and mathematical model to previously published work (Reference 49), but extend by showing that the behaviour can be modified through simple interventions. Specifically, the authors show that adding water droplets or drying the sample through heating can result in changes in the observed wave structure. This represents a possible way of controlling active matter.

      The mathematical model proposed in this paper involves a phase-oscillator model of Kuramotostyle coupling (similar to previously reported models). A non-reciprocal phase lag is introduced in order to facilitate the patterns seen in experiments. The qualitative agreement in the behaviour is quite striking, showing both spiral waves and travelling waves.

      We thank the reviewer for the positive assessment and for pointing out areas that required further development. The reviewer is correct that our work builds on previously reported bacterial spiral wave systems; however, there are several significant differences that we now emphasize more clearly in the revised manuscript.

      First, our study involves a different bacterial species and reveals a distinct dynamical process: the waves we report are strictly localized on the surface of the biofilm, in contrast to the bulk oscillations detected through density fluctuations in the earlier work (Ref. 49). The surface waves in our system resemble “Mexican wave”-like motions, in which surface bacteria periodically lift upward. To highlight this key distinction, we performed new imaging experiments that directly visualize this process. (New Video 5 and 6, Author response image 1).

      Second, we systematically compared different bacterial strains, including pathogenic species such as P. aeruginosa PA14 and PAO1, alongside our BSL-1 strain. This comparative approach demonstrates that the observed phenomenon spans strains with different pathogenicity levels, and genetic variations while also showing that our strain provides a safer and more broadly usable model system for laboratory investigations.

      Third, the modeling frameworks differ. Whereas the referred study relied primarily on phase models similar to those used in cilia systems, we combine a delayed Kuramoto-style oscillator model with an active solid model. This combination provides both a phenomenological description and a physical interpretation of the collective dynamics. We acknowledge that, in the original submission, the physical interpretation of the model in relation to our experimental system was underdeveloped. In the revision, we have now established this link explicitly through the elasticity and elasto active coupling of the biofilm. Specifically, we show that the transition from motile to biofilm states is accompanied by changes in elasticity, which directly influence the observed transitions between different types of wave defects. This connection is consistent with prior theoretical works and has even been only studied in robotic active matter systems.

      Together, these clarifications and new results reinforce the novelty of our findings and establish a stronger connection between the experiments and the modeling framework.

      Author response image 1.

      Comparison between the elastic biofilm core and the motile colony edge. Highresolution video recordings revealing individual bacterial motion highlight the key physical differences driving wave-generating. Time-lapse snapshots show that bacteria at the colony edge move freely and form fingering structures, whereas bacteria in the elastic central biofilm periodically lift vertically, producing a Mexican-wave–like collective motion across the surface. See new Video

      Weaknesses:

      The principal observation of the paper - that spiral waves emerge in these systems and can be controlled in various ways - is not linked to microscale dynamics at the cell level. It is recognised that hydrodynamics can introduce non-reciprocity, an essential ingredient of this model. However, in this work the authors have not identified a physical mechanism for the lag, e.g., either through steric interactions or hydrodynamic disturbances. This is also relevant in the phase oscillator modelling section. In low Reynolds number flows, dynamics are instantaneously determined. In this light, what does the phase lag term represent?

      The reviewer is correct that, at low Reynolds numbers, fluid dynamics are instantaneous and do not generate real temporal delays. However, nonreciprocity in hydrodynamic interactions can still emerge from the tensorial structure of the Blake–Oseen Green’s function. In this formalism, the effective asymmetry can be represented mathematically as a phase-lag–like term. This has been theoretically demonstrated in Ref.40. While this is not a literal time delay, it functions analogously by breaking odd symmetry in the coupling.

      In our system, strong long-range hydrodynamic interactions are absent, as the bacteria are embedded in an elastic biofilm matrix. Instead, the dominant interactions are active elastic couplings mediated by pili and biofilm structure. The elastic solid model behaves in a way that is conceptually similar to the hydrodynamic case: pili-induced deformations of the elastic medium produce anisotropic stresses that play a role analogous to the tensorial hydrodynamic Green’s function. Thus, the phase-lag term in our Kuramoto-based model can be interpreted as an effective representation of these nonreciprocal elastic interactions.

      We have clarified this point in the revised manuscript by explicitly connecting the phenomenological phase-lag term to the underlying elastic coupling in biofilms.

      What is the origin of the coupling term, b? Can this be varied systematically or derived from experimental measurements or parameters?

      The term b represents the enhanced elasto-active coupling of the pili process. The length of the Pili varies, and the elongated Pili has more potential to modulate the coupling between bacteria which is known to depend on a critical threshold. This process resembles the pinning dynamics and is driven by the activity of molecular motors within the pili machinery. However, the detailed mechanisms that set the effective coupling strength remain highly complex and are not yet fully understood.

      At present, we do not have a direct way to systematically manipulate b in experiments. A major technical limitation is the nanoscale nature of type IV pili: these protein assemblies are extremely small and difficult to monitor or manipulate directly. Even basic tools such as GFP-based labeling have proven challenging to implement, which restricts our ability to track the detailed dynamics of these structures in live biofilms.

      While we cannot currently derive b directly from experimental parameters, we emphasize in the revised manuscript that b should be understood as an effective parameter capturing the excitability of pili retractions. We also highlight this limitation and note that future advances in molecular imaging and manipulation of pili will be essential for quantitatively linking b to microscopic processes.

      Classification of wave properties is an important aspect of this paper, but is not accomplished in a quantitative sense. What is the method for distinguishing between travelling and spiral waves? There is a range of quantitative tools that could be used to investigate these dynamics (and also compare quantitatively with the models). For example, examining the correlation functions and order parameters could assist with the extraction of wave features (see extensive literature on oscillator models).

      We thank the reviewer for emphasizing this important point. In the revised manuscript, we have incorporated the classic Kuramoto order parameter (S) to characterize the dynamics in our model simulations. However, this metric is not directly applicable to our experimental system, because we cannot resolve the phase of individual bacteria at large scales.

      Instead, we have focused on a flux-based parameter, as previously used in Ref. 40, which can be measured experimentally from collective surface dynamics. Interestingly, we find that the directional flux extracted from our experimental movies closely matches the trends predicted by the model order parameter. We suspect that this similarity arises from the combination of our optical illumination method and the characteristic surface modulations of the biofilm. While we currently lack a rigorous theoretical justification for this correspondence, so we want to keep this discussion in the review document.

      In summary, we now use the classic Kuramoto order parameter in simulations and rely on the experimentally accessible flux measure for our experimental data. This dual approach allows us to compare model and experiment in a consistent manner.

      Author response image 2.

      Critical order parameters of the coupled biofilm system. (a) The Kuramoto global order parameter increases continuously as the system becomes globally synchronized. In contrast, in the nonreciprocally coupled system the order parameter saturates at a critical level. (b) In the experimentally observed biofilm, however the flux generated by the coupled oscillations provides a more appropriate measure of synchronization. Blue curves indicate directionally propagating planar waves, red curves correspond to spiral wave formation, and green curves represent the globally synchronized reciprocal system.

      Author response image 3.

      Comparison of flux profiles of the simulations with experimental measurements. Directional optical illumination enhances the flux term on the surface of the biofilm.

      The methodology of changing the dynamics through moisture content appears to be slightly underdeveloped, e.g., adding water involves a droplet, and removing water is accomplished by heating (which presumably could cause other effects). Could the dynamics not be controlled more directly by varying the humidity?

      We thank the reviewer for this valuable suggestion. Our results indicate that water content in the biofilm plays a key role in driving the transition to the biofilm state by modulating its elasticity. During the initial submission, we did not know how to systematically vary humidity without simultaneously altering temperature. Standard approaches typically involve water evaporation in controlled chambers, which inherently changes both parameters.

      Following the reviewer’s recommendation, we first measured the ambient moisture levels inside closed culture plates. To our surprise, the relative humidity was already ~98%, leaving virtually no room to increase it further. We then attempted to decrease humidity by flowing dry synthetic air, but even under these conditions we could not reduce it below ~85%, and achieving this required unrealistically high flow rates. Moreover, we noticed that in closed-lid NGM plates, evaporation is already substantial, and when the lid is left open the evaporation rate reaches ~1 µm/s. This rapid surface thinning severely limits the quality of long-term time-lapse imaging.

      Taken together, these technical constraints explain why we have to reliy on localized perturbations such as water droplets and heating rather than global humidity control. We have clarified this point in the revised manuscript and now explicitly discuss both the challenges and limitations of humidity-based approaches.

      At the same time, the authors also mention that temperature itself plays a role in shaping the behaviour. What is the mechanism for this? Is it just through evaporation? Since the frequency increases with temperature, could it just be that activity increases with temperature?

      We thank the reviewer for raising this critical point. We believe that temperature has two distinct impacts operating on different timescales.

      Short timescale (~minutes): We observed that biofilm oscillations respond to temperature changes very rapidly and in a reversible manner. This timescale is too short to be explained by modulation of water content or bulk elasticity of the biofilm. Instead, we attribute the immediate frequency increase to enhanced biological activity of the bacteria at elevated temperatures.

      Long timescale (~tens of minutes to hours): During processes such as the transition from planar to spiral waves, prolonged heating can significantly alter the biofilm structure. These changes are not reversible and likely involve modifications of elasticity and other structural properties.

      In the modeling framework, the short-timescale effect is represented as an increase in the active force term, while the long-timescale effect is captured by concurrent changes in both the active force and the elastic properties of the biofilm. We have clarified this mechanism and its representation in the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      This manuscript presents a novel investigation into unidirectionally propagating waves observed on the surface of Pseudomonas nitroreducens bacterial biofilms. The authors explore how these waves, initially spiral in form, transition into combinations of spiral, target, and planar patterns. The study identifies the periodic extension-retraction cycles of type IV pili as the driving mechanism for wave propagation, which preferentially moves from the colony's edge to its center. Furthermore, the manuscript proposes two theoretical models-a phase-oscillator model and a continuum active solid model-to reproduce these phenomena, and demonstrates how external manipulations (e.g., water droplets, temperature, PEG) can control wave patterns and direction, often correlating with oscillation frequency gradients. The work aims to bridge the fields of activematter physics and bacterial biophysics by providing both experimental observations and theoretical frameworks for understanding these complex biological wave phenomena.

      We thank the reviewer for the positive assessment of our work and for highlighting both the novelty and the key contributions of our study.

      Strengths:

      The experimental discovery of unidirectionally propagating waves on bacterial biofilms is highly intriguing and represents a significant contribution to both microbiology and active-matter physics.

      The detailed observations of wave pattern transitions (spiral to target to planar) and their response to various environmental perturbations (water, temperature, PEG) provide valuable empirical data. The identification of type IV pili as the driving force offers a concrete biological mechanism. The observed correlation between frequency gradients and wave direction is a compelling finding with potential for broader implications in understanding biological pattern formation. This work has the potential to stimulate further research in the collective behavior of living systems and the physical principles underlying biological organization.

      We thank the reviewer once again for emphasizing the importance of wave directionality. We also believe that this phenomenon may provide insight into early symmetry-breaking processes observed in developmental biology, where oxygen or nutrient gradients in dense environments could play a similar role.

      Weaknesses:

      The manuscript attempts to link unidirectional wave propagation to non-reciprocal couplings but ultimately shows that the wave direction is determined by the gradient of the oscillation frequency. The couplings in the two theoretical models are both isotropic and thus cannot dictate the wave direction. A clear distinction should be made between non-reciprocity as a source of wave generation and non-uniformity as a controlling factor of wave direction.

      We greatly appreciate the reviewer’s careful evaluation, particularly for highlighting this important and often confusing distinction. The relationship between nonreciprocity, spontaneous symmetry breaking, and frequency gradients has also been a challenging concept for us and required significant effort to clarify.

      Recent theoretical studies have established that traveling wave formation requires nonreciprocity, which provides a framework for understanding phenomena ranging from spiral to target and planar waves. In our system, nonreciprocity arises between the displacement field (U) and the pili force vector (P): as a result in broken phase U effectively “chases” P, breaking PT symmetry locally and thereby enabling the generation of local directional flux and traveling waves. In this sense, nonreciprocity is essential for travelling wave generation and spontaneous symmetry breaking in either direction.

      However, we now agree that global directionality (always from right to left, or edge to center) is set by an independent factor—namely, the oscillation frequency gradient across the biofilm. Thus, while nonreciprocity determines whether waves can travel, frequency gradients determine the large-scale direction in which they propagate. Put differently, PT symmetry is already broken spiral waves due to nonreciprocity, but global asymmetry (frequency gradients) is required to align the overall propagation in one direction.

      We have clarified this distinction in the revised manuscript, emphasizing that nonreciprocity is a necessary ingredient for travelling wave generation, whereas global asymmetry controls global wave direction.

      Modification in the manuscript:

      “We should note that traveling waves indicate broken PT symmetry between these fields triggered by nonreciprocity, with spiral waves serving as a classic signature of this phenomenon. A further transition from spiral to planar waves reflects an overall asymmetry in the frequency profile, which is not directly related to PT-symmetry breaking.”

      The relationship between the phase oscillator model and the active solid model is unclear. Given that U and P are both dynamical variables evolving in three-dimensional space, defining the phase Φ precisely in the phase space spanned by U and P could be challenging. A graphical illustration of the definition of Φ would be beneficial. To ensure reproducibility of the numerical results, the parameter values used in the numerical simulations and an explicit definition of the elastic force in the active solid model should be provided.

      We agree with the reviewer that the relationship between the phase oscillator model and the active solid model can be confusing, but establishing this link is essential to connect different modeling approaches in the literature. As the reviewer notes, in a fully three-dimensional setting with freely moving bacteria, defining the oscillation phase (Φ) in the phase space spanned by U and P is indeed complicated.

      However, our recent imaging results show that bacteria within the biofilm do not undergo large translational motions but instead exhibit periodic “Mexican wave”-like oscillations. These oscillations are confined to a restricted phase space, which allows us to define Φ in a straightforward way. In this context, the phase oscillator model becomes a natural reduction of the dynamics.

      Similarly, in the active solid (or active gel) model, we can plot not only the displacement and force vectors but also the local phase, which shows strong agreement with the phenomenological Kuramoto-style model. To make this connection clearer, we have now included a schematic illustration in the revised manuscript that explicitly shows how Φ is defined in the reduced phase space, and we provide the parameter values used in the simulations as well as the explicit definition of the elastic force in the active solid model to ensure reproducibility.

      The link between the theoretical models and experimental results is weak. For example, the propagation of the kink from the lower to the higher part of the surface (Figure 1e) could be addressed within the framework of the active solid model. The mechanism of transition from spiral to target waves (Figure 3a), b)) requires clarification, identifying which model parameter is crucial for inducing this transition. The wave propagation toward the lower frequency side is numerically demonstrated using the phase oscillator model, but a physical or intuitive explanation for this phenomenon is missing. Also, the wave transitions induced by the addition of water droplets and temperature rise are not linked to specific parameters in the theoretical models.

      We thank the reviewer for highlighting this important weakness, which was also consistently noted by the other reviewers. We fully agree that the link between our theoretical models and experimental results required significant strengthening.

      With improved imaging in the revised study, we were able to uncover additional connections that help establish this link more clearly. We acknowledge that our ability to measure detailed biofilm parameters is limited, which restricts us from providing fully quantitative mappings. Nonetheless, based on the reviewers’ suggestions, we carried out additional imaging and simulations to compare bacterial dynamics at the colony edge and within the biofilm surface. These data confirm that cells within the biofilm undergo restricted, “Mexican wave”-like oscillations, emphasizing the critical role of elasticity in governing the collective dynamics.

      Experimentally, we found that adding water or PEG, or alternatively inducing drying, strongly modulates the effective elasticity of the biofilm. Within the active solid framework, elasticity and the elasto-active coupling are the key parameters controlling the system. By tuning these parameters in simulations, we could reproduce the qualitative transitions observed experimentally. Specifically, we observed that:

      At low elasticity, topological defects are mobile and can move, merge, or annihilate, leading to the emergence of planar waves.

      At high elasticity, defects remain pinned, across the biofilm surface, dominating the dynamics.

      These observations suggest that the motility of defects is the crucial parameter governing the transition between spiral, target, and planar waves. Although we cannot independently manipulate each parameter in experiments, varying the moisture content provides an effective and experimentally accessible control.

      Finally, our simulations and new analyses reveal that spiral defect cores can move and merge to form target waves or annihilate entirely—processes that we also observe experimentally. This rich dynamical behavior underscores the importance of elasticity in shaping pattern transitions, and we believe it warrants further theoretical exploration. We have clarified this connection and its implications in the revised manuscript.

      First, we compare defect dynamics in both Kuramoto-based simulations and the active solid model. Both systems exhibit similar defect-survival behavior. As shown in the review , pairs of unlike (+/−) defects can stably persist only at high nonreciprocity. We further quantify this behavior by plotting the separation distances between unlike defect pairs and find that short-range defect separations are possible exclusively in the high-nonreciprocity regime Supplementary Figure 11.

      This high-nonreciprocity regime corresponds to the dry biofilm state. Increasing moisture reduces elasticity, leading to the loss of stable defect dynamics and promoting the annihilation of unlike defect pairs, which in turn drives the system toward target-wave formation and ultimately planar waves. Conversely, heating the biofilm removes water, enhances elasticity, and increases the system’s ability to sustain closely separated defect pairs.

      Experimentally, we further observe that removing water by heating enhances surface nonuniformities, which readily trigger defect-pair formation. To investigate this mechanism, we performed additional simulations in which local nonuniformities were introduced Supplementary Figure 12. Consistent with experiments, defect-pair generation occurs only at high nonreciprocity, where pairs of unlike defects can be stably maintained. Experimental observation (Author response image 4) also show that surface nonuniformities on the biofilm surface similarly trigger the formation of closely separated defect pairs. We have updated the details of the defect dynamics in the revised manuscript to clarify the transition between these waves.

      Author response image 4.

      Experimental observation showing that small surface nonuniformities on the biofilm surface trigger the formation of closely separated defect pairs. Arrows indicate the position of the nonuniformities

      Modification in the manuscript:

      Defect dynamics controlling the transition between spiral to target waves

      “To better understand the dynamics of the transition between different form of the waves we focused on numerical simulations. We noticed that the motility of defects is the crucial parameter governing the transition between spiral, target, and planar waves varying the moisture content provides an effective and experimentally accessible control this motility. Our analyses revealed that spiral defect cores can move and merge to form target waves or annihilate entirely—processes that we also observe experimentally. This rich dynamical behavior underscores the importance of elasticity in shaping pattern transitions. First, we compare defect dynamics in both Kuramotobased simulations and the active solid model. Both systems exhibit similar defect-survival behavior. As shown in Supplementary Figure10, pairs of unlike (+/−) defects can stably persist only at high nonreciprocity. We further quantify this behavior by plotting the separation distances between unlike defect pairs and find that short-range defect separations are possible exclusively in the high-nonreciprocity regime (Supplementary Figure11). This high-nonreciprocity regime corresponds to the dry biofilm state. Increasing moisture reduces elasticity, leading to the loss of stable defect dynamics and promoting the annihilation of unlike defect pairs, which in turn drives the system toward target-wave formation and ultimately planar waves. Conversely, heating the biofilm removes water, enhances elasticity, and increases the system’s ability to sustain closely separated defect pairs. Experimentally, we further observe that removing water by heating enhances surface nonuniformities, which readily trigger defect-pair formation (Supplementary Video9). To investigate this mechanism, we performed additional simulations in which local nonuniformities were introduced (Supplementary Video12-13). Consistent with experiments, defect-pair generation occurs only at high nonreciprocity, where pairs of unlike defects can be stably maintained. Experimental observation (Supplementary Video9) also show that surface nonuniformities on the biofilm surface similarly trigger the formation of closely separated defect pairs.”

      All the recommended points have been addressed in the revised manuscript.

    1. eLife Assessment

      This important study combines a two-person joint hand-reaching paradigm with game-theoretical modeling to examine whether, and how, reflexive visuomotor responses are modulated by a partner's control policy and cost structure. The study provides a convincing set of behavioral findings suggesting that involuntary visuomotor feedback is indeed modulated in the context of interpersonal coordination. The work will be of interest to cognitive scientists studying the motor and social aspects of action control.

    2. Reviewer #1 (Public review):

      Summary:

      Sullivan and colleagues examined the modulation of reflexive visuomotor responses during collaboration between pairs of participants performing a joint reaching movement to a target. In their experiments, the players jointly controlled a cursor that they had to move towards narrow or wide targets. In each experimental block, each participant had a different type of target they had to move the joint cursor to. During the experiment, the authors used lateral perturbation of the cursor to test participants' fast feedback responses to the different target types. The authors suggest participants integrate the target type and related cost of their partner into their own movements, which suggests that visuomotor gains are affected by the partner's task.

      Strengths:

      The topic of the manuscript is very interesting, and the authors are using well-established methodology to test their hypothesis. They combine experimental studies with optimal control models to further support their work. Overall, the manuscript is very timely and shows important findings - that the feedback responses reflect both our and our partners tasks.

    3. Reviewer #2 (Public review):

      Summary:

      Sullivan and colleagues studied the fast, involuntary, sensorimotor feedback control in interpersonal coordination. Using a cleverly designed joint-reaching experiment that separately manipulated the accuracy demands for a pair of participants, they demonstrated that the rapid visuomotor feedback response of a human participant to a sudden visual perturbation is modulated by his/her partner's control policy and cost. The behavioral results are well matched with the predictions of the optimal feedback control framework implemented with the dynamic game theory model. Overall, the study provides an important and novel set of results on the fast, involuntary feedback response in human motor control in the context of interpersonal coordination.

      Review:

      Sullivan and colleagues investigated whether fast, involuntary sensorimotor feedback control is modulated by the partner's state (e.g., cost and control policy) during interpersonal coordination. They asked a pair of participants to make a reaching movement to control a cursor and hit a target, where the cursor's position was a combination of each participant's hand position. To examine fast visuomotor feedback response, the authors applied a sudden shift in either the cursor (experiment 1) or the target (experiment 2) position in the middle of movement. To test the involvement of partner's information in the feedback response, they independently manipulated the accuracy demand for each participant by varying the lateral length of the target (i.e., a wider/narrower target has a lower/higher demand for correction when movement is perturbed). Because participants could also see their partner's target, they could theoretically take this information (e.g., whether their partner would correct, whether their correction would help their partner, etc.) into account when responding to the sudden visual shift. Computationally, the task structure can be handled using dynamic game theory, and the partner's feedback control policy and cost function are integrated into the optimal feedback control framework. As predicted by the model, the authors demonstrated that the rapid visuomotor feedback response to a sudden visual perturbation is modulated by the partner's control policy and cost. When their partner's target was narrow, they made rapid feedback corrections even when their own target was wide (no need for correction), suggesting integration of their partner's cost function. Similarly, they made corrections to a lesser degree when both targets were narrower than when the partner's target was wider, suggesting that the feedback correction takes the partner's correction (i.e., feedback control policy) into account.

      The strength of the current paper lies in the combination of clever behavioral experiments that independently manipulate each participant's accuracy demand and a sophisticated computational approach that integrates optimal feedback control and dynamic game theory. Both the experimental design and data analysis sound good and the main claim is well supported by the results.

      A future direction would be to investigate how this mechanism is implemented in the CNS and to examine whether the same cooperative mechanism also applies to human-AI interactions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary

      Sullivan and colleagues examined the modulation of reflexive visuomotor responses during collaboration between pairs of participants performing a joint reaching movement to a target. In their experiments, the players jointly controlled a cursor that they had to move towards narrow or wide targets. In each experimental block, each participant had a different type of target they had to move the joint cursor to. During the experiment, the authors used lateral perturbation of the cursor to test participants’ fast feedback responses to the different target types. The authors suggest participants integrate the target type and related cost of their partner into their own movements, which suggests that visuomotor gains are affected by the partner’s task.

      Strengths

      The topic of the manuscript is very interesting, and the authors are using well established methodology to test their hypothesis. They combine experimental studies with optimal control models to further support their work. Overall, the manuscript is very timely and shows important findings - that the feedback responses reflect both our and our partner’s tasks.

      We thank the reviewer for the positive comments regarding our work.

      Weaknesses

      However, in the current version of the manuscript, I believe the results could also be interpreted differently, which suggest that the authors should provide further support for their hypothesis and conclusions.

      Major Comments

      (1) Results of the relevant conditions:

      In addition to the authors’ explanation regarding the results, it is also possible that the results represent a simple modulation of the reflexive response to a scaled version of cursor movement. That is, when the cursor is partially controlled by a partner, which also contributes to reducing movement error, it can also be interpreted by the sensorimotor system as a scaling of hand-to-cursor movement. In this case, the reflexes are modulated according to a scaling factor (how much do I need to move to bring the cursor to the target). I believe that a single-agent simulation of an OFC model with a scaling factor in the lateral direction can generate the same predictions as those presented by the authors in this study. In other words, maybe the controller has learned about the nature of the perturbation in each specific context, that in some conditions I need to control strongly, whereas in others I do not (without having any model of the partner). I suggest that the authors demonstrate how they can distinguish their interpretation of the results from other explanations.

      We thank the reviewer for the thoughtful comment. While it is possible that the change in the visuomotor feedback responses could be just from a scaling factor. This hypothesis could explain the difference between two conditions, but would fail to explain differences between two other conditions. Specifically, this hypothesis could explain a decrease in involuntary visuomotor feedback responses between partner-irrelevant/self-relevant and partner-relevant/self-relevant. Critically, this hypothesis could not explain the difference between partner-irrelevant/self-irrelevant and partner-relevant/self-irrelevant. That is, there is no reason to scale a response to correct for a partner’s relevant target when your own target is irrelevant. However, our finding that there is a greater involuntary visuomotor feedback response in partner-relevant/self-irrelevant compared to partner-irrelevant/self-irrelevant is predicted by the notion that humans form a representation of others and consider their movement costs.

      We have added a paragraph in the discussion to justify our hypothesis over the scaling factor hypothesis.

      “Our hypothesis that the sensorimotor system uses a representation of a partner and considers the partner’s costs to modify involuntary visuomotor feedback responses can parsimoniously explain all of our experimental findings. There are a few alternative hypotheses that could explain a subset of results. One alternative hypothesis is that participants simply learned the hand to center cursor mapping in each experimental condition. That is, instead of using a model of their partner, participants simply adapted to the dynamics of the center cursor. However, this hypothesis would not predict an increased involuntary visuomotor feedback response in the partner-relevant/self-irrelevant condition compared to the partner-irrelevant/self-irrelevant condition. If participants did not form a model of their partner nor consider their partner’s costs, then they would not display an increased feedback response when they had an irrelevant target and their partner’s target was relevant. An increased feedback response to help a partner achieve their goal is captured by our hypothesis that the sensorimotor system uses a representation of a partner and considers the partner’s costs to modify involuntary visuomotor feedback responses.”

      (2) The effect of the partner target:

      The authors presented both self and partner targets together. While the effect of each target type, presented separately, is known, it is unclear how presenting both simultaneously affects individual response. That is, does a small target with a background of the wide target affect the reflexive response in the case of a single participant moving? The results of Experiment 2, comparing the case of partner- and self-relevant targets versus partner-irrelevant and self-relevant targets, may suggest that the system acted based on the relevant target, regardless of the presence and instructions regarding the self-target.

      We thank the reviewer for bringing up another valid point, which we discussed at length as a group when designing the experiment. The reviewer is correct in pointing out the lack of difference in the involuntary epoch between the partner-relevant/self-relevant and partner-irrelevant/self-relevant could potentially suggest that the sensorimotor system acted based on only relevant targets, irrespective if it was a self or partner relevant target. While the effect of the simultaneous presentation of a narrow and wide target on an individual’s response by themselves is unknown, comparing the differences between our other experimental conditions control for this potential confound. Participants viewed a wide target and a narrow target on the screen, in both the partner-irrelevant/self-relevant condition and the partner-relevant/self-irrelevant condition. Crucially, we found that the visuomotor feedback responses were greater in the partner-irrelevant/self-relevant condition compared to the partner-relevant/self-irrelevant condition in both Experiment 1 and 2. That is, participants were able to distinguish between the self-target and partner target and appropriately modify their feedback responses in both Experiment 1 and 2, despite there being both a wide and narrow target on the screen in both conditions. Given that we found different visuomotor feedback responses between the two conditions that had both a narrow and wide target, this rules out the alternative hypothesis that the sensorimotor system acted based just on a relevant target being present. We have added to our discussion to clarify this point.

      “Another alternative hypothesis would be that the sensorimotor system was responding only to the relevant target displayed on the screen. Again, this hypothesis would only explain a subset of our results. In particular, this relevant target hypothesis cannot explain the observed feedback response differences between the partner-relevant/self-irrelevant and partner-irrelevant/self-relevant conditions in both Experiments 1 and 2.”

      (3) Experiment instructions:

      It is unclear what the general instructions were for the participants and whether the instructions provided set the proposed weighted cost, which could be altered with different instructions.

      Our instructions explicitly informed participants that their performance bonus was only based on them stabilizing within their own self-target within the time constraint. We have added the following in the methods to emphasize this instruction.

      “In other words, we ensured participants had a clear understanding that their performance in the task was only based on stabilizing the center cursor in their own self-target within the time constraint. Therefore, the instructions and timing constraints did not enforce participants to work together.”

      (4) Some work has shown that the gain of visuomotor feedback responses reflects the time to target and that this is updated online after a perturbation (Cesonis & Franklin, 2020, eNeuro; Cesonis and Franklin, 2021, NBDT; also related to Crevecoeur et al., 2013, J Neurophysiol). These models would predict different feedback gains depending on the distance remaining to the target for the participant and the time to correct for the jump, which is directly affected by the small or large targets. Could this time be used to target instead of explaining the results? I don’t believe that this is the case, but the authors should try to rule out other interpretations. This is maybe a minor point, but perhaps more important is the location (&time remaining) for each participant at the time of the jump. It appears from the figures that this might be affected by the condition (given the change in movement lengths - see Figure 3 B & C). If this is the case, then could some of the feedback gain be related to these parameters and not the model of the partner, as suggested? Some evidence to rule this out would be a good addition to the paper - perhaps the distance of each partner at the time of the perturbation, for example. In addition, please analyze the synchrony of the two partners’ movements.

      (1) Time to target and forward position

      The reviewer raises an interesting point. In our task, the cursor/target jump occurs once the center cursor crosses 6.25 cm from the start. We analyzed the time it took for the center cursor to intercept the targets from perturbation onset (Supplementary D). In Experiment 1, an ANOVA with center cursor time-to-target as the dependent variable showed no main effect of self-target (F[1,47] = 2.45, p = 0.124) or partner target (F[1,47] = 2.50, p=0.120), nor any interaction (F[1,47] = 1.97, p = 0.166). In Experiment 2, an ANOVA with center cursor time-to-target as the dependent variable showed a significant interaction (F[1,47] = 5.87, p = 0.019). Post-hoc mean comparisons showed that only the difference between the partner-irrelevant/self-irrelevant and partner-relevant/self-irrelevant condition was significant (p = 0.006). Given that only one comparison in Experiment 2 showed a difference in time-to-target, we do not believe that time-to-target was a significant driver of the change in involuntary visuomotor feedback responses observed between conditions. While time-to-target is likely a metric the nervous system modifies feedback gains around, our results suggest that the nervous system can also use a partner model to modify feedback gains. We have added a supplemental analysis on time to target

      “Previous work by Česonis and Franklin (2020) showed that time to-target is a key variable the sensorimotor system uses to modify feedback responses. In their experiment, they manipulated the time-to-target of the participant’s cursor, while controlling for other movement parameters (e.g., distance from goal) [1]. When compared to classical optimal feedback control models, they showed that a model that modifies feedback responses based on time-to-target best predicted their results. In our task, it’s possible that the time-to-target could have influenced visuomotor feedback responses, since the distance to the center of the target is greater for a narrow target than a wide target on perturbation trials.”

      “We calculated the time from perturbation onset to the center cursor reaching the forward position of the targets (Supplementary Fig. S5). In Experiment1, an ANOVA with center cursor time-to-target as the dependent variable showed no main effect of self-target (F[1,47]=2.45,p=0.124) or partner target (F[1,47] = 2.50, p=0.120), nor any interaction (F[1,47] = 1.97, p = 0.166). In Experiment2, an ANOVA with center cursor time-to-target as the dependent variable showed a significant interaction (F [1,47] = 5.87, p = 0.019). Post-hoc mean comparisons showed that only the difference between the partner-irrelevant/self-irrelevant and partner-relevant/self-irrelevant condition was significant (p=0.006). Although time-to-target and hand position are important variables for the control ofmovement,[1,2,3] they are likely not driving factors of the different in voluntary visuomotor feedback responses between our experimental conditions.”

      However, it is possible that the participant forward position at perturbation onset could also influence the involuntary feedback response. We show the forward positions at perturbation onset in Supplementary D. Statistical analysis of the forward positions in Experiment 1 showed a main effect of self-target (F[1,47] = 12.72, p < 0.001), main effect of partner target (F[1,47] = 12.82, p < 0.001), and no interaction (F[1,47] = 0.00, P = 0.991). We see the same trend in experiment 2, showing a main effect of self-target (F[1,47] = 12.11, p < 0.001), main effect of partner target (F[1,47] = 12.04, p < 0.001), and no interaction (F[1,47] = 0.00, p = 0.986). The fact that there was no interaction implies that the results could not solely be due to forward position. Nevertheless, given there were main effects, we proceeded to run an ANCOVA on the involuntary visuomotor feedback responses with forward position as a covariate. For experiment 1, we still observed a significant interaction between self and partner target (F[1,47] = 43.14, p < 0.001). Further, we also observed no significant main effect of forward position on the involuntary visuomotor feedback responses. The ANCOVA for Experiment 2 also showed that there was still a significant interaction of self and partner target on the involuntary visuomotor feedback responses (F[1,47] = 9.80, p = 0.002). However, here we did find a significant main effect of the forward position (F[1,47] = 5.06, p = 0.026). Therefore, we ran follow-up mean comparisons with the covariate adjusted means. We found the same statistical trend as reported in the main results. We found significant differences between the partner-irrelevant/self-irrelevant and partner-relevant/self-irrelevant conditions (p = 0.003), partner-relevant/self-irrelevant and partner-irrelevant/self-relevant conditions (p < 0.001), partner-relevant/self-irrelevant and partner-relevant/self-relevant conditions (p < 0.001). We found no significant difference between the partner-irrelevant/self-relevant and partner-relevant/self-relevant conditions (p = 0.381). Given that there was no main effect of forward position in Experiment 1, and that our adjusted mean comparisons in Experiment 2 showed the same trends as the unadjusted mean comparisons in the main manuscript, our results show that the forward position of the participants is not a significant factor in explaining the differences in involuntary visuomotor feedback responses between conditions.

      “Supplementary Fig. 6 shows the participant hand forward position at perturbation onset time for Experiment 1 (A) and Experiment 2 (B). It is possible that the participant forward hand position at perturbation onset time could influence their visuomotor feedback responses. Therefore, we ran an ANCOVA with self-target and partner target as factors, and participant forward hand position at perturbation onset time as a covariate. In Experiment 1, we found no main affect of participant forward hand position on involuntary visuomotor feedback responses (F[1,47] = 1.466, p = 0.228). Further, when including the covariate, we still found a significant interaction between self-target and partner target on in voluntary visuomotor feedback responses (F[1,47]=43.2, p<0.001).”

      “In Experiment 2, we found a significant main effect of participant forward hand position on involuntary visuomotor feedback responses (F[1,47] = 6.73, p = 0.010). We still found a significant interaction between self-target and partner target (F[1,47] = 9.78, p = 0.002). Since we found a main effect of participant forward hand position, we calculated the adjusted means of the involuntary visuomotor feedback responses. We then performed follow-up mean comparisons on the adjusted means of the involuntary visuomotor feedback responses (using emmeans in R). We found the same significant trends as the unadjusted means in the main manuscript. Specifically we found involuntary visuomotor feedback responses to be: significantly greater in the partner-relevant/self-irrelevant condition compared to the partner-irrelevant/self-irrelevant condition (p =0.003),significantly greater in the partner-relevant/self-irrelevant condition compared to the partner-irrelevant/self-relevant condition (p<0.001), significantly greater in the partner-relevant/self-relevant condition compared to the partner-relevant/self-irrelevant condition (p<0.001),and not different between the partner-irrelevant/self-relevant and partner-relevant/self-relevant conditions (p = 0.824).”

      We have also included in the discussion how time-to-target and participant forward hand position are important control variables to consider, and their potential relationship to our findings.

      “Finally, we also considered whether time to target [1,2]. (Supplementary D), participant forward hand position (Supplementary E), or learning [4] (Supplementary G-H) influenced feedback responses, but found that none impacted the observed differences between experimental conditions nor changed our interpretation. Our hypothesis that the sensorimotor system uses a representation of a partner and considers the partner’s costs to modify involuntary visuomotor feedback responses parsimoniously accounts for the differences observed between all conditions.”

      (2) Synchrony

      In our task, participants movements were not self-initiated. We had them begin the movement as soon as they hear an audible tone so that they would begin their movements at as similar a time as possible. We have analyzed the movement onset synchrony between participants within a pair, shown in Supplementary F.

      Supplementary: “We calculated movement onset times at the time that the participants left the start target [8]. We then took the absolute value of the difference between the participants within a pair as a measure of movement onset synchrony. For Experiment 1, an ANOVA with movement onset synchrony as the dependent variable showed no main effect of self-target (F[1,47] = 1.38, p = 0.252), no main effect of partner target (F[1,47] = 0.057, p = 0.813), and no interaction (F[1,47] = 0.45, p = 0.508). For Experiment 2, an ANOVA with movement onset synchrony as the dependent variable showed no main effect of self-target (F[1,47] = 0.07, p = 0.788), no main effect of partner target (F[1,47] = 2.75, p = 0.111), and no interaction (F[1,47] = 2.31, p = 0.142).”

      Further, we have modified our methods to emphasize that participants within a pair generally began their movement at the same time.

      “Instead of self-initiating their movements, we specifically had participants move at the sound of a tone so that the movement onset between participants in a pair was as synchronous as possible (see Supplementary F for movement onset synchrony analysis).”

      Reviewer #1 (Recommendations for the authors):

      (1) Lines 291-292: One study extensively examined cursor and target jump visuomotor on set times and found no difference (Franklin et al., 2016; J Neuroscience), which strongly argues against this interpretation.

      We thank the reviewer for pointing out this work. We have modified the following lines:

      “However, other work by Franklin and colleagues (2016) found no difference in visuomotor feedback response latencies between cursor and target jumps [6].”

      (2) Line 411: What were the instructions regarding partner performance in terms of the reward? Did you explain that individual performance alone will determine the reward?

      As addressed above, we have made the following changes to emphasize the instructions given to participants.

      “In other words, we ensured participants had a clear understanding that their performance in the task was only based on stabilizing the center cursor in their own self-target within the time constraint. Therefore, the instructions and timing constraints did not enforce participants to work together.”

      (3) Line 506: Ten probe trials in each direction is very low. Can this still be in the transition state of the feedback response, rather than at steady state? There are many studies done looking at the learning of visuomotor responses in which changes are still occurring after several hundred trials (e.g., Franklin et al., 2017 J Neurophysiol; Franklin et al., 2008; J Neuroscience). In this experiment, each block only lasts 151 trials total if my calculations are correct. How certain are you that the results are at a steady state and not continuously changing? Perhaps with further experimental experience, the feedback responses would approach the predictions of a different model.

      The reviewer raises an important point. We had run these analyses prior to submitting the manuscript and did not see anything. However, we believe this information is important to include since both we and yourself asked the same question. Specifically, we have analyzed the visuomotor feedback responses over the trials (Supplementary G), which shows little to no learning over time. Additionally, we also found no difference in the visuomotor feedback response trends between the first and second half of trials in each condition (Supplementary H). Therefore, it appears that the sensorimotor system was at steady state behaviour very quickly and we do believe that the feedback responses would approach the predictions of a different model if participants performed more trials. We have added the following

      Supplementary: “Given there were 151 trials and 10 left/right probe trials for each experimental condition, it is possible that completing more trials may have lead to different involuntary visuomotor feedback responses. Therefore, we analysed the in voluntary visuomotor feedback responses over the course of each experimental condition. Visually, involuntary visuomotor feedback responses in neither Experiment 1 (Fig. S8) nor Experiment 2 (Fig. S9) show any consistent learning (see Fig. S10 for statistical analysis). Therefore, it appears participants rapidly formed a partner model based on knowledge of their movement goal to modify their involuntary visuomotor feedback responses.”

      Supplementary: “Supplementary Fig. S10 shows the involuntary visuomotor feedback responses in the first half (A,C) and second half (B,D) for each experimental condition. In Experiment 1, we observed the same statistical results in the first half and second half of trials as the analysis of all trials. That is, we observed a significant interaction between self-target and partner target in the first half (F[1,47] = 37.09, p < 0.001) and second half (F[1,47] = 48.68, p < 0.001) of trials. Follow-up mean comparisons showed the same significant trends as our analysis of all trials in the main manuscript (see Fig. S10A-B).”

      Supplementary: “In Experiment 2, we observed the same statistical results in the first half and second half of trials as the analysis of all trials. That is, we observed a significant interaction between self-target and partner target in the first half (F[1,47] = 9.42, p = 0.004) and second half (F[1,47] = 17.40, p < 0.001) of trials. Follow-up mean comparisons showed the same significant trends as our analysis of all trials in the main manuscript (Fig. S10C-D).”

      Supplementary: “Showing the same involuntary visuomotor feedback response trends across the experimental conditions for the first half, second half, and all trials suggests that the sensorimotor system quickly formed a model of a partner and considered their costs to modify rapid motor responses.”

      We have also added to the discussion:

      “Finally, we also considered whether time to target [1,2] (Supplementary D), participant forward hand position (Supplementary E), or learning [4] (Supplementary G) influenced feedback responses, but found that none impacted the observed differences between experimental conditions nor changed our interpretation.”

      (4) The authors should also discuss some of the prior work which is very relevant to the tasks studied: (Knill, Bondata & Chhabra, 2011, J Neuroscience). There may also be other papers that use this task for visuomotor feedback responses and therefore, should be included.

      We have included the Knill 2011 paper and also Cross 2019 in our discussion:

      “This modification of feedback responses based on a relevant/irrelevant task goal has also been shown in response to visual perturbations [7,8].”

      (5) Lines 301-303: The terms ’relevant’ and ’irrelevant’ here describe different concepts than the ones used in this study. I suggest making a distinction to avoid confusion for the reader.

      We thank the reviewer for pointing out that this is confusing. We’ve made the following changes to improve the clarity:

      “Further, Franklin and colleagues (2008) designed a visual perturbation to be relevant or irrelevant when reaching to the same target, showing greater involuntary visuomotor feedback responses to a relevant visual perturbation compared to an irrelevant visual perturbation [9].”

      (6) Line 459: The reaching movement was quite slow (25cm in about 1.2 seconds). Is this needed to ensure that both participants can complete the movements, given potentially very different start times? Please comment as this is different than many previous studies.

      Participants needed to stabilize the cursor for 500ms in their target within a time constraint of 1400 - 1600 ms. Therefore, they had to reach the target between 900 - 1100 ms (before stabilizing). Additionally, participants did not perform self-initiated movements, but were required to begin their movement as soon as they heard an audible tone. Given that reaction times are ~200ms, participants had ~700 - 900 ms to reach the target, which aligns with previous research (Franklin et al. (2008), Franklin et al. (2012), Nashed et al. (2012)). We have clarified the time constraints of the task in our Methods:

      “They therefore had 700 - 900 ms to first reach the target, since humans generally have response times ~200 ms, and they needed to stabilize within the target for 500 ms (i.e., 1400 - 200 - 500 = 700 ms and 1600 - 200 - 500 = 900 ms). Movement times of 700 - 900 ms are thus consistent with previous human reaching studies [4,9,10].”

      (7) Reference [25] is incomplete

      Thank you for catching this.

      And thank you for the thoughtful and clear review. We feel it has greatly improved the quality and clarity of our manuscript!

      Reviewer #2 (Public review):

      Summary

      Sullivan and colleagues studied the fast, involuntary, sensorimotor feedback control in interpersonal coordination. Using a cleverly designed joint-reaching experiment that separately manipulated the accuracy demands for a pair of participants, they demonstrated that the rapid visuomotor feedback response of a human participant to a sudden visual perturbation is modulated by his/her partner’s control policy and cost. The behavioral results are well-matched with the predictions of the optimal feedback control framework implemented with the dynamic game theory model. Overall, the study provides an important and novel set of results on the fast, involuntary feedback response in human motor control, in the context of interpersonal coordination.

      We thank the reviewer for the kind words!

      Review:

      Sullivan and colleagues investigated whether fast, involuntary sensorimotor feedback control is modulated by the partner’s state (e.g., cost and control policy) during interpersonal coordination. They asked a pair of participants to make a reaching movement to control a cursor and hit a target, where the cursor’s position was a combination of each participant’s hand position. To examine fast visuomotor feedback response, the authors applied a sudden shift in either the cursor (experiment 1) or the target (experiment 2) position in the middle of movement. To test the involvement of partner’s information in the feedback response, they independently manipulated the accuracy demand for each participant by varying the lateral length of the target (i.e., a wider/narrower target has a lower/higher demand for correction when movement is perturbed). Because participants could also see their partner’s target, they could theoretically take this information (e.g., whether their partner would correct, whether their correction would help their partner, etc.) into account when responding to the sudden visual shift. Computationally, the task structure can be handled using dynamic game theory, and the partner’s feedback control policy and cost function are integrated into the optimal feedback control framework. As predicted by the model, the authors demonstrated that the rapid visuomotor feedback response to a sudden visual perturbation is modulated by the partner’s control policy and cost. When their partner’s target was narrow, they made rapid feedback corrections even when their own target was wide (no need for correction), suggesting integration of their partner’s cost function. Similarly, they made corrections to a lesser degree when both targets were narrower than when the partner’s target was wider, suggesting that the feedback correction takes the partner’s correction (i.e., feedback control policy) into account.

      The strength of the current paper lies in the combination of clever behavioral experiments that independently manipulate each participant’s accuracy demand and a sophisticated computational approach that integrates optimal feedback control and dynamic game theory. Both the experimental design and data analysis sound good. While the main claim is well-supported by the results, the only current weakness is the lack of discussion of limitations and an alternative explanation. Adding these points will further strengthen the paper.

      Reviewer #2 (Recommendations for the authors):

      (1) While the current version is already well-written, it would be helpful for readers to further discuss the relationship between the current study and some potentially relevant studies, such as Braun et al. (2009), Ganesh et al. (2014), and Takagi et al. (2017) (2019).

      Thank you for pointing out these papers that we missed, which we now cite appropriately in light of our own work. In particular, we have added the following to our discussion, including Braun et al. (2009) and Takagi et al. (2017) (2019). However, Beckers et al. (2020) showed conflicting results from Ganesh et al. (2014), and since these works are about learning, we feel it is outside the scope of our work.

      “Further, others have shown that the sensorimotor system modifies movement selection according to game-theoretic predictions, [11] and that the sensorimotor system modifies movements using an estimate of the joint goal during human-human interactions [12,13].”

      (2) For an alternative interpretation of the results, one could consider, for instance, that the target’s visual appearance could have served as a contextual cue for learning different movement gains in the lateral direction (e.g., whether the partner corrects the shift might be approximated as a gain change). Although less likely, this alternative account could be tested by simulation and would strengthen the argument.

      This a thoughtful comment, also brought up by Reviewer 1. Here we provide our previous response that addresses this concern. While it is possible that the change in the visuomotor feedback responses could be just from a scaling factor. This hypothesis could explain the difference between two conditions, but would fail to explain differences between two other conditions. Specifically, this hypothesis could explain a decrease in involuntary visuomotor feedback responses between partner-irrelevant/self-relevant and partner-relevant/self-relevant. Critically, this hypothesis could not explain the difference between partner-irrelevant/self-irrelevant and partner-relevant/self-irrelevant. That is, there is no reason to scale a response to correct for a partner’s relevant target when your own target is irrelevant. However, our finding that there is a greater involuntary visuomotor feedback response in partner-relevant/self-irrelevant compared to partner irrelevant/self-irrelevant is predicted by the notion that humans form a representation of others and consider their movement costs.

      We have added a paragraph in the discussion to justify our hypothesis over the scaling factor hypothesis.

      “Our hypothesis that the sensorimotor system uses a representation of a partner and considers the partner’s costs to modify involuntary visuomotor feedback responses can parsimoniously explain all of our experimental findings. There are a few alternative hypotheses that could explain a subset of results. One alternative hypothesis is that participants simply learned the hand to center cursor mapping in each experimental condition. That is, instead of using a model of their partner, participants simply adapted to the dynamics of the center cursor. However, this hypothesis would not predict an increased involuntary visuomotor feedback response in the partner-relevant/self-irrelevant condition compared to the partner-irrelevant/self-irrelevant condition. If participants did not form a model of their partner nor consider their partner’s costs, then they would not display an increased feedback response when they had an irrelevant target and their partner’s target was relevant. An increased feedback response to help a partner achieve their goal is captured by our hypothesis that the sensorimotor system uses a representation of a partner and considers the partner’s costs to modify involuntary visuomotor feedback responses.”

      (3) Another (maybe unlikely) alternative interpretation is that the targets’ visual appearances might have been confusing. One might find that the closed square is common to both targets for the “Partner Relevant Self Irrelevant” and the “Partner Relevant Self Relevant”, and that this might have elicited the response to perturbation in “Partner Relevant Self Irrelevant”. Related to this point, it would be informative to describe how the “cooperative” fast feedback response developed over the course of the experiment, for instance, by comparing behaviors across experimental blocks.

      We have partitioned this question into two responses, relating to visual appearance of the targets and the development (i.e., learning) of visuomotor feedback responses over the course of the experiments.

      (1) Participants confused by visual appearance of the targets.

      We were also concerned that participants might be confused by the targets, and therefore confirmed with participants after the experiment that they correctly understood that the light grey filled rectangle was their own target and the dark grey hollow rectangle was their partners. Furthermore, in the partner-relevant/self-irrelevant, partner-irrelevant/self-relevant, and partner-relevant/self-relevant conditions, there is a small square target in each of the conditions. However, we found that the partner-irrelevant/self-relevant and partner-relevant/self-relevant conditions both elicited significantly greater involuntary visuomotor feedback responses than the partner-relevant/self-irrelevant condition. Thus, participants involuntary visuomotor feedback responses suggest that they correctly formed different representations based on an accurate understanding of the self vs partner target. The other reviewer had related comments about the visual stimuli, which we also address within the discussion.

      “Another alternative hypothesis would be that the sensorimotor system was responding only to the relevant target displayed on the screen. Again, this hypothesis would only explain a subset of our results. In particular, this relevant target hypothesis cannot explain the observed differences between the partner-relevant/self-irrelevant and partner-irrelevant/self-relevant conditions in both Experiments 1 and 2.”

      (2) Comparing feedback responses over time

      We have included the visuomotor feedback responses over each experimental condition in Supplementary G. Notably, we did not find any learning effect, suggesting that the sensorimotor system quickly developed a model of a partner’s behaviour and used that model to modify feedback responses. We have also added a paragraph on learning to our discussion.

      We’ve addressed how learning did not play a role in this study:

      “Finally, we also considered whether time to target [1,2] (Supplementary D), participant forward hand position (Supplementary E), or learning [4] (Supplementary G-H) influenced feedback responses, but found that none impacted the observed differences between experimental conditions nor changed our interpretation.”

      Supplementary: “Given there were 151 trials and 10 left/right probe trials for each experimental condition, it is possible that completing more trials may have lead to different in voluntary visuomotor feedback responses. Therefore, we analysed the in voluntary visuomotor feedback responses over the course of each experimental condition. Visually, involuntary visuomotor feedback responses in neither Experiment 1 (Fig. S8) nor Experiment 2 (Fig. S9) show any consistent learning (see Fig. S10 for statistical analysis). Therefore, it appears participants rapidly formed a partner model based on knowledge of their movement goal to modify their involuntary visuomotor feedback responses.”

      Supplementary: “Supplementary Fig. S10 shows the involuntary visuomotor feedback responses in the first half (A,C) and second half (B,D) for each experimental condition. In Experiment 1, we observed the same statistical results in the first half and second half of trials as the analysis of all trials. That is, we observed a significant interaction between self-target and partner target in the first half (F[1,47] = 37.09, p < 0.001) and second half (F[1,47] = 48.68, p < 0.001) of trials. Follow-up mean comparisons showed the same significant trends as our analysis of all trials in the main manuscript (see Fig. S10A-B).”

      Supplementary: “Supplementary Fig. S10 shows the involuntary visuomotor feedback responses in the first half (A,C) and second half (B,D) for each experimental condition. In Experiment 1, we observed the same statistical results in the first half and second half of trials as the analysis of all trials. That is, we observed a significant interaction between self-target and partner target in the first half (F[1,47] = 37.09, p < 0.001) and second half (F[1,47] = 48.68, p <0.001) of trials. Follow-up mean comparisons showed the same significant trends as our analysis of all trials in the main manuscript (see Fig. S10A-B).”

      Supplementary: “Showing the same involuntary visuomotor feedback response trends across the experimental conditions for the first half, second half, and all trials suggests that the sensorimotor system used a model of a partner based on their goals and considered their costs to modify rapid motor responses.”

      (4) It looks slightly counter intuitive (and therefore interesting) that the participant shows some amount of fast feedback responses in the “Partner Relevant Self Irrelevant” condition, since they were instructed to only consider the self-target. Based on the results, the authors suggest an altruistic feature of the motor system (lines 333-340). It would be helpful to clarify the basis for this interpretation, whether it is formally derived from the game-theoretic framework or represents a more conceptual interpretation. Providing additional explanation that translates the game-theoretic reasoning into more accessible, intuitive terms would help readers better understand and evaluate this claim.

      We are glad the reviewer also finds this result interesting. The reviewer raises an important point that there needs to be a more clear explanation for why we believe this result was found. We have made the following changes to the discussion:

      “Furthermore, this result is predicted by our dynamic game theory models that include the partner’s costs in the self cost function. In other words, a dynamic game theory model that selects feedback gains to minimize both the self and partner cost reflects an altruistic control policy.”

      (5) Please check whether all references are displayed correctly. Some of them (e.g., 25, 65) seemed not correctly shown in the References section.

      We have fixed the citation.

      We thank the reviewer for providing a clear and insightful review. Their comments have significantly improved the manuscript.

      References

      (1) Česonis, J., & Franklin, D. W. (2020). Time-to-Target Simplifies Optimal Control of Visuomotor Feedback Responses. eneuro, 7 (2), ENEURO.0514–19.2020.

      (2) Česonis, J., & Franklin, D. W. (2022). Contextual Cues Are Not Unique for Motor Learning: Task-dependant Switching of Feedback Controllers. PLOS Computational Biology, 18 (6), ed. by Haith, A. M.: e1010192.

      (3) Crevecoeur, F., Kurtzer, I., Bourke, T., & Scott, S. H. (2013). Feedback Responses Rapidly Scale with the Urgency to Correct for External Perturbations. Journal of Neurophysiology, 110 (6), 1323–1332.

      (4) Franklin, S., Wolpert, D. M., & Franklin, D. W. (2012). Visuomotor Feedback Gains Upregulate during the Learning of Novel Dynamics. Journal of Neurophysiology, 108 (2), 467–478.

      (5) Liu, Y., Leib, R., Dudley, W., Shafti, A., Faisal, A. A., & Franklin, D. W. (2025). Partner-Sourced Haptic Feedback Rather than Environmental Inputs Drives Coordination Improvement in Human Dyadic Collaboration. Scientific Reports, 15 (1), 40347.

      (6) Franklin, D. W., Reichenbach, A., Franklin, S., & Diedrichsen, J. (2016). Temporal Evolution of Spatial Computations for Visuomotor Control. The Journal of Neuroscience, 36 (8), 2329–2341.

      (7) Knill, D. C., Bondada, A., & Chhabra, M. (2011). Flexible, Task-Dependent Use of Sensory Feedback to Control Hand Movements. The Journal of Neuroscience, 31 (4), 1219–1237.

      (8) Cross, K. P., Cluff, T., Takei, T., & Scott, S. H. (2019). Visual Feedback Processing of the Limb Involves Two Distinct Phases. The Journal of Neuroscience, 39 (34), 6751–6765.

      (9) Franklin, D. W., & Wolpert, D. M. (2008). Specificity of Reflex Adaptation for Task-Relevant Variability. The Journal of Neuroscience, 28 (52), 14165–14175.

      (10) Nashed, J. Y., Crevecoeur, F., & Scott, S. H. (2012). Influence of the Behavioral Goal and Environmental Obstacles on Rapid Feedback Responses. Journal of Neurophysiology, 108 (4), 999–1009.

      (11) Braun, D. A., Ortega, P. A., & Wolpert, D. M. (2009). Nash Equilibria in Multi-Agent Motor Interactions. PLoS Computational Biology, 5 (8), ed. by Friston, K. J.: e1000468.

      (10) Takagi, A., Ganesh, G., Yoshioka, T., Kawato, M., & Burdet, E. (2017). Physically Interacting Individuals Estimate the Partner’s Goal to Enhance Their Movements. Nature Human Behaviour, 1 (3), 0054.

      (11) Takagi, A., Hirashima, M., Nozaki, D., & Burdet, E. (2019). Individuals Physically Interacting in a Group Rapidly Coordinate Their Movement by Estimating the Collective Goal. eLife, 8 , e41328.

    1. eLife Assessment

      This study addresses an important question and shows how social navigation in homing pigeons can be explained by simple averaging, without requiring any complex cognitive abilities. The evidence, based on a rigorous and systematic comparison of seven models and data on how social routes can be generated from solitary routes, is compelling. The authors should be commended for their willingness to critically re-examine established interpretations.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates how collective navigation improvements arise in homing pigeons. Building on the Sasaki & Biro (2017) experiment on homing pigeons, the authors use simulations to test seven candidate social learning strategies of varying cognitive complexity, ranging from simple route averaging to potentially cognitively demanding selective propagation of superior routes. They show that only the simplest strategy-equal route averaging-quantitatively matches the experimental data in both route efficiency and social weighting. More complex strategies, while potentially more effective, fail to align with the observed data. The authors also introduce the concept of "effective group size," showing that the chaining design leads to a strong dilution of earlier individuals' contributions. Overall, they conclude that cognitive simplicity rather than cumulative cultural evolution explains collective route improvements in pigeons.

      Strengths:

      The manuscript provides a compelling argument that a simpler hypothesis is necessary and sufficient to explain the findings of a recent study on improvements to pigeon routes, through a rigorous, systematic comparison of seven alternative hypotheses. The authors should be commended for their willingness to critically re-examine established interpretations. The introduction and discussion are broad and link pigeon navigation to general debates on social learning, wisdom of crowds, and CCE.

      Weaknesses:

      The authors' method focuses on trajectory-level average behaviour rather than the fine-scale decision-making processes of organisms. This is acknowledged in the manuscript by the authors.

      Comments on revision:

      The authors have addressed most of the comments by me as well as the other reviewer.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript investigates which social navigation mechanisms, with different cognitive demands, can explain experimental data collected from homing pigeons. Interestingly, the results indicate that the simplest strategy - route averaging - aligns best with the experimental data, while the most demanding strategy - selectively propagating the best route - offers no advantage. Further, the results suggest that a mixed strategy of weighted averaging may provide significant improvements.

      The manuscript addresses the important problem of identifying possible mechanisms that could explain observed animal behavior by systematically comparing different candidate models. A core aspect of the study is the calculation of collective routes from individual bird routes using different models that were hypothesized to be employed by the animals but which differ in their cognitive demands.

      The manuscript is well written, with high-quality figures supporting both the description of the approach taken and the presentation of results. The results should be of interest to a broad community of researchers investigating (collective) animal behavior, ranging from experiment to theory. The general approach and mathematical methods appear reasonable and show no obvious flaws. The statistical methods also appear.

      Strengths:

      The main strength of the manuscript is the systematic comparison of different meta-mechanisms for social navigation by modeling social trajectories from solitary trajectories and directly comparing them with experimental results on social navigation. The results show that the experimentally observed behavior could, in principle, arise from simple route averaging without the need to identify "knowledgeable" individuals. Another strength of the work is the establishment of a connection between social navigation behavior and the broader literature on the wisdom of crowds through the concept of effective group size.

      Comments on revision:

      The authors made substantial revisions to the manuscript, addressing my comments. While I do think that regarding my second comment on CCE the authors could be a bit more bold, I am overall satisfied with the revisions made.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigates how collective navigation improvements arise in homing pigeons. Building on the Sasaki & Biro (2017) experiment on homing pigeons, the authors use simulations to test seven candidate social learning strategies of varying cognitive complexity, ranging from simple route averaging to potentially cognitively demanding selective propagation of superior routes. They show that only the simplest strategy-equal route averaging-quantitatively matches the experimental data in both route efficiency and social weighting. More complex strategies, while potentially more effective, fail to align with the observed data. The authors also introduce the concept of "effective group size," showing that the chaining design leads to a strong dilution of earlier individuals' contributions. Overall, they conclude that cognitive simplicity rather than cumulative cultural evolution explains collective route improvements in pigeons.

      Strengths:

      The manuscript addresses an important question and provides a compelling argument that a simpler hypothesis is necessary and sufficient to explain findings of a recent influential study on pigeon route improvements, via a rigorous systematic comparison of seven alternative hypotheses. The authors should be commended for their willingness to critically re-examine established interpretations. The introduction and discussion are broad and link pigeon navigation to general debates on social learning, wisdom of crowds, and CCE.

      We thank the reviewer for their positive comments.

      Weaknesses:

      The lack of availability of codes and data for this manuscript, especially given that it critically examines and proposes alternative hypotheses for an important published work.

      We thank the reviewer for their comment. The code and data for our manuscript are an important aspect of the study, and we had intended to make them publicly available upon publication. The link to our code and data on fig share can be found here: (https://doi.org/10.6084/m9.figshare.28950032.v1). We have now revised the manuscript to include a link to our dataset.

      Reviewer #2 (Public review):

      Summary:

      The manuscript investigates which social navigation mechanisms, with different cognitive demands, can explain experimental data collected from homing pigeons. Interestingly, the results indicate that the simplest strategy - route averaging - aligns best with the experimental data, while the most demanding strategy - selectively propagating the best route - offers no advantage. Further, the results suggest that a mixed strategy of weighted averaging may provide significant improvements.

      The manuscript addresses the important problem of identifying possible mechanisms that could explain observed animal behavior by systematically comparing different candidate models. A core aspect of the study is the calculation of collective routes from individual bird routes using different models that were hypothesized to be employed by the animals, but which differ in their cognitive demands.

      The manuscript is well-written, with high-quality figures supporting both the description of the approach taken and the presentation of results. The results should be of interest to a broad community of researchers investigating (collective) animal behavior, ranging from experiment to theory. The general approach and mathematical methods appear reasonable and show no obvious flaws. The statistical methods also appear.

      Strengths:

      The main strength of the manuscript is the systematic comparison of different meta-mechanisms for social navigation by modeling social trajectories from solitary trajectories and directly comparing them with experimental results on social navigation. The results show that the experimentally observed behavior could, in principle, arise from simple route averaging without the need to identify "knowledgeable" individuals. Another strength of the work is the establishment of a connection between social navigation behavior and the broader literature on the wisdom of crowds through the concept of effective group size.

      We thank the reviewer for their positive comments.

      Weaknesses:

      However, there are two main weaknesses that should be addressed:

      (1) The first concerns the definition of "mechanism" as used by the authors, for example, when writing "navigation mechanism." Intuitively, one might assume that what is meant is a behavioral mechanism in the sense of how behavior is generated as a dynamic process. However, here it is used at a more abstract (meta) level, referring to high-level categories such as "averaging" versus "leader-follower" dynamics. It is not used in the sense of how an individual makes decisions while moving, where the actual route followed in a social context emerges from individuals navigating while simultaneously interacting with conspecifics in space and time. In the presented work, the approach is to directly combine (global) route data of solitary birds according to the considered "meta-mechanisms" to generate social trajectories. Of course, this is not how pigeon social navigation actually works-they do not sit together before the flight and say, "This is my route, this is your route, let's combine them in this way." A mechanistic modeling approach would instead be some form of agent-based model that describes how agents move and interact in space and time. Such a "bottom-up" approach, however, has its drawbacks, including many unknown parameters and often strongly simplifying (implicit) assumptions. I do not expect the authors to conduct agent-based modeling, but at the very least, they should clearly discuss what they mean by "mechanism" and clarify that while their approach has advantages-such as naturally accounting for the statistical features of solitary routes and allowing a direct comparison of different meta-mechanisms is also limited, as it does not address how behavior is actually generated. For example, the approach lacks any explicit modeling of errors, uncertainty, or stochasticity more broadly (e.g., due to environmental influences). Thus, while the presented study yields some interesting results, it can only be considered an intermediate step toward understanding actual behavioral mechanisms.

      We thank the reviewer for their comment and thoughtful suggestions. We agree that the inherent behavioral mechanisms and the biological basis of these mechanisms cannot be determined just through the navigational data alone. For instance, it remains unexplored if pigeons are adapting their behavior based only on social cues from their partners or using other navigational features such as landmarks or roads, location of the sun, geomagnetic cues or prior learnt routes. However, we do agree (as also pointed by the reviewer) that these behavioral rules generate an emergent ‘meta-mechanism’ where the bird pairs are behaving as if their preferred routes are averaged during a flight. It will be important in future work to explore the biological basis of these mechanisms, but our current approach allows us to only describe the mechanisms in a meta sense with any confidence. Considering this, we believe that our analysis is a more top-down approach towards describing the outcomes of these underlying mechanisms in an abstract sense. We would also like to point the reviewer to Dalmaijer, 2024 [1] who used a bottom up approach, using naive agents and showed that cumulative route improvements emerged in the absence of any sophisticated communication in the same dataset, in agreement with our approach. We have now added a paragraph: “It is also important to clarify that we use the terms…… that lead to these meta-mechanisms arising remain an open question.” found in lines 120-129 in our Introduction to make this clarification.

      (2) While the presented study raises important questions about the applicability and viability of cumulative cultural evolution (CCE) in explaining certain animal behaviors such as social navigation, I find that it falls short in discussing them. What are the implications regarding the applicability of CCE to animal data and to previously claimed experimental evidence for CCE? Should these experiments be re-analyzed or critically reassessed? If not, why? What are good examples from animal behavior where CCE should not be doubted? Furthermore, what about the cited definitions and criteria of CCE? Are they potentially too restrictive? Should they be revised-and if so, how? Conversely, if the definitions become too general, is CCE still a useful concept for studying certain classes of animal behavior? I think these are some of the very important questions that could be addressed or at least raised in the discussion to initiate a broader debate within the community.

      We thank the reviewer for their comments and interesting questions regarding our study. We agree with the reviewer that our study opens up new avenues for critically analysing the criteria previous studies have used for providing evidence of CCE in non-human animals. According to our literature review, we found that the field has been usually motivated in thinking about CCE in a ‘process’ focused manner (Reindl et al. [2]) in regards to individuals being able to compare strategies and selecting ones resulting in higher individual fitness. This preferential selection of strategies – termed innovations — allows for the stereotypical ratcheting effect seen in CCE. In our study, we propose that in the case of homing pigeons, the ratcheting effect is more of a statistical outcome rather than deliberate individual judgement. We believe that this strategy is also amenable to certain task types (which in our study was homing route choice) and may change for others (for example solving a puzzle box) and the task also needs to be sufficiently complex for animals to benefit from the use of social information (Caldwell et al. 2008 [3]). Thus, we recommend future work to address what classes of problems would fit well within the definition of “emergent” CCE and which ones don’t. Keeping this framework in mind, studies should clearly state what definition of CCE they are using and should be critically evaluated for their underlying task type and cognitive mechanisms to deem them as CCE. Considering these points, we have now expanded our Discussion to include a paragraph: “Our results highlight the need for more…..range of task types and cognitive abilities.” found in lines 420-433 to highlight these key questions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I do not have any major objections, but I am clarifying my points as major or minor depending on the effort required to address (mostly via rewriting and clarifications).

      Major comments:

      (1) A schematic summary of the original study: Since the current manuscript builds directly on Sasaki & Biro (2017), it would greatly help readers if you included a concise schematic figure summarizing the original experiment. For instance, a simple panel could depict the chain design (experienced + naïve replacements), the control treatments, and the key empirical findings (improvements in route efficiency across generations, and route similarity within vs. between chains). Presenting this visually would save readers the effort of reconstructing the design and main results from text alone, especially for those unfamiliar with the original paper. It would also clarify exactly what empirical patterns your simulations are intended to reproduce.

      We thank the reviewer for this comment. We have now revised the manuscript with a schematic illustration adapted from the original study by Sasaki and Biro (2017). We hope this clarifies the experimental design and results we aimed to highlight in our work.

      (2) Reproducibility: Code and data are only "available on request." I believe eLife has strong policies on open science; a lack of immediate open access to analysis would be a barrier. I find it jarring that a paper intending to reproduce and improvise a previously published paper does not make the codes and data available for peer review or to readers without an explicit request.

      We have taken the feedback into consideration and updated the Data Availability section with a link to our Fig share dataset.

      (3) One huge drawback of the current format of the manuscript, where Methods come after Results, is that one has to really struggle to understand and appreciate Figures 2 and 3. I would strongly urge authors to have a shorter methods section embedded either as a subsection before the Results, or within the results section, as described in each figure. Perhaps a lot of my confusion also comes from not having known the previous paper, but it may be true for other readers, too. More specifically, for Figure 3, how is social weight for the experiments inferred? Figure 3 caption talks of mean difference, but one has to check the manuscript at multiple places throughout to really understand what this difference is (the definition) and how it is computed.

      While we agree that our manuscript includes the Methods section at the end, we tried to structure our text to tell a story (as stated in our manuscript title). To this end, we organized the text into short titled subsections that briefly convey the relevant background, identify the knowledge gap and outline our approach. We chose this structure to reserve the indepth details about model implementation and statistical analysis for the Methods.

      Additionally, we made sure to include references to methodological details in relevant segments of the Introduction and Results section so as to not bog down the reader by model complexities and keep a coherent narrative that delivers the message of our study. To further address the background of our work, we have now added a schematic of the original study in response to a previous comment by the reviewer, which we hope helps the reader better understand our work. We hope this explanation clarifies the intention behind our writing choice and decision to retain the current structure.

      (4) The introduction of the 'effective group size' concept is a potentially valuable and intuitive way to interpret chain dynamics, but the explanation is somewhat buried in the Results/Methods; I suggest highlighting it more prominently (e.g., in the Discussion or with a schematic in the Results) so readers can readily grasp this useful idea.

      We thank the reviewer that they found our concept of ‘effective group size’ useful. However, we do believe that we introduced the idea and rationale behind using this method in the Results: “We asked to what extent……to an equivalent group size” found in lines 305-314. We reserved a detailed description of this method in the Methods section. However, to further emphasize the importance of the concept we have now added a text: “This is further supported….. slightly better than two individuals.” found in lines 389-394 in the Discussion. 

      Minor comments:

      (1) Line 12: "what is the navigation mechanism(s)" - the (s) is a bit awkward. Either remove (s) or ask what the mechanisms are.

      We have fixed the typo to clarify the statement.

      (2) Line 78: "Such 'ratchet'-like improvements is referred to..." → "are referred to."

      We have fixed the typo to clarify the statement.

      (3) Figure 3 caption: "color scheme in the plots are same" → should be "is the same."

      We have fixed the typo to clarify the statement.

      (4) Clarification on reporting confidence intervals: The manuscript reports confidence intervals (CIs) for the model-based comparisons (e.g., Figures 2-3). This might seem unnecessary for simulation studies, since running more iterations can arbitrarily shrink uncertainty. However, in your case, the CIs are justified because the simulations are anchored to a finite empirical dataset (only 9 solo trajectories), sampled with replacement, and analyzed with mixed-effects models that incorporate bird identity as a random effect. Thus, the intervals reflect biological sample variability rather than simulation noise. This must be clarified.

      We have added a clarifying statement: “...and reflect the biological uncertainty in the empirical dataset, not simulation noise” found in lines 241 and 293 in the captions of Figures 2 and 3 in accordance with the reviewer’s comment. 

      (5) One part of the issue is that details of methods come much later in the manuscript, perhaps following journal style. Therefore, I recommend explicitly highlighting this rationale in the Results, so readers do not misinterpret the CIs as simply reflecting simulation error.

      We believe that the clarifying statements we have now added in the captions of Figures 2 and 3 should convey this interpretation of CIs and further changes in the Results may not be required.

      With these proposed changes we hope that we improved upon the clarity of our manuscript.

      References:

      (1) Dalmaijer ES (2024) Cumulative route improvements spontaneously emerge in artificial navigators even in the absence of sophisticated communication or thought. PLoS Biol. 22:e3002644.

      (2) Reindl, E., Gwilliams, A.L., Dean, L.G. et al. (2020) Skills and motivations underlying children’s cumulative cultural learning: case not closed. Palgrave Commun 6, 106.

      (3) Caldwell CA, Millen AE (2008) Studying cumulative cultural evolution in the laboratory. Phil. Trans. R. Soc. B 363:3529-3539.

    1. eLife Assessment

      This important manuscript reports a very interesting view of how pesticides can be toxic to beneficial insects like the honeybee. The study uses machine learning for the discovery of new honeybee-repellent odorants. The solid evidence predicts compounds that were validated in the lab and in the field. This work will be of great interest to researchers in ecology, pest control and sensory biology.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      Original review:

      Summary:

      This manuscript reports a very interesting, novel and important research angle to add to the now enormous interest in how pesticides can be toxic to beneficial insects like the honey bee. Many studies have reported on how pesticides in standard use formulations show both lethality as well as sublethal negative effects on behavior and reproduction. The authors propose to use machine learning algorithms to identify new volatile compounds that can be tested for repellency. They use as input chemical structures that are derived from chemicals that have known repellent effects as identified in their initial behavioral assays.

      Strengths:

      The conclusion is that such chemicals specific to repelling bees and not pest insects (using the fruit fly as a model for the latter) can be identified using the ML approach. Have a list of such chemicals that can be rotated among in any field application would be a benefit because of the honey bees' ability to learn its way around any kind of stimulus designed to keep it from nectar and pollen, even when they may be tainted by pesticide.

      Weaknesses:

      The use of machine learning seems well-executed and legitimate. But this is beyond my expertise. So other reviewers can maybe comment more on that.

      The behavioral data report on the use of a two-choice assay for bees in small Petrie plates. Bess can feed from two small wells place of filter paper impregnated with control or the control containing a chemical. The primary behavior, for ex in Fig 2C, is the first choice by one of the five bees in the plate of which well to feed from. For some chemical compound, there seems to be a 50:50 choice, indicating no repellent effects. In other cases the first bee making the choice chose the control, indicating possible repellent effects of the test chemical. Choices in this assay were validated in a free flying assay.

      Concerns with the choice assay:

      - 50-70 microliters amounts to what one hungry bee will drink. Did the first bee drink most of it, such that measures of bait consumed reflect a single bee or multiple bees?<br /> - How many bees were repelled to the control side? Was it just the one bee? Were other measures considered? E.g. time to first approach; the number of bees feeding at different time points; the total number of bees observed feeding per unit time.

    3. Reviewer #2 (Public review):

      Original review:

      Summary:

      The search for new repellent odors for honey bees has significant practical implications. The authors developed an iterative pipeline through machine learning to predict honey bee-repellent odors based on molecular structures. By screening a large number of candidate compounds, they identified a series of novel repellents. Behavioral tests were then conducted to validate the effectiveness of these repellents. Both the discovery and the methodological approach hold value for related fields.

      Strengths:

      * The study demonstrates that using molecular structures and a relatively small training dataset, the model could predict repellents with a reasonably high success rate. If the iterative approach works as described, it could benefit a wide range of olfaction-related fields.<br /> * The effectiveness of the predicted repellents was validated through both laboratory and field behavioral tests.

      Weaknesses:

      The small size of the training dataset poses a common challenge for machine learning applications. However, the authors did not clearly explain how their iterative approach addresses this limitation in this study. Quantitative evidence demonstrating improvements achieved in the second round of training would strengthen their claims. For instance, details on whether the success rate of predictions or the identification of higher-affinity components would be helpful. Furthermore, given that only 15 new components were added for the second round of training, it is surprising that such a small dataset could result in significant improvements.

    4. Reviewer #3 (Public review):

      Original summary:

      The manuscript of Kowalewski et al. titled "Machine learning of honey bee olfactory behavior identifies repellent odorants in free flying bees in the field" did machine learning to predict potential candidates for honeybee repellents, which may keep foraging bees from pesticides. This is a pilot research with strong significance in the research of olfactory behavior and in pest control.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript reports a very interesting, novel and important research angle to add to the now enormous interest in how pesticides can be toxic to beneficial insects like the honey bee. Many studies have reported on how pesticides in standard use formulations show both lethality as well as sublethal negative effects on behavior and reproduction. The authors propose to use machine learning algorithms to identify new volatile compounds that can be tested for repellency. They use as input chemical structures that are derived from chemicals that have known repellent effects as identified in their initial behavioral assays.

      Strengths:

      The conclusion is that such chemicals specific to repelling bees and not pest insects (using the fruit fly as a model for the latter) can be identified using the ML approach. Have a list of such chemicals that can be rotated among in any field application would be a benefit because of the honey bees' ability to learn its way around any kind of stimulus designed to keep it from nectar and pollen, even when they may be tainted by pesticide.

      Weaknesses:

      The use of machine learning seems well-executed and legitimate. But this is beyond my expertise. So other reviewers can maybe comment more on that.

      The behavioral data report on the use of a two-choice assay for bees in small Petrie plates. Bess can feed from two small wells place of filter paper impregnated with control or the control containing a chemical. The primary behavior, for ex in Fig 2C, is the first choice by one of the five bees in the plate of which well to feed from. For some chemical compound, there seems to be a 50:50 choice, indicating no repellent effects. In other cases the first bee making the choice chose the control, indicating possible repellent effects of the test chemical. Choices in this assay were validated in a free flying assay.

      Concerns with the choice assay:

      50-70 microliters amounts to what one hungry bee will drink. Did the first bee drink most of it, such that measures of bait consumed reflect a single bee or multiple bees?

      The measure of lure consumed reflects multiple bees. We observed that the first bee did not empty the 70 ul of honey, allowing us to estimate honey consumption by several bees.

      How many bees were repelled to the control side? Was it just the one bee?

      All the bees in a group were repelled to the control side for repellents. Evaluating lack of honey consumption, also allowed us to repellency as well. As an example: if 100% honey is consumed on the control side meant that the bees were hungry, but if 0% honey was consumed on the repellent side, this meant that the bees were not hungry enough to drink from the honey on the repellent side.

      Were other measures considered? E.g. time to first approach; the number of bees feeding at different time points; the total number of bees observed feeding per unit time.

      Bees were cooled down to place them in the plates for the experiments. Therefore, time to first approach could also depend on how long it took the bees to warm up, which was not as relevant for our research question. Because bees can communicate where to find food sources to each other, we restricted ourselves to first choice, only, to get independent data points for each plate. However, we investigated whether the first cup the first bee chose was also the one it drank from, which was the case.

      Reviewer #2 (Public review):

      Summary:

      The search for new repellent odors for honey bees has significant practical implications. The authors developed an iterative pipeline through machine learning to predict honey bee-repellent odors based on molecular structures. By screening a large number of candidate compounds, they identified a series of novel repellents. Behavioral tests were then conducted to validate the effectiveness of these repellents. Both the discovery and the methodological approach hold value for related fields.

      Strengths:

      The study demonstrates that using molecular structures and a relatively small training dataset, the model could predict repellents with a reasonably high success rate. If the iterative approach works as described, it could benefit a wide range of olfaction-related fields.

      The effectiveness of the predicted repellents was validated through both laboratory and field behavioral tests.

      Weaknesses:

      The small size of the training dataset poses a common challenge for machine learning applications. However, the authors did not clearly explain how their iterative approach addresses this limitation in this study. Quantitative evidence demonstrating improvements achieved in the second round of training would strengthen their claims. For instance, details on whether the success rate of predictions or the identification of higher-affinity components would be helpful. Furthermore, given that only 15 new components were added for the second round of training, it is surprising that such a small dataset could result in significant improvements.

      The original repellency dataset was collected from multiple older studies, each with differences in assays for bee behavior, and using differing delivery and chemical concentrations. Moreover, the number of strong repellents were limited in number, and because they varied structurally from non-repellents in the dataset, the AUC appeared high. A smaller dataset result in unusual AI/ML model performance trends, as any algorithm is just a reflection of its training data. As a result, we found that the Round 1 predictions had a low success rate in behavior assays (~20%). Subsequently, even small amounts of data collected using one standard concentration and assay, could dramatically change the quality of the dataset, not just for structures of repellents, but also related structures that were not repellent. What we observe is a more complete representation of how repellents and non-repellents are distributed when adding just 15 chemicals. And the prediction success of Round 2 is more than doubled in repellent behavior assays at >50%. The initially observed performance gains with even small additions to the training dataset will stabilize and ultimately plateau due to the limits of the ML algorithm and/or chemical featurization technique. A more complex model, trained on a large dataset, may not be expected to benefit from a handful of additional examples, it is because the chemical feature distributions are already better approximations of the real world. To put simply, smaller datasets imply there is more to learn.

      It is also true that the size of the training dataset is important for AI/ML algorithms, Artificial neural network, for instance, are highly sensitive to noise and generalize poorly with limited data; the noise is amplified in these cases, and the solution—reducing the complexity of the model—impedes learning. Many algorithms like the decision trees and support vector machines featured in our paper can handle noise more efficiently and are suitable for smaller datasets in that they can still make reasonably successful predictions.

      Reviewer #3 (Public review):

      The manuscript of Kowalewski et al. titled "Machine learning of honey bee olfactory behavior identifies repellent odorants in free flying bees in the field" did machine learning to predict potential candidates for honeybee repellents, which may keep foraging bees from pesticides. This is a pilot research with strong significance in the research of olfactory behavior and in pest control. However, some major issues need to be addressed to enhance the manuscript's clarity, strength, and overall coherence.

      (1) Drosophila melanogaster is not considered as a true agricultural pest. The manuscript would be more compelling if using true pests, for example, Drosophila suzukii or others.

      Honeybees face a critical risk of lethal pesticide exposure when they drift from their designated orchards into adjacent blooming crops or honeydew-coated fields, where they encounter chemical treatments intended for insects like Citrus Thrips, Asian Citrus Psyllid, Alfalfa Weevil, Peach Twig Borer, Oriental Fruit Moth, Lygus Bugs , Cotton Aphids, Whiteflies, Corn Rootworm, Sunflower Head Moth, Vine Mealybug, Cucumber Beetles, and Sugarcane Aphids. Unfortunately, testing such pest species is outside the scope of this paper, but would deserve further research.

      (2) For repellency test, the result relies on dosage. An attractant may become a repellent at high concentration. Test a range of concentrations for each chemicals and compare responses between honeybees and pests.

      Testing freely flying honey bees in the field is an extremely challenging undertaking. Nevertheless, we added extra tests for two strong repellents, BR4.5 and BR3.81, at half dose of 0.05 mg/cm<sup>2</sup>. As expected, we found that there was a reduction in repellency. Testing more concentrations was not within the scope of this paper.

      (3) Be more clear about bee behavior data and their scores (as in Page 4 Results "184 training chemicals and later for 203 chemicals" and Page 10 Methods). I suggest that authors add a supplemental table with each chemical and its behavioral score, feature and reference - which ones were used for training, and which ones for testing. Also add your own behavioral test data (second input) to this table

      We have added the training chemical lists as Supplemental Tables S3 and S4.

      (4) The AUC in the first validation was 0.88 (Page 4), and in Page 5, "As expected, the computational validation results based on the AUC values, show an improvement." However, there were no other AUC values to show improvement.

      (5) Show plots of ROC AUC curves from Round 1 and Round 2.

      The round one ROC curve is shown in Figure 1. The round two ROC curves obtained from 3 different approaches (Author response image 1). The manuscript shows direct behavioral validation of chemicals identified, which is more important.

      Author response image 1.

      (6) In the Discussion, the authors mentioned olfactory receptors in honeybees. It would be useful to provide a general review of the current understanding of these receptors and their (potential) functions.

      We have expanded the discussion and pointed to a review on honey bee olfaction.

      (7) I suggest combining Fig. 1 and Fig. 3A as one pipeline for this work.

      (8) Figure 2C, some sample sizes are very small, such as 2-piperidone: 1 first-choice control vs 0 first-choice repellent? Increase sample size and do statistical analysis.

      Most compounds except the one pointed out, have small sample sizes because of the low percentage of bees participating in the trials. Consequently, we improved methods in round 2 and were able to increase participation from 68% to 81%, as described in the methods. However since the compound was included in the second round of training, we would like to report it anyway. This compound had the highest rate of non-participating plates compared to the others and there is a possibility that it it may affect both the stimuli.

      (9) In general, to assist reviewers, include line numbers to the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Other factors about the newly identified chemicals:

      Is there a toxicity index for these chemicals that can be listed? This would be important obviously for any humans around the repellents

      While toxicity index determination is outside the scope of this manuscript, it is possible to predict Rat LD50 values using the EPA Suite’s toxicity prediction tool. In a pilot test, the software predicted an average oral toxicity is ~3064mg/kg for the 18 repellents in Round 2, which is considered “Practically non-toxic” by the EPA.

      Was there any indication of bees being behaviorally impaired or dying when exposed to the chemicals in a confined space? Even exposure to intense floral perfumes in a confined space and be toxic over a longer period.

      Less than 5% of the 2225 honey bee died after the experiments, and none of the compounds showed a significantly higher level of dying, suggesting that the minor effect was not due to chemicals, but possibly due to handling steps (starving, chilling, recovery, etc).

      The 'plates not participating' measure indicates plates in which no bees fed on either choice. Is that correlated to the choice index? That is, when bees showed some repellency was it the case that often that led to no choice?

      Yes, non-participating plates were those, in which the bees did not drink any honey at all. The reason for this could have been that the bees were too cold and unable to heat up enough to participate in the trials, or that the chemical was so repellent, the bees did not want to drink any honey at all. Because we were not able to distinguish between these two reasons, we excluded plates in which the bees did not drink any honey at all from our dataset.

      It is unclear why the McNemar test was used.

      The McNemar test is used for hypothesis testing for paired dichotomous data. In our data file, we created two columns to report our first-choice results: “Control side first” and “Repellent side first”. When the first bee in a plate drank from the control side first, we added a 1 to the “Control side first” column and a “0” to the “Repellent side first” column. Because one control and one repellent-side honey pot were in the same Petri dish, the bees could only choose one side first, this meant it could not choose the other side at the same time. Consequently, our dataset consisted of paired samples, which were dependent from each other. We therefore split the dataset by Repellent candidate, and we used the paired -sample McNemar tests for non-parametric data. (Lachenbruch P.A. McNemar Test, Wiley StatsRef: Statistics Reference Online)

      The statistical result is not discussed in the text, only shown in the figure. And it looks to be significant only for one chemical and DEET. Yet on page 4 the end of the second paragraph, the authors write "For many of the tested compounds the bees preferred to visit the honey-water pots on the control side versus the repellent side,". That implies that they are not really using the test as a meaningful means for showing differences. If they are arguing only from trends, then that should be clearer in the text.

      We reported the p-values for each test we had used in tables in Figure 2C and S2. In the methods section we report which statistical tests were used to evaluate the data.

      There is no mention of attractant chemicals:

      Slessor and Winston used queen pheromone to attract bees to fields and improve pollination. Honey bees use the Nasonov pheromone to attract other bees to feeding locations. Could the addition of their chemical features change ML outcomes? This should be at least discussed.

      We thank the referee for the suggestion; however the focus this manuscript is repellents and therefore we restricted the background to that area of knowledge.

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      Releasing the dataset and code will benefit the readers interested in this study.

      The behavioral data are reported within the figures, tables, and supplementary. The computational code will be available upon request from the communicating author for non-commercial use.

      Figure 1, AUC curve, "AUC = 0.XX", should there be an actual value from the experiment?

      Added

      Page 4, "(Talbe S1)" should be placed in the next sentence, as "From the initial training set we identified 45 features that were considered important for predicting aversive valence (Table S1)."

      We have added this in the appropriate spot.

      Page 5, "As expected, the computational validation results based on the AUC values, show an improvement.". Please list the AUC values.

      Author response image 2.

      Reviewer #3 (Recommendations for the authors):

      Minor comments:

      (1) Page 3: "they sense using a sophisticated olfactory system of >180 odorant receptor genes in the genome". In the cited Robertson & Wanner's paper, there are around 160 receptors, and 170 if pseudogenes are included.

      We thank the referee and have updated the numbers.

      (2) Page 4: "initially for 184 training chemicals and later for 203 chemicals (Table S1)." Table S1 is about features, not chemicals?

      We have moved the reference to an appropriate location.

      (3) Figure 2A: What is the control? Acetone or another solvent?

      Acetone, but it rapidly evaporates before the time of experiment.

      (4) Figure 2A: What does asterisks mean?

      Statistically significant.

      (5) Figure 3: When you added your own testing data as a second input for Round 2, put details about these data: chemical names, preference scores... Also, are Round 2 data (Round 1 plus your own) were also split as 90:10 into training and testing partitions?

      Yes, the validation was performed on the updated data set including the new chemicals.

      (6) Figure 3D: Is asterisk at correct location? What does it mean?

      Means that BR3.15 was significantly different from BR4.5

      (7) Figure 4D: "4D" in legend is missing. Also, "... tested at the regular dose (0.1mg/cm2) and half dose (0.05mg/cm2)". In the panel, it is only 0.05mg/cm2.

      Added

      (8) Table S2 is the same as Fig. 2C? Remove one.

      We have deleted Table S2.

    1. eLife Assessment

      This meta-analysis provides a fundamental synthesis of evidence demonstrating that transcranial magnetic stimulation targeting the hippocampal-cortical network reliably enhances episodic memory performance across diverse study designs. The evidence is convincing, with rigorous methodology and consistent effects observed despite modest sample sizes and some heterogeneity in stimulation approaches. The work highlights the specificity of memory improvements to hippocampal-dependent memories and identifies key methodological factors-such as individualized targeting-that influence efficacy. Overall, this study offers a timely and integrative framework that will inform both basic memory research and the design of future clinical trials for cognitive enhancement.

    2. Reviewer #1 (Public review):

      Summary:

      Goicoechea et al. conducted a timely and thorough meta-analysis on the potential for indirect hippocampal targeted transcranial magnetic stimulation (TMS) to improve episodic memory. The authors included additional factors of interest in their meta-analysis which can be used to inform the next generation of studies using this intervention. Their analysis revealed critical factors for consideration: TMS should be applied pre-encoding, individualized spatial targeting improves efficacy, and improvement of recollection was stronger than recognition.

      Strengths:

      As mentioned previously, the meta-analysis is timely and summarizes an emerging set of studies (over the past decade since Wang et al., Science 2014). Those outside of the field may not be aware of the robustness in improvements in episodic memory from hippocampal targeted TMS. The authors were quite thorough in including additional factors which are important for the interpretation of these findings. These factors also address the differences in approach across studies. The evidence that individualized spatial targeting improves TMS efficacy is consistent with recent advances in TMS for major depressive disorder. The specificity of the cognitive improvements to recollection of episodic memory and not for other cognitive domains is consistent with hippocampal targeting. The authors also plan to post the complete dataset on an open-source repository which enables additional analysis by other researchers.

      Weaknesses:

      The write-up is succinct and emphasizes the scientific decisions that underly key differences in the various experimental designs. While the manuscript is written for a scientific audience, the authors are likely aware that findings like this will be of broad appeal to the field of neurology where treatments for memory loss are desperately needed. For this reason, the authors could consider including a statement regarding an interpretation of this meta-analysis from a clinical standpoint. Statements such as 'safe and effective' imply a clinical indication and yet the manuscript does not engage with clinical trials terminology such as blinding, parallel arm versus crossover design, and trial phase. While the authors might prefer not to engage with this terminology, it can be confusing when studies delivering intervention-like five-days of consecutive TMS (e.g., Wang et al., 2014) are clustered with studies that delivered online rhythmic TMS which tests target engagement (e.g., Hermiller et al., 2020). While the 'sessions' variable somewhat addresses the basic-science versus intervention-like approach, adding an explicit statement regarding this in the discussion might help the reader to navigate the broad scope of approaches that are utilized in the meta-analysis.

      Following revision: The authors have adequately addressed my concerns.

    3. Reviewer #2 (Public review):

      Parietal lobe TMS, targeted to the episodic memory network via connections with the structures in the medial temporal lobe, improves episodic memory. This is one of very few robustly reproduced cognitive findings in noninvasive brain stimulation. The comprehensive review and detailed meta-analysis by Goicoechea et al. makes a convincing case for efficacy in healthy people and will be important for neuroscientists and clinical researchers in memory and dementia.

      In 2014, Wang et al. showed that noninvasive stimulation of a parietal site, connected functionally to the hippocampus, increased resting state functional connectivity throughout a canonical network associated with episodic memory. It also caused a memory boost which was proportional to the connectivity increase within subjects. Their discovery that an imaging biomarker could (1) be used to target a functional network with critical nodes too deep to reach directly with TMS, (2) enable individualized, functionally confirmed, targeting, and (3) provide a scaling measure of target engagement, is one of the signal advances in noninvasive brain stimulation.

      The meta-analytical methodology used by these authors is rigorous, and the central finding, viz. that high-frequency, network-targeted stimulation reproducibly improves event recall, is amply supported. The question of whether to stimulate before or after memory encoding is also answered. While there is a hint that individualized anatomical or functional MRI-based targeting may be superior to atlas or group average-based techniques, the finding did not survive correction for multiple comparisons. Additional studies will be needed to resolve this issue, optimize the stimulation delivery parameters, and further define the behavioral effect.

      While the authors appropriately emphasize the associated network rather than the hippocampus itself, naming the target after a single node could suggest a primary role for the hippocampus in the observed outcomes, a conclusion not supported by the data reviewed here. Other nodes in the network are be equally important to aspects of episodic memory and could be useful targets for stimulation.

      Despite encouraging results from small clinical samples, the question of efficacy in patients with static lesions and ongoing neurodegeneration remains open. The information gathered here, including the absence of reported adverse events, should spur Phase 2 clinical trials in patients with disorders of memory.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Goicoechea et al. assesses the influence of hippocampal-network targeted TMS to parietal cortex on episodic memory using a meta-analytic approach. This is an important contribution to the literature, as the number of studies using this approach to modulate memory/hippocampal function has clearly increased since the initial publication by Wang et al. 2014. This manuscript makes an important contribution to the literature. In general, the analysis is straightforward and the conclusions are well-supported by the results.

      Strengths:

      (1) A meta-analysis across published work is used to evaluate the influence of hippocampal-network-targeted TMS in parietal cortex on episodic memory. By pooling results across studies, the meta-analytic effects demonstrate an influence of TMS on memory across the diversity of many details in study design (specific tasks, stimuli, TMS protocols, study populations).

      (2) Selectivity with regard to episodic memory vs. non-episodic memory tasks is evaluated directly in the meta-analysis.

      (3) The investigation into supplemental factors as predictors of TMS's influence on memory was tested. This is helpful given the diversity of study designs in the literature. This analysis helps to shed light on which study designs, e.g., TMS protocols, etc., are most effective in memory modulation.

      Weaknesses:

      The authors thoroughly addressed and responded to the prior comments in the revision. The only minor weakness I see is acknowledged in terms of how null effects for particular design or TMS features should be interpreted (i.e., with caution given the regression approach used).

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) While the manuscript is written for a scientific audience, the authors are likely aware that findings like this will be of broad appeal to the field of neurology, where treatments for memory loss are desperately needed. For this reason, the authors could consider including a statement regarding an interpretation of this meta-analysis from a clinical standpoint. Statements such as 'safe and effective' imply a clinical indication, and yet the manuscript does not engage with clinical trials terminology such as blinding, parallel arm versus crossover design, and trial phase. While the authors might prefer not to engage with this terminology, it can be confusing when studies delivering intervention-like five days of consecutive TMS (e.g., Wang et al., 2014) are clustered with studies that delivered online rhythmic TMS, which tests target engagement (e.g., Hermiller et al., 2020). While the 'sessions' variable somewhat addresses the basic-science versus intervention-like approach, adding an explicit statement regarding this in the discussion might help the reader navigate the broad scope of approaches that are utilized in the meta-analysis.

      We appreciate the suggestion to enhance interpretability of our report by broader audiences. First, to avoid confusion, we have eliminated “safe” and “effective” descriptors from the main summary of findings in the Abstract (pg. 1) and Discussion (pg. 6). Second, we now describe that reviewed studies included those categorized as traditional clinical trials, as well as non-clinical studies that generally follow clinical trial designs (i.e., multi-day intervention-like studies), in addition to more basic-oriented studies that are geared towards target engagement (Introduction, pg. 2). Third, we now clarify that the Design and Control factors (Figure 3) correspond to fairly standard distinctions in the clinical trials literature and were intended to capture major study design factors choices that are used in both clinical-trial and non-trial studies (Methods, pg. 9; Table S1). Finally, we now clarify that future clinical trials would be needed to evaluate HITS for any specific indication, and that our findings motivate such investigations but do not conclusively indicate efficacy for any given indication (Abstract, pg. 1; Discussion, pg. 7).

      Reviewer #1 (Recommendations for the authors):

      (1) The color scheme of Figure 1 was a bit confusing. All of the colors used for the flagged regions were incredibly similar. At first glance, it looks like the hippocampus was targeted directly due to the subtle color difference. Could the authors use colors that are more different? Similarly, zooming into the specific locations shows blue dots encompassed by teal. I am not sure what I am looking at here.

      We have updated the figure for clarity.

      (2) Given the broad appeal of the current study, I would encourage the authors to include a brief visual depiction of "HITS." This could help the more casual reader to understand the general approach.

      We have included this in Figure 1A.

      Reviewer #2 (Public review):

      (1) While the introduction centers on the role of the hippocampus in episodic memory and posits hippocampal neuromodulation by TMS as causative, the true mechanism may be more complex. Clean hippocampal lesions in primates cause focal loss of spatial and place memory, and I am aware of no specific evidence that the hippocampus does more than this in humans. Moreover, there is evidence that lateral parietal TMS also reaches neighboring temporal lobe regions, which contribute to episodic memory. The hippocampus may, therefore, be a reliable deep seed for connectivity-based targeting of the episodic memory network, but might not be the true or only functional target.

      We regret to have implied that we think the hippocampus is the true or only functional target. We agree with the reviewer that the hippocampus is “a reliable deep seed for connectivity-based targeting of the episodic memory network” and that the specific locus/loci of the HITS effects and mechanisms are not yet clear. We now emphasize that although hippocampus is used to define the targeted network, effects of TMS are likely distributed throughout the network, citing relevant studies that have shown that brain activity changes due to HITS are certainly not restricted to the hippocampus (Introduction, pg. 2).

      (2) The meta-analysis combines studies with confirmation of targeting and target-network engagement from fMRI and studies without independent evidence of having stimulated the putative target (e.g., Koch et al). That seems like a more important methodological distinction than merely the use of any individual targeting method. In my experience, atlas-based estimates are at least as accurate as eyeballing cortical areas in individuals. Hence, entering individual functional targeting as a factor might reveal an effect on efficacy.

      Our current definition of the “Targeting” factor appears to satisfy this concern. That is, we distinguish studies that used “individual functional targeting” (i.e., resting-state fMRI or DTI connectivity in each individual to select the target) from those that did not (i.e., atlas or other group-average approach). Notably, the Targeting factor modulation effect failed to survive correction for multiple comparisons. We think this satisfies the reviewer criticism, unless the reviewer is suggesting that we categorize studies based on whether they included evaluation of target engagement (e.g., tested for change in fMRI activity or connectivity of the network due to HITS) versus those that measured only behavioral outcomes. We did not include this distinction as a factor, as our analysis focuses on behavioral effects of HITS, and it is not clear what the neural effects would have been in studies in which they were not measured. Notably, we are providing the full raw dataset of effect sizes in a public repository with our final version of record, such that any other categorization schemes could be assessed by others.

      (3) The funnel plot and Egger's regression for episodic memory outcomes suggested possible bias, and the average sample size of 23 is small, contributing to the likelihood of false positive results. It would be informative, therefore, to know how many or which studies had formal power estimates and what the predicted effect sizes were.

      Regarding the average sample size of 23, we note that we used Hedges’ g for the effect size measure because it corrects for bias associated with small samples (pg. 10). Further, small sample sizes contribute to noisy estimates of true effects, allowing outliers to contribute to false positives and low power to contribute to false negatives, but without any reason to systematically yield bias towards false positives. Regarding potential publication bias, although we cannot rule this out based only on the statistics, we think that bias against publication of negative results is unlikely. First, HITS experiments are time consuming and expensive, and most in the field seem to be motivated to publish, whatever the outcome. Second, the notion of memory enhancement via brain stimulation is controversial, and groups have certainly been motivated, if not overly eager, to publish “failure to replicate” studies for HITS (e.g., the failure-to-replicate publication by Hendrikse et al. 2020, which was then re-analyzed by many of the original authors to arrive at different conclusions in Cash et al. 2022). Given these considerations, we think that it is very unlikely that publication bias had any major impact on our conclusions, but of course it cannot be conclusively excluded. Finally, we note that our finding of HITS selectivity for recollection enhancement is likely not affected by publication bias, as this selectivity versus other memory and non-memory outcomes was found only within published studies (i.e., it is very unlikely that publication bias would have led researchers to withhold publication of studies that found effects of HITS on recognition but not on recollection).

      (4) In the Discussion, the authors might provide a comparison between the effect size for memory improvement found here with those reported for other brain-targeted interventions and behavioral strategies. It may also be worthwhile pointing out that HITS/memory is one of the very few, or perhaps the only, neuromodulatory effects on cognition that has been extensively reproduced and survived rigorous meta-analysis.

      We now emphasize that this is, to our knowledge, the only neuromodulatory effect on cognition that is selective, has been extensively reproduced, and survived rigorous meta-analysis (Discussion, pg. 6). However, we wish to avoid the clinical overinterpretation of our findings that might result if we were to compare directly to effect size estimates for other current therapies, which have been evaluated for specific clinical indications. For example, antibody and pharmacological interventions for Alzheimer’s dementia typically have been associated with similar effect sizes to our estimate for HITS. However, those estimates derive from systematic review of randomized controlled trials measuring clinically relevant outcomes at relatively long delays, whereas the HITS studies we review include a mix of controlled and uncontrolled trials, vary in whether clinical outcomes were assessed, and mostly assessed outcomes at shorter delays. Thus, it could be misleading to directly compare the effect sizes. We instead continue to highlight that the HITS effects are promising and warrant rigorous testing for any given clinical indication.

      (5) The section of the Discussion on specificity compares HITS to transcranial electrical stimulation without specifying an anatomical target or intended outcome. A better contrast might be the enormous variety of cognitive and emotional effects claimed for TMS of the dorsolateral prefrontal cortex.

      We now also note that TMS of lateral frontal cortex has not been associated with similarly high specificity (Discussion, pg. 6). Note however that we cannot exclude anti-depressant or other psychological effects of HITS, as such outcomes were not consistently assessed in HITS studies and so were not included in our analyses.

      (6) With reference to why other nodes in the episodic memory network have not been tested, current flow modeling shows TMS of the medial prefrontal cortex is unlikely to be achievable without stronger stimulation of the convexity under the coil, in addition to being uncomfortable. The lateral temporal lobe has been stimulated without undue discomfort.

      We now additionally indicate that medial prefrontal stimulation may be ineffective given conventional TMS (Discussion, pg. 7). However, we are aware of no studies that have stimulated the portion of middle temporal gyrus that shows strong connectivity with hippocampus. We have tried this location, which positions the coil on or slightly above the ear and bordering on the temple area that is very sensitive to most. We were not able to minimize pain/discomfort for most subjects in pilot experiments, and so had to abandon it. Perhaps others have succeeded? If the reviewer has any specific references that could be included we would be happy to add them and update this section accordingly.

      (7) Finally, a critical question hanging over the clinical applicability of HITS and other neuromodulation techniques is how well they will work on a damaged substrate. Functional and/or anatomical imaging might answer this question and help screen for likely responders. The authors' opinion on this would be informative.

      We appreciate this point but don’t think there are enough data to assess the level of substrate damage needed to frustrate any stimulation benefits. The only thing we can say is that HITS was equally effective for mild to moderate Alzheimer’s dementia as it was for other non-neurodegenerative groups (nonsignificant effect of the Population factor, Figure 3B), suggesting that whatever degree of damage present in that group is insufficient to prevent the stimulation effects. We now highlight this point and raise the issue that, presumably, some level of damage would render HITS ineffective (Discussion, pg. 8).

      Reviewer #3 (Public review):

      (1) My only significant concern is how studies are categorized in the 'Timing' factor (when stimulation is applied). Currently, protocols in which TMS is administered across days are categorized as 'pre-encoding' in the Timing factor. This has the potential to be misleading and may lead to inaccurate conclusions. When TMS is administered across multiple days, followed by memory encoding and retrieval (often on a subsequent day), it is not possible to attribute the influence of TMS to a specific memory phase (i.e., encoding or retrieval) per se. Thus, labeling multi-day TMS studies as 'pre-encoding' may be misleading to readers, as it may imply that the influence of TMS is due to modulation of encoding mechanisms per se, which cannot be concluded. For example, multi-day TMS protocols could be labeled as 'pre-retrieval' and be similarly accurate. This approach also pools results from TMS protocols with temporal specificity (i.e., those applied immediately during encoding and not on board during memory testing) and without temporal specificity (i.e., the case of multi-day TMS) regarding TMS timing. Given the variety of paradigms employed in the literature, and to maximize the utility/accuracy of this analysis, one suggestion is to modify the categories within the Timing factor, e.g., using labels like 'Temporally-Specific' and 'Temporally Non-specific'. The 'Temporally-Specific' category could be subdivided based on the specific memory process affected: 'encoding', 'retrieval', or 'consolidation' (if possible). I think this would improve the accuracy of the approach and help to reach more meaningful conclusions, given the variety of protocols employed in the literature.

      We agree in principle with this criticism and think that the most straightforward way to address it is to relabel the “Pre-Encoding” category as “Pre-Task”. The issue with labeling/considering single-session stimulation delivered immediately before encoding as “Pre-encoding” is that this makes the assumption that this stimulation doesn’t also affect retrieval (i.e., is temporally specific). We do not have certainty about the timecourse of how a single session of stimulation affects brain activity. We think the “Pre-Task” label and interpretation is the best way to address this, to avoid suggesting that we are confident about the timecourse/selectivity of stimulation effects. Notably, the “Sessions” factor directly compares among designs that delivered stimulation in a single session versus in multiple consecutive sessions, and was a nonsignificant modulator. Thus, our analyses already compare studies that are relatively temporally specific versus those that, likely, are less so. In addition to relabeling, we have also added clear caveats to address the interpretive constraint imposed by the unknown timecourse of stimulation effects (Discussion, pg. 6-7) and revised the Abstract to reflect this change.

      (2) As the scope of the meta-analysis is limited to TMS applied to parietal or superior occipital cortex, it is important to highlight this in the Introduction/Abstract. The 'HITS' terminology suggests a general approach that would not necessarily be restricted to parietal/nearby cortical sites.

      This was previously highlighted only in the Methods and Discussion (with a Discussion paragraph dedicated to the issue of target selection; see also Comment 6 from Reviewer 2). We now also note this in the Introduction (pg. 2) and Abstract.

      Minor:

      (1) To reduce the number of study factors tested, data reduction was performed via Lasso regression to remove factors that were not unique predictors of the influence of TMS on memory. This approach is reasonable; however, one limitation is that factors strongly correlated with others (and predict less unique variance) will be dropped. This may result in a misrepresentation, i.e., if readers interpret factors left out of this analysis as not being strongly related to the influence of TMS on memory. I do see and appreciate the paragraph in the Discussion which appropriately addresses this issue. However, it may be worth also considering an alternative analysis approach, if the authors have not already done so, which explicitly captures the correlation structure in the data (i.e., shown in Figure S2) using a tool like PCA or an appropriate factor analysis. Then, this shared covariance amongst factors can be tested as predictors of the influence of TMS - e.g., by testing whether component scores for dominant PCs are indeed predictive of the influence of TMS. This complementary approach would capture rather than obfuscate the extent to which different factors are correlated and assess their joint (rather than independent) influence on memory, potentially resulting in more descriptive conclusions. For example, TMS intensity and protocol may jointly influence memory.

      We argue that feature selection via Lasso regression is a better approach for our research question than PCA, factor analysis, or other latent variable methods. The main reason is that PCA would sacrifice the interpretability of our findings with respect to the design of future experiments using or testing HITS. That is, because PCA creates composite components that are linear combinations of multiple variables, we would lose the ability to provide clear, actionable guidance to researchers about which specific study design choices (e.g., stimulation intensity, protocol type, timing) influence memory outcomes. Given that a major goal of our meta-analysis is to inform future experimental design, we believe that it is essential to maintain interpretability of the individual factors that must be decided when designing a study. Regarding factor analysis, this approach would require making a priori theoretical decisions about how to group individual moderators, which could introduce subjective bias into the analysis and would introduce other complications such as a need for validation of the resulting factor scores. We believe that the exploratory nature of our investigation, examining which among many possible study design factors substantially determine TMS efficacy, is better suited to a data-driven selection approach like Lasso. While the reviewer correctly notes that Lasso may drop factors that are correlated with stronger predictors, this feature can be considered advantageous in terms of identifying factors for inclusion in future study designs. That is, this can help identify the most parsimonious set of independent predictors, such that researchers can focus on the study design elements that matter most when controlling for other factors. Notably, we provide the table of factor relationships (Figure S2) so that interested readers can inspect how dropped factors were related to those that were retained.

      It is also important to note that we have provided the full dataset with our resubmission, which has been deposited in Dryad with a link in the Data Availability section (pg. 15). Thus, others are free to explore alternative analytical approaches should they wish to examine the data from different perspectives or to answer different questions.

      (2) Given the specific focus on TMS applied to parietal cortex to modulate hippocampal and related network function, it would be fruitful if the authors could consider adding discussion/speculation regarding whether this approach may be effectively broadened using other stimulation methods (e.g., tACS, tDCS), how it may compare to other non-invasive brain stimulation methods with depth penetration to target hippocampal function directly (transcranial temporal interference, or transcranial focused ultrasound), and/or how or whether other stimulation sites may or may not be effective.

      We briefly discuss a meta-analysis of tACS studies which reported nonspecific effects, including for parietal targets overlapping those used for HITS (Discussion, pg 6). We briefly speculate about how tES effects remain mechanistically uncertain. We are afraid that further speculation about other stimulation modalities and targets would be beyond the scope of this focused meta-analysis, given especially the few datapoints for newer approaches such as TI or tFUS.

      (3) Studies were only included in the meta-analysis if they contained objective episodic memory tests. How were studies handled that included both objective and subjective memory, or other non-episodic memory measures? For example, Yazar et al. 2014 showed no influence of TMS on objective recall, but an impairment in subjective confidence. I assume confidence was not included in the meta-analysis. Similarly, Webler et al. 2024 report results from both the mnemonic similarity task (presumably included) and a fear conditioning paradigm (presumably excluded). Please clarify in the methods how these distinctions were handled.

      Studies were included in our meta-analysis if they included at least one objectively scorable test of episodic memory. We only included objectively scorable test performance in our analysis, excluding scores from any other subjective measures if they were also reported. This is now clarified in Methods (pg. 9).

      (4) The analysis comparing memory to non-memory measures is important, showing the specificity of stimulation. Did the authors consider further categorizing the non-memory tasks into distinct domains (i.e., language, working memory, etc.)? If possible, this could provide a finer detail regarding the selectivity of influences on memory vs. other aspects of cognition. It is likely that other aspects of cognition dependent on hippocampal function may be modulated as well, i.e., tasks with high relational/associative processing demands.

      This is an interesting idea, but it is beyond our expertise to categorize these other tasks based on the nature of processing demands that they capture. Note that the task names are provided in the data table that we are making available online with our submission of record (via Dryad), such that other groups could address this question if interested.

      (5) In the analysis of the Intensity factor, how were studies using Active (rather than resting) MT categorized? Only resting MT is mentioned in Table S1. This is important as the original theta-burst TMS protocol from Huang et al. 2005 determines intensity based on Active Motor Threshold.

      MT was resting/passive in all reviewed studies except for one (Tambini et al. 2018), which used 80% of active MT. We categorized this as <100% MT for the Intensity factor, as it was <100% of MT as defined in that study. Although one could make the argument that 80% AMT might instead correspond to 100+% RMT, this change would have very little influence on our results or conclusions. We now clarify this in Table S1.

      (6) Is there a reason why the study by Koen et al. 2018 (Cognitive Neuroscience) was not included? TMS was performed during encoding to the left AG, and objective memory was assessed, so it would seemingly meet the inclusion criterion.

      The failure to include Koen et al. 2018 was our error. Koen et al. 2018 is the only study that used “online” stimulation, delivered during the trials when memoranda were displayed for encoding in the task. In contrast, all other reviewed studies delivered “offline” stimulation either before the memoranda was presented (“Pre-Task”) or after the encoding period but before retrieval (“Post-Encoding”). Therefore, categorization for the “Timing” factor would be problematic for its inclusion in the main analysis. We therefore now include Koen et al. 2018 in the “Supplementary Results” section as well as the corresponding main Results section on “Similar outcomes in studies that were excluded from meta-analysis”. We also note in the relevant discussion that “online” stimulation, as done in Koen et al. 2018, is typically considered disruptive (e.g., Beynel et al. 2019 Neuroscience & Biobehavioral Reviews; Yeh & Rose 2019 Frontiers in Psychology), which should be taken into account when considering the findings of Koen et al. 2018 relative to other reviewed studies that used “offline” designs.

      (7) It would be helpful to briefly differentiate the current meta-analysis from that performed by Yeh & Rose (How can transcranial magnetic stimulation be used to modulate episodic memory?: A systematic review and meta-analysis, 2019, Frontiers in Psychology) (other than being more current).

      Beyond being more current and therefore including many more studies in which stimulation targets were based on hippocampal connectivity (which tend to have been published more recently), the differences with Yeh & Rose 2019 are subtle. Our review focuses on assessment of network targeting and whether effects were specific to episodic memory versus other tasks, which differs somewhat from the focus of Yeh & Rose 2019. The main difference in conclusions likely derives from there being more network-focused memory TMS experiments now than were available for Yeh & Rose’s review. We also differentiate episodic memory into recollection versus other components to test specificity and analyze modulation by many study design factors relevant to HITS studies that were not emphasized in Yeh & Rose’s review. Note that we now cite Yeh & Rose for those interested in potential differences.

      (8) For transparency and to facilitate further understanding of the literature and potential data re-use, it would be great if the authors consider sharing a supplementary table or file that describes how individual studies/memory measures were categorized under the factors listed in Table S1.

      As promised in our original submission, we are providing the full data table, including how individual studies and memory measures were categorized, as an open dataset in Dryad. The Dryad dataset is cited in “Data availability” (pg. 15).

      Reviewer #3 (Recommendations for the authors):

      Please explicitly state in the Methods (Meta-analysis of effect modifiers section) that the criteria used for categorizing each measure into a factor (e.g., probing Recollection, Recognition, etc.) are fully described in Table S1; this will help readers to find these details (it took me a while!).

      This is now emphasized (pg. 10).

    1. eLife Assessment

      In this important study, the authors conducted atomistic molecular dynamics simulations to probe the interactions between IRE and unfolded peptides. The results help reconcile contradicting experimental findings in the literature and offer mechanistic insights into the activation of the unfolded protein response. The atomistic molecular dynamics simulations performed are solid, leading to convincing conclusions that are partly supported by experimental validations. The use of unbiased molecular dynamics simulations, while appropriate for the current system due to its complexity, limits the time scale of events that can be observed and therefore the proposed mechanism of recognition merits further confirmation by future studies.

    2. Reviewer #1 (Public review):

      Summary:

      This work provides structural and mechanistic insights into the disordered protein recognition process inside the endoplasmic reticulum by the inositol-requiring enzyme 1. Using state-of-the-art molecular dynamics simulation tools, the authors propose a mechanism of disordered protein recognition that reconciles contradictory findings of biochemical and structural biology experiments.

      Strengths:

      (1) All MD simulations have been carried out in triplicates, and several different folded conformations were generated using alphafold2. This provides adequate statistics to draw meaningful conclusions from the simulations.

      (2) Potential limitations of the disordered protein force fields and water models have been taken into consideration. Particularly, performing the simulation in both TIP3P and TIP4PD water models ensures that the conclusions drawn are not influenced by the force field choice.

      (3) The binding of a large number of disordered peptides was investigated, ensuring that the conclusions drawn about disordered peptide recognition are sufficiently general.

      Weaknesses:

      (1) The timescales of the peptide recognition and unbinding process are much longer than what can be sampled from unbiased simulations. Therefore, the proposed mechanism of recognition should only be considered a hypothesis based on the results presented here. For example, peptides that do not dissociate within one microsecond MD simulation are considered to be stable binders. However, they may not have a viable way to bind to the narrow protein cleft in the first place.

      (2) Oftentimes, representative structures sampled from MD simulation are used to draw conclusions (e.g., Figure 4 about the role of R161 mutation in binding affinity). This is not appropriate as one unbinding event being observed or not observed in a microsecond-long trajectory does not provide sufficient information about the binding strength of free energy difference.

      Comments on revisions:

      The authors have adequately addressed my comments. I have no further comments.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigated the interactions between IRE and unfolded peptides using all-atom molecular dynamics simulations. The interactions between a couple of unfolded peptides and IRE provide mechanistic insight on the activation of the UPR.

      Strengths:

      - Well-written manuscript accessible for a broad biological audience

      - State-of-art structural predictions and all-atom simulations

      - Validation with existing experimental data<br /> - Clear schematic diagram summarizing mechanisms learned from simulations

      - Error estimate included

      - Shared simulation data and code in public repository

      Weakness:

      No major concerns remain after revision.

      Comments on revisions:

      The authors have addressed all my questions from the previous assessment. I do not have more suggestions.

    4. Reviewer #3 (Public review):

      Summary:

      In this important work, the authors use extensive MD simulations to study how the IRE1 protein can detect unfolded peptides. Their study consolidates contradictory experimental results and offers a unique view of the different sensing models proposed in the literature. Overall, it is an excellent study that is quite extensive. The research is solid, meticulous, and carefully performed, leading to convincing conclusions.

      Strengths:

      The strength of this work is the extensive and meticulous molecular dynamics simulations. The authors use and investigate different structural models, for example carefully comparing a model based a PDB structure with reconstructed loops with a AlphaFold 2 Multimer model. The authors also investigate a wide range of different protein structural models that probe different aspects of the peptide-sensing process. Additionally, the authors experimentally validate a part of the simulation results. These solid and meticulous MD simulations allow the authors to obtain convincing conclusions concerning the peptide-sensing process of the IRE1 protein.

      Weaknesses:

      A potential weakness of the study is the use of equilibrium (unbiased) molecular dynamics simulations, which means only processes and conformational changes on the microsecond timescale can be probed. Furthermore, there can be inaccuracies and biases in the description of unfolded peptides and protein segments due to the protein force fields. Here, it should be noted that the authors do acknowledge these possible limitations of their study in the conclusions. Furthermore, in the revised version, the authors partly address this weakness by employing orthogonal simulation methods and experimental techniques.

      Comments on revisions:

      The authors have addressed all the issues that I raised in my previous report.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer 1 (Public review):

      (1) "The timescales of the peptide recognition and unbinding process are much longer than what can be sampled from unbiased simulations. Therefore, the proposed mechanism of recognition should only be considered a hypothesis based on the results presented here. For example, peptides that do not dissociate within one one-microsecond MD simulation are considered to be stable binders. However, they may not have a viable way to bind to the narrow protein cleft in the first place."

      We thank the Reviewer for this valuable feedback and we agree with the Reviewer. Our work on the IRE1 cLD activation mechanism is focused on generating a hypothesis of the binding mechanism driven by MD simulations. We recognize the limitations in defining a stable binder due to the time scales sampled. However, our primary focus was to sample and characterize a possible binding pose in the center of the cLD dimer. We contextualized our statements about stable binders and limited our claims to stating that the protein-peptide complex is stable within 1 µs-long simulations. However, we believe that our finding that the cLD dimer groove is not able to accommodate peptides is solid, as the steric impediment described is present in all our replicas, both with and without peptides, in a cumulative sampling time of 24 µs without peptides and 66 µs with peptides. Additionally, we included a plot showing the distribution of groove width across all replicas.

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1α cLD dimer surface) The title was changed from “Unfolded polypeptides can stably bind to hIRE1α cLD dimer” to “Unfolded polypeptides bind to hIRE1α cLD dimer surface”

      Addition to the text. (Figure 15 A legend) “(A) Distributions of the groove width of peptide-bound cLD dimers throughout all simulations performed. The left column shows the values for the three replicas in TIP3P water, while the right column displays those for the three replicas in TIP4P-D water.”

      (2) Oftentimes, representative structures sampled from MD simulation are used to draw conclusions (e.g., Figure 4 about the role of R161 mutation in binding affinity). This is not appropriate as one unbinding event being observed or not observed in a microsecond-long trajectory does not provide sufficient information about the binding strength of the free energy difference.

      We thank the Reviewer for the insightful comment. As explained in the previous point, we believe that our simulations provide useful hypotheses. We are aware of the limitations due to the timescale and agree that these limitations cannot be overcome with standard equilibrium simulations. To address these limitations, used orthogonal methods, specifically MM/PB(GB)SA calculations, to calculate binding free energies from existing trajectories. We added predictions of all the peptides using AlphaFold 3, to confirm the binding region. Importantly, we now provide experimental results to assess the binding affinity of cLD dimer mutants E102R and Y161R.

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1α cLD dimer surface) “AlphaFold3 predictions of the complexes indicate that the peptides adopt the same preferred orientation, despite being predominantly helical (Supplementary Fig. 16A). We further assessed the MPZ-derived peptide complexes using MM/PBSA free energy calculations over the final 250 ns of each simulation replica (see Methods), finding binding enthalpies consistent with our observations (Supplementary Fig. 16B). In particular, MPZ1N-2X exhibited the lowest binding energy, whereas MPZ1N-2X-RD showed the highest.”

      Addition to the text. (Figure 16 legend) “(A) Prediction of AlphaFold 3 for hIRE1α cLD dimer in complex with peptides. Colors represent the confidence of the prediction (plDDT). (B) Difference in enthalpy (enthalpy of binding, ∆H) as an estimate of the binding free energies of unfolded polypeptides to hIRE1α cLD dimer derived from MM/PBSA calculations of our peptide simulations.”

      Addition to the text. (Figure 4 G legend) “(G) Fluorescence anisotropy measurements of labeled MPZ1N-2X binding to hIRE1α LD wild type and mutants E102R and Y161R.”

      Addition to the text. (Results section: Point mutations destabilize unfolded peptide binding to cLD) “To experimentally test whether these residues are involved in hIRE1α LD’s interaction with peptides, we expressed and purified these mutants and conducted fluorescence anisotropy experiments using fluorescently labeled MPZ1N-2X peptide. We could purify both E102R and Y161R mutants to high purity (Supplementary Fig. 18C). They both behaved similarly to the wild type during purification. Notably, both E102R and Y161R mutants demonstrated around two-fold lower binding affinity (Fig. 4G, E102 K<sub>1/2</sub>= 6.35 µM and Y161R K<sub>1/2</sub>= 5.4 µM, Supplementary Table 3) compared to the wildtype (K<sub>1/2</sub>= 2.14 µM, Supplementary Table 3), revealing that the protein’s central area is crucial for binding unfolded proteins and that binding activity occurs within the pocket defined by E102 and Y161.”

      Addition to the text. (Figure 4G legend) “(G) Fluorescence anisotropy measurements of labeled MPZ1N-2X binding to hIRE1α LD wild type and mutants E102R and Y161R.”

      Addition to the text. (Supplementary Table 3)

      Reviewer 2 (Public review):

      (1) Improving presentation to include more computational details.

      We thank the Reviewer for raising this critical point. We agree that the manuscript is tailored for a biology audience, as the data are particularly relevant for that community. Nevertheless, we also understand the importance of providing sufficient methodological detail for computational readers. We added more references to the methods for computational information in the main text.

      (2) More quantitative analysis in addition to visual structures.

      We added an uncertainty estimate for the HDX calculations using bootstrapping and included additional information on bond distances for E102 and Y161. We also incorporated time-series data showing the distance of the peptide from the groove across all replicas.

      Addition to the text. (Figure 1C legend) “(C) The deuterated fraction obtained from experimental results (dashed line, shaded area indicates the error we calculated from bootstrapping) published by Amin-Wetzel et al. and the fraction computed from MD simulations (solid lines, blue for TIP3P water and orange for TIP4PD water) for the PDB and AF model at incubation time point 0.5 min. This time point corresponds to experimental incubation times, not MD simulation time. Each point represents the mean value derived from three replicas and two monomers per replica. The error bars were obtained from bootstrapping. Below each absolute value plot, we report the discrepancy, which is defined as the difference between the simulated and experimental deuterated fractions, with the shaded area indicating the corresponding error.”

      Addition to the text. (Figure 15B legend) “(B) Minimum groove-peptide distance over time for all simulations of cLD dimer in complex with a peptide. The left column shows the values for the three replicas in TIP3P water, while the right column displays those for the three replicas in TIP4P-D water.”

      Reviewer 3 (Public review):

      A potential weakness of the study is the usage of equilibrium (unbiased) molecular dynamics simulations, so that processes and conformational changes on the microsecond time scale can be probed. Furthermore, there can be inaccuracies and biases in the description of unfolded peptides and protein segments due to the protein force fields. Here, it should be noted that the authors do acknowledge these possible limitations of their study in the conclusions.

      We appreciate the Reviewer’s thoughtful comment. As noted in our response to Reviewer 1, we addressed the concern about sampling by applying orthogonal methods and experimental techniques. We agree with the Reviewer that some form of enhanced sampling is necessary if we want to assess binding in a more quantitative way, e.g., via free energy calculations. However, we also realize that applying any enhanced sampling scheme to our system is very challenging, given its large size and the complex peptide-protein interactions, which are not easily captured in a few collective variables. After a careful assessment and some preliminary tests, we decided that estimating free energies using enhanced sampling would necessitate a separate paper due to both the conceptual complexity of the project and the size of the necessary sampling campaign.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Some enhanced sampling or path sampling simulations may be carried out to identify the peptides’ binding and unbinding mechanisms to the protein. This can show whether the disordered peptides studied in this work do indeed bind to the protein.

      We thank the Reviewer for this constructive criticism. We acknowledge the limitations associated with investigating binding and unbinding mechanisms of disordered peptides within the time scales accessible to our equilibrium simulations. However, the primary objective of our study was to sample and characterize a plausible binding pose at the center of the cLD dimer. We wanted to understand if unfolded model peptides require an open groove able to contain them to bind to IRE1’s core luminal domain or if binding also in the absence of an open groove.

      Enhanced sampling is, of course, an important strategy to overcome the limits of equilibrium simulations. However, we note that implementing enhanced sampling approaches in this system poses significant challenges due to its large size and the complexity of peptide–protein interactions, which cannot be easily captured using a limited set of collective variables. We decided that a thorough application of enhanced sampling would therefore constitute a separate study. Instead, we decided to validate our simulations in two ways: 1) we ran a new set of free energy calculations, and 2) we tested key predictions in experiments, adding significant new data to strengthen the conclusions of our manuscript.

      To evaluate whether the binding free energies of MPZ-derived peptides to human IRE1α cLD dimers are consistent with experimentally reported binding constants, we employed the MM/PBSA (Molecular Mechanics/Poisson–Boltzmann Surface Area) method. Calculations were performed over the final 250 ns of each simulation replica using the Single Trajectory Protocol (STP), which avoids the need for additional simulations. This approach provides an estimate of the effective binding free energy (i.e., enthalpy of binding) by accounting for bonded and non-bonded interactions, as well as solvation contributions. The entropic contribution, being computationally more demanding and subject to additional approximations, was not included. Binding enthalpies were obtained for MPZ1-N (in different initial orientations), MPZ1-C, MPZ1-N-2X, and MPZ1-N-2X-RD. The results indicated small differences in effective binding energies between the shorter peptides (MPZ1-N and MPZ1-C), whereas MPZ1-N-2X exhibited the lowest binding energy and MPZ1-N-2X-RD the highest, consistent with experimental trends. These findings support the reliability of our model and sampling strategy as a framework for analyzing peptide binding conformations to cLD.

      We identified residues E102 and Y161 as key contributors to the binding of unfolded peptides in our simulations. Contact analysis revealed these residues as binding hotspots, centrally located within the observed interaction regions. To probe their relevance, we conducted simulations of cLD dimers with single arginine mutations in these residues, aimed at disrupting these hotspots through charge repulsion. These simulations revealed increased instability of the MPZ1N2X on the cLD dimer surface. We further validated these findings experimentally using fluorescence anisotropy assays. Fluorescently labeled MPZ1N-2X was titrated with purified cLD mutants (E102R and Y161R), and anisotropy measurements were fitted to derive  K<sub>1/2</sub> values. Both mutations resulted in approximately a two-fold reduction in binding affinity relative to the wild-type cLD, confirming the importance of these residues in stabilizing peptide binding.

      Addition to the text. (Results section title: Unfolded polypeptides bind to hIRE1α cLD dimer surface) “We further assessed the MPZ-derived peptide complexes using MM/PBSA free energy calculations over the final 250 ns of each simulation replica (see Methods), finding binding enthalpies consistent with our observations (Supplementary Fig. 16B). In particular, MPZ1N-2X exhibited the lowest binding energy, whereas MPZ1N-2X-RD showed the highest.”

      Addition to the text. (Results section title: Unfolded polypeptides bind to hIRE1α cLD dimer surface) “Thus, we investigated how the point mutations of two key residues, E102R and Y161R, would affect peptide binding by simulating the cLD mutant in complex with MPZ1N-2X (Fig. 4C-E). We initialized the systems in the pose described for the other peptide-cLD systems described earlier (Fig. 3B, t = 0 µs). In simulations of the wild-type (WT) cLD dimer, the peptide generally remained near the center (Fig. 4C,F). By contrast, MPZ1N-2X displayed reduced binding to E102R, fully dissociating in one TIP4P-D replica (Fig. 4E,F). A similar trend was observed for Y161R, where one partial dissociation event occurred (Fig. 4D,F). Comparative analysis of MPZ1N-2X contact sites on the WT and mutant cLD dimers (Supplementary Fig. 17B-D) revealed that, in the presence of mutations, the peptide engages a broader surface region rather than remaining centrally localized, while forming fewer contacts with the specific residues (Supplementary Fig. 18A-B).”

      Addition to the text. (Results section title: Unfolded polypeptides bind to hIRE1α cLD dimer surface) “To experimentally test whether these residues are involved in hIRE1α LD’s interaction with peptides, we expressed and purified these mutants and conducted fluorescence anisotropy experiments using fluorescently labeled MPZ1N-2X peptide. We could purify both E102R and Y161R mutants to high purity (Supplementary Fig. 18C). They both behaved similarly to the wild type during purification. Notably, both E102R and Y161R mutants demonstrated around two-fold lower binding affinity (Fig. 4G, E102  K<sub>1/2</sub>= 6.35 µM and Y161R  K<sub>1/2</sub>= 5.4 µM, Supplementary Table 1) compared to the wildtype (K<sub>1/2</sub>= 2.14 µM, Supplementary Table 1), revealing that the protein’s central area is crucial for binding unfolded proteins and that binding activity occurs within the pocket defined by E102 and Y161.”

      Addition to the text. (Figure 4 legend) “(E) Side view snapshot after 1 µs of simulation of E102R hIRE1α cLD dimer (gray) in complex with MPZ1N-2X (orange). The amino acid R102 on both monomers is represented in magenta sticks. (F) Time series of the minimum groove-peptide distance for MPZ1N-2X simulated in complex with wild-type, E102R, and Y161R hIRE1α cLD dimer in TIP3P (3 replicas) and TIP4P-D (3 replicas) water. The darker lines show the rolling average over 25 frames, while the shaded lines represent the raw data. (G) Fluorescence anisotropy measurements of labeled MPZ1N-2X binding to hIRE1α LD wild type and mutants E102R and Y161R.”

      Addition to the text. (Methods section: Binding free energy calculations (MM/PBSA)) “The binding free energy of noncovalently bound complexes of human IRE1 cLD and peptides was calculated with MM/PBSA (Molecular mechanics/PoissonBoltzmann Surface Area) method via gmx_MMPBSA (version 1.6.4)[1, 2]. The Poisson-Boltzmann method was used to estimate the electrostatic contribution to solvation free energy as recommended for data obtained with the CHARMM force field. The contribution of the entropic term was omitted, obtaining effective binding free energy values, or enthalpy of binding (∆H). We used the Single Trajectory Protocol (STP), using the cLD-peptide simulations as input. The calculations were performed on the last 250 ns of each replica. Single-term total non-polar solvation free energy (inp = 1) was used. The charmm_radii (PBRadii= 7) was used to build amber topology files [3]. The default parameters were applied for other terms.”

      Addition to the text. (Methods section: Protein purification) “To express hIRE1α LD (24-443) human cDNA sequences were cloned into pET47b(+) to create a coding sequence with N-terminal His6-tag. Mutations of hIRE1α LD were introduced by overlap extension PCR and restriction cloning into pET47b(+). For expression of the proteins, the plasmid of interest was transformed into Escherichia coli strain BL21DE3* RIPL (Agilent Technologies). Cells were grown in Luria Broth until OD600=0.6-0.8. Protein expression was induced with 0.6 mM IPTG, and cells were grown in 20°C overnight. For purification, cells after harvesting were resuspended in Lysis Buffer (50 mM HEPES pH 7.2, 400 mM NaCl, 20 mM imidazole, 5% glycerol, 5 mM β-mercaptoethanol) and were lysed in Constans Systems cell disruptor at 25 000 psi. The supernatant was collected after centrifugation for 45 minutes at 48000×g in 4°C. Supernatant was loaded onto Ni-NTA column (Cytiva) and the protein eluted with a linear gradient of imidazole from 20 to 500 mM. Fractions containing the protein were diluted 1:8 with anion exchange wash buffer (50 mM HEPES pH 7.2, 5 mM β-mercaptoethanol), loaded onto HiTRAP-Q ion exchange column (Cytiva) and eluted with a linear gradient from 50 mM to 1 M NaCl. Afterwards, the His6tag was removed by cleavage with Precission protease (GE Healthcare, 1 µg of enzyme per 100 µg of protein). The cleavage was performed overnight in 4°C. The protein sample after cleavage was loaded onto a Ni-NTA column, and the flow-through containing protein without the tag was collected. The protein was further purified on a Superdex 200 10/300 gel filtration column equilibrated with Buffer A (25 mM HEPES pH 7.2, 150 mM NaCl, 2 mM DTT). Protein concentrations were determined using extinction coefficient at 280 nm predicted by the Expasy ProtParam tool (http://web.expasy.org/protparam/).”

      Addition to the text. (Methods section: Fluorescence anisotropy) “For fluorescence anisotropy measurements, the MPZ1-N-2X peptide attached to 5 carboxyfluorescein (5-FAM) at its N-terminus was obtained from GenScript at >95% purity. Binding affinities of hIRE1α LD mutants to FAM-labeled peptides were determined by measuring the change in fluorescence anisotropy on a Tecan CM Spark Micro Plate Reader with excitation at 485 nm and emission at 525 nm with increasing concentrations of hIRE1α LD variants. Measurements were performed in Buffer A supplemented with Tween 20 (25 mM HEPES pH 7.2, 150 mM NaCl, 2 mM DTT, 0.025% Tween 20). Fluorescently labeled peptides were used in a concentration of 90 nM. The reaction volume of each data point was 25 µL and the measurements were performed in 384-well, black flat-bottomed plates (Corning) after incubation of peptide with hIRE1α LD variants for 30 min at 25◦C. Binding curves were fitted using Prism Software (GraphPad) using the following equation: F<sub>bound</sub> = r<sub>free</sub> +( r<sub>max</sub>r<sub>free</sub>)/(1+10((Log K<sub>1/2</sub> −x)·n<sub>H</sub>)), where F<sub>bound</sub> is the fraction of peptide bound, r<sub>max</sub> and r<sub>free</sub> are the anisotropy values at maximum and minimum plateaus, respectively. n<sub>H</sub> is the Hill coefficient and x is the concentration of the protein in log scale. Curve-fitting was performed with minimal constraints to obtain K<sub>1/2</sub> values with high R<sup>2</sup> values. However, as this equation does not consider the equilibria between hIRE1α LD dimers/oligomers, these apparent K<sub>1/2</sub> values do not reflect the dissociation constant.”

      (2) Wherever possible, conclusions related to binding affinity should not be drawn from single unbinding events. For example, the title of Figure 4, "Single point mutation of cLD alters the binding affinity of unfolded peptide," should be softened. Similar changes should be made throughout the manuscript where such claims have been presented.

      We thank the Reviewer for highlighting this important point. In the revised manuscript, we have adjusted the text to remove or soften conclusions related to binding affinity that were based on single unbinding events in the MD simulations.

      Addition to the text. (Figure 4 title) “Single point mutations of cLD alter the binding of unfolded peptide MPZ1N-2X.”

      Addition to the text. (Results section title: Unfolded polypeptides can stably bind to hIRE1α cLD dimer) “Unfolded polypeptides bind to hIRE1α cLD dimer surface.”

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1αα cLD dimer surface) “Our goal was to elucidate a potential binding pose and identify the relevant features of unfolded proteins and the cLD that affect the binding.”

      Reviewer #2 (Recommendations for the authors):

      (1) A table of all simulated trajectories, including simulation conditions, number of replicas, box size, number of atoms, equilibration length, recording time step, number of frames for further analysis.

      We thank the Reviewer for this helpful suggestion. We have added a summary table of all simulations, including the requested details, to the Supplementary Information (Table 1).

      Addition to the text. (Supplementary figures and tables: Table 2)

      (2) The current NVT equilibration time was 0.125ns, and then no productive NPT simulations were mentioned as equilibration. Even though this is a simulation of mostly folded structures, it still takes some time for these amino acids to relax within the force field.

      We thank the Reviewer for this constructive comment and acknowledge the validity of the concern. However, our simulations were extensively sampled, and equilibration was achieved within the first 50 ns of the production runs. Therefore, the segments of the trajectories from which we draw conclusions correspond to equilibrated states (see RMSD analysis, Figure 1). Additionally, binding free energy calculations (MM/PBSA) were carried out on the last 250 ns of the simulation replicas.

      (3) At least three histograms were presented in Figure 2C, which I guess is from multiple simulations, and does not seem to be discussed.

      We thank the Reviewer for pointing out the lack of reference to Figure 2C. We added the correct reference to the text where the groove width of luminal domains of human and yeast is discussed.

      Author response image 1.

      RMSD analysis of human IRE1_α_ cLD dimer simulated in complex with unfolded peptides.

      Addition to the text. (Results section: The putative groove of human IREα cLD is dynamic but unable to contain peptides ) In simulations of the dimeric structures, the average groove width was 7.3 ± 0.1 Å for the human cLD and 8.9 ± 0.1 Å for the yeast cLD, averaged over three TIP3P and three TIP4P-D replicas per system (Fig. 2C).

      (4) The comment regarding the CHARMM force field on Page 6 is not justified. Actually the force field the authors used (CHARMM36m, Jing et al Nat Methods 2016) did include scaling of TIP3P LJ parameters to correctly capture the dimensions of the intrinsically disordered proteins (IDPs). However, the authors cited a couple of examples of literature of previous versions of CHARMM force fields and commented that it cannot capture IDP dimensions with TIP3P.

      We thank the Reviewer for pointing out this source of confusion. We cited the main papers of CHARMM as [4, 5], which were misleading, and following the Reviewer’s advice, we removed these citations.

      Addition to the text. (Results section: The hIRE1α cLD forms a stable dimer) “Current all-atom force fields used in MD simulations are mainly designed to reproduce the dynamics of folded and globular proteins [6].”

      (5) I am fine that the authors used TIP4PD with CHARMM36m, but caution should be taken for such a combination of protein and water force fields. Note that when optimizing force fields for IDPs, one often has to balance protein-water interactions by either enhancing protein-water interactions, enhancing water dispersions, or reducing protein-protein interactions. So, all such optimization is dependent on both protein and water force fields. TIP4PD was designed to pair with Amber99sb-ildn or, most recently, Amber99sb-disp instead of CHARMM36m. This could result in rescaling of LJ parameters.

      We thank the Reviewer for raising this issue. We argue that the TIP4P-D water model has been used in combination with the CHARMM36m force field [7] and has been shown to yield satisfactory results for disordered regions.

      Addition to the text. (Results section: The hIRE1α cLD forms a stable dimer) “The TIP4P-D water model was developed to address limitations of existing force fields in reproducing the structural ensembles of intrinsically disordered proteins and regions. It incorporates enhanced dispersion and moderately stronger electrostatic interactions to improve the balance between water dispersion and electrostatics [8]. Zapletal et al. [7] showed that for proteins containing both folded and disordered regions, the CHARMM36m force field [9] in combination with the TIP4P-D water model provides a robust framework, preventing collapse of disordered regions while preserving folded regions. Acknowledging that the behavior of disordered regions can be case-specific, we conducted molecular dynamics simulations of the two cLD dimer models using the CHARMM36m force field with both TIP3P and TIP4P-D water models.”

      (6) I suggest referring to the methodology part for simulation details as much as possible when presenting the story.

      We thank the Reviewer for this suggestion. In the revised manuscript, we now refer the reader to the Methodology section for detailed descriptions of the HDX-MS data analysis and the MM/PBSA free energy calculations.

      Addition to the text. (Results section: Hydrogen-deuterium exchange experimental data validate the cLD dimer structure) “From our simulations, we calculated the theoretical deuterated fraction using the method by Bradshaw et al.[10] and compared it to the experimental data (Fig. 1C-D and Supplementary Fig. 10) (see Methods).”

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1α cLD dimer surface) “We further assessed the MPZ-derived peptide complexes using MM/PBSA free energy calculations over the final 250 ns of each simulation replica (see Methods), finding binding enthalpies consistent with our observations (Supplementary Fig. 16B). In particular, MPZ1N-2X exhibited the lowest binding energy, whereas MPZ1N-2X-RD showed the highest.”

      (7) Error bars and methodology of error analysis should be provided for all cases of all-atom simulations if possible, since convergence is always an issue when considering these conformational changes within microseconds of all-atom simulations.

      We thank the Reviewer for the important observation. We agree and added error methodology for the estimation of theoretical deuterated fractions (Fig. 1C).

      Addition to the text. (Figure C legend) “Each point represents the mean value derived from three replicas and two monomers per replica. The error bars were obtained from bootstrapping.”

      Addition to the text. (Methods section: Hydrogen-deuterium exchange fractions calculation from MD simulations) “To reproduce the time points after incubation in deuterium (D<sub>2</sub>O), we computed deuterated fractions separately for each of the two monomers constituting a dimer for the time points 0.5 min (30 s) and 5 min (300 s). Then, we computed the mean and standard deviation over the data coming from replicas of the same cLD dimer model (AF or PDB model) and the same water model (TIP3P or TIP4P-D). To estimate the uncertainty of the mean values obtained from our datasets and the dataset from Amin-Wetzel et al. ([11] Figure 3—source data 1), we applied a non-parametric bootstrap resampling procedure. For each sequence range from HDX-MS analysis, we treated the measurements from the N=6 independent datasets as independent samples, accounting for 3 replicas each with two monomers (6 monomers total). We then generated 10,000 bootstrap replicates by sampling the datasets with replacement, maintaining the same number of samples N in each resample. For each replicate, we calculated the mean at each sequence position. The resulting distribution of bootstrap means was used to compute the standard deviation as an estimate of the standard error. We computed the difference between simulation and experimental data (deuterated fraction discrepancy), and for each residue, we selected as the ‘best structure’ the model with the discrepancy closest to zero among PDB-TIP3P, PDB-TIP4P-D, AF-TIP3P, and AF-TIP4P-D systems.”

      (8) Technically I would call DR1 and DR2 linker regions within a folded structure. Their motions are quite restrained by the fold part. I therefore, am not sure how much TIP4PD really helps in contrast to a scaled TIP3P. A plot of structures colored with PLDDT score or b-factor within the PDB should be provided. Quantitative metrics of these regions (e.g. chi chi-squared) might help justify the choice of the AF model against the PDB model. Currently, the two models look very similar in Figures 1c and 1d. Similarly, quantitative metrics as a function of different simulation time windows will help justify the convergence of the simulation and indicate the flexibility of these regions.

      We thank the Reviewer for this thoughtful comment. In response, we analyzed the AlphaFold2 and AlphaFold3 predictions, which consistently assign very low pLDDT values (<50) to the DR2 region, while DR1, is predicted with higher but still low confidence (50 < pLDDT < 70). These scores indicate intrinsic uncertainty in the structural definition of both regions, supporting their flexibility despite being located within a folded context.

      Addition to the text. (Results section: The hIRE1_α_ cLD forms a stable dimer) “All five AlphaFold 2 predictions closely resembled the top-ranked model used for our simulations (Supplementary Fig. 7C). In contrast, the five AlphaFold 3 predictions yielded greater variability in DR2 organization and longer helices in DR2, but still consistently maintain low pLDDT scores in this region, indicating disorder (Supplementary Fig. 7D).”

      Addition to the text. (Figure 7 C-D legend) “(C) Superposition of the 5 structures predicted by AlphaFold 2 Multimer for the cLD dimer and colored by confidence prediction score (pLDDT). (D) Superposition of the 5 structures predicted by AlphaFold 3 for the cLD dimer and colored by confidence prediction score (pLDDT).”

      (9) Fluorescence anisotropy seems to be an important set of experimental data to justify the binding of multiple unfolded peptides to IRE. I suggest the authors include a bar plot of binding affinity of different variants in Figure 3. The raw titration curves should also be included in SI.

      We thank the Reviewer for this valuable suggestion. The binding affinities reported in previous studies are summarized in Table 2; the reader is referred to those works for the corresponding raw titration curves. The binding affinities for the cLD mutants analyzed in the present study are provided in Table 3, and the associated titration curves are shown in Figure 4G.

      Addition to the text. (Figure 4G legend) “Fluorescence anisotropy measurements of labeled MPZ1N-2X binding to hIRE1α LD wild type and mutants E102R and Y161R.”

      Addition to the text. (Supplementary figures and tables: Table 3) See Tab. 1

      (10) The authors should discuss the dependence of initial orientations of unfolded peptides on the final results. The authors claimed that after 1 microsecond simulations, the orientation of these peptides to IRE changed. Quantitative metrics showing both the binding (e.g., number of contacts) and binding orientation (contact region or angles) should be provided to tell whether the simulation is converged. The comparison to the experimental data lacks quantitative metrics. The authors mentioned the dissociation of MPZ1N-2X-RD in half of the simulations; they might want to provide such a metric for all peptides. Technically, 1 microsecond brute-force simulation is quite short for observing such a binding event, and enhanced sampling methods (e.g. metadynamics) might be necessary for investigating binding. However, at least the presentation and interpretation of the current results should be improved for comparing simulations and experiments.

      We thank the Reviewer for the insight. We expanded the discussion of the peptide orientation and added an analysis of the peptide angle with respect to the cLD central groove and contacts. Additionally, we inserted AlphaFold 3 predictions of all the simulated complexes.

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1_α_ cLD dimer surface) “In initial simulations with peptides valine8 and MPZ1-N, we positioned the polypeptides over the cLD, aligning them parallel to the principal axis of the central groove in accordance with the proposed binding mode. We refer to this pose as the "0◦ orientation", as the peptide forms a 0 ◦ angle with the principal axis of the groove. We observed that the peptides could rearrange into an orientation perpendicular to the central groove axis, while maintaining contact with the dimer (Fig. 3A, Supplementary Fig. 13A, valine8 TIP4P-D, and Supplementary Fig. 14). Conversely, when MPZ1-N was initially oriented perpendicularly to the groove, it did not transition to a parallel (0◦) orientation (Supplementary Fig. 14). We refer to these poses as the "90◦ orientation" and "270◦ orientation".”

      Addition to the text. (Supplementary Figures and Tables Fig. 14) “(A) Peptide orientation with respect to the central groove principal axis. The angle was computed as the dihedral angle described by the Cα atoms of Y161 residues (groove principal axis) and the C_α_ atoms of residues L1 and A12 of the MPZ1N peptide. The dark lines indicate the rolling average of the fraction of native contacts over 10 frames, while the shaded lines indicate the value per frame. (B) Number of contacts between hIRE1α cLD dimer and MPZ1N peptide. The dark lines indicate the rolling average of the fraction of native contacts over 50 frames, while the shaded lines indicate the value per frame. The analysis were performed on three sets of simulations: "90 degrees" orientation, the peptide is initially placed perpendicular to the central groove principal axis; "270 degrees" orientation, the peptide is initially placed perpendicular to the central groove principal axis but flipped 180 degrees with respect to the 0 degree; "0 degrees" orientation, the peptide is placed parallel to the groove principal axis.”

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1α cLD dimer surface) “AlphaFold3 predictions of the complexes indicate that the peptides adopt the same preferred orientation, despite being predominantly helical (Supplementary Fig. ??A).”

      Addition to the text. (Supplementary Figures and Tables Fig. 16A) “(A) Prediction of AlphaFold 3 for hIRE1α cLD dimer in complex with peptides. Colors represent the confidence of the prediction (plDDT).”

      (11) I also have a couple of questions regarding the point mutant Y161R. a) The motivation of mutating Y161 to R is more speculative (Figures 4a,b) than quantitative. The authors might want to show an intermolecular contact map between IRE and unfolded peptides or IRE contact probability along residue indexes to show the interaction hotspots. Figure S11 only showed the structure instead of any metrics for such a purpose. b) It might be better to also show a histogram of the distances of Figure 4e and 4f. Figure 4f actually suggested 1 microsecond simulation is quite short to observe the dissociation event. c) Testing the mutation within the experiment, if possible, would clearly strengthen this part of the manuscript.

      We thank the Reviewer for these constructive suggestions. We have added an analysis of intermolecular contacts for the Y161R and E102R mutants (Fig. 18A–B), which highlights the interaction hotspots between IRE1 residues and the unfolded peptides. To further characterize peptide–groove interactions, we now provide minimum peptide–groove distance time series for all peptides (Fig. 15B). Moreover, to experimentally support our simulations, we performed fluorescence anisotropy measurements on the MPZ1N-2X peptide with cLD WT and mutant constructs. These experiments confirm our computational observations (Fig. 4F–G and Fig. 18C).

      Addition to the text. (Figure 18 legend) “(A) Number of contacts between residues 102 on both monomers and the MPZ1-N-2X peptide during simulations of WT hIREα LD and mutants E10R and Y161R. The dark lines indicate the rolling average of the fraction of native contacts over 25 frames, while the shaded lines indicate the value per frame. (B) Number of contacts between residues 161 on both monomers and the MPZ1-N-2X peptide during simulations of WT hIREα LD and mutants E10R and Y161R. The dark lines indicate the rolling average of the fraction of native contacts over 25 frames, while the shaded lines indicate the value per frame. (C) Protein purification of WT hIREα LD and mutants E10R and Y161R.”

      Addition to the text. (Figure 4F-G legend) “(F) Time series of the minimum groove-peptide distance for MPZ1N-2X simulated in complex with wild-type, E102R, and Y161R hIRE1α cLD dimer in TIP3P (3 replicas) and TIP4P-D (3 replicas) water. The darker lines show the rolling average over 25 frames, while the shaded lines represent the raw data. (G) Fluorescence anisotropy measurements of labeled MPZ1N-2X binding to hIRE1α LD wild type and mutants E102R and Y161R.”

      Addition to the text. (Figure 15B legend) “(B) Minimum groove-peptide distance over time for all simulations of cLD dimer in complex with a peptide. The left column shows the values for the three replicas in TIP3P water, while the right column displays those for the three replicas in TIP4P-D water.”

      (12) Similar comments of quantitative analysis (e.g. contact map as a function of simulation time) apply to the last part of results when discussing the intermolecular interactions. Observations such as "the interface predicted by AlphaFold showed stability across MD simulation replicas lasting 200 ns" were provided, but there is no quantitative analysis. How consistent was this observation across multiple replicas of simulations, and how many replicas were used?

      We thank the Reviewer for this valuable suggestion. To provide a quantitative assessment, we performed new triplicate simulations of the BiP–cLD monomer complex and plotted the fraction of native contacts over time. These results, which demonstrate the consistency of the interface across replicas, are now included in the Supplementary Material.

      Addition to the text. (Figure 19 legend) “(A) Prediction of AlphaFold 3 for hIRE1α cLD monomer in complex with ATP-bound BiP. The colors are as in Fig. 5B. (B) Prediction of AlphaFold 3 for hIRE1α cLD monomer in complex with ADP-bound BiP. (C) Prediction of AlphaFold 3 for hIRE1α cLD monomer in complex with BiP not bound to any nucleotide. (D) Structure of hIRE1α cLDBiP-ATP after 2 µs of simulation. (E) Structure of hIRE1α cLD-BiP-ADP after 2 µs of simulation. (F) Structure of hIRE1α cLD-BiP after 2 µs of simulation.”

      Addition to the text. (Figure 20 legend) “Fraction of native contacts between BiP and cLD monomer in simulations of the structures predicted by AlphaFold 3 without ligands or in complex with ADP or ATP. The dark lines indicate the rolling average of the fraction of native contacts over 100 frames, while the shaded lines indicate the value per frame. The fraction of native contacts (Q) was calculated according to the definition of Best et al. [12]: . For N pairs of native contacts (i, j), where is the distance of the pair in the initial configuration (here the AlphaFold 3 prediction), r<sub>(i,j)</sub>(X) is the distance at frame X, β is a smoothing parameter (β = 50 nm<sup>−1</sup>), λ is the tolerance of the reference distance (λ \= 1.8) and the cutoff used to define a contact between heavy atoms was 0.45 nm.”

      (13) The figure legends are noted using lowercase letters but are described using uppercase.

      We thank the Reviewer for pointing that out, and we changed everything to capital letters.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1: I am confused about the HDX-MS results shown in Figure 1. Here, I must also mention that I am not familiar with comparing HDX-MS experiments with MD simulations. The authors mention that they show the deuterated fraction computed from MD simulations for the PDB and AF model at time points 0.5 min and 5 min. However, this time certainly does not correspond to the MD simulation time, thus, it is unclear to me where the difference between the results comes from. Are the two time points some input parameters to the script used to calculate the deuterated fraction? Thus, I would ask the authors to better explain what is the difference in the results between the two time points. Especially, since the general reader might not be familiar with comparing HDX-MS experimental results to MD simulations. Furthermore, I would ask the authors to clarify in the Figure 1 caption that these time points do not correspond to the MD simulation time.

      We thank the Reviewer for pointing us to this possible source of confusion. The time points are effectively input parameters to the calculations of theoretical deuterated fractions from MD simulations. We expanded the explanation of the method in the method section and clarified in the Figure 1 caption that these time points do not correspond to the MD simulation time.

      Addition to the text. (Methods section: Hydrogen-deuterium exchange fractions calculation from MD simulations) “To determine the deuterated fraction of a peptide segment from simulations, the protection factor for each residue i, Pi, must be computed from the simulation snapshots, following the approach of Best and Vendruscolo [13]: . Here, N<sub>C,i</sub> and N<sub>H,i</sub> are the number of H-bonds and heavy-atom contacts of the backbone amide of residue i, and the scaling factors β<sub>C</sub> and β<sub>H</sub> are set to 0.35 and 2.0, respectively. The simulated deuterated fraction of a peptide segment, , defined by residues m<sub>j</sub> +1 to n<sub>j</sub>, was then calculated at any exchange time point t as:

      Where m<sub>j</sub> and n<sub>j</sub> are the first and last residue numbers of the j-th protein fragment, respectively. The intrinsic exchange rate constants for each residue type () were obtained from Bai et al. with updated acidic residues and glycine [14, 15].”

      Addition to the text. (Figure 1 legend: ) “This time point corresponds to experimental incubation times, not MD simulation time.”

      Addition to the text. (Figure 10 legend: ) “Time points correspond to experimental incubation times, not MD simulation time.”

      (2) For AlphaFold 2 Multimer prediction, the authors only considered the top predicted structure. However, AF2-M, one generally obtains 5 structures, and it is also possible to obtain more structures by using an additional random seed. Thus, it would be interesting if the authors would consider the difference between the 5 structures they obtained from the AF2-M prediction. Are they all very similar? (Especially considering the DR1 and DR2 segments, that is the main difference between the PDB and AF2 structures). Analyzing the different predicted AF2 structures would give more insight into the accuracy of the AF2-M predicted model.

      We thank the Reviewer for this insightful suggestion. All AF2-M predicted structures were found to be highly similar, and we now include them in Figure 7E for comparison.

      Addition to the text. (Figure 7E legend) “(E) Superposition of the 5 structures predicted by AlphaFold 2 Multimer for the cLD dimer and colored by confidence prediction score (pLDDT).”

      (3) On Page 6, the authors talk about a "an early PDB model". First, I find the nomenclature "early" confusing here; perhaps it would be better to talk about "an initial PDB model", but I leave it up to the authors to think about if they want to change that. More importantly, reading the Comp. detail on Page 23, it is not so clear what the difference is between the "early" and "final" PDB models, and how the difference in their setups leads to different results. The information is somewhat there on Page 6 and Page 23, but it can be made much clearer. Thus, I would ask the authors to better explain the difference between the early and final PDB models.

      We thank the Reviewer for this helpful comment. In the revised manuscript, we have clarified the terminology and provided a more explicit explanation of the differences between the two IRE1 models, both in the Results section and in the Methods.

      Addition to the text. (Results section: The hIRE1α cLD forms a stable dimer) “An initial PDB model with modified side chain orientations in residues L116 and Y166 due to the modelling of neighbouring missing DR1, caused the dimer to dissociate in one-third of the replicas. [...] The final PDB model, with correctly oriented L116 and Y166 (Supplementary Fig. 9B), was stable in simulations in both TIP3P and TIP4P-D water (Supplementary Fig. 7B).”

      Addition to the text. (Methods section: IRE1_α_ core Luminal Domain (cLD) structural models - Human PDB dimer) “An initial PDB model was briefly equilibrated in NPT, and a conformation with a groove width of approximately 0.6 nm was selected. This snapshot was used as the initial structure for the initial “PDB model” simulations, in which the dimer dissociates.”

      (4) Page 12: "In early simulations", again, I find the nomenclature "early" confusing here. Perhaps it would be better to talk about "In initial simulations" or "In preliminary simulations", but I leave it up to authors to think about this.

      We thank the Reviewer for pointing out this possible source of confusion. We improved the text by referring to these simulations based on the different orientations of the peptide on the cLD dimer in the modeled complex.

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1_α_ cLD dimer surface) “In initial simulations with peptides valine8 and MPZ1-N, we positioned the polypeptides over the cLD, aligning them parallel to the principal axis of the central groove in accordance with the proposed binding mode. We refer to this pose as the "0° orientation", as the peptide forms a 0° angle with the principal axis of the groove. We observed that the peptides could rearrange into an orientation perpendicular to the central groove axis, while maintaining contact with the dimer (Fig. 3A, Supplementary Fig. 13A, valine8 TIP4P-D, and Supplementary Fig. 14). Conversely, when MPZ1-N was initially oriented perpendicularly to the groove, it did not transition to a parallel (0°) orientation (Supplementary Fig. 14). We refer to these poses as the "90° orientation" and "270° orientation".”

      Here, we provide a detailed description of the additional changes made to the manuscript.

      Additional edits to the manuscript

      Following discussions with Prof. Dr. David Ron, we refined our BiP model by removing the signal peptide (residues 1–18). Using AlphaFold 3, we predicted BiP–cLD heterodimeric complexes in the presence of ADP, ATP, or without nucleotide. Each of the three complexes was simulated in TIP3P water, in three independent replicas of 1 µs each.

      Addition to the text. (Results section: hIRE1α cLD intermolecular interactions guide the activation process) “We used AlphaFold 3 to model the interaction between a cLD monomer and BiP (residues E19–L654) in the presence of ATP and ADP (Fig. 5B, Supplementary Fig. 19A). Prediction quality was limited in the apo and ADP-bound states (pTM = 0.48, ipTM = 0.59; pTM = 0.49, ipTM = 0.61, respectively), whereas ATP binding improved accuracy (pTM = 0.66, ipTM = 0.72). The predicted interfaces involved DR2, particularly residues 314PLLEG-318, forming a short parallel β-sheet with the substrate-binding domain (SBD) of BiP through two hydrogen bonds. All AlphaFold 3 models were stable across three 1-µs simulations (Supplementary Fig. 19B), with cLD–BiP interfaces retaining 60–80% of initial contacts (Supplementary Fig. 20). In the apo and ADP-bound states, the nucleotide-binding domain (NBD) showed high Predicted Aligned Error (PAE) relative to the cLD, indicating uncertain positioning of the two domains relative to each other. Notably, in the ADP-bound state, which is thought to interact with hIRE1α cLD, the NBD remained mobile but proximal to the αB-helices, thereby restricting access to this region. Together, the AlphaFold 3 predictions suggest that BiP engages hIRE1α cLD by sterically hindering the oligomerization interface defined by DR2 and the αB-helices [16].”

      Addition to the text. (Figure 5 legend) “(B) BiP-cLD monomer complex as predicted by AlphaFold (BiP in shades of purple, cLD in orange) before the simulation (t = 0 µs) and at the end of the simulation (t = 1 µs). The SBD (residues E19-D408) is colored in light purple, and the NDB (residues C420-E650) in dark purple, and the interdomain linker (residues D409-V419) and KDEL motif (residues K651-L654) in light purple.”

      Addition to the text. (Figure 19 legend) “(A) Prediction of AlphaFold 3 for hIRE1α cLD monomer in complex with ATP-bound BiP. The colors are as in Fig. 5B. (B) Prediction of AlphaFold 3 for hIRE1α cLD monomer in complex with ADP-bound BiP. (C) Prediction of AlphaFold 3 for hIRE1α cLD monomer in complex with BiP not bound to any nucleotide. (D) Structure of hIRE1α cLDBiP-ATP after 2 µs of simulation. (E) Structure of hIRE1α cLD-BiP-ADP after 2 µs of simulation. (F) Structure of hIRE1α cLD-BiP after 2 µs of simulation.”

      Addition to the text. (Methods section: cLD monomer in complex with BiP) “The BiP-cLD heterodimer systems were predicted with AlphaFold 3 using the AlphaFold server[17] at https://alphafoldserver.com/. The hIRE1α cLD sequence used is the same used for predicting the dimer: the PDB 2HZ6 sequence, Uniprot identifier O75460 with mutations C127S and C311S, and residues P29-P368. The BiP sequence used is taken from UniProt identifier P11021, residues E19L654. We predicted three complexes: one without any nucleotide, one containing ADP, and another containing ATP. Simulations of the BiP-cLD complex were run in TIP3P water.”

      We have updated the Zenodo repository with additional data and calculations, and the corresponding link is provided in the manuscript.

      References

      (1) Mario S. Valdés-Tresanco, Mario E. Valdés-Tresanco, Pedro A. Valiente, and Ernesto Moreno. gmx_mmpbsa: A New Tool to Perform End-State Free Energy Calculations with GROMACS. Journal of Chemical Theory and Computation, 17(10):6281–6291, October 2021. Publisher: American Chemical Society.

      (2) Bill R. III Miller, T. Dwight Jr. McGee, Jason M. Swails, Nadine Homeyer, Holger Gohlke, and Adrian E. Roitberg. MMPBSA.py: An Efficient Program for End-State Free Energy Calculations. Journal of Chemical Theory and Computation, 8(9):3314–3321, September 2012. Publisher: American Chemical Society.

      (3) Fanhao Wang, Yuzhe Wang, Laiyi Feng, Changsheng Zhang, and Luhua Lai. Target-Specific De Novo Peptide Binder Design with DiffPepBuilder. Journal of Chemical Information and Modeling, 64(24):9135–9149, December 2024. Publisher: American Chemical Society.

      (4) Alexander D. MacKerell Jr., Bernard Brooks, Charles L. Brooks III, Lennart Nilsson, Benoit Roux, Youngdo Won, and Martin Karplus. CHARMM: The Energy Function and Its Parameterization. In Encyclopedia of Computational Chemistry. 2002.

      (5) Bernard R. Brooks, Robert E. Bruccoleri, Barry D. Olafson, David J. States, S. Swaminathan, and Martin Karplus. CHARMM: A program for macromolecular energy, minimization, and dynamics calculations. Journal of Computational Chemistry, 4(2):187–217, 1983.

      (6) Junxi Mu, Hao Liu, Jian Zhang, Ray Luo, and Hai-Feng Chen. Recent Force Field Strategies for Intrinsically Disordered Proteins. Journal of Chemical Information and Modeling, 61(3):1037–1047, March 2021.

      (7) Vojtech Zapletal, Arnošt Mládek, Kateˇ ˇrina Melková, Petr Louša, Erik Nomilner, Zuzana Jasenáková, Vojtˇ ech Kubᡠn, Markéta Makovická, Alice Laníková, Lukᚡ Žídek, and Jozef Hritz. Choice of Force Field for Proteins Containing Structured and Intrinsically Disordered Regions. Biophysical Journal, 118(7):1621–1633, April 2020.

      (8) Stefano Piana, Alexander G. Donchev, Paul Robustelli, and David E. Shaw. Water dispersion interactions strongly influence simulated structural properties of disordered protein states. Journal of Physical Chemistry B, 119(16):5113–5123, April 2015.

      (9) Jing Huang, Sarah Rauscher, Grzegorz Nawrocki, Ting Ran, Michael Feig, Bert L. de Groot, Helmut Grubmüller, and Alexander D. MacKerell. CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nature Methods, 14(1):71–73, January 2017.

      (10) Richard T. Bradshaw, Fabrizio Marinelli, José D. Faraldo-Gómez, and Lucy R. Forrest. Interpretation of HDX Data by Maximum-Entropy Reweighting of Simulated Structural Ensembles. Biophysical Journal, 118(7):1649–1664, April 2020.

      (11) Niko Amin-Wetzel, Lisa Neidhardt, Yahui Yan, Matthias P. Mayer, and David Ron. Unstructured regions in IRE1 specify BiP-mediated destabilisation of the luminal domain dimer and repression of the UPR. eLife, 8, December 2019.

      (12) Robert B. Best, Gerhard Hummer, and William A. Eaton. Native contacts determine protein folding mechanisms in atomistic simulations. Proceedings of the National Academy of Sciences, 110(44):17874–17879, October 2013. Publisher: Proceedings of the National Academy of Sciences.

      (13) Robert B. Best and Michele Vendruscolo. Structural Interpretation of Hydrogen Exchange Protection Factors in Proteins: Characterization of the Native State Fluctuations of CI2. Structure, 14(1):97–106, January 2006.

      (14) Yawen Bai, John S. Milne, Leland Mayne, and S. Walter Englander. Primary structure effects on peptide group hydrogen exchange. Proteins: Structure, Function, and Bioinformatics, 17(1):75–86, 1993. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/prot.340170110.

      (15) David Nguyen, Leland Mayne, Michael C. Phillips, and S. Walter Englander. Reference Parameters for Protein Hydrogen Exchange Rates. Journal of the American Society for Mass Spectrometry, 29(9):1936–1939, September 2018. Publisher: American Society for Mass Spectrometry. Published by the American Chemical Society. All rights reserved.

      (16) G Elif Karagöz, Diego Acosta-Alvear, Hieu T Nguyen, Crystal P Lee, Feixia Chu, and Peter Walter. An unfolded protein-induced conformational switch activates mammalian IRE1. eLife, 6:e30700, 2017.

      (17) Josh Abramson, Jonas Adler, Jack Dunger, Richard Evans, Tim Green, Alexander Pritzel, Olaf Ronneberger, Lindsay Willmore, Andrew J. Ballard, Joshua Bambrick, Sebastian W. Bodenstein, David A. Evans, Chia-Chun Hung, Michael O’Neill, David Reiman, Kathryn Tunyasuvunakool, Zachary Wu, Akvile Žemgu-˙ lyte, Eirini Arvaniti, Charles Beattie, Ottavia Bertolli, Alex Bridgland, Alexey˙ Cherepanov, Miles Congreve, Alexander I. Cowen-Rivers, Andrew Cowie, Michael Figurnov, Fabian B. Fuchs, Hannah Gladman, Rishub Jain, Yousuf A. Khan, Caroline M. R. Low, Kuba Perlin, Anna Potapenko, Pascal Savy, Sukhdeep Singh, Adrian Stecula, Ashok Thillaisundaram, Catherine Tong, Sergei Yakneen, Ellen D. Zhong, Michal Zielinski, Augustin Žídek, Victor Bapst, Pushmeet Kohli, Max Jaderberg, Demis Hassabis, and John M. Jumper. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, pages 1–3, May 2024.

    1. eLife Assessment

      This manuscript reports high-resolution cryo-EM structures of a trimethylamine N-oxide demethylase and advances the hypothesis that the enzyme is bifunctional, coupling TMAO demethylation to formaldehyde capture via an enclosed intramolecular tunnel. The structural findings remain valuable, particularly the unusual oligomeric architecture and proposed conduit for a reactive intermediate. While the revision improves clarity and addresses several technical concerns, the central mechanistic framework remains incomplete, with persistent concerns regarding the proposed catalytic mechanism and metal dependence.

    2. Reviewer #1 (Public review):

      Summary:

      Thach et al. report on the structure and function of trimethylamine N-oxide demethylase (TDM). They identify a novel complex assembly composed of multiple TDM monomers and obtain high-resolution structural information for the catalytic site, including an analysis of its metal composition, which leads them to propose a mechanism for the catalytic reaction.

      In addition, the authors describe a novel substrate channel within the TDM complex that connects the N-terminal ZnZn<sup>2+</sup>-dependent TMAO demethylation domain with the C-terminal tetrahydrofolate (THF)-binding domain. This continuous intramolecular tunnel appears highly optimized for shuttling formaldehyde (HCHO), based on its negative electrostatic properties and restricted width. The authors propose that this channel facilitates the safe transfer of HCHO, enabling its efficient conversion to methylenetetrahydrofolate (MTHF) at the C-terminal domain as a microbial detoxification strategy. Experimental data that shows an involvement of TDM in the reaction of HCHO with THF is less convincing.

      Strengths:

      The authors provide convincing high-resolution cryo-EM structural evidence (up to 2 Å) revealing an intriguing complex composed of two full monomers and two half-domains. They further present evidence for the metal ion bound at the active site and articulate a hypothesis for the catalytic cycle. Substantial effort is devoted to optimizing and characterizing enzyme activity, including detailed kinetic analyses across a range of pH values, temperatures, and substrate concentrations. Furthermore, the authors validate their structural insights through functional analysis of active-site point mutants.

      In addition, the authors identify a continuous channel for formaldehyde (HCHO) passage within the structure and support this interpretation through molecular dynamics simulations. These analyses suggest an exciting mechanism of specific, dynamic, and gated channelling of HCHO. This finding is particularly appealing, as it implies the existence of a unique, completely enclosed conduit that may be of broad interest, including potential applications in bioengineering.

      Weaknesses:

      Although the idea of an enclosed channel for HCHO is compelling, the experimental evidence supporting enzymatic assistance in the reaction of HCHO with THF is less convincing. The linear regression analysis shown in Figure 1C demonstrates a THF concentration-dependent decrease in HCHO; however, it is well established that HCHO and THF can react spontaneously in a non-enzymatic manner, raising the possibility that the observed effect does not require enzymatic involvement. I appreciate the authors' clarification that the data in Figure 1 were not intended to demonstrate enzymatic channelling or catalytic involvement in the HCHO-THF reaction, and that the assay does not distinguish between changes in HCHO production and downstream consumption. However, the statement "these findings show that TDM carries out two linked reactions: TMAO demethylation at one active site, and the HCHO produced can condense with THF at the C-terminal domain, connecting TMAO breakdown to one-carbon metabolism" (page 2) still implies a mechanistic and functional coupling that is not supported by the presented data and appears inconsistent with the authors' clarification. In light of this, I recommend revising this statement to avoid implying mechanistic or functional coupling between the two reactions unless additional experimental evidence is provided.

      Overall, the authors were successful in advancing our structural and functional understanding of the TDM complex. They suggest an interesting oligomeric complex composition which should be investigated with additional biophysical techniques.

      Additionally, they provide an intriguing hypothesis for a new type of substrate channelling. Additional kinetic experiments focusing on HCHO and THF turnover by enzymatic proximity effects would strengthen this potentially fundamental finding. If this channelling mechanism can be supported by stronger experimental evidence, it would substantially advance our understanding and knowledge of biologic conduits and enable future efforts in the design of artificial cascade catalysis systems with high conversion rate and efficiency, as well as detoxification pathways.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript reports a cryo-EM structure of TMAO demethylase from Paracoccus sp. This is an important enzyme in the metabolism of trimethylamine oxide (TMAO) and trimethylamine (TMA) in human gut microbiota, so new information about this enzyme would certainly be of interest.

      Strengths:

      The cryo-EM structure for this enzyme is new and provides new insights into the function of the different protein domains, and a channel for formaldehyde between the two domains.

      Weaknesses:

      (1) The proposed catalytic mechanism in this manuscript does not make sense. Previous mechanistic studies on the Methylocella silvestris TMAO demethylase (FEBS Journal 2016, 283, 3979-3993, reference 7) reported that, as well as a Zn2+ cofactor, there was a dependence upon non-heme Fe2+, and proposed a catalytic mechanism involving deoxygenation to form TMA and an iron(IV)-oxo species, followed by oxidative demethylation to form DMA and formaldehyde.

      In this work, the authors do not mention the previously proposed mechanism, but instead just say that elemental analysis "excluded iron". This is alarming, since the previous work has a key role for non-heme iron in the mechanism. The elemental analysis here gives a Zn content of about 0.5 mol/mol protein (and no Fe), whereas the Methylocella TMAO demethylase was reported to contain 0.97 mol Zn/mol protein, and 0.35-0.38 mol Fe/mol protein. It does, therefore, appear that their enzyme is depleted in Zn, and the absence of Fe impacts on the mechanism, as explained below.

      The proposed catalytic mechanism in this manuscript, I am sorry to say, does not make sense, for several reasons:

      i) Demethylation to form formaldehyde is not a hydrolytic process; it is an oxidative process (normally accomplished by either cytochrome P450 or non-heme iron-dependent oxygenase). The authors propose that a zinc (II) hydroxide attacks the methyl group, which (a) is unprecedented, (b) even if it were possible, would generate methanol, not formaldehyde.

      ii) The amine oxide is proposed to deoxygenate, with hydroxide appearing on the Zn - unfortunately, amine oxide deoxygenation is a reductive process, for which a reducing agent is needed, and Zn2+ is not a redox active metal ion;

      iii) The authors say "forming a tetrahedral intermediate, as described for metalloprotease" but zinc metalloproteases attack an amide carbonyl to form an oxyanion intermediate, whereas in this mechanism there is no carbonyl to attack, so this statement is just wrong.

      So on several counts the proposed mechanism cannot be correct. Some redox cofactor is needed in order to carry out amine oxide deoxygenation, and Zn2+ cannot fulfil that role. Fe2+ could do, which is why the previously proposed mechanism involving an iron(IV)-oxo intermediate is feasible. But the authors claim that their enzyme has no Fe. If so then there must be some other redox cofactor present. Therefore, the authors need to re-analyse their enzyme carefully and look either for Fe or for some other redox-active metal ion, and then provide convincing experimental evidence for a feasible catalytic mechanism. As it stands the proposed catalytic mechanism is unacceptable.

      Revised version. The authors have essentially not changed the proposed mechanism. They have removed the reference to zinc metalloproteases, but still propose a mechanism mediated only by Zn2+. As explained above, attack by zinc (II) hydroxide is unprecedented and would generate methanol, not formaldehyde, and amine deoxygenation is a reductive process that cannot be fulfilled by Zn2+. So the proposed mechanism is still not feasible at all. The authors now say that "oxidative chemistry....remains unresolved", I'm sorry, but that is not acceptable.

      I have urged the authors to re-examine the metal content of their enzyme, In the Supporting Information (Figure S5) they give ICPMS data that indicates a Zn stoichiometry of 0.5 mol Zn/mol protein, and Fe is not detected. Have the authors analysed for other redox active metals? The authors say that there is no evidence for any other metal binding site, but there is only 50% occupancy of Zn in their protein, so could there be a different metal ion present in place of Zn in the other 50% of the protein, that accounts for the observed activity?

      Since there is clearly a major discrepancy here, the onus is on the authors to explain the discrepancy, rather than just returning with the same data. For example, they could treat the enzyme with EDTA to remove all metals (and check the treated enzyme by ICPMS), and then add different metal ions to test activity with different metals (could even titrate with different molar equivalents of metal ions). They could then test a range of different redox-active metal ions.

      (2) Given the metal content reported here, it is important to be able to compare the specific activity of the enzyme reported here with earlier preparations. The authors have now done this in the revised version.

      (3) The consumption of formaldehyde to form methylene-THF is potentially interesting, but the authors say "HCHO levels decreased in the presence of THF", which could potentially be due to enzyme inhibition by THF. Is there evidence that this is a time-dependent and protein-dependent reaction? Not yet addressed.

      Also in Figure 1C, HCHO reduction (%) is not very helpful, because we don't know what concentration of formaldehyde is formed under these conditions; it would be better to quote in units of concentration, rather than %. This point has been addressed by the authors in the revised version.

      (4) Has this particular TMAO demethylase been reported before? It's not clear which Paracoccus strain the enzyme is from; the Experimental Section just says "Paracoccus sp.", which is not very precise. There has been published work on the Paracoccus PS1 enzyme, is that the strain used? Details about the strain are needed, and the accession for the protein sequence. Addressed in the revised version.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Thach et al. report on the structure and function of trimethylamine N-oxide demethylase (TDM). They identify a novel complex assembly composed of multiple TDM monomers and obtain high-resolution structural information for the catalytic site, including an analysis of its metal composition, which leads them to propose a mechanism for the catalytic reaction.

      In addition, the authors describe a novel substrate channel within the TDM complex that connects the N-terminal Zn<sup>2</sup>-dependent TMAO demethylation domain with the C-terminal tetrahydrofolate (THF)-binding domain. This continuous intramolecular tunnel appears highly optimized for shuttling formaldehyde (HCHO), based on its negative electrostatic properties and restricted width. The authors propose that this channel facilitates the safe transfer of HCHO, enabling its efficient conversion to methylenetetrahydrofolate (MTHF) at the C-terminal domain as a microbial detoxification strategy.

      Strengths:

      The authors provide convincing high-resolution cryo-EM structural evidence (up to 2 Å) revealing an intriguing complex composed of two full monomers and two half-domains. They further present evidence for the metal ion bound at the active site and articulate a plausible hypothesis for the catalytic cycle. Substantial effort is devoted to optimizing and characterizing enzyme activity, including detailed kinetic analyses across a range of pH values, temperatures, and substrate concentrations. Furthermore, the authors validate their structural insights through functional analysis of active-site point mutants.

      In addition, the authors identify a continuous channel for formaldehyde (HCHO) passage within the structure and support this interpretation through molecular dynamics simulations. These analyses suggest an exciting mechanism of specific, dynamic, and gated channeling of HCHO. This finding is particularly appealing, as it implies the existence of a unique, completely enclosed conduit that may be of broad interest, including potential applications in bioengineering.

      Weaknesses:

      Although the idea of an enclosed channel for HCHO is compelling, the experimental evidence supporting enzymatic assistance in the reaction of HCHO with THF is less convincing. The linear regression analysis shown in Figure 1C demonstrates a THF concentration-dependent decrease in HCHO, but the concentrations used for THF greatly exceed its reported KD (enzyme concentration used in this assay is not reported). It has previously been shown that HCHO and THF can couple spontaneously in a non-enzymatic manner, raising the possibility that the observed effect does not require enzymatic channeling. An additional control that can rule out this possibility would help to strengthen the evidence. For example, mutating the THF binding site to prevent THF binding to the protein complex could clarify whether the observed decrease in HCHO depends on enzyme-mediated proximity effects. A mutation which would specifically disable channeling could be even more convincing (maybe at the narrowest bottleneck).

      We agree with the reviewer that HCHO and THF can react spontaneously in a non-enzymatic manner, and our experiments were not intended to demonstrate enzymatic channeling. The linear regression analysis in Figure 1C was designed solely to confirm that HCHO reacts with THF under our assay conditions. Accordingly, THF was titrated over a broad concentration range starting from zero, and the observed THF concentration–dependent decrease in HCHO reflects this chemical reactivity.

      We do not interpret these data as evidence that the enzyme catalyzes or is required for the HCHO–THF coupling reaction. Instead, the structural observation of an enclosed channel is presented as a separate finding. We have clarified this point in the revised text to avoid overinterpretation of the biochemical data (page 2, line 16).

      Another concern is that the observed decrease in HCHO could alternatively arise from a reduced production of HCHO due to a negative allosteric effect of THF binding on the active site. From this perspective, the interpretation would be more convincing if a clear coupled effect could be demonstrated, specifically, that removal of the product (HCHO) from the reaction equilibrium leads to an increase in the catalytic efficiency of the demethylation reaction.

      We agree that, in principle, a decrease in detectable HCHO could also arise from an indirect effect of THF binding on enzyme activity. However, in our study the experiment was not designed to assess catalytic coupling or allosteric regulation. The assay in question monitors HCHO levels under defined conditions and does not distinguish between changes in HCHO production and downstream consumption.

      Additionally, we do not interpret the observed decrease in HCHO as evidence that THF binding enhances catalytic efficiency, or that removal of HCHO shifts the reaction equilibrium. Instead, the data are presented to establish that HCHO can react with THF under the assay conditions. Any potential allosteric effects of THF on the demethylation reaction, or kinetic coupling between HCHO removal and catalysis, are beyond the scope of the current study, and are not claimed.

      While the enzyme kinetics appear to have been performed thoroughly, the description of the kinetic assays in the Methods section is very brief. Important details such as reaction buffer composition, cofactor identity and concentration (Zn<sup>2+</sup>), enzyme concentration, defined temperature, and precise pH are not clearly stated. Moreover, a detailed methodological description could not be found in the cited reference (6), if I am not mistaken.

      Thank you for the suggestion. We have added reference [24] to the methodological description on page 8. The Methods section has been revised accordingly on page 8 under “TDM Activity Assay,” without altering the Zn<sup>2+</sup> concentration.

      The composition of the complex is intriguing but raises some questions. Based on SDS-PAGE analysis, the purified protein appears to be predominantly full-length TDM, and size-exclusion chromatography suggests an apparent molecular weight below 100 kDa. However, the cryo-EM structure reveals a substantially larger complex composed of two full-length monomers and two half-domains.

      We appreciate the reviewer’s careful analysis of the apparent discrepancy between the biochemical characterization and the cryo-EM structure. This issue is addressed in Figure S1, which may have been overlooked.

      As shown in Figure S1, the stability of TDM is highly dependent on protein and salt conditions. At 150 mM NaCl, SEC reveals a dominant peak eluting between 10.5 and 12 mL, corresponding to an estimated molecular weight of ~170–305 kDa (blue dot, Author response image 1). This fraction was explicitly selected for cryo-EM analysis and yields the larger complex observed in the reconstruction. At lower salt concentrations (50 mM) or higher (>150 mM NaCl), the protein either aggregates or elutes near the void volume (~8 mL).

      SDS–PAGE analysis detects full-length TDM together with smaller fragments (~40–50 kDa and ~22–25 kDa). The apparent predominance of full-length protein on SDS–PAGE likely reflects its greater staining intensity per molecule and/or a higher population, rather than the absence of truncated species.

      Author response image 1.

      Given the lack of clear evidence for proteolytic fragments on the SDS-PAGE gel, it is unclear how the observed stoichiometry arises. This raises the possibility of higher-order assemblies or alternative oligomeric states. Did the authors attempt to pick or analyze larger particles during cryo-EM processing? Additional biophysical characterization of particle size distribution - for example, using interferometric scattering microscopy (iSCAT)-could help clarify the oligomeric state of the complex in solution.

      Cryo-EM data were collected exclusively from the size-exclusion chromatography fraction eluting between 10.5 and 12 mL. This fraction was selected to isolate the dominant assembly in solution. Extensive 2D and 3D particle classification did not reveal distinct classes corresponding to smaller species or higher-order oligomeric assemblies. Instead, the vast majority of particles converged to a single, well-defined structure consistent with the 2 full-length + 2 half-domain stoichiometry.

      A minor subpopulation (~2%) exhibited increased flexibility in the N-terminal region of the two full-length subunits, but these particles did not form a separate oligomeric class, indicating conformational heterogeneity rather than alternative assembly states (Author response image 2). Together, these data support the 2+2½ architecture as the predominant and stable complex under the conditions used for cryo-EM. Additional techniques, such as iSCAT, would provide complementary information, but are not required to support the conclusions drawn from the SEC and cryo-EM analyses presented here.

      Author response image 2.

      The authors mention strict symmetry in the complex, yet C2 symmetry was enforced during refinement. While this is reasonable as an initial approach, it would strengthen the structural interpretation to relax the symmetry to C1 using the C2-refined map as a reference. This could reveal subtle asymmetries or domain-specific differences without sacrificing the overall quality of the reconstruction.

      We thank the reviewer for this thoughtful suggestion. In standard cryo-EM data processing, symmetry is typically not imposed initially to minimize potential model bias; accordingly, we first performed C1 refinement before applying C2 symmetry. The resulting C1 reconstructions revealed no detectable asymmetry or domain-specific differences relative to the C2 map. In addition, relaxing the symmetry consistently reduced overall resolution, indicating lower alignment accuracy and further supporting the presence of a predominantly symmetric assembly.

      In this context, the proposed catalytic role of Zn<sup>2+</sup> raises additional questions. Why is a 2:1 enzyme-to-metal stoichiometry observed, and how does this reconcile with previous reports? This point warrants discussion. Does this imply asymmetric catalysis within the complex? Would the stoichiometry change under Zn<sup>2+</sup>-saturating conditions, as no Zn<sup>2+</sup> appears to be added to the buffers? It would be helpful to clarify whether Zn<sup>2+</sup> occupancy is equivalent in both active sites when symmetry is not imposed, or whether partial occupancy is observed.

      The observed ~2:1 enzyme-to-Zn<sup>2+</sup> stoichiometry likely reflects the composition of the 2 full-length + 2 half-domain (2+2½) complex. In this assembly, only the core domains that are fully present in the complex contribute to metal binding. The truncated or half-domains lack the Zn<sup>2+</sup> binding domain. As a result, only two metal-binding sites are occupied per assembled complex, consistent with the measured stoichiometry.

      We note that Zn<sup>2+</sup> was not deliberately added to the buffers, so occupancy may not reflect full saturation. Based on our cryo-EM and biochemical data, both metal-binding sites in the full-length subunits appear to be occupied to an equivalent extent, and no clear evidence of asymmetric catalysis is observed under these current experimental conditions. Full Zn<sup>2+</sup> saturation could potentially increase occupancy, but was not explored in these experiments.

      The divalent ion Zn<sup>2+</sup> is suggested to activate water for the catalytic reaction. I am not sure if there is a need for a water molecule to explain this catalytic mechanism. Can you please elaborate on this more? As one aspect, it might be helpful to explain in more detail how Zn-OH and D220 are recovered in the last step before a new water molecule comes in.

      Thank you for your suggestion. We revised our text in page 2 as bellow.

      Based on our structural and biochemical data, we propose a structurally informed working model for TMAO turnover by TDM (Scheme 1). In this model, Zn<sup>2+</sup> plays a non-redox role by polarizing the O–H bond of the bound hydroxyl, thereby lowering its pK<sub>a</sub>. The D220 carboxylate functions as a general base, abstracting the proton to generate a hydroxide nucleophile. This hydroxide then attacks the electrophilic N-methyl carbon of TMAO, forming a tetrahedral carbinolamine (hemiaminal) intermediate. Subsequent heterolytic cleavage of the C–N bond leads to the release of HCHO. D220 then switches roles to act as a general acid, donating a proton to the departing nitrogen, which facilitates product release and regenerates the active site. This sequence allows a new water molecule to rebind Zn<sup>2+</sup>, enabling subsequent catalytic turnovers. This proposed pathway is consistent with prior mechanistic studies, in which water addition to the azomethine carbon of a cationic Schiff base generates a carbinolamine intermediate, followed by a rate-limiting breakdown to yield an amino alcohol and a carbonyl compound, in the published case, an aldehyde (Pihlaja et al., J. Chem. Soc. Perkin Trans. 2, 1983, 8, 1223–1226).

      Overall, the authors were successful in advancing our structural and functional understanding of the TDM complex. They suggest an interesting oligomeric complex composition which should be investigated with additional biophysical techniques.

      Additionally, they provide an intriguing hypothesis for a new type of substrate channeling. Additional kinetic experiments focusing on HCHO and THF turnover by enzymatic proximity effects would strengthen this potentially fundamental finding. If this channeling mechanism can be supported by stronger experimental evidence, it would substantially advance our understanding and knowledge of biologic conduits and enable future efforts in the design of artificial cascade catalysis systems with high conversion rate and efficiency, as well as detoxification pathways.

      Reviewer #2 (Public review):

      Summary:

      The manuscript reports a cryo-EM structure of TMAO demethylase from Paracoccus sp. This is an important enzyme in the metabolism of trimethylamine oxide (TMAO) and trimethylamine (TMA) in human gut microbiota, so new information about this enzyme would certainly be of interest.

      Strengths:

      The cryo-EM structure for this enzyme is new and provides new insights into the function of the different protein domains, and a channel for formaldehyde between the two domains.

      Weaknesses:

      (1) The proposed catalytic mechanism in this manuscript does not make sense. Previous mechanistic studies on the Methylocella silvestris TMAO demethylase (FEBS Journal 2016, 283, 3979-3993, reference 7) reported that, as well as a Zn2+ cofactor, there was a dependence upon non-heme Fe2+, and proposed a catalytic mechanism involving deoxygenation to form TMA and an iron(IV)-oxo species, followed by oxidative demethylation to form DMA and formaldehyde.

      In this work, the authors do not mention the previously proposed mechanism, but instead say that elemental analysis "excluded iron". This is alarming, since the previous work has a key role for non-heme iron in the mechanism. The elemental analysis here gives a Zn content of about 0.5 mol/mol protein (and no Fe), whereas the Methylocella TMAO demethylase was reported to contain 0.97 mol Zn/mol protein, and 0.35-0.38 mol Fe/mol protein. It does, therefore, appear that their enzyme is depleted in Zn, and the absence of Fe impacts the mechanism, as explained below.

      The proposed catalytic mechanism in this manuscript, I am sorry to say, does not make sense to me, for several reasons:

      (i) Demethylation to form formaldehyde is not a hydrolytic process; it is an oxidative process (normally accomplished by either cytochrome P450 or non-heme iron-dependent oxygenase). The authors propose that a zinc (II) hydroxide attacks the methyl group, which is unprecedented, and even if it were possible, would generate methanol, not formaldehyde.

      (ii) The amine oxide is then proposed to deoxygenate, with hydroxide appearing on the Zn - unfortunately, amine oxide deoxygenation is a reductive process, for which a reducing agent is needed, and Zn2+ is not a redox-active metal ion;

      (iii) The authors say "forming a tetrahedral intermediate, as described for metalloproteinase", but zinc metalloproteases attack an amide carbonyl to form an oxyanion intermediate, whereas in this mechanism, there is no carbonyl to attack, so this statement is just wrong.

      So on several counts, the proposed mechanism cannot be correct. Some redox cofactor is needed in order to carry out amine oxide deoxygenation, and Zn2+ cannot fulfil that role. Fe2+ could do, which is why the previously proposed mechanism involving an iron(IV)-oxo intermediate is feasible. But the authors claim that their enzyme has no Fe. If so, then there must be some other redox cofactor present. Therefore, the authors need to re-analyse their enzyme carefully and look either for Fe or for some other redox-active metal ion, and then provide convincing experimental evidence for a feasible catalytic mechanism. As it stands, the proposed catalytic mechanism is unacceptable.

      We thank the reviewer for the detailed and thoughtful mechanistic critique. We fully agree that Zn<sup>2+</sup> is not redox-active, and cannot directly mediate oxidative demethylation or amine oxide deoxygenation. We acknowledge that the oxidative step required for the conversion of TMAO to HCHO is not explicitly resolved in the present study. Accordingly, we have revised the manuscript to remove any implication of Zn<sup>2+</sup>-mediated redox chemistry, and have eliminated the previously imprecise analogy to zinc metalloproteases.

      We recognize and now discuss prior biochemical work on TMAO demethylase from Methylocella silvestris (MsTDM), which proposed an iron-dependent oxidative mechanism (Zhu et al., FEBS 2016, 3979–3993). That study reported approximately one Zn<sup>2+</sup> and one non-heme Fe<sup>2+</sup> per active enzyme, implicated iron in catalysis through homology modeling and mutagenesis, and used crossover experiments suggesting a trimethylamine-like intermediate and oxygen transfer from TMAO, consistent with an Fe-dependent redox process. However, that system lacked experimental structural information, and did not define discrete metal-binding sites.

      In contrast,

      (1) Our high-resolution cryo-EM structures and metal analyses of TDM consistently reveal only a single, well-defined Zn<sup>2+</sup>-binding site, with no structural evidence for an additional iron-binding site as in the previous report (Zhu et al., FEBS 2016, 3979–3993).

      (2) To investigate the potential involvement of iron, we expressed TDM in LB medium supplemented with Fe(NH<sub>4</sub>)<sub>2</sub>SO<sub>4</sub> and determined its cryo-EM structure. This structure is identical to the original one, and no EM density corresponding to a second iron ion was observed. Moreover, the previously proposed Fe<sup>2+</sup>-binding residues are spatially distant (Figure S6).

      (3) ICP-MS analysis shows undetectable Iron, and only Zinc ion (Figure S5).

      (4) Our enzyme kinetics analysis with the TDM without Iron is comparable to that of from MsTDM (Figure 1A). The differences in Km and Vmax we propose is due to the difference in the overall sequence of the enzymes. Please also see comment at the end on a new published paper on MsTDM.

      While we cannot comment on the MsTDM results, our ‘experimental’ results do not support the presence of an iron-binding site. Our data indicate that this chemistry is unlikely to be mediated by a canonical non-heme iron center as proposed for MsTDM. We therefore revised our model as a structural framework that rationalizes substrate binding, metal coordination, and product stabilization, while clearly delineating the limits of mechanistic inference supported by the current data.

      The scheme 1 and proposal mechanism section were revised in page 4. Figure S6 was added.

      (2) Given the metal content reported here, it is important to be able to compare the specific activity of the enzyme reported here with earlier preparations. The authors do quote a Vmax of 16.52 µM/min/mg; however, these are incorrect units for Vmax, they should be µmol/min/mg. There is a further inconsistency between the text saying µM/min/mg and the Figure saying µM/min/µg.

      Thank you for the correction. We converted the V<sub>max</sub> unit to nmol/min/mg. and revised the text in page 2. We also compared with the value of the previous report in the TDM enzyme by revising the text on page 2. See also the note on a newly published manuscript and its comparison.

      (3) The consumption of formaldehyde to form methylene-THF is potentially interesting, but the authors say "HCHO levels decreased in the presence of THF", which could potentially be due to enzyme inhibition by THF. Is there evidence that this is a time-dependent and protein-dependent reaction? Also in Figure 1C, HCHO reduction (%) is not very helpful, because we don't know what concentration of formaldehyde is formed under these conditions; it would be better to quote in units of concentration, rather than %.

      We appreciate this important point. We have revised Figure 1C to present HCHO levels in absolute concentration units. While the current data demonstrate reduced detectable HCHO in the presence of THF, we agree that distinguishing between HCHO consumption and potential THF-mediated enzyme inhibition would require dedicated time-course and protein-dependence experiments. We have therefore revised the description to avoid overinterpretation and limit our conclusions to the observed changes in HCHO concentration in page 2, line 18-19.

      (4) Has this particular TMAO demethylase been reported before? It's not clear which Paracoccus strain the enzyme is from; the Experimental Section just says "Paracoccus sp.", which is not very precise. There has been published work on the Paracoccus PS1 enzyme; is that the strain used? Details about the strain are needed, and the accession for the protein sequence.

      Thank you for this comment. We now indicate that the enzyme is derived from Paracoccus sp. DMF and provide the accession number for the protein sequence (WP_263566861) in the Experimental Section (page 8, line 4).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The ITC experiment requires a ligand-into-buffer titration as an additional control. Also, maybe I misunderstood the molar ratio or the concentrations you used, but if you indeed added a total of 4.75 μL of 20 μM THF into 250 μL of 5 μM TDM, it is not clear to me how this leads to a final molar ratio of 3.

      We thank the reviewer for this suggestion. A ligand-into-buffer control ITC experiment was performed and is now included in Figure S8C, which shows no realizable signal.

      Regarding the molar ratio, it is our mistake. The experiment used 2.45 μL injections of 80 μM THF into 250 μL of 5 μM TDM. This corresponds to a final ligand concentration of ~12.8 μM, giving a ligand-to-protein molar ratio of ~2.6. We revised our text in page 9, ITC section.

      (2) Characterization/quality check of all mutant enzymes should be performed by NanoDSF, CD spectroscopy or similar techniques to confirm that proteins are properly folded and fit for kinetic testing.

      We appreciate the reviewer’s suggestion. All mutant proteins, including D220A, D367A, and F327A, were purified with yields similar to the wild-type enzyme. Additionally, cryo-EM maps of the mutants show well-defined density and overall structural integrity consistent with the wild-type. These findings indicate that the introduced mutations do not significantly affect protein folding, supporting their use for kinetic analysis. While NanoDSF might reveal differences in thermal stability due to mutations, it does not provide structural information. Our conclusions are not based on minor differences in thermostability. Our cryo-EM structures of the mutants offer much more reliable structural data than CD spectroscopy.

      (3) Best practice would suggest overlapping pH ranges between different buffer systems in the pH-dependence experiments to rule out buffer-specific effects independent of pH.

      We thank the reviewer for this helpful suggestion. We agree that overlapping pH ranges between different buffer systems can be valuable for excluding buffer-specific effects. In this study, the pH-dependence experiments were intended to provide a qualitative assessment of pH sensitivity rather than a detailed analysis of buffer-independent pKa values. While we cannot fully exclude minor buffer-specific contributions, the overall trends observed were reproducible and sufficient to support the conclusions drawn. We have added a clarifying statement to the revised manuscript to reflect this consideration, page 2, line 12.

      (4) Structural comparison revealed high similarity to a THF-binding protein, with superposition onto a T protein.": It would be nice to show this as an additional figure, as resolution and occupancy for THF are low.

      We thank the reviewer for this suggestion. To address this point, we have revised Figure S6 by adding an additional panel (C, now is Figure S7C) showing the structural superposition of TDM with the THF-binding T protein. This comparison is included to better illustrate the structural similarity, despite the limited resolution and partial occupancy of THF density in our map.

      (5) Editing could have been done more thoroughly. Some spelling mistakes, e.g. "RESEULTS", "redius", "complec"; kinetic rate constants should be written in italic (not uniform between text and figures); Prism version is missing; Vmax of 16.52 µM/min/mg - doublecheck units; Figure S1B: The "arrow on the right" might have gone missing.

      We corrected the spelling in page 2 ~ line 10, page 5 ~ line 34, page 6 ~ line40. All were highlighted as blue color. Prism version was added. The arrow was added into figure S1B. The Vmax unit is corrected to nmol/min/mg

      Reviewer #2 (Recommendations for the authors):

      (1) The authors must re-examine the metal content of their purified enzyme, looking in particular for Fe or another redox-active metal ion, which could be involved in a reasonable catalytic mechanism.

      We thank the reviewer for this suggestion and have carefully re-examined the metal content of TDM. Elemental analyses by EDX and ICP-MS consistently detected Zn<sup>2+</sup> in purified TDM (Zn:protein ≈ 1:2), whereas Fe was below the detection limit across multiple independent preparations (Fig. S5A,B). To assess whether iron could be incorporated or play a functional role, we expressed TDM in E. coli grown in LB medium supplemented with Fe(NH<sub>4</sub>SO<sub>4</sub>)<sub>2</sub> and performed activity assays in the presence of exogenous Fe<sup>2+</sup>. Neither condition resulted in enhanced enzymatic activity.

      Consistent with these biochemical data, all cryo-EM structures reveal a single, well-defined metal-binding site coordinated by three conserved cysteine residues and occupied by Zn<sup>2+</sup>, with no evidence for an additional iron species or other redox-active metal site.

      (2) The specific activity of the enzyme should be quoted in the same units as other literature papers, so that the enzyme activity can be compared. It could be, for example, that the content of Fe (or other redox-active metal) is low, and that could then give rise to a low specific activity.

      Thank you for the suggestion, we quoted the enzyme units as similar with previous report. and revised the text in in page 2.

      Since the submission of our paper a new report on MsTDM has been published (Cappa et al., Protein Science 33(11), e70364). It further supports our findings. First, the reported kinetic parameters using ITC (Vmax = 0.309 μmol/s, approximately 240 nmol/min/mg; Km = 0.866 mM) are comparable to our observed (156 nmol/min/mg and 1.33 mM, respectively) in the absence of exogenous iron. Second, the optimal pH for enzymatic activity similar to that observed in our paraTDM. Third, the reported two-state unfolding behavior is consistent with our cryo-EM structural observations, in which the more dynamic subunits appear to destabilize prior to unfolding of the core domains. Based on these findings, we now propose that Zn<sup>2+</sup> appears to function primarily as an organizational cofactor at the core catalytic domain (revised Scheme 1).

    1. eLife Assessment

      This manuscript reports a valuable modeling study on sequence generation in the hippocampus in a variety of behavioral contexts. The authors model context-depending decision making, and suggest that psychiatric disorders can be interpreted in terms of over or under representation of context information. The presentation is solid, and the work will interest the broad community of researchers studying cortical-hippocampal interactions and sequences.

    2. Reviewer #2 (Public review):

      [Editors' note: This version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      Summary:

      Ito and Toyoizumi present a computational model of context-dependent action selection. They propose a "hippocampus" network that learns sequences based on which the agent chooses actions. The hippocampus network receives both stimulus and context information from an attractor network that learns new contexts based on experience. The model is consistent with a variety of experiments both from the rodent and the human literature such as splitter cells, lap cells, the dependence of sequence expression on behavioral statistics. Moreover, the authors suggest that psychiatric disorders can be interpreted in terms of over/under representation of context information.

      My general assessment of the work is unchanged, and I still have some questions requesting methodological clarification

      Strengths:

      This ambitious work links diverse physiological and behavioral findings into a self-organizing neural network framework. All functional aspects of the network arise from plastic synaptic connections: Sequences, contexts, action selection. The model also nicely links ideas from reinforcement learning to a neuronally interpretable mechanisms, e.g. learning a value function from hippocampal activity.

    3. Reviewer #3 (Public review):

      Summary:

      This paper develops a model to account for flexible and context-dependent behaviors, such as where the same input must generate different responses or representations depending on context. The approach is anchored in the hippocampal place cell literature. The model consists of a module X, which represents context, and a module H (hippocampus), which generates "sequences". X is a binary attractor RNN, and H appears to be a discrete binary network, which is called recurrent but seems to operate primarily in a feedforward mode. H has two types of units (those that are directly activated by context, and transition/sequence units). An input from X drives a winner-take-all activation of a single unit H_context unit, which can trigger a sequence in the H_transition units. When a new/unpredicted context arises, a new stable context in X is generated, which in turn can trigger a new sequence in H. The authors use this model to account for some experimental findings, and on a more speculative note, propose to capture key aspects of contextual processing associated with schizophrenia and autism.

      Strengths:

      Context-dependency is an important problem. And for this reason, there are many papers that address context-dependency - some of this work is cited. To the best of my knowledge, the approach of using an attractor network to represent and detect changes in context is novel and potentially valuable.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      Summary:

      Ito and Toyoizumi present a computational model of context-dependent action selection. They propose a "hippocampus" network that learns sequences based on which the agent chooses actions. The hippocampus network receives both stimulus and context information from an attractor network that learns new contexts based on experience. The model is consistent with a variety of experiments both from the rodent and the human literature such as splitter cells, lap cells, the dependence of sequence expression on behavioral statistics. Moreover, the authors suggest that psychiatric disorders can be interpreted in terms of over/under representation of context information.

      My general assessment of the work is unchanged, and I still have some questions requesting methodological clarification

      Strengths:

      This ambitious work links diverse physiological and behavioral findings into a self-organizing neural network framework. All functional aspects of the network arise from plastic synaptic connections: Sequences, contexts, action selection. The model also nicely links ideas from reinforcement learning to a neuronally interpretable mechanisms, e.g. learning a value function from hippocampal activity.

      Weaknesses:

      The presentation, particularly of the methodological aspects, needs to be heavily improved. Judgment of generality and plausibility of the results is severely hampered but is essential, particularly for the conclusions related to psychiatric disorders. In its present form, it is impossible to judge whether the claims and conclusions made are justified. Also, the lack of clarity strongly reduces the impact of the work on the field.

      Thank you for pointing this out.

      In the revised text, we clarified the definition of “time step” and how hippocampal neurons behaved in each time step (see individual comments below). Also, we clarified the implementation of disorder conditions in our model by indicating the exact neuron numbers of the stimulus domain in H module as below. (Other parameters were common in all conditions.)

      “𝑋 consists of two domains: stimulus domain 𝑋 and context domain 𝑋. The neuron ratio in the stimulus domain over the whole neurons dim 𝑋/𝑁 is 16.7% (200 neurons) for the control condition, 2.5% (30 neurons) for the SZ condition, and 50% (600 neurons) for the ASD condition.”

      Comments:

      The authors have made strong efforts to improve on their description of the methods, however, it is still very hard to understand. As a result of some of their clarifications, new issues appeared that I was not able to extract in the previous version.

      (1) Particularly I had problems figuring out how the individual dynamical systems are interrelated (sequences, attractor, action, learning). As I understand it now (and I still might be wrong) there is one discrete time dynamics, where in each time step one action takes place as well as the attractor and sequence dynamics are moved one step forward. Also, synaptic updates happen in every one of those time steps. The authors may verify or correct my interpretations and further improve on their description in the manuscript. It is also confusing that time in the figure panels is given in units of trials, where each trial may consist of (maybe different amounts of) multiple time steps. Are the thin horizontal red ad blue lines time steps?

      Thank you for raising the confusing point.

      The reviewer’s understanding is correct. In our model, at each time step the agents transition to the next environmental state (which also corresponds to the contextual state). During this step, each processing stage proceeds in order: Context selector performs attractor selection, Sequence composer performs sequence selection, followed by action selection and synaptic updates. As learning progresses and hippocampal sequences begin to predict longer futures, reducing the need for step-by-step planning. However, at least at the beginning of each task, all processes are conducted at each time step (see Fig. 1G).

      In all tasks, trials are reset when the agents visit the reward sites (i.e., S4 or S5). n Fig. 2C, for example, one trial consists of three time steps (i.e., three state transitions), and the red and blue shaded regions indicate individual trials. During each time step, two types of hippocampal neurons are activated: a state-coding neuron and a transition-coding neuron. (In contrast, in X, one contextual state is active during one time step). Therefore, in Fig. 2E, two neuronal activities correspond to a single time step.

      For clarification, we have revised Fig. 2 and related descriptions in the manuscript as follows.

      “Here, we simplified this task by using an environment with five discrete states (S1-S5), i.e., five discrete external stimuli (Figure 2A), where agents transition to the next state at each time step.”

      “Figure 2C illustrates an example of both the environmental state transition and the corresponding contextual state transition of an agent, with each trial resetting upon visiting the reward sites (S4 or S5). ”

      “At each time step, one state-coding neuron and one transition-coding neuron are active in this order.”

      “At each time step, the agents transition between environmental states.”

      “The model’s computational dynamics are fundamentally synchronized with the environmental (behavioral) time step, and at each time step, the agents transition to the next environmental state. Upon a state transition, the agents first perform contextual state estimation by Context selector and activate a corresponding hippocampal neuron.”

      (2) As a consequence of my new understanding of the model dynamics, I have become doubts about the interpretation of the attractor network as context encoding. Since the X population mainly serves to disambiguate sequence continuation, right before the action has to be taken (active for only two time steps in Figure 1C?) they could also be considered to encode task space (El-Gaby et al. 2024; doi: 10.1038/s41586-024-08145-x).

      We thank the reviewer for this insightful comment.

      First of all, we would like to clarify that Figure 1C shows the following process: the activity of H at time step t−1 and the external stimulus at time step t jointly provide input to X module, and the activity of X settles into a contextual state at the time step t. As explained in our response to comment (1), the activity of X remains constant during each time step.

      The primary function of X module in our model is to disambiguate the environmental states defined by the external stimuli based on the history information. It is true that, in practice, whether an ongoing sequence is maintained or remapped depends on whether the observed stimulus is consistent with the predicted stimulus. However, this is a consequence of the predictive sequence obtained from scratch rather than the primary computational role of X module. In contrast, X module becomes particularly important when past experience does not uniquely determine the next state. In this situation, the agent must infer the contextual state by associating the current situation with previously experienced contexts, rather than relying solely on temporal continuity.

      We also add that, in most successful cases, the contextual states learned by the agent often correspond to the hidden states of each task as a result of disambiguation. In this sense, the resulting representation may resemble a “task space” encoding, as suggested by the reviewer. However, an important aspect of our model is that the agent does not assume the existence or number of hidden states a priori. Instead, we considered the situation where the agent initially underestimates the number of contextual states, and through remapping it incrementally increases the number of contextual representations. When the number of contextual states matches the number of hidden task states, the task is typically solved.

      (3) Also technically, I wonder why the authors introduce the criterion of 50(!) time steps to allow the attractor to converge, if the state of the attractor network is only relevant in one time step to choose the appropriate continuation of the sequence of actions. Is attractor dynamics important at all? What would happen if just the input and output weights to the X population are kept and the recurrent weights are set 0?

      We thank the reviewer for raising this confusing point.

      First, we would like to clarify that the “50 iterations” mentioned in the manuscript does not refer to 50 environmental time steps. We implemented multiple iterations of attractor updates (typically until convergence) by Context selector within each behavioral time step.

      We clarify this point in the Method section as below.

      “After history-based or landmark-based initialization, X iteratively updates its contextual state at the beginning of each time step according to the associative memory dynamics:”

      The recurrent connectivity within the X population is essential for attractor updates. If the recurrent weights were removed (i.e., set to zero), the network would lose the ability to retrieve distinct contextual states for the same stimulus. In that case, the model would be unable to solve the context-dependent task as we showed in this manuscript.

      (4) Figure 3E: How many time steps are the H cells active (red bars?) Figure 4J: What are the units of the time axis?

      Thank you for pointing this out.

      In Figure 3E, each time step is indicated in the X-axis ticks (i.e., each environmental state). As we explained in the comment (1), two hippocampal neurons’ activity (red bars) corresponds to each time step.

      Similarly, in Figure 4J, each time step is indicated in the X-axis ticks. To better represent the results, we added descriptions of the environmental states in our model to the X-axis tick labels in Figure 4J.

      We added the following texts below in Figure captions.

      “The x-axis represents each time step (corresponding to environmental states), and the y-axis shows the sorted activity of H module.”

      “The x-axis represents each time step (corresponding to environmental states), and the y-axis shows the decoding accuracy of each context based on hippocampal activity.”

    1. eLife Assessment

      This important study examines how chronic pain and opioid exposure interact at the cellular and molecular levels in a reward-related brain region. Using single-nucleus RNA sequencing, the authors map transcriptional changes in the rat ventral tegmental area following chronic inflammatory pain and acute morphine exposure. Notably, their convincing data support that acute morphine, not chronic pain, elicits a stress-related transcriptional response primarily in glial cells rather than neurons, challenging prevailing views of opioid action and supporting growing evidence for glucocorticoid signaling in glial responses. A limitation is the use of a single opioid dose and time point, and further discussion of these constraints would help clarify the broader implications of the findings.

    2. Reviewer #1 (Public review):

      Studies investigating global gene expression changes induced by a single morphine administration have previously been conducted in several rodent brain regions. In this work, the authors focused on the ventral tegmental area (VTA), a key structure of the reward system that has not been extensively characterized in this context. To examine genome-wide transcriptional responses, they employed single-nucleus RNA sequencing (snRNA seq), a method well-suited for profiling gene expression in VTA cells, which are otherwise difficult to isolate.

      The effects of morphine on gene expression in VTA cells were assessed in naive animals, in rats exposed to chronic inflammatory pain induced by local CFA injection into the paw, and in animals subjected to both conditions. The study revealed widespread transcriptional changes following morphine administration, whereas inflammation alone produced only limited alterations-an outcome that may reflect the sensitivity or resolution of the sequencing approach used.

      Further in vitro experiments conducted in multiple astrocyte models demonstrated that the increase in Fkbp5 expression observed in the VTA is unlikely to result from opioid receptor activation. Instead, the data indicate that this effect is mediated by glucocorticoid receptor stimulation. These findings suggest that the elevated Fkbp5 expression in the rat VTA represents a secondary response rather than a direct consequence of morphine exposure. Comparable transcriptional changes, as well as similar mechanistic interpretations, have been reported in previous studies examining the nucleus accumbens (NAc), reinforcing the view that glucocorticoid-dependent regulation of Fkbp5 may be a broader feature of opioid related neuroadaptations.

      The present paper showed largely similar morphine-induced gene changes in both male and female VTA samples. On the other hand, several studies indicate that males and females exhibit differences in dopaminergic activation and distinct gene expression profiles in response to opioids in the reward system. Preclinical studies have found marked sex differences in Fkbp5 expression in the dorsal striatum. This issue should be better addressed both experimentally and theoretically.

    3. Reviewer #2 (Public review):

      Summary:

      This study addresses an important gap in our understanding of how pain‑related neuroadaptations interact with opioid exposure at the cellular and molecular levels, particularly in terms of cell‑type-specific responses within reward‑related brain regions. By applying single‑nucleus RNA sequencing, the authors generate a comprehensive atlas of transcriptional changes in the rat VTA associated with chronic inflammatory pain and acute morphine administration.

      Strengths:

      Overall, the study is important, and the experiments are carefully designed and executed. The manuscript is logically structured and well written. The sample size is appropriate: nuclei were collected from 14 male and 14 female Sprague‑Dawley rats, with 6-8 animals per experimental group. The inclusion of both sexes further strengthens the study by enhancing the generalizability of the findings.

      To increase translational relevance, the authors also employ a human‑derived astrocyte culture model, which helps bridge findings from rodent tissue to human‑related cellular mechanisms.

      Weaknesses:

      A limitation is that the study examines only a single time point after morphine administration. However, this is balanced by the use of state‑of‑the‑art , and inherently expensive, molecular tools that allow deep transcriptional profiling.

      One area requiring clarification is compliance with methodological standards. The manuscript does not specify whether ARRIVE guidelines were followed, whether a power analysis was performed to justify the number of animals used, or how randomization and blinding procedures were implemented.

    4. Reviewer #3 (Public review):

      Summary:

      This work examined the transcriptional response to pain induction by CFA and/or morphine treatment in rat VTA at the level of single cells. This builds on prior work using bulk-tissue RNA-seq to evaluate response to SNI pain and/or oxycodone treatment. Here, authors find few lasting gene expression changes with CFA, but a robust transcriptional response to acute morphine, particularly in non-neuronal cells, where an increase in Fkbp5 stood out. The authors validated corticosterone-induced elevations in Fkbp5 in rat glial cell culture and human astrocyte cell culture, which are blocked by the GR antagonist mifepristone and inhibition of Nr3c1, but are not independently induced by the µOR agonist DAMGO.

      Strengths:

      The authors started with somewhat surprising transcriptional observations and followed the science appropriately to investigate the functional relevance of one particular finding. This work is well-powered and uses state-of-the-art snRNA-seq and CRISPR-based manipulations in both rat glia and human astrocyte cell preparations to determine the functional relevance of Fkbp5-regulated transcriptional activity.

      Weaknesses:

      (1) It was somewhat surprising that the CFA-Morphine group was not taken at a time point when the morphine treatment was found to be behaviorally effective.

      (2) The final conclusion that Nr3c1 repression reduces the response to cort is not novel or surprising, even if it is within human astrocyte culture (which is cool).

      (3) This work falls short of bringing the research full circle by applying their Nr3c1-CRISPRi approach in vivo to alter behavioral response to morphine and/or pain.

    1. eLife Assessment

      This manuscript presents a useful computational framework for systematically characterising how heterogeneity in initial conditions or biophysical parameters shapes the dynamic behaviour of protein signalling networks, with potential relevance to understanding adaptive drug resistance. While the approach represents a significant methodological contribution, the extent to which its conclusions are biologically informative remains debated, as the model is only qualitatively compared with experimental data and lacks quantitative validation. As a result, the strength of evidence supporting the mechanistic claims is viewed as incomplete.

    2. Joint Public Review:

      In this manuscript, the authors proposed an approach to systematically characterise how heterogeneity in a protein signalling network affects its emergent dynamics, with particular emphasis on drug-response signalling dynamics in cancer treatments. They named this approach Meta Dynamic Network (MDN) modelling, as it aims to consider the potential dynamic responses globally, varying both initial conditions (i.e., expression levels) and biophysical parameters (i.e., protein interaction parameters). By characterising the "meta" response of the network, the authors propose that the method can provide insights not only into the possible dynamic behaviours of the system of interest but also into the likelihood and frequency of observing these dynamic behaviours in the natural system.

      The authors study the Early Cell Cycle (ECC) network as a proof of concept, focusing on pathways involving PI3K, EGFR, and CDK4/6 with the aim of identifying mechanisms that may underlie resistance to CDK4/6 inhibition in cancer. The biochemical reaction model comprises 50 state variables and 94 kinetic parameters, implemented in SBML and simulated in Matlab. A central component of the study is the generation of large ensembles of model instances, including 100,000 randomly sampled parameter sets intended to represent intra-tumour heterogeneity. On the basis of these simulations, the authors conclude that heterogeneity in kinetic rate parameters plays a stronger role in driving adaptive resistance than variation in baseline protein expression levels, and that resistance emerges as a network-level property rather than from individual components alone. The revised manuscript provides additional clarification regarding aspects of the simulation and filtering procedures and frames the comparison with experimental data as qualitative. Nonetheless, the study is best interpreted as a theoretical and exploratory analysis of the model's behaviour under heterogeneous conditions. Consequently, questions remain regarding the biological grounding of the sampled parameter regimes and the extent to which the reported frequencies of resistance-associated behaviours can be directly interpreted in physiological terms.

      While the authors propose a potentially useful computational framework to explore how heterogeneity shapes dynamic responses to drug perturbation, a number of important conceptual and methodological concerns remain to be addressed:

      (1) The sampling of kinetic parameters constitutes the backbone of the manuscript, yet important concerns remain regarding its biological grounding and transparency. Although the revised version provides additional clarification on the exploration of "model instances", it is still not sufficiently clear how parameter values and initial conditions are generated, nor how the chosen ranges relate to biological measurements. The kinetic rates are sampled over broad intervals without explicit justification in terms of experimentally measured bounds or inferred distributions. As a consequence, it remains uncertain whether the ensemble of simulated behaviours reflects physiologically plausible cellular regimes or primarily the properties of the assumed parameter space. In this context, the large-scale sampling (100,000 parameter sets) resembles a Monte Carlo exploration of the model rather than a biologically calibrated representation of tumour heterogeneity.

      Furthermore, the adequacy of the sampling strategy in such a high-dimensional space (94 free parameters) remains open to question. In the absence of biologically informed constraints, the combinatorial space of possible parameter configurations is vast, and it is unclear to what extent the sampled ensembles can be considered representative. This issue is particularly relevant because the manuscript interprets the frequency of resistance-associated behaviours as indicative of their likelihood.

      The validation presented in Figure 7 does not fully resolve these concerns. The comparison with experimental data is qualitative, and the simulations are performed in arbitrary time units, which complicates direct interpretation alongside time-resolved experimental measurements. Moreover, certain qualitative discrepancies between simulated and experimental trends (e.g., persistent versus decreasing CDK4/6 activity) are not thoroughly discussed. As this figure represents the primary empirical reference point in the manuscript, the extent to which the model captures experimentally observed dynamics remains uncertain.

      Finally, aspects of presentation continue to limit transparency. Parameter ranges are described at different points in the manuscript but are not consolidated clearly in the Methods, and the definition of initial conditions remains ambiguous - particularly whether these correspond to conserved quantities or to the dynamic variables used to initialise simulations. In addition, the exact number of model instances underlying specific analyses and figures is not always explicit. Greater clarity on these issues is essential for assessing reproducibility and for interpreting the quantitative claims of the study.

      (2) A central conclusion of the manuscript is that heterogeneity in protein-protein interaction kinetics is a stronger driver of adaptive resistance than heterogeneity in protein expression levels. To assess the latter, the authors fix a nominal set of kinetic parameters and generate 100,000 random initial concentrations for the 50 model species. However, according to the simulation protocol described in the manuscript, each trajectory includes three phases: (i) simulation under starvation conditions to equilibrium, (ii) mitogenic stimulation to a second ("fed") equilibrium, and (iii) application of drug treatment. The equilibrium concentrations reached in phases (i) and (ii) are determined by the kinetic parameters of the model and are independent of the initial concentrations, provided the system converges to a stable steady state. In dynamical systems terms, stable equilibria are defined by the parameter set and attract all initial conditions within their basin of attraction. Since the kinetic parameters are fixed in this experiment, the pre-treatment equilibrium that serves as the starting point for drug application should likewise be fixed. Under these conditions, it is therefore not unexpected that sampling a large number of initial concentrations has limited influence on the treated dynamics.

      This raises conceptual questions about the interpretation of the comparison between kinetic and expression heterogeneity. If the system converges to a unique stable steady state prior to treatment, then variability in initial concentrations does not propagate into variability in drug response, and the observed dominance of kinetic heterogeneity may partly reflect this structural property of the model rather than a biological principle. Clarification is needed regarding whether multiple steady states exist under the nominal parameter set, and if so, how basins of attraction are explored.

      More broadly, it remains unclear why initial protein concentrations can be sampled independently of the kinetic parameters. In biological systems, steady-state expression levels are typically determined by the underlying kinetic rates. A more consistent approach might require constraining initial concentrations to correspond to equilibrium states of the chosen parameter set, thereby introducing relationships between at least some of the 50 initial conditions and the 94 kinetic parameters. Finally, the manuscript employs a non-standard terminology regarding "initial conditions," which may further obscure interpretation of these results and would benefit from clarification.

      (3) The technical implementation of the modelling and simulation framework remains difficult to evaluate due to insufficient methodological detail. Although the authors state that kinetic parameters are randomly sampled, the manuscript does not specify the distributions from which parameters are drawn, nor whether potential correlations between parameters are considered or explicitly ignored. Without this information, it is not possible to assess how implicit modelling assumptions shape the ensemble of simulated behaviours. Given that the conclusions rely on frequency-based interpretations across sampled parameter sets, greater transparency regarding the sampling procedure is essential.

      A further concern relates to the parameter filtering step. The authors report that the "vast majority" of sampled parameter sets produced systems that were "too stiff," and that these were excluded on the grounds that stiff dynamics are not biologically plausible. However, the manuscript does not clearly define how stiffness is assessed, nor why stiffness is interpreted as biologically unrealistic rather than as a numerical property of the formulation. In standard practice, stiff systems are typically handled using appropriate implicit solvers rather than being discarded. Similarly, parameter sets that produce negative state values are excluded, yet such behaviour may arise from numerical artefacts rather than from intrinsic model inconsistency. The rationale for excluding these parameter sets, rather than adapting the numerical scheme, is not sufficiently justified.

      The reported rejection rate - approximately 90% of sampled parameter sets - is substantial and raises questions regarding the interplay between model structure, parameter ranges, and numerical methods. As currently described, the filtering step appears to select parameter sets based primarily on computational tractability rather than on experimentally motivated biological criteria. The manuscript would be strengthened by clarifying whether the retained parameter sets are representative of biologically meaningful regimes, and by distinguishing clearly between exclusions based on biological plausibility and those arising from numerical considerations.

      Finally, important aspects of the simulation protocol require clarification. The model is simulated under "fasted" and "fed" conditions until equilibrium is reached, yet the criterion used to determine convergence is not specified. It would be important to describe how equilibrium is assessed (e.g., based on the norm of the time derivatives). Additionally, it remains unclear whether the mitogenic stimulus applied in the "fed" phase is assumed to be constant over time and, if so, how this assumption relates to biological experimental conditions. Greater detail on these implementation choices is necessary to ensure interpretability and reproducibility.

      (4) The manuscript states that the modelling conclusions are strongly supported by existing literature; however, the validation presented does not fully substantiate this claim. As noted above, the comparison with CDK2 and CDK4/6 experimental data remains qualitative, and the use of arbitrary simulation time units complicates interpretation of temporal agreement. The extent to which the model quantitatively or mechanistically recapitulates experimentally observed dynamics therefore remains uncertain.

      The claim that the model reproduces known resistance mechanisms is also difficult to assess in light of Figure S10, where a large fraction of network nodes (~80%) appear implicated in resistance under some conditions. If most components of the network can, in at least some parameter regimes, be associated with resistance phenotypes, the resulting lack of selectivity weakens the strength of model-based validation. It becomes challenging to distinguish specific mechanistic insights from generic consequences of network connectivity.<br /> In addition, the Supplementary Information notes that certain components of the mitogenic and cell-cycle pathways were abstracted or excluded in order to maintain computational tractability. While such abstraction is understandable in a large ODE framework, it raises interpretative questions. Proteins identified as potential resistance drivers within the model may, in some cases, represent aggregated or simplified pathway effects. Clarifying in the main text how such abstractions may influence the attribution of resistance mechanisms would strengthen the biological interpretation of the results.

      Drug inhibition is central to the manuscript's conclusions. The revised version clarifies that inhibition is implemented as a fixed fractional modification of specific kinetic rate laws. This abstraction is appropriate for exploring network-level responses, but it represents a stylised perturbation rather than a pharmacologically calibrated model of drug action. For full interpretability and reproducibility, the mathematical form of the modified rate laws, as well as the timing of inhibition relative to network equilibration, should be specified unambiguously. The biological implications of the findings depend critically on understanding this modelling choice.

      The one-at-a-time perturbation analysis presented in Figure 5 provides an interpretable ranking of first-order control points across the ensemble and offers mechanistic insight into primary sensitivities of the network. However, many targeted therapies act on multiple components, and resistance frequently arises through combinatorial mechanisms. The reported rankings should therefore be interpreted as identifying primary influences under isolated perturbations, rather than as a comprehensive account of multi-target drug behaviour.

      Overall, the manuscript succeeds in presenting a conceptual and exploratory framework for analysing how signalling network topology can shape the qualitative landscape of adaptive responses under heterogeneous kinetic conditions. Its principal contribution lies in establishing a systematic platform for large-scale in silico exploration. At the same time, the current limitations in biological calibration, parameter grounding, and validation constrain the extent to which the conclusions can be interpreted as predictive or quantitatively representative of specific tumour contexts. Addressing these issues would further strengthen the connection between the theoretical landscape described here and experimentally observed resistance dynamics.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Reviews:

      In this manuscript, the authors proposed an approach to systematically characterise how heterogeneity in a protein signalling network affects its emergent dynamics, with particular emphasis on drug-response signalling dynamics in cancer treatments. They named this approach Meta Dynamic Network (MDN) modelling, as it aims to consider the potential dynamic responses globally, varying both initial conditions (i.e., expression levels) and biophysical parameters (i.e., protein interaction parameters). By characterising the "meta" response of the network, the authors propose that the method can provide insights not only into the possible dynamic behaviours of the system of interest but also into the likelihood and frequency of observing these dynamic behaviours in the natural system.

      The authors studied the Early Cell Cycle (ECC) network as a proof of concept, specifically focusing on PI3K, EGFR, and CDK4/6, with particular interest in identifying the mechanisms that cancer could potentially exploit to display drug resistance. The biochemical reaction model consists of 50 equations (state variables) with 94 kinetic parameters, described using SBML and computed in Matlab. Based on the simulations, the authors concluded the following main points: a large number of network states can facilitate resistance, the individual biophysical parameters alone are insufficient to predict resistance, and adaptive resistance is an emergent property of the network. Finally, the authors attempt to validate the model's prediction that differential core sub-networks can drive drug resistance by comparing their observations with the knock-out information available in the literature. The authors identified subnetworks potentially responsible for drug resistance through the inhibition of individual pathways. Importantly, some concerns regarding the methodology are discussed below, putting in doubt the validity of the main claims of this work.

      While the authors proposed a potentially useful computational approach to better understand the effect of heterogeneity in a system's dynamic response to a drug treatment (i.e., a perturbation), there are important weaknesses in the manuscript in its current form:

      (1) It is unclear how the random parameter sets (i.e., model instances) and initial conditions are generated, and how this choice biases or limits the general conclusions for the case studied. Particularly, it is not evident how the kinetic rates are related to any biological data, nor if the parameter distributions used in this study have any biological relevance.<br /> (2) Related to this problem, it is not clear whether the considered 100,000 random parameter samples sufficiently explore parameter space due to the combinatorial explosion that arises from having 94 free parameters, nor 100,000 random initial conditions for a system with 50 species (variables).<br /> (3) Moreover, the authors filter out all the cases with stiff behaviour. This filtering step appears to select model parameters based on computational convenience, rather than biological plausibility.<br /> (4) Also, it is not clear how exactly the drug effect is incorporated into the model (e.g., molecular inhibition?), nor how it is evaluated in the dynamic simulations (e.g., at the beginning of the simulation?). Moreover, in a complex network, the results may differ depending on whether the inhibition is applied from the start or after the network has reached a stable state.<br /> (5) On the same line, the conclusions need to be discussed in the context of stability, particularly when evaluating the role of initial conditions. As stable steady states are determined by the model parameters, once again, the details of how the perturbation effect is evaluated on the simulation dynamics are critical to interpret the results.<br /> (6) The presented validation of the model results (Fig. 7) is only qualitative, and the interpretation is not carefully discussed in the manuscript, particularly considering the comparison between fold-change responses without specifying the baseline states.

      We thank the reviewers for their thoughtful and constructive comments. In response to their comments, we have undertaken a substantial revision to address all the comments, improve clarity, transparency, and robustness while preserving the paper’s core contribution: a principled, scalable framework (MDN) for mapping how molecular heterogeneity and network architecture shape adaptive drug-response dynamics. At a high level, we clarified the study design and analysis goals, tightened definitions, and added methodological detail where it most advances interpretability. Importantly, these updates leave the analytical pipelines and major conclusions unchanged.

      Conceptually, we now make explicit that our objective is coverage of the output space of qualitative dynamics supported by the network topology, not exhaustive enumeration of parameter space. To support this, we added a convergence analysis and clarified that “triplicates” refers to independent ensembles used to demonstrate reproducibility. We also refined how we describe and implement initial conditions (as conserved total abundances that encode expression heterogeneity) and reframed filtering as minimal numerical/feasibility checks, using rejection sampling to obtain the prespecified ensemble size. Solver choices and input modelling (constant step mitogen/drug) are now spelled out succinctly.

      We expanded the model specification and rationale (complete reaction list with rate laws and brief biological justifications in the Supplement) and unified terminology throughout. Figures and legends have been overhauled for readability and accuracy, with missing labels added and ordering corrected. For validation, we clarified the nature of the single-cell reporter readout, improved Figure 7’s presentation, and emphasised - consistent with our aims - that comparisons are qualitative.

      Finally, we have rewritten the Discussion to centre on interpretation, implications, and connect our findings to the literature. It now: (i) frames MDN as a systems-level framework that links molecular heterogeneity to qualitative signalling “meta-dynamics” and adaptive escape under constant drug pressure; (ii) highlights two key findings: an asymmetry in control (interaction kinetics exert stronger, more consistent influence than protein abundance) and a topology-driven convergence whereby a vast parameter space funnels into a finite set of recurrent behaviours; (iii) shows that resistance is a network-level property, with many possible routes but a small set of recurrent hubs/modules dominating; and (iv) provides a qualitative alignment with single-cell reporter data while clarifying the intent and limits of that comparison. Moreover, we now explicitly discuss limitations (rate-law simplifications, broad priors, determinism, and modular abstractions) and outline next steps for future research, including data-constrained priors and stochastic extensions.

      We believe these revisions materially strengthen the manuscript and fully address all the reviewers’ comments. A detailed, point-by-point response follows.

      Joint Recommendations for the Authors:

      (1) It is confusing exactly what are the different sets evaluated in each cases, e.g. "generated 100,000 model instances, each with the same set of ICs but a unique set of randomly generated parameter values" (lines 299-300), "generated 100,000 model instances (in triplicate), each with the same set of 'nominal' parameter values (see supplementary Table S1), and a unique set of ICs, and repeated the analysis as performed previously" (lines 366-368), "combined the 1000 IC sets with each parameter set to create 1000 model instances" (lines 382-383), "repeated for 1000 parameter sets, allowing us to observe how frequently IC variation induced adaptive resistance independent of the chosen parameter set" (lines 386-387). A small table or just a clearer explanation is needed.

      In response to these comments, we have revised the main text to clarify the process of model instance generation. Specifically, we have made changes at page 7: line 297 - page 8: line 302, page 8: lines 305 - 310, page 9: lines 372-378, and page 9: line 384 – page 10: line 399 in the revised main text.

      We have also added a new Figure (Figure S1) to the supplementary file to allow readers to visualise the model generation process for each relevant set of experiments. Supplementary figures are referenced in the main text where appropriate.

      (2) The authors mentioned performing each simulation in triplicate, which is puzzling as the model is based on deterministic ODEs with fixed parameters for each simulation. Under such conditions, one would anticipate identical results from multiple simulations with the same initial conditions and fixed parameters. Perhaps the authors expect the model to exhibit chaos or aim to assess the precision of the parameter estimates through triplicate simulations. Further clarification from the authors would be valuable to comprehend the rationale behind conducting triplicate simulations in a deterministic setting.

      We agree that repeating deterministic ODE simulations with identical inputs would be redundant. In our study, “triplicate” referred instead to generating three independent ensembles of 100,000 unique model instances each, where model parameters (or initial conditions) were randomly resampled. These ensembles were analysed separately to assess whether the inferred meta-dynamic distributions converged robustly. Indeed, the distributions from the three replicates were nearly indistinguishable, confirming that the results are reproducible and not artefacts of a particular random draw.

      We have revised the main text to clarify this distinction (page 8: lines 305 - 310) and added an extended explanation for meta-dynamic behaviour convergence in the new section Error Convergence in the supplementary text (page 6: lines 184 - 210).

      (3) While the lack of a connection between model parameters and biological data (mentioned in the public review) may not be a fatal flaw in the manuscript, the concern about the 100,000 random samples being insufficient to explore the parameter space is valid. In a thought experiment, considering the high and low rate for each parameter and the combinatorial explosion of possibilities (2^94), the number of simulations performed (100,000) represents only an extremely small fraction of the entire parameter space (~1/10^(23)). This limitation might not accurately capture the true heterogeneity present inside a solid tumour. One potential solution is to determine biological bounds on model parameters through data fitting, which can provide more meaningful constraints for the simulations. Alternatively, increasing the number of simulations and adopting more efficient sampling techniques can enhance the coverage of possible parameter sets.

      We thank the reviewer for this insightful comment. We agree that the 94-dimensional parameter space is vast, and that 100,000 simulations represent only a fraction of the total combinatorial possibilities. However, the objective of our study is not to exhaustively sample the entire parameter space, but rather to sufficiently sample the ‘output space’ - that is, the complete spectrum of qualitative dynamic behaviours the network topology can generate. The key question is whether 100,000 model instances are sufficient for the distribution of these output dynamics to converge.

      To formally address this, we have performed a convergence analysis, which is now detailed in the new supplementary section "Error Convergence" (Supplementary text page 6: lines 184 - 210) and illustrated in Supplementary Figure S12. This analysis demonstrates that the mean squared error (MSE) between dynamic distributions from N and 2N simulations exponentially decreases as N increases, and the distribution of protein dynamics changes negligibly well before reaching 100,000 instances. Furthermore, performing the entire analysis in triplicate with independent random seeds yielded nearly identical meta-dynamic maps (average standard deviation < 0.04%), giving us high confidence that we have robustly captured the network's behavioural repertoire.

      We believe this convergence occurs because the system is degenerate: many distinct parameter sets within the high-dimensional space map to the same qualitative outcome (e.g., 'rebound' or 'decreasing'). Our goal was to capture the set of possible outcomes, not every unique parameter combination that leads to them.

      Regarding the parameter range, we intentionally chose a broad, unbiased range (10<sup>-5</sup> to 10<sup4></sup>)as a proof-of-concept to delineate the theoretical upper limit of heterogeneity the network can support, thereby capturing even rare but potentially critical resistance dynamics. We agree with the reviewer that a future direction is to constrain these ranges using biological data. Such an approach would transition from defining what is possible (the focus of this manuscript) to predicting what is probable in a specific biological context. We have added this important point to the Discussion (page 16: lines 663-679) to highlight this avenue for future work.

      (4) One of the manuscript's main results indicates that protein interactions play a more significant role in driving adaptive resistance than protein expression. To explore the impact of protein expression, the authors fixed a nominal parameter set and generated 100,000 initial concentrations of the 50 proteins in the ODE model. However, the simulations' equilibrium concentrations in the "starvation" and "fed" phases, which form the initial condition for the treated phase, are uniquely determined by the nominal model's kinetic parameters and not the initial conditions, which remain identical for each simulation. From a dynamical systems perspective, stable steady states are determined by the model parameters and attract all initial conditions within their basin of attraction. As a result, a random sampling of the initial conditions has a limited impact on the model dynamics. The authors' conclusion that "the ability of expression to induce resistance also seems to be dependent on the master parameter set" can be explained by this dynamical systems perspective, where the resistance state corresponds to a stable steady state determined by the master parameter set. Considering this, the evidence presented in the manuscript may not fully support the authors' conclusion regarding the importance of protein expressions relative to protein dynamics. The discrepancy might be attributed to a possible misunderstanding of this point, and further clarification from the authors could be helpful.

      We thank the reviewer for the thoughtful perspective. We agree that, in a monostable system with fixed kinetic parameters and fixed conserved totals, varying only the initial split among moieties (e.g., X vs pX) will not change the final steady state; trajectories converge to the same attractor. In our analysis, however, “initial conditions” predominantly refer to total protein abundances (e.g., X_tot = X + pX + complexes), used as a proxy for expression heterogeneity. These totals are invariants on the simulated timescale (no synthesis/degradation in the pre-equilibration phases), and therefore alter the value of the steady state under a given parameter set. In other words, our IC sampling mostly varies conserved totals rather than merely redistributing a fixed total; hence the equilibrium reached after the starvation/fed pre-equilibrations depends on the sampled totals and the kinetics. This can be seen in the new Supplementary Figure S4, showing that changing the ICs does shift the eventual steady state even when kinetic parameters are fixed.

      We have revised the text to: (1) define ICs explicitly as total abundances for multi-state species, (2) distinguish “initial split” from “conserved totals,” and (3) clarify that expression effects are context-dependent rather than universally dominant (page 4: lines 139-141 and page 10: lines 413-416)

      (5) Additionally, it is important to note that the random sampling of 100,000 initial concentrations might not sufficiently explore the vast space of possible initial conditions. In the thought experiment mentioned earlier, where each protein can have high or low expression concentrations, there are approximately 2^(50) = ~10^(15) possible combinations of initial concentrations. Thus, the 100,000 random simulations only represent around ~1/10^(10) of the possible initial conditions in this simplistic scenario. Consequently, this limited sampling of initial conditions may not provide enough information to draw meaningful conclusions, even if the initial conditions were more directly linked to kinetic rates.

      Please see our response to Comment (3). Briefly, our ICs are continuous total abundances (conserved moieties), not binary high/low states; many IC configurations converge to the same qualitative attractors, so we estimate distributional properties rather than enumerate all combinations. Our convergence diagnostics (independent replicates and sample-size doubling) show that the meta-dynamic distributions stabilise well before N=100,000 (see Supplementary Figure S12). We have clarified this in the Supplementary Information (Error Convergence section) with the new convergence results.

      (6) The authors implement a parameter selection step in the manuscript, where they filter out parameter sets that lead to what they term non-biological simulations. However, the rationale for determining if a given parameter set results in a stiff system of ODEs remains unclear. The authors cite references [38,39] to support the claim that stiff equations are not biologically plausible. Still, upon review, it is evident that [38] does not include the term "stiff," and [39] discusses using implicit methods to simulate stiff ODE models without specifically commenting on the biological plausibility of stiff systems. The manuscript lacks direct evidence to justify the conclusion that filtering out parameter sets that result in stiff ODE systems is reasonable. Since the filtering step accounts for the majority of discarded parameter sets, a stronger foundation is required to support the statement that stiff equations are non-biological.

      We thank the reviewer for pointing out the issue in our original justification. The reviewer is correct: stiff systems are a common feature of biological models, and our claim that they are likely ‘biologically implausible’ was not well substantiated. The filtering of these model instances was, in fact, due to a computational limitation rather than a biological principle. The issue was that these parameter sets produced systems of ODEs that were so numerically stiff they were unsolvable within a reasonable timeframe by the SUNDIALS ODE solver suite, which is specifically designed for such systems.

      Following the reviewer's comment, we investigated the source of this prohibitive stiffness. We discovered it was not an intrinsic property of the parameter sets themselves, but rather an artifact of our simulation setup. The extreme stiffness occurred almost exclusively during the initial integration timesteps, caused by the large initial discrepancy between the concentrations of active and inactive protein forms. This large discrepancy created the conditions for overtly stiff solutions i.e. unsolvable with implemented ODE solve settings. To overcome this problem, we set a large maximum number of steps in the ODE solver for the first couple of time points, enabling the solver to overcome the excessively stiff portion of the solve. We found that the vast majority of the previously 'unsolvable' model instances could now be successfully simulated. Consequently, the number of parameter sets discarded due to solver failure is now negligible (< 1%), and this filtering step no longer accounts for the majority of discarded parameter sets. Most importantly, the distributions of dynamics were not significantly altered by this adaptation.

      We have revised the " Sampling and filtering of model instances (page 5: lines 174 – 189)" part in the Methods section to reflect this more accurate understanding. We have corrected our original claim regarding the biological plausibility of stiff systems and corrected our use of the references. Ref [38] was included to demonstrate that models of biological systems are stiff, which was a major conclusion of that paper, and [39] was originally included to demonstrate that solving ODEs is reliant on solvers that can integrate stiff systems. Upon review, ref [39] has been removed.

      Overall, this investigation has made our analysis more robust by allowing us to include a wider, more representative range of parameter sets, and has tangibly improved the quality of our study.

      (7) Additionally, it is important to consider the standard method for accounting for stiff systems, as presented in [39], which involves using implicit numerical methods for ODE simulation. The authors mention using numerical methods from the SUNDIALS suite, which includes implicit methods, but the specific numerical method used remains unclear. Furthermore, it would be valuable for the authors to disclose the number of parameter sets that were filtered to obtain the final set of 100,000 accepted parameter sets. This information would provide insights into the extent of filtering and the proportion of parameter sets that were excluded during the selection process.

      We apologise for the lack of specific detail and have now updated the text. To clarify, all ODE simulations were performed using the CVODE solver from the SUNDIALS suite. This solver employs an implicit, variable-order, variable-step Backward Differentiation Formula (BDF) method, which is robust and specifically designed for handling the stiff systems common in biological network modelling. We have now explicitly stated this in the "ODE model construction, modelling, and simulations (page 4: lines 162 – 164)" section of the Methods.

      Regarding the filtered parameters, we have included a revised and detailed discussion of this in the "Sampling and filtering of model instances (page 5: lines 174 – 189)" part in the Methods section (see our response to comment (6) above). Briefly, after applying the filters, ~40–45% of instances did not reach steady state within the simulation timeframe, and ~50–55% did not meet the minimum drug-response criterion. Approximately 10% satisfied all criteria and were retained for analysis. Importantly, we employed ‘rejection sampling’ and continued drawing until we had N = 100,000 accepted instances that satisfied all the criteria.

      (8) An important step in the simulation process described by the authors is the simulation of the "fasted" and "fed" states until an equilibrium is reached. However, it is not clear how the authors determine if the system has reached an equilibrium. It would be helpful if the authors could provide more information regarding the criteria used to assess equilibrium in the simulations. Regarding the "fed" state, it is not explicitly stated whether the mitogen stimulus is assumed to be constant throughout the "fed" experiment. Considering the dynamic nature of mitogen stimulation in biological systems, it would be beneficial if the authors could clarify this assumption and discuss its biological relevance.

      We apologise for the lack not specifying this in the original text. A simulation was considered to have reached equilibrium when the concentration of every protein species changed by < 1% over the final 100 time steps of the simulation phase. We have now added this criterion to the "Sampling and filtering of model instances (page 5: lines 177 – 179)" part of the Methods section.

      Regarding the second part of the comment, in our simulations, both the mitogenic and the drug inputs were modelled as constant, stepwise functions that, once turned on, remained at a fixed concentration for the remainder of the simulation. The biological rationale for this choice was to rigorously test for bona fide adaptive resistance. By maintaining a constant mitogenic and drug pressure, we can ensure that any observed recovery in the activity of downstream proteins is due to the internal rewiring and adaptation of the signalling network itself, rather than an artefact of the removal or decay of the external stimulus/drugs. We have now clarified this rationale in the "ODE model construction, modelling, and simulations (page 4: lines 168 – 171)" part of the Methods section.

      (9) The "Description of Model Scope and Construction" section in the Supplementary Information should include explicitly the model reactions and some discussion about their specific form (e.g., why is '(((kc2f1*pIR*PI3K) / (1 + (pS6K/Ki2))) + (kc2f2*pFGFR*PI3K))' representing the phosphorylation rate of PI3K, with pS6K in the denominator?).

      The reviewer is right to ask for model justification. We have expanded the Supplementary “Description of Model Scope and Construction” section (page 2: line 63 – page 5: line 185) to include a complete reaction list with rate laws and a brief rationale for each. We also explain the specific PI3K phosphorylation term: activation by pIR and pFGFR is attenuated by pS6K via a denominator, which captures the well-described S6K-mediated negative feedback that reduces activation (e.g., via IRS1 phosphorylation).

      (10) In line 349, the statement "Given that CDK46cycD is only strongly suppressed in just under 60% of the model instances (Figure 3C)" lacks clarity regarding where to look to interpret the 60% value. If this means that 4 out of the 7 model instances are resistant, and the other 2 proteins also have the same percentage of resistance, then there is no apparent reason to focus solely on CDK46cycD.

      The reviewer is correct; the figure reference was an error, which has been rectified in the main text (page 9: line 355). The actual figure reference was to Supplementary Figure 2A, which shows the heatmap of all the frequencies for each protein dynamics for all the active protein forms. CDK4/6cycD shows a sustained decreasing dynamic for 59.93% of model instances, which is where this number was derived. We have also now explicitly referenced this number in the supplementary Figure 2A legend.

      We focus on CDK4/6cycD because it is the direct pharmacological target of CDK4/6 inhibitors. Our point was to suggest that even when the target is suppressed in the majority of instances (~60%), this does not reliably propagate to uniform downstream inhibition across the network, thus highlighting emergent, network-driven adaptive responses.

      (11) We observed that in Fig. 5A, the authors show that multiple pathways are blocked. However, it is unclear whether they reduced the value of one parameter in the experiment or simulated multiple combinations of parameter inhibition. Considering the large number of parameters (94) in the model, if the authors simulated all possible combinations of parameter inhibition, the number of combinations would be significantly more than 94. An actual inhibitor typically has an inhibitory effect on multiple molecules. Therefore, it would be necessary to identify the parameters that lead to drug resistance when multiple molecules are inhibited. However, examining the inhibition patterns for all 94 parameters would be practically impossible. As a potential approach, we suggest using ensemble learning techniques, such as random forests, to handle this problem efficiently. With a dataset of binary outputs indicating the presence or absence of resistance for a sufficient number of inhibition patterns, ensemble learning can be applied to find the parameters that contribute to drug resistance. Popular feature selection algorithms like Boruta could be utilised to identify the most relevant parameters. The results obtained by ensemble learning are similar to the ranking in Fig. 5C, potentially providing a more robust validation of the authors' findings. By incorporating these additional analyses, the authors could strengthen the reliability and significance of their results related to parameter inhibition and drug resistance.

      We appreciate the suggestion and the opportunity to clarify. Figure 5A depicts multiple pathways were interrogated, but in the analysis, parameters were inhibited one at a time (OAT) - not in combination. We have revised the figure legend and added a section named “Protein knockdown perturbation analyses (page 6: lines 228 – 233)” in the Methods section to make this explicit. Moreover, some additional text in the main text has been slightly modified to make this clearer (page 11: lines 462-463, page 24: lines 856-857).

      We chose the OAT design intentionally to obtain causal, first-order attribution of control points across a broad parameter ensemble without confounding from simultaneous co-inhibition. This provides an interpretable ranking of primary drivers (Figure 5C) that is consistent with the paper’s mechanistic focus. We agree that a multi-target inhibition approach could be a useful next step; however, an exhaustive combinatorial screen is beyond the scope of this proof-of-concept. In such future studies, the ensemble learning, as suggested by the reviewer, could be layered onto our MDN framework to assess robustness of the ranking under co-inhibition.

      (12) In explaining the parameterization of the model, we find an implication of a quantitative model. However, upon examining the results in Fig. 7D, we observe that they are only qualitatively correct. When comparing Figs. 7A and 7C, we note that many model instances are immediately suppressed, and the time scale remains unknown. We believe it would be essential for the authors to explain how the model of this study maintains its quantitative nature despite the results in Fig. 7. If such an explanation cannot be provided, it raises concerns regarding the biological reliability of several findings within this study.

      While our framework is built on quantitative ODEs, the validation we present in Figure 7 is indeed qualitative. This is an intentional and key feature of our study's design. Our goal was not to build a calibrated, quantitative model of a specific cell line (e.g., MCF10A), but rather to establish a proof-of-concept theoretical framework that systematically explores the full spectrum of dynamic behaviours a given network topology can possibly generate. To achieve this, we intentionally sampled parameters from a very broad, unbiased range to delineate the theoretical upper limit of heterogeneity. This in silico population is therefore designed to be far more heterogeneous than any single isogenic cell line.

      The striking qualitative agreement seen between our meta-dynamic distributions and the single-cell data in Figure 7D is thus not a failure of quantitative prediction, but rather a strong validation of our core premise: that a significant degree of signalling heterogeneity exists in cell populations and that our framework can effectively capture its emergent properties.

      Regarding the specific comment on Figure 7C, we apologise for the lack of clarity. Nominally, we chose to simulate for 24 hours however, the x-axis in our simulations represents arbitrary time units, as the timescale is dependent on the meaning/units of the parameter values. The goal is to compare the qualitative shape of the response (e.g., rebound, sustained decrease), not the absolute time in hours. Moreover the rapid initial suppression seen in many of our model instances (Fig 7C) is a direct parallel to the rapid suppression seen in the experimental data (Fig 7A). This initial phase is followed by a wide variety of adaptive behaviours (or lack thereof) in both our simulations and the real cells, which is the key phenomenon we are studying.

      We have revised the text (page 14: lines 598-601) and Figure 7’s legend to state more explicitly that our validation is qualitative and to clarify the purpose of our broad, uncalibrated approach. We have also added a note in the Discussion (page 18: lines 744-747) that calibrating this framework with cell-line-specific data is a natural next step for generating quantitative, context-specific predictions.

      (13) Related to the previous point, the experimental data is presented as fold-change during CDK4/6 inhibition, and we notice that the initial fold-change at time 0 varies between 1 and 1.8. The difference in initial fold-change is unclear to us, as our understanding of fold-change typically corresponds to the change from baseline, typically represented by the protein concentration at time 0.

      Furthermore, while the experimental data exhibits uniformly decreasing CDK4/6 activity, a substantial number of simulations indicate constant CDK4/6cycD, showing a significant qualitative discrepancy between the simulations and experimental findings. This disparity makes it difficult for us to interpret the comparison between the two datasets effectively, given the complexities in comprehending the experimental fold-change figure.

      As Figure 7 serves as the primary validation of model simulations in the manuscript, we believe that the current presentation may not provide a compelling reason to believe that the model accurately captures experimental data. To enhance clarity and validation, we suggest overlaying the experimental data over the simulations or considering the median and 10/90% percentile of the experimental data, which may potentially offer improved readability and facilitate a more robust interpretation of the comparison.

      The experimental data from Yang et al. (ref 55, main text) measures kinase activity using a nucleus-to-cytoplasm translocation reporter system, wherein a bait protein is phosphorylated by the target kinase causing it to translocate from the nucleus to the cytoplasm. Hence, the y-axis represents the ratio of nuclear vs. cytoplasmic fluorescence, not a fold-change from a t=0 baseline. The variation in the starting value (between 1 and 1.8) reflects the inherent heterogeneity in the reporter's localization across individual cells even before the drug is added. We have updated the y-axis label and revised Fig. 7’s legend to state this explicitly.

      The most likely explanation for the discrepancy between experimental dynamics and our simulation dynamics is that the experimental data comes from an isogenic cell line that is largely sensitive to CDK4/6 inhibition. Our simulations are derived from a very wide parameter sweep, where the intent is to represent all possible cell states. It is quite striking that that there is such a high correlation between the experimental data and simulations, indicating that perhaps the heterogeneity of even isogenic cell lines is significantly greater than might be intuited; a point we now mention in the revised Discussion (page 17: lines 716-727).

      It is worth noting again, that our analysis is intentionally constructed to be as heterogeneous as possible, and is not trained on any biological data that might otherwise constrain the output-behaviour space. The isogenic cell line almost certainly represents a much more constrained output-behaviour space than our analysis.

      The y-axis label has also been updated accordingly. As mentioned in (12) this result is intended as a qualitative validation, showing that cell lines indeed have highly variable signalling dynamics. Given the range of parameters tested, we think it is surprising that the degree of agreement between the experiment and our analysis is as high as it is. Again, we believe this suggests that heterogeneity may be more prevalent than is intuited. We do not believe we have made any strong quantitative claims in the main text, and we certainly aim to work towards biological, quantitative validation in the future. Finally, we altered the wording of the results heading (page 14: line 562) to make it clear that we are only making qualitative claims and removed the claim that the evidence was strong.

      With these clarifications and corrections, we believe the validation is now much more compelling. The key point is not a perfect quantitative match, but the strong similarity in the distribution of heterogeneous behaviours.

      (14) The authors mention simulating treatment with 10nM of CDK4/6i or Ei, but specific details on how this treatment is included in the model simulations are not provided. This lack of information makes it challenging to fully evaluate the comparison between model simulations and experimental evidence in Figure 7. It would be highly appreciated if the authors could clarify how the treatment with CDK4/6i or Ei is incorporated into the simulations to facilitate a better understanding and interpretation of the results.

      To clarify, the effects of the inhibitors were incorporated directly into the kinetic rate laws of their respective target reactions.

      CDK4/6 inhibitor (CDK4/6i): This was modelled as an inhibitor of the formation of the active CDK4/6-cyclin D complex. We have now explicitly detailed this in the description for reaction R27 in the "Description of Model Scope and Construction" section of the Supplementary Information.

      Estrogen Receptor inhibitor (Ei): This was modelled as an inhibitor of the estrogen-dependent activation of the Estrogen Receptor. This is now explicitly detailed in the description for reaction R15 in the same supplementary section.

      It is however important to reiterate that our goal in Figure 7 is qualitative, shape-based comparison; therefore, we used a fixed fractional inhibition (reported in Methods) rather than a calibrated IC50/Hill model.

      (15) The authors state strong support for their modelling conclusions based on the literature. However, we still have concerns regarding the validation of the model against CDK2 or CDK4/6 data in Figure 7, as it appears less convincing to us. Furthermore, the authors list known resistance mechanisms that are replicated in their modelling. Nevertheless, we find the conclusion somewhat weakened by Figure S10, where approximately 80% of the nodes are implicated in some form of resistance pathway. This raises questions about the model's selectivity, as many proteins included in the model seem to drive resistance in some manner. In the Supplementary Information, the authors mention excluding or abstracting some protein species from the mitogenic and cell cycle pathways to manage computational resources effectively. This abstraction makes it difficult to determine if the proteins identified as potential drivers of resistance genuinely drive resistance or might represent abstractions of other potential drivers. To enhance the manuscript's clarity and address potential concerns about the model's selectivity and abstraction, we suggest providing more details and discussion in the main text.

      The reviewer's observation that a large number of nodes are implicated in resistance pathways in Figure S10 is correct. However, we argue this is not a weakness of the model's selectivity, but rather a key finding that reflects the biological reality of adaptive resistance. The literature is replete with a wide and growing number of distinct mechanisms of resistance even to a single class of drugs (1,2), which supports the idea that cancer can co-opt a wide variety of network nodes to survive.

      Figure S10 is not a binary map where every implicated node is equal, instead it is a likelihood map, where the colour and weight of the connections represent how often a particular interaction participates in driving resistance across the theoretical full range of possible network dynamics. The figure shows that while many nodes can contribute to resistance, they do so in a hub-like manner i.e. small subsets of nodes coordinate to drive resistance. This provides a rationalised, data-driven prioritisation of the most dominant and recurrent resistance strategies. We draw two important conclusions from this work 1) Resistance likely occurs due to resistance hubs, not individual proteins, and 2) that the frequency of a resistance hub in an MDN analysis is likely proportional to the frequency of that hub emerging as a resistance mechanism in a population of cells and patients.

      Regarding the issue of abstraction, the reviewer is correct that this is an inherent feature of any tractable systems model. In our case, several species in the mitogenic/cell-cycle pathways are module-level proxies to control model size. The highly implicated "hub" nodes in our model likely represent critical cellular processes that are themselves composed of several individual protein interactions.

      To address these concerns, we have significantly revised the Discussion (page 16: lines 681 – 694) to: (1) frame resistance as a network-level phenomenon; (2) show that our frequency-based ranking is selective, prioritising the most probable, recurrent mechanisms; and (3) clarify that - given model abstraction -our findings implicate critical processes (modules), not just single proteins, as the drivers.

      Overall, these changes do not alter our main conclusions: adaptive resistance is an emergent, network-level property; many routes exist, but a smaller set of nodes/modules consistently carry the largest influence across heterogeneous contexts.

      (16) We consider that the figures and legends, including the supplementary information, are inadequately explained. The information provided is insufficient for us to comprehend the figures fully, leading to the need for interpretation on our part as readers. This could potentially introduce biases when trying to understand the claims made by the authors. To improve our understanding, it would be essential for the authors to assign appropriate labels to the figures and provide comprehensive explanations in the legends. For example, in Fig 3, we suggest labelling the tree diagrams in panels A and B, as well as the colour bars. We also recommend applying the same approach to other figures, adding accurate axis labels and descriptions of colour gradients to enhance clarity.

      We thank the reviewer for this critical feedback. To address this comment, the figure legends have been revised where appropriate and greatly expanded to improve their comprehension. Moreover, we have added explicit labels to all previously unlabelled components, such as the cluster dendrograms and colour code bars in Figure 3A, B.

      (17) To enhance readability, we recommend interchanging the order of Figures 1 and 2 in the sequence they appear in the main text. Alternatively, the text can be adjusted to refer to the figures in the correct order. Additionally, attention should be given to the bottom of Fig 1, which appears to be cropped or cut off. Furthermore, the incorrect word spacing in some figure elements, such as Fig. 3A title, Fig. 5B title, and Fig. 6B y-label, should be corrected for improved visual presentation.

      Following the reviewer’s comment, the order of Figures 1 and 2 has been switched to reflect the order in which they are referred to in the main text. These Figures have been re-exported to fix unintentional word spacing errors.

      (18) We recommend that the language used to refer to the initial conditions in the manuscript is clarified and homogenised. Currently, the authors use different terms such as "basal expression," "protein expression," "state variable values," or "initial conditions" to refer to them. This variation in terminology can be confusing for readers. In particular, the use of "basal expression" is problematic, as it typically refers to the leaky value of a reaction in the absence of an inducer, making it another biophysical parameter of the system rather than an initial condition. To enhance clarity and consistency, we suggest the authors decide on a single term to refer to the initial conditions throughout the manuscript and provide a clear explanation of its meaning to avoid any confusion. This will help readers better understand the concept being discussed and prevent any potential misinterpretations.

      We thank the reviewer for this very helpful suggestion. To resolve this and improve clarity, we have homogenized the language throughout the manuscript. We now clarify the use the following 3 terms in their specific contexts:

      We use “protein abundances” exclusively for the conserved total abundances of multi-state species (e.g., Xtot = X + pX + complexes) that are sampled across instances to represent expression heterogeneity.

      We use ‘initial conditions’ to refer to initial values of the state variables in a model simulation. This term is related to protein abundance as the setting of initial conditions for conserved species sets the protein abundance. This is explicitly stated in the text (page 3: lines 87 - 91).

      We use “state variables” to refer to the time-dependent model species.

      We avoid the term “basal expression” in technical descriptions. Where a biology-facing phrase is helpful, we use “protein expression level”. This is used when referring to the biological concept that the initial conditions are intended to represent, i.e. the heterogeneity in protein amounts across a cell population.

      We have performed a thorough search-and-replace to ensure this new convention is applied consistently and have removed the potentially confusing term "basal expression" from the revised manuscript.

      (19) Why are saturable functions (e.g., Michaelis-Menten functions) ignored in the model? What are the potential consequences?

      The main objective of this work was to perform a large-scale, systematic exploration of a high-dimensional parameter space (94 parameters) to map the full repertoire of qualitative dynamic behaviours a network topology can support. Using saturable functions like Michaelis-Menten kinetics would have roughly doubled the number of parameters to be explored (from k to Vmax and Km for each enzymatic reaction), making a parameter sweep of this scale computationally intractable. We therefore prioritised the breadth of the parameter search over the depth of kinetic detail, which we believe is the appropriate choice for a proof-of-concept study focused on heterogeneity.

      This simplification has potential consequences. A major one is that our model cannot capture phenomena that arise specifically from enzyme saturation, such as zero-order kinetics or certain forms of ultrasensitivity (switch-like responses). However, we argue that this is an acceptable trade-off for two main reasons: (1) Our analysis is based on classifying broad, qualitative response shapes (increasing, decreasing, rebound, etc.). Mass-action kinetics are fully capable of generating this rich spectrum of behaviours; and (2) by varying the mass-action rate constants over nine orders of magnitude (from 10<sup>-5</sup> to 10<sup4></sup>), our parameter sweep effectively samples a vast range of reaction efficiencies. A very low rate-constant can approximate the behaviour of a saturated, low-efficiency enzyme, while a high rate-constant can approximate a highly efficient, non-saturated one. In this way, the broad sweep of the rate parameter partially reflects the effects that would be captured by varying Vmax and Km.

      For transparency, we have added a brief rationale to the “ODE model construction, modelling, and simulations” part of the Methods (revised main text, page 4: lines 153-155) and the "Description of Model Scope and Construction" section in the Supplementary file (Supplementary text page 2: lines 63-73).

      (20) Given the relevance of the concept of "heterogeneity" in this work, a short discussion about biochemical noise and its implications on the analysis (e.g., why it is not included, and if it will be a next step) would be appreciated.

      Our MDN modelling framework represents heterogeneity by creating an ensemble of deterministic models, where each model instance has a unique set of kinetic parameters and/or initial protein abundances. We propose that this is a powerful way to mechanistically represent the functional consequences of all sources of cellular variation. Over time, the effects of genetic mutations, epigenetic states, and even the time-averaged impact of intrinsic biochemical noise will manifest as changes in the effective interaction strengths and protein concentrations within a cell. Our large-scale parameter/IC sweep is designed to systematically explore the full range of dynamic behaviours that can emerge from this underlying biological variation. Therefore, our approach does not compete with stochastic modelling but is complementary to it. While stochastic simulations can capture the dynamic trajectories of single cells, our framework provides a panoramic view of the entire spectrum of possible stable phenotypes that can emerge at the population level. We agree that modelling intrinsic biochemical noise (stochasticity arising from finite copy numbers), e.g. using chemical Langevin or SSA, is a possible extension in future work but expected to be very computationally expensive. We have added a brief discussion on this as future direction in the revised Discussion.

      (21) We have noticed that the first four paragraphs of the Discussion section overlap with the Introduction, as they mainly reiterate the significance of the study itself rather than focusing on the specific results obtained. To avoid redundancy and provide a more cohesive and informative discussion, we recommend that the authors shift the focus of the Discussion section towards presenting potential interpretations, even if they are not definitive, of the results obtained. By doing so, the Discussion will serve as a valuable platform for deeper analysis and insightful observations, allowing readers to better comprehend the implications and significance of the research findings.

      We thank the reviewer for this structural feedback. Following the reviewer's feedback, we have significantly rewritten and restructured the Discussion section. The redundant introductory material has been removed.

      The rewritten Discussion centres on interpretation, implications, and connect our findings to the literature. It now: (i) frames MDN as a systems-level framework that links molecular heterogeneity to qualitative signalling “meta-dynamics” and adaptive escape under constant drug pressure; (ii) highlights two key findings: an asymmetry in control (interaction kinetics exert stronger, more consistent influence than protein abundance) and a topology-driven convergence whereby a vast parameter space funnels into a finite set of recurrent behaviours; (iii) shows that resistance is a network-level property, with many possible routes but a small set of recurrent hubs/modules dominating; and (iv) provides a qualitative alignment with single-cell reporter data while clarifying the intent and limits of that comparison. Moreover, we now explicitly discuss limitations (rate-law simplifications, broad priors, determinism, and modular abstractions) and outline next steps for future research, including data-constrained priors and stochastic extensions.

      We believe this substantial revision has transformed the Discussion into a much more insightful and valuable part of the manuscript that directly addresses the reviewer's concerns.

      (22) The supplemental text file containing the model equations can be a bit challenging to read and understand. It would be greatly beneficial if the authors could consider generating a file using a typesetting program.

      We have now included a typeset list of state variable equations and ODEs, along with the original model files.

      (23) The authors mentioned that some model parameterizations result in negative solutions, which is surprising. Access to the model equations would help understand why this happens and is crucial for researchers who may want to use this approach. Clarifying the model equations' presentation would enhance transparency and aid other researchers in applying this method for similar research questions.ach. Clarifying the model equations' presentation would enhance transparency and aid other researchers in applying this method for similar research questions.

      The reviewer is correct to be surprised by the mention of negative solutions, as negative concentrations are physically impossible. We clarify that these are not a result of any structural flaw in our model's equations but are a well-known, although rare, numerical artifact of floating-point arithmetic in computational solvers.

      Our model is constructed using standard mass-action and first-order kinetics, which structurally guarantee non-negativity. However, when a species' concentration approaches the limits of machine precision (i.e., becomes a very small number extremely close to zero), the ODE solver can, in rare instances, numerically undershoot zero, resulting in a small negative value. If this occurs, it can lead to instability in subsequent integration steps.

      This is not a biological phenomenon but a computational one. Therefore, the standard and appropriate procedure, which we follow, is to implement a filter that discards any simulation trajectory where such a numerical instability occurs.

      (24) The reference listed for the CDK4/6 and CDK2 measurements is Yang et al. [55] in the figure caption, but as Xe et al. in lines 559-561 of the manuscript.

      The text has been updated to match citation.

      (25) We suggest that the authors revise and cite a previous study conducted by Yamada et al. (Scientific Reports, 2018), which presents an approach to expressing cell heterogeneity as a probability distribution of model parameters.

      Following this suggestion, we have revised the Discussion (see response to comment (21)) to include and discuss Yamada et al. (Scientific Reports, 2018), which models cell heterogeneity as a probability distribution over parameter values.

      (26) In the manuscript, on line 677, the authors state, "This indicates that there is an upper limit to the degree to which parameter sets can influence the qualitative shape of a protein's dynamic within a given network topology." We wish to highlight that this finding may not be particularly surprising. Given that the parameters were randomly determined within a specific range, it is understandable that altering the number of parameter samples would not substantially impact the distribution of model instances.

      We thank the reviewer for this insightful comment, which allows us to clarify the significance of this finding. While it is true that any sampling from a fixed distribution will eventually converge statistically, our conclusion is not about statistics but about the intrinsic, constraining properties of the network's topology. The novelty is not that the distribution converges, but that it converges to a surprisingly limited and finite repertoire of qualitative dynamic behaviours. A complex, non-linear network with nearly 100 free parameters could theoretically generate an almost endless variety of complex dynamics. Our finding is that this specific biological topology acts as a powerful filter, robustly channelling the vast majority of the near-infinite parameter combinations into a small, recurring set of functional outputs (increasing, decreasing, rebound, etc.).

      The reason for this finite limit is mechanistic, as the reviewer's comment prompted us to investigate further. Our parameter sweep already covers an extremely wide, 9-order-of-magnitude range. As we pushed parameter values to even greater extremes in exploratory simulations, we found they do not generate novel, complex dynamic shapes. Instead, they tend to drive network nodes into saturated states- either permanently "on" (maximally activated) or permanently "off" (minimally activated). In both cases, the node becomes unresponsive to upstream perturbations.

      Therefore, further expanding the parameter range would be unlikely to uncover new behavioural categories; it would simply increase the proportion of model instances classified as "no-response." This demonstrates a fundamental principle: the network topology itself enforces an upper limit on its dynamic complexity. We think this inherent robustness is what allows for reliable cellular signalling in the face of constant biological variation. We believe this is a non-trivial finding, and we have revised the Discussion (page 16: lines 664 - 680) to state this conclusion and its implications more clearly.

    1. eLife Assessment

      This important study demonstrates the power of the UniDesign computational framework in prospectively engineering a PAM-relaxed Staphylococcus aureus Cas9 variant with editing performance comparable to evolution-derived counterparts. The authors responded promptly and thoroughly to reviewer concerns and strengthened the manuscript with additional experimental validation, providing compelling evidence through expanded biochemical characterization across multiple human cell types, comprehensive deep-sequencing analyses, and direct comparisons with established variants that illuminate the mechanistic basis of PAM specificity remodeling and Cas9 optimization. By establishing computational design as a rigorous and viable alternative to directed evolution for CRISPR systems, this work will be of broad interest to the protein engineering, genome engineering, synthetic biology, and computational protein design communities.

    2. Reviewer #1 (Public review):

      [Editors' note: The Reviewing Editor has assessed the work without involving the previous reviewers, updating the eLife Assessment accordingly. The authors did an excellent job of addressing the reviewers' comments and suggestions. The manuscript is now in line with the minor suggestions from the original reviewers, who were already enthusiastic about the first version.]

      Summary:

      This manuscript by Xiong and colleagues presents a compelling validation of UniDesign, a fully computational protein design framework, by using it to engineer a novel, PAM-relaxed variant of Staphylococcus aureus Cas9 (SaCas9) named KRH. The core achievement is the successful de novo generation of a high-performance nuclease (E782K/N968R/R1015H) solely through in silico modeling, without any subsequent experimental optimization or directed evolution. The authors demonstrate that KRH expands the SaCas9 PAM specificity from NNGRRT to NNNRRT, achieving genome editing and base editing efficiencies across multiple human cell types that are comparable to, and sometimes exceed, the well-known evolution-derived KKH variant. The work positions UniDesign not merely as an analytical tool, but as a powerful engine for the generative design of complex molecular functions, offering a scalable and mechanistically insightful alternative to traditional experimental screening.

      Strengths:

      This is an outstanding manuscript that serves as a powerful proof-of-concept for the next generation of computational protein design. The primary selling point-the raw predictive and generative power of UniDesign-is convincingly demonstrated throughout.

      The manuscript shows that the tool can:

      (1) successfully navigate a complex sequence landscape to identify a minimal set of three mutations (KRH) that remodel a critical protein-DNA interface;

      (2) accurately model and balance the delicate interplay between specific base contacts and non-specific backbone interactions to achieve relaxed PAM specificity;

      (3) deliver a final product whose performance is indistinguishable from, and in some cases superior to, a variant that required extensive wet-lab evolution.

      The experimental validation is rigorous, thorough, and directly supports the computational predictions. This work will stand as a landmark study for the field, illustrating that computational design has matured to the point where it can reliably generate sophisticated tools for genome engineering.

      (1) Demonstration of Generative Power:

      The most significant finding is that UniDesign, without any experimental feedback, generated a variant (KRH) that matches the performance of the evolution-derived KKH. This is a remarkable achievement. The iterative design strategy-first reducing PAM bias (R1015H), then restoring binding through non-specific interactions (e.g., N968R, E782K)-is a textbook example of rational design, but it is executed entirely by the algorithm. This validates UniDesign's energy function and search algorithm as capable of capturing the subtle biophysical principles governing PAM recognition.

      (2) Mechanistic Insight as a Built-in Feature:

      A key advantage of UniDesign highlighted by this work is its inherent ability to provide mechanistic explanations. The computational models not only predicted which mutations would work (e.g., N968R over N968K in the KRH variant) but also why they work. The structural and energetic analyses showing the bidentate salt bridge formed by Arg968 versus the single bond formed by Lys968 (Figure 4A) is a perfect example of how the tool's output can rationalize functional differences, a level of insight that is rarely attainable from directed evolution campaigns alone.

      (3) Scalability and Accessibility for Engineering:

      The authors explicitly contrast UniDesign's efficiency (minutes to hours per design run) with the computational expense of methods like COMET and the experimental overhead of directed evolution. The improvements to UniDesign v1.2, specifically the mutation-count and sequence-uniqueness penalties, directly address a key challenge in computational design (generating diverse, low-energy point-mutant libraries). This positions the tool as a highly accessible and scalable platform for engineering other CRISPR systems, a point that will be of immense interest to the community.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript describes the fully in silico design of a new variant of Staphylococcus aureus Cas9 (SaCas9) using an improved UniDesign workflow.

      The design strategy consists of three sequential steps:

      (1) Reducing positional bias at PAM position 3;<br /> (2) Restoring DNA binding through nonspecific interactions;<br /> (3) Combining individually favorable substitutions.

      The overall pipeline is conceptually elegant and logically structured, and the genome-editing activity of the designed variants is comprehensively characterized. The resulting KRH variant exhibits relaxed PAM specificity, expanding the targeting range of SaCas9 across diverse cell types. Notably, the KRH variant demonstrates performance comparable to that of the evolution-derived KKH variant, underscoring the effectiveness of the proposed computational design framework.

    4. Reviewer #3 (Public review):

      Summary:

      This study reports KRH, a SaCas9 variant computationally engineered via UniDesign to recognize an expanded NNNRRT PAM with substantially enhanced editing efficiency at non-canonical sites. KRH achieves genome- and base-editing efficiencies comparable to or exceeding the evolution-derived KKH variant across multiple human cell types, demonstrating that computational design can effectively remodel PAM specificity while preserving nuclease activity.

      Strengths:

      The research follows a clear line of reasoning, and the results appear sound. The computational design strategy presented offers a valuable alternative to directed evolution, with potential applicability beyond Cas9 engineering.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Xiong and colleagues presents a compelling validation of UniDesign, a fully computational protein design framework, by using it to engineer a novel, PAM-relaxed variant of Staphylococcus aureus Cas9 (SaCas9) named KRH. The core achievement is the successful de novo generation of a high-performance nuclease (E782K/N968R/R1015H) solely through in silico modeling, without any subsequent experimental optimization or directed evolution. The authors demonstrate that KRH expands the SaCas9 PAM specificity from NNGRRT to NNNRRT, achieving genome editing and base editing efficiencies across multiple human cell types that are comparable to, and sometimes exceed, the well-known evolution-derived KKH variant. The work positions UniDesign not merely as an analytical tool, but as a powerful engine for the generative design of complex molecular functions, offering a scalable and mechanistically insightful alternative to traditional experimental screening.

      Strengths:

      This is an outstanding manuscript that serves as a powerful proof-of-concept for the next generation of computational protein design. The primary selling point-the raw predictive and generative power of UniDesign-is convincingly demonstrated throughout.

      The manuscript shows that the tool can:

      (1) successfully navigate a complex sequence landscape to identify a minimal set of three mutations (KRH) that remodel a critical protein-DNA interface;

      (2) accurately model and balance the delicate interplay between specific base contacts and non-specific backbone interactions to achieve relaxed PAM specificity;

      (3) deliver a final product whose performance is indistinguishable from, and in some cases superior to, a variant that required extensive wet-lab evolution.

      The experimental validation is rigorous, thorough, and directly supports the computational predictions. This work will stand as a landmark study for the field, illustrating that computational design has matured to the point where it can reliably generate sophisticated tools for genome engineering.

      (1) Demonstration of Generative Power:

      The most significant finding is that UniDesign, without any experimental feedback, generated a variant (KRH) that matches the performance of the evolution-derived KKH. This is a remarkable achievement. The iterative design strategy-first reducing PAM bias (R1015H), then restoring binding through non-specific interactions (e.g., N968R, E782K)-is a textbook example of rational design, but it is executed entirely by the algorithm. This validates UniDesign's energy function and search algorithm as capable of capturing the subtle biophysical principles governing PAM recognition.

      (2) Mechanistic Insight as a Built-in Feature:

      A key advantage of UniDesign highlighted by this work is its inherent ability to provide mechanistic explanations. The computational models not only predicted which mutations would work (e.g., N968R over N968K in the KRH variant) but also why they work. The structural and energetic analyses showing the bidentate salt bridge formed by Arg968 versus the single bond formed by Lys968 (Figure 4A) is a perfect example of how the tool's output can rationalize functional differences, a level of insight that is rarely attainable from directed evolution campaigns alone.

      (3) Scalability and Accessibility for Engineering:

      The authors explicitly contrast UniDesign's efficiency (minutes to hours per design run) with the computational expense of methods like COMET and the experimental overhead of directed evolution. The improvements to UniDesign v1.2, specifically the mutation-count and sequence-uniqueness penalties, directly address a key challenge in computational design (generating diverse, low-energy point-mutant libraries). This positions the tool as a highly accessible and scalable platform for engineering other CRISPR systems, a point that will be of immense interest to the community.

      We sincerely thank the reviewer for the comprehensive summary and the highly positive and encouraging comments on our manuscript.

      Weaknesses:

      (1) Title and Abstract Emphasis:

      The title and abstract are effective but could be slightly sharpened to emphasize the primary message. Consider a title like "Fully computational design of a PAM-relaxed SaCas9 variant with UniDesign demonstrates power to match directed evolution." The abstract could more explicitly state upfront that the design was achieved without any experimental iteration.

      Thank you for this valuable suggestion. We have revised the title and abstract accordingly to better reflect your feedback.

      (2) Figure 1, Panel M:

      The data points in panel M are currently presented at a font size that makes them difficult to read, particularly the labels for the many triple-mutant variants. This density obscures the clear identification of the top-performing designs, such as the KRH variant selected for experimental validation. I recommend that the authors increase the font size of all text elements within this panel, including axis labels, tick marks, and data point labels, to improve legibility. If necessary, the panel dimensions can be adjusted or the layout reorganized to accommodate the larger text without compromising clarity. Ensuring this figure is readable is important, as it visually communicates the energetic convergence that led to the selection of KRH.

      Thank you for this helpful suggestion. We have increased the font size the Figure 1M, as well as in Figure 1C and Figure 1E, to improve the readability in the revised manuscript.

      (3) Generality of the Design Strategy for Other PAM Positions:

      The design strategy focused on relaxing specificity at the highly constrained third position of the PAM (the guanine in NNGRRT). How transferable is this specific strategy (i.e., disrupting a key specific contact and compensating with non-specific backbone binders) to relaxing other positions in the PAM or to other Cas enzymes with different PAM-interaction architectures? A short discussion on this point would help readers understand the broader applicability of the "fine-tuning the balance" principle.

      Thank you for this insightful question and suggestion. The current study builds upon our previous work on CRISPR–Cas PAM recognition modeling using UniDesign (PMID: 37078688), in which eight Cas9 proteins and two Cas12 proteins (each has a different PAM) were investigated. Our computational results demonstrated that UniDesign can effectively capture the mutual preferences between natural PAMs and native PAM-interacting amino acids (PIAAs). For example, UniDesign accurately predicted the canonical PAMs of SpCas9 and SaCas9 as NGG and NNGRRT, respectively; conversely, given their canonical PAMs, UniDesign successfully recapitulated the corresponding PIAAs in both systems.

      These findings provide the foundation for the present study and motivate our selection of SaCas9 as a representative system to explore PAM relaxation, thereby further demonstrating UniDesign’s predictive power through experimental validation. Although we did not perform similar PAM relaxation designs for other Cas9 or Cas12 proteins, we believe that the UniDesign framework is broadly generalizable and can be readily extended to these systems. We have included additional discussion to clarify this point and highlight the broader applicability of our design strategy.

      Reviewer #2 (Public review):

      Summary:

      This manuscript describes the fully in silico design of a new variant of Staphylococcus aureus Cas9 (SaCas9) using an improved UniDesign workflow.

      The design strategy consists of three sequential steps:

      (1) reducing positional bias at PAM position 3;

      (2) restoring DNA binding through nonspecific interactions;

      (3) combining individually favorable substitutions.

      The overall pipeline is conceptually elegant and logically structured, and the genome-editing activity of the designed variants is comprehensively characterized. The resulting KRH variant exhibits relaxed PAM specificity, expanding the targeting range of SaCas9 across diverse cell types. Notably, the KRH variant demonstrates performance comparable to that of the evolution-derived KKH variant, underscoring the effectiveness of the proposed computational design framework.

      Strengths:

      The design pipeline is entirely computational and does not rely on experimental data for pretraining or iterative optimization.

      We thank the reviewer for the concise and accurate summary of our manuscript.

      Weaknesses:

      The computationally generated KRH mutant differs from the experimentally evolved KKH variant by only a single residue, which may reflect insufficient exploration of the available sequence space.

      Thank you for this insightful critique. In the present study, our strategy was not to allow UniDesign to freely explore all 27 mutable positions simultaneously, but rather to constrain the search to point mutations (e.g., double or triple mutants) within the full sequence space (approximately 20<sup>27</sup>). Even with this constraint, UniDesign effectively samples a substantially large design space compared to traditional protein engineering approaches.

      Through iterative design, we observed that only certain residue types became enriched at a subset of positions when identifying effective double mutants. These enriched residues were then systematically combined to generate performance-enhancing triple mutants in an automated manner. Although we ultimately selected the KRH mutant for experimental validation due to its high similarity to the known KKH variant, UniDesign also proposed additional multi-mutants that are distinct from KKH (see Figure 1M).

      Reviewer #3 (Public review):

      Summary:

      This study reports KRH, a SaCas9 variant computationally engineered via UniDesign to recognize an expanded NNNRRT PAM with substantially enhanced editing efficiency at non-canonical sites. KRH achieves genome- and base-editing efficiencies comparable to or exceeding the evolution-derived KKH variant across multiple human cell types, demonstrating that computational design can effectively remodel PAM specificity while preserving nuclease activity.

      Strengths:

      The research follows a clear line of reasoning, and the results appear sound. The computational design strategy presented offers a valuable alternative to directed evolution, with potential applicability beyond Cas9 engineering.

      We thank the reviewer for the concise and accurate summary of our manuscript.

      Weaknesses:

      The benchmarking of the UniDesign method is insufficient. How its performance compares to other protein design algorithms, whether the energy function parameters were systematically optimized, and if the design strategy can be generalized to other Cas9 orthologs or genome engineering tasks.

      Thank you for this valuable critique. The present study builds upon our previous work on CRISPR–Cas PAM recognition modeling using UniDesign (PMID: 37078688), in which many of these concerns were systematically addressed. In that study, UniDesign was benchmarked against Rosetta, a well-established protein design platform, across eight Cas9 proteins and two Cas12 proteins, each recognizing distinct PAM sequences.

      Our results demonstrated that UniDesign effectively captures the mutual preferences between natural PAMs and native PAM-interacting amino acids (PIAAs) across these CRISPR–Cas systems. For example, UniDesign accurately predicted the canonical PAMs of SpCas9 and SaCas9 as NGG and NNGRRT, respectively; conversely, given their canonical PAMs, UniDesign successfully recapitulated the corresponding PIAAs in both systems.

      These findings provide the foundation for the present study and motivate our selection of SaCas9 as a representative system to explore PAM relaxation, thereby further demonstrating UniDesign’s predictive power through experimental validation. Although we did not perform analogous PAM relaxation designs for other Cas9 or Cas12 proteins in this work, we believe that the UniDesign framework is broadly generalizable and can be readily extended to these systems. We have incorporated additional discussion in the revised manuscript to address these points and clarify the broader applicability of our approach.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) SaCas9 is highlighted for its AAV compatibility, but the manuscript does not further discuss how the KRH variant may benefit AAV-based genome editing applications. A brief discussion on how expanded PAM compatibility could facilitate target selection in AAV-constrained therapeutic settings would strengthen the translational relevance of the work, potentially reducing the need for split-Cas9 or dual-vector strategies.

      Thank you for your helpful suggestion. We have added a brief discussion in the revised manuscript highlighting how the KRH variant’s expanded PAM compatibility may enhance AAV-based genome editing applications. Specifically, this property can broaden the range of targetable genomic sites and may reduce the need for split-Cas9 or dual-vector delivery strategies in size-constrained AAV therapeutic contexts.

      (2) The study shows that a fully computational workflow can recapitulate the performance of an evolution-derived variant. A short discussion comparing the scalability and practical advantages of computational design versus directed evolution for future PAM engineering would help emphasize the broader methodological significance of UniDesign.

      Thank you for your valuable suggestion. We have added a brief discussion in the revised manuscript comparing the scalability and practical advantages of computational design with directed evolution for PAM engineering. Specifically, we highlight that UniDesign enables rapid and scalable exploration of sequence space without requiring iterative experimental screening, thereby offering a complementary—and in some cases more efficient—approach to directed evolution for future protein engineering applications.

      (3) The noticeable variation in editing efficiency across cell types, particularly the lower activity in A549 cells. Could the authors explain why the differences in editing efficiency are so large?

      Thank you for this insightful comment. We agree that the variation in editing efficiency across cell types—particularly the lower activity observed in A549 cells—warrants clarification, and we have added a corresponding discussion in the revised manuscript. We attribute this observation to two main factors. First, transfection efficiency varies substantially across cell lines; in our experiments, A549 cells exhibited lower transfection efficiency compared to HEK293T, HeLa, and U2OS cells, which likely contributes to the reduced editing efficiency. Second, the intrinsic performance of genome editing systems can differ across cellular contexts due to variations in DNA repair pathways, including chromatin accessibility and the expression levels of key repair-related genes. Importantly, despite this cell-type-dependent variability in absolute editing efficiency, the KRH variant consistently outperformed wild-type SaCas9 across all tested cell lines, underscoring the robustness and general applicability of our design.

      (4) Given that the computationally generated KRH mutant differs from the experimentally evolved KKH variant by only a single residue, it would be valuable to discuss whether R968 (or saturation mutations at this site) has previously been explored experimentally, and to elaborate on strategies for further expanding the diversity of mutations identified through the computational design framework.

      Thank you for your suggestion. We have added a brief discussion in the manuscript noting that, to the best of our knowledge, R968 has not been experimentally characterized prior to this study. It was identified solely through our computational design workflow, highlighting the strength of our approach.

      Reviewer #3 (Recommendations for the authors):

      (1) During the protein amino acid conformational sampling process in UniDesign, were nucleic acid conformational changes taken into consideration?

      Thank you for this question. Nucleic acid conformational changes were not explicitly considered during the protein sequence design stage in UniDesign after the four specific PAM variants (e.g., TTAGGT, TTCGGT, TTGGGT, and TTTGGT) were defined. We consider this assumption reasonable, as the base conformations in these PAM sequences are expected to remain largely stable, with minimal structural variation due to preserved base-stacking interactions.

      (2) The authors used a mutation-count penalty to control the number of mutations generated during the design process, which appears to occasionally yield results that exceed the intended limit. Is this an efficient approach? Could the count be controlled more directly by imposing constraints within the design procedure itself?

      Thank you for these insightful questions. You are correct that the design process may occasionally yield variants exceeding the intended mutation limit. This occurs because the mutation-count penalty is implemented as a soft constraint, where violations incur a penalty rather than being strictly excluded. Based on our benchmarking, this strategy—combined with the duplicate-design penalty—has been effective in generating multimutant variants with mutation counts close to the desired range. However, we acknowledge that this approach may not achieve optimal efficiency. We are currently developing improved strategies in UniDesign to more directly control mutation counts by incorporating explicit constraints during the sequence simulation process, which we expect will further enhance design precision and efficiency.

      (3) Is the new version of UniDesign developed specifically for the Cas9 design task in this study? What are its advantages and disadvantages compared to other state-of-the-art protein design algorithms?

      Thank you for this important question. The new version of UniDesign (v1.2) was not developed specifically for Cas9 engineering. Rather, it is intended as a general framework for protein engineering tasks that focus on introducing point mutations to improve protein properties, as opposed to de novo design. Compared to current state-of-the-art protein design methods—many of which are deep learning–based—UniDesign offers distinct advantages and limitations. Deep learning approaches are often highly efficient and powerful but may lack interpretability in their predictions. In contrast, UniDesign is a well-benchmarked, lightweight, physics-based method that provides greater interpretability, allowing users to better understand the underlying basis of the design decisions. On the other hand, a limitation of UniDesign is that it is less straightforward to incorporate experimental feedback for iterative refinement, such as fine-tuning the scoring function for specific design tasks.

      (4) The study employed a three-round design process to obtain the mutants. Is there a conformational correlation between the mutation sites identified in these three rounds? Could this have been accomplished in a single computational run instead of three separate calculations?

      Thank you for these insightful questions. We adopted a multi-round design strategy for SaCas9 PAM relaxation because this task inherently involves multi-objective optimization: enhancing PAM compatibility—particularly relaxing base recognition at the third PAM position—while preserving editing activity comparable to wild-type SaCas9. In our view, identifying the key mutations (e.g., E782K, N968R, and R1015H) in a single UniDesign run would be highly challenging due to competing energetic requirements. In the first round, R1015H emerged from single-site mutational scanning as the most favorable PAM-relaxing mutation based on its minimal MAD score. However, this mutation also significantly increased the binding energy relative to wild-type SaCas9 with its native PAM, suggesting a likely reduction in editing activity due to weakened binding. To address this, the second round focused on compensatory mutations. Variants such as E782K and N968R (along with several additional candidates) were identified in the context of R1015H to reduce binding energy and partially restore affinity. In the third round, we further combined compatible mutations from the second round, resulting in variants that more effectively lowered binding energy and restored it to levels comparable to wild-type SaCas9 with its native PAM. Notably, the design objectives in rounds one and two drive binding energy in opposite directions, making it unlikely that all key mutations could be identified simultaneously in a single run. During the design process, we also observed conformational correlations among mutation sites. For example, R1015H can form hydrogen-bonding interactions with residue E993, and we observed multiple alternative mutations at position 993 (e.g., E993S, E993P, E993A, E993G, E993K, and E993R), suggesting local structural coupling between these positions.

      (5) In Figure 4D, for the FANCF-1 site, there appears to be a noticeable difference in editing efficiency between KKH-ABE and KRH-ABE. Is this difference statistically significant? If so, please provide an explanation for this observation.

      Thank you for this question. For the FANCF-1 site shown in Figure 4D, we performed statistical analyses and found that the differences in editing efficiency between KKH-ABE and KRH-ABE are not statistically significant: P(A4) = 0.1239, P(A10) = 0.0671, P(A12) = 0.0942, and P(A13) = 0.1349 (two-tailed unpaired Student’s t-test). These results indicate that KRH-ABE and KKH-ABE exhibit comparable editing efficiencies at this site, supporting our overall conclusion that the computationally designed KRH variant achieves performance on par with the KKH variant.

      (6) Does the evolutionary term within the UniDesign scoring function bias the designed sequences towards pre-existing protein features?

      Thank you for this question. In this study, as well as in our previous work on Cas9 PAM recognition modeling (PMID: 37078688), the evolutionary term in the UniDesign scoring function was completely disabled. Therefore, it does not introduce any bias toward pre-existing protein features in the designed sequences.

    1. eLife Assessment

      This paper presents an important theory and analysis of the role of neurogenesis and inhibitory plasticity in the drift of neural representations in the olfactory system. For one of the findings, regarding the impact of neurogenesis on the drift, the evidence remains incomplete. The reason lies in the differences in variability/drift of the mitral/tufted cell responses observed in the model compared to experimental observations, where these responses remain stable over extended time scales.

    2. Reviewer #1 (Public review):

      Summary:

      The authors build a network model of the olfactory bulb and the piriform cortex and use it to run simulations and test their hypotheses. Given the the model's settings, the authors observe drift across days in the responses to the same odors of both the mitral/tufted cells, as well as of piriform cortex neurons. When representing the M/T and PCx responses within a lower dimensional space, the apparent drift is more prominent in the PCx, while the M/T responses appear in comparison more stable. The authors further note that introducing spike-time dependent plasticity (STDP) at bulb synapses involving abGCs slows down the drift in the PCx representations, and further link this to the observation that repeated exposure to the same odorant slows down drift in the piriform cortex.

      The model is clearly explained and relies on several assumptions and observations: 1) random projections of MTC from the olfactory bulb to the piriform cortex, random intra-piriform connectivity and random piriform to bulb connectivity; 3) higher dimensionality of piriform cortex representations compared to M/T responses which enables superior decoding of odor identity in the piriform cortex; 2) spike time dependent plasticity (STDP) at synapses involving the abGCs.

      The authors address an open topical problem and model is elegant in its simplicity. The authors addressed many of my concerns by plotting new analyses and by adding clarifying statements and discussion points, as well as testable predictions to the revised manuscript. In the revised manuscript, a few points remain unclear and I am listing them below for further potential discussion.

      (1) Given the large in response (variability) across trials reported by Shani-Narkiss, Kay & Laurent - the question remains open: what fraction of the variability in response across days can be really accounted by adult born neurogenesis (the main topic of this study) vs. other mechanisms. I think the answer to this question is key for interpreting the results presented by the authors on the impact of adult neurogenesis on changes of mitral cell responses. Unfortunately, I could not find the answer in the revised version of the manuscript.

      (2) Yamada indeed reported a "drastic reorganization of ensemble odor representation" in their manuscript (Figure 3D), but my understanding is that this was observed in the context of passive exposure to the same odor across several days in a row. This does not appear to contradict the findings of Kato et al., 2012 that when an odor is presented seldom, across days the mitral cell responses are stable. Also, data from Yamada et al. appears to show some degree of overall sparsening of odor responses in mitral cells at least at the level of a decrease in response amplitude between day 1 to day 7 of repeated passive exposure (Figure 3A, Yamada et al., 2017).

      (3) There was mistake on my part on one of the papers referenced with respect to random vs. structured projections from the olfactory bulb to the piriform cortex. The one I was referring to is Chen et al., Cell, 2022 (not Chae et al., Neuron, 2022). The authors discussed the implications from the latter, while I was commenting in fact on the findings from Chen et al., 2022. This study identified structured projections of individual mitral cells along the A-P axis of the piriform cortex in conjunction with collaterals to specific subsets of extra-piriform target regions.

    3. Reviewer #3 (Public review):

      Summary

      The authors set out to explore the potential relationship between adult neurogenesis of inhibitory granule cells in the olfactory bulb and cumulative changes over days in odor-evoked spiking activity (representational drift) in the olfactory stream. They developed a richly detailed spiking neuronal network model based on Izhikevich (2003), allowing them to capture the diversity of spiking behaviors of multiple neuron types within the olfactory system. This model recapitulates the circuit organization of both the main olfactory bulb (MOB) and the piriform cortex (PCx), including connections between the two (both feedforward and corticofugal). Adult neurogenesis was captured by shuffling the weights of the model's granule cells, preserving the distribution of synaptic weights. Shuffling of granule cell connectivity resulted in cumulative changes in stimulus-evoked spiking of the model's M/T cells. Individual M/T cell tuning changed with time, and ensemble correlations dropped sharply over the temporal interval examined (long enough that almost all granule cells in the model had shuffled their weights). Interestingly, these changes in responsiveness did not disrupt low-dimensional stability of olfactory representations: when projected into a low-dimensional subspace, population vector correlations in this subspace remained elevated across the temporal interval examined. Importantly, in the model's downstream piriform layer this was not the case. There, shuffled GC connectivity in the bulb resulted in a complete shift in piriform odor coding, including for low-dimensional projections. This is in contrast to what the model exhibited in the M/T input layer. Interestingly, these changes in PCx extended to the geometrical structure of the odor representations themselves. Finally, the authors examined the effect of experience on representational drift. Using an STDP rule, they allowed the inputs to and outputs from adult-born granule cells to change during repeated presentations of the same odor. This stabilized stimulus-evoked activity in the model's piriform layer.

      Strengths

      This paper suggests a link between adult neurogenesis in the olfactory bulb and representational drift in the piriform cortex. Using an elegant spiking network that faithfully recapitulates the basic physiological properties of the olfactory stream, the authors tackle a question of longstanding interest in a creative and interesting manner. As a purely theoretical study of drift, this paper presents important insights: synaptic turnover of recurrent inhibitory input can destabilize stimulus-evoked activity, but only to a degree, as representations in the bulb (the model's recurrent input layer) retain their basic geometrical form. However, this destabilized input results in profound drift in the model's second (piriform) layer, where both the tuning of individual neurons and the layer's overall functional geometry are restructured. This is a useful and important idea in the drift field and to my knowledge is novel. The bulb is not the only setting where inhibitory synapses exhibit turnover (whether through neurogenesis or synaptic dynamics), and so this exploration of the consequences of such plasticity on drift is valuable. The authors also elegantly explore a potential mechanism to stabilize representations through experience, using an STDP rule specific to the inhibitory neurons in the input layer. This has an interesting parallel with other recent theoretical work on drift in the piriform (Morales et al., 2025 PNAS), in which STDP in the piriform layer was also shown to stabilize stimulus representations there. It is fascinating to see that this same rule also stabilizes piriform representations when implemented in the bulb's granule cells.

      The authors also provide a thoughtful discussion regarding differential roles of mitral and tufted cells in drift in piriform and AON and potential roles of neurogenesis in archicortex.

      In general, this paper puts an important and much-needed spotlight on the role of neurogenesis and inhibitory plasticity in drift. In this light, it is a valuable and exciting contribution to the drift conversation.

      Comments on revisions:

      I appreciate the substantial revisions the authors have made to the manuscript. The paper is clearly improved and addresses an important and timely question: the relationship between adult neurogenesis and drift. In particular, the effort to link adult neurogenesis in the olfactory bulb to the long-term stability of odor representations downstream is valuable, and the modeling provides useful mechanistic intuition about how inhibitory circuit remodeling could influence representational drift across layers.

      That said, I remain concerned that the manuscript, as currently framed, risks giving readers the incorrect impression that experimental work has established progressive, time-dependent drift in the odor tuning of olfactory bulb neurons. Experimental studies do show that ongoing experience with a set of odors can profoundly alter bulbar responses to those odors, but longitudinal measurements in which the tested odors are not repeatedly presented between sessions have instead emphasized remarkable stability of mitral/tufted tuning over days to months across multiple groups. I also think it would strengthen the manuscript to avoid anchoring the empirical comparison too heavily on a single paradigm (Yamada et al., 2017). The experimental literature spans multiple regimes, including daily odor exposure with ongoing experience and longitudinal measurements in which the tested odors are not repeatedly presented between sessions, and these regimes can yield qualitatively different degrees of reorganization. Situating the model explicitly within this broader landscape, rather than emphasizing one dataset, would make the interpretation clearer and prevent readers from overgeneralizing the Yamada findings to baseline bulbar stability. This distinction is especially important because it contrasts with what has been reported in piriform cortex, where representational drift is observed even in the absence of ongoing experience with a given odor set, and where repeated daily encounters with the same odors can slow or arrest that drift.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We thank the editor and reviewers for their thoughtful and constructive feedback. We appreciate that all reviewers recognized the value of our study in linking adult neurogenesis and synaptic plasticity to representational drift in the olfactory system. They described the model as elegant and well-motivated, and agreed that it provides new theoretical insight into how stability and adaptability can coexist in sensory representations. The reviewers also identified areas where our manuscript could be strengthened, and as outlined in our revision plan we have:

      (1) Refined our description of mitral/tufted cell stability and expand on within-session and across-day variability.

      (2) Substantially expanded the Discussion to compare our modeling assumptions with experimental findings and recent anatomical evidence. Additionally, we have included the limitations of the study and areas for future investigation.

      (3) Included a clearer description of the STDP implementation, plastic synapses, and their functional effects.

      (4) Add a short section outlining model-based predictions that can guide future experiments. We also made minor textual edits to improve precision and flow, including citing prior conceptual work and clarifying model procedures.

      These changes have strengthened both the conceptual framing and technical clarity of the paper. We are grateful for the reviewers’ careful reading and valuable suggestions.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors build a network model of the olfactory bulb and the piriform cortex and use it to run simulations and test their hypotheses. Given the model's settings, the authors observe drift across days in the responses to the same odors of both the mitral/tufted cells, as well as of piriform cortex neurons. When representing the M/T and PCx responses within a lower-dimensional space, the apparent drift is more prominent in the PCx, while the M/T responses appear in comparison more stable. The authors further note that introducing spike-time dependent plasticity (STDP) at bulb synapses involving abGCs slows down the drift in the PCx representations, and further link this to the observation that repeated exposure to the same odorant slows down drift in the piriform cortex.

      The model is clearly explained and relies on several assumptions and observations:

      (1) Random projections of MTC from the olfactory bulb to the piriform cortex, random intra-piriform connectivity, and random piriform to bulb connectivity.

      (2) Higher dimensionality of piriform cortex representations compared to M/T responses, which enables superior decoding of odor identity in the piriform cortex.

      (3) Spike time-dependent plasticity (STDP) at synapses involving the abGCs.

      The authors address an open topical problem, and the model is elegant in its simplicity. I have however, several major concerns with the hypotheses underlying the model and with its biological plausibility.

      Concerns:

      (1) In their model, the authors propose that MTC remain stable at the population level, despite changes in individual MTC responses.

      The authors cite several experimental studies to support their claims that individual MTC responses to the same odors change (some increase, some decrease) across days. Interpreting the results of these studies must, however, take into account the variability of M/T responses across odor presentation repeats within the same session vs. across sessions. In the Shani-Narkiss et al., Frontiers in Neural Circuits, 2023 study referenced, a large fraction of the variability across days in M/T responses is also observed across repeats to the same odorant in the same session (Shani-Narkiss et al., Figure 4), while the authors have M/T responses in the same session that are highly reproducible. This is an important point to consider and address, since it constrains how much of the variability in M/T responses can be attributed to adult neurogenesis in the olfactory bulb versus to other networks' inhibitory mechanisms, which do not rely on neurogenesis. In the authors' model, the variability in M/T responses observed across days emerges as a result of adult-born neurogenesis, which does not need to be the main source of variability observed in imaging experiments (Shani-Narkiss et al., Figure 4).

      We agree with the reviewer and believe this is a critical discussion point. Indeed, both in Shani-Narkiss et al, Kay and Laurent, 1999, and in our lab, we observe trial-to-trial variability that occurs in the same recording session; as the reviewer correctly points out, this cannot be due to neurogenesis. These fluctuations may be trial to-trial noise, or reflect dynamics associated with other behaviors such as running (Chockanathan, et al. 2021) and decision making (Kay and Laurent, 1999). There is growing repertoire of literature showing that neural variability in early sensory coding appears to depend on behavioral fluctuations and internal states (Niell and Stryker for example). This variability that happens within a session in the Shani-Narkiss et al work may reflect some of these behaviorally relevant features of early olfactory coding, something that our model cannot account for. This is an excellent discussion point and we have included text (line 153-157, and line 321-330) in the manuscript to note this aspect of the data and how one can think of it in the context of our results.

      Another study (Kato et al., Neuron, 2012, Figure 4) reported that mitral cell responses to odors experienced repeatedly across 7 days tend to sparsen and decrease in amplitude systematically, while mitral cell responses to the same odor on day 1 vs. day 7 when the odor is not presented repeatedly in between seem less affected (although the authors also reported a decrease in the CI for this condition). As such, Kato et al. mostly report decreases in mitral cell odor responses with repeated odor exposure at both the individual and population level, and not so much increases and decreases in the individual mitral cell responses, and stability at the population level.

      Thank you for raising this important point regarding the findings of Kato et al. (2012). We agree that their results suggest increased sparsening and stability in M/T cell odor responses with repeated exposure. However, as noted in Yamada et al. (2017), the experimental literature on this question remains mixed. Yamada and colleagues reported a “drastic reorganization of ensemble odor representation” across days and emphasized that “sensory experience does not necessarily cause a major sparsening of the odor response,” explicitly contrasting their findings with those of Kato et al. (2012).

      Our model captures the dynamics observed in Yamada et al. (2017), providing a mechanistic explanation for how significant reorganization can emerge in M/T ensembles despite stable low-dimensional population structure. In both Yamada et al (2017) and Kato et al (2012) the investigators have nuanced differences in experimental design (method of head fixation, behavioral paradigm used, training etc.), all of which are known to affect olfactory responses and therefore the degree of sparsity and overlap in population codes. Our model does not include any of these behavioral features that may differentially engage the olfactory circuit and thus affect population responses. Notably, in previous work, we highlight how even simple changes to top down feedback that reflect one phenomenological manipulation to functional connectivity in the olfactory circuit could have disparate effects on the degree of sparsity in neural representations over time whereby this manipulation would be activated by some behavior broadly. In our current model, there is no behavior that would allow us to study the critical features of the neural activity code in the M/T cells. Instead we focus on one specific aspect, adult neurogenesis which we can explicitly manipulate and affect in a biologically meaningful way. The review’s point however is well taken and important, and we have added text to the Discussion (line 336-344) to highlight the differing experimental outcomes and to clarify how our model aligns with the Yamada et al. results.

      (2) In Figure 1, a set of GCs is killed off, and new GCs are integrated in the network as abGC. Following the elimination of 10% of GCs in the network, new cells are added and randomly assigned synaptic weights between these abGCs and MTC, GCs, SACs, and top-down projections from PCx. This is done for 11 days, during which time all GCs have gone through adult neurogenesis.

      Is the authors' assumption here that across the 11 days, all GCs are being replaced? This seems to depart from the known biology of the olfactory bulb granule cells, i.e., GCs survive for a large fraction of the animal's life.

      Thank you for raising this important point regarding the lifespan of granule cells (GCs). We agree that developmentally born GCs are not fully replaced. Indeed, multiple studies indicate that some developmentally born GCs can survive for very long periods, up to 18-24 months, essentially the lifetime of the animal (Kaplan, 1985; Petreanu & Alvarez-Buylla, 2002). However, the fraction of total GCs that such long lived GCs constitute remains an open question, in part because of challenges to measure the lifetime survival of newborn neurons. What there is consensus on is the significant size of the granule-cell population undergoing continuous turnover through adult neurogenesis (reviewed in Lepousez et al., 2013).

      We should clarify that we do not assume that 100% of the granule cell population turns over in an 11 day period. We use “day” to represent a static epoch over which we can implement plasticity rules across two time scales. Critically, we also randomize the turnover treating every cell in the GC population as equally likely to be replaced. Prior experimental evidence suggests that some GCs are more likely to persist (possibly as a result of experience, Magavi et al., 2005) which may in some regards make our result on stabilization following repeated sensory exposure more dramatic (as the GCs that show the largest change following STDP may also be the ones that are the most stable, and therefore least likely to turnover). We do not include this in our model as we could not identify a framework for “selecting” which GCs would persist that would not be tautological. The point the reviewer raises is critical, and a discussion of these points is warranted - which we now include in the manuscript (line 352-361).

      Additionally, there is some evidence that behaviors, such as novelty, can increase the rate of adult neurogenesis (Kamimura et al., 2022, H.van Praag et al.,1999, Gheusi and Lledo., 2014) , suggesting a complex reciprocal relationship between the mechanisms that generate the cells shaping how olfactory stimuli are encoded for and the encoding process itself; our model also does not include any of these dynamic features which represent an additional layer of complexity, which may further provide an intermediate time scale, one of behavioral selection and action, that is slower than the milliseconds on which spike time dependent plasticity happens, but faster than the time scale of neurogenesis. We include this point in the discussion also (line 352-361). 

      Our 11-day simulation however is designed to uncover how plasticity across multiple timescales (STDP and adult neurogenesis) at the network level shapes odor representations as multiple rounds of GC turnover occur. Changing the timescale and magnitude replacement in the simulations (either in terms of days or percent cells replaced) would affect the degree to which drift happens, but not phenomenon. Additionally, the representational structure in our model at intermediate time points (e.g., days 8~10) would correspond well to scenarios in which some fraction of developmentally born GCs persists in the circuit. Thus, our simulations span a range of possible empirical regimes, from high turnover to partial preservation. We have added discussion to the revised manuscript (line 352-361) clarifying this point and acknowledging the biological heterogeneity in GC lifespans.

      (3) The authors' model relies on several key assumptions: random projections of MTC from the olfactory bulb to the piriform cortex, random intra-piriform connectivity, and random piriform to bulb connectivity. These assumptions are not necessarily accurate, as recent work revealed structure in the projections from the olfactory bulb to the piriform cortex and structure within the piriform cortex connectivity itself (Fink et al., bioRxiv, 2025; Chae et al., Cell, 2022; Zeppilli et al., eLife, 2021).

      How do the results of the model relating adult neurogenesis in the bulb to drift in the piriform cortex representations change when considering an alternative scenario in which the olfactory bulb to piriform and intra-piriform connectivity is not fully distributed and indistinguishable from random, but rather is structured?

      Thank you for pointing us to these important studies. We fully agree with the reviewer that the structure of the olfactory system might not be purely random, but we do not believe these papers contradict the level of abstraction used in our model.

      Zeppilli et al. (2021) map molecularly defined projection neuron subtypes and their preferential targeting of different cortical and subcortical regions, but they do not report any fine-scale topographic organization of bulb → piriform connectivity that would contradict a view of randomly distributed input to piriform cortex. Studies from our lab using retrograde tracers in the blub show some spatial clustering of piriform cortical neurons whose axons project to the bulb (Padmanabhan et al., 2016, 2019), but these studies do not identify any “functional organization” or structure. Chae et al., (2022) focus on distinct long-range functional loops (mitral ↔ piriform vs tufted ↔ AON) and the differential role of cortical feedback, but again, at the level of cortical regions rather than individual cells and connectivity. Notably, our model does not consider AON.

      Finally, Fink et al. (2025) reports a “like-to-like” excitatory connectivity motif within the piriform cortex and an experience-dependent reorganization of inhibitory synapses. As the authors note, “... this like-to-like motif is unlikely to reflect common input from the olfactory bulb”, so it does not conflict with our assumption of broadly random bulb → piriform input. This “like-to-like” motif is reflected in our model by wiring a certain subpopulation of piriform cells. On the other hand, we agree that the experience dependent changes in inhibitory connectivity within PCx are highly relevant for learning related plasticity but fall outside the scope of our study. We intentionally omitted piriform plasticity to isolate the contributions of adult neurogenesis in the bulb and plasticity acting on adult-born granule cells. But incorporating such cortical plasticity is an important direction for future work. We added a discussion (line 395-405) on this important point raised by the reviewer in the revised manuscript.

      (4) I didn't understand the logic of the low-dimensional space analysis for M/T cells and piriform cortex neurons (Figures 2 & 3). In the authors' model, the full-ensemble M/T responses are reorganized over time, presumably due to the adult-born neurogenesis. Analyzing a lower-dimensional projection of the ensemble trajectories reveals a lower degree of re-organization. This is the same for the piriform cortex, but relatively, the piriform ensembles displayed in a low-dimensional embedding appear to drift more compared to the M/T ensembles.

      This analysis triggers a few questions: which representation is relevant for the brain function - the high or the low-dimensional projection? What fraction of response variance is included in the low-dimensional space analysis? How did the authors decide the low-dimensional cut-off? Why does STDP cause more drift in piriform cortex ensembles vs. M/T ensembles? Is this because of the assumed higher dimensionality of the piriform cortex representations compared to the mitral cells?

      Thank you for these thoughtful questions. We clarify the logic and purpose of the low-dimensional analyses and address each point below.

      (1) Which representation is relevant for brain function, the high-dimensional or low-dimensional one?

      We believe both representations are meaningful, with each capturing different aspects of the neural code. The high-dimensional activity reflects the full variability of individual cell responses, while the low-dimensional projection captures the dominant population level components that downstream areas are most likely to use for readout. We found that the low-dimensional representations are more stable in the bulb than in PCx, suggesting that information is used differentially between the two areas. The bulb provides a stable, sensory-anchored population code that reliably represents odor identity over time, consistent with both electrophysiological and behavioral studies (Nagayama et al., 2004, Chen et al., 2009, Davison and Katz, 2007, Cavaretta et al., 2018). This is consistent with its role as the first stage of information processing in the olfactory system which provides faithful representations that downstream circuits receive. The piriform cortex, by contrast, transforms this stable input into a more flexible representation. Drift in its low-dimensional space may reflect ongoing plasticity (Schoonover et al., Nature, 2021), integration of contextual signals, or higherdimensional computations characteristic of PCx (Fink et al., bioRxiv, 2025), suggesting its role more as an associative cortex instead of a pure sensory cortex.

      (2) What fraction of variance is included in the low-dimensional space, and how was the cutoff chosen?

      In our simulations, these PCs captured the majority of variance relevant for odor identity (~60–70% for M/T cells and ~55–65% for piriform cortex). We now report these fractions explicitly in Methods (line 937-939).

      (3) Why does STDP cause more drift in piriform-cortex ensembles than in M/T ensembles? Does this reflect higher dimensionality in piriform cortex?

      In our model, STDP does not cause more drift in PCx. It actually reduces drift and stabilizes PCx representations relative to the condition without STDP (as shown in Fig. 4C2). STDP has a much smaller effect in the bulb because: (1) M/T cells continue to receive stable odor input from the glomeruli and (2) the low-dimensional M/T representation is already stable even without plasticity. We have edited the manuscript to reiterate this point in both the results and discussion.

      The reviewer is correct that the piriform cortex naturally exhibits more drift than the bulb, and their comment that this is due to its substantially higher representational dimensionality is spot on. The PCx contains many more neurons, receives highly divergent OB → PCx inputs, and has dense recurrent connectivity, all of which create many more degrees of freedom through which representations can drift. Additionally, because individual PCx neurons are sampling from a substantially more diverse combinatorial space of inputs (include feedback to piriform from an array of regions, Illig, 2005, Majak et al., 2004, Chapuis et al., 2013), the “dimensionality” of the population code is likely higher dimensional. While STDP stabilizes the dimensions of the PCx representation that are reinforced during plasticity, due to the large number of orthogonal dimensions available, some residual drift remains. Additionally, as the reviewer notes, there are some forms of plasticity, such as inhibitory plasticity in PCx that are not included in the model, that may also have an impact on both the representations, and the underlying dimensionality of those representations. We include these points in the discussion (line 381-394).

      (5) Could the authors comment whether STDP at abGC synapses and its impact on decreasing drift represent a new insight, and also put it into context? Several studies (e.g., Lledo, Murthy, Komiyama groups) reported that abGC integrates in the network in an activity-dependent manner, and not randomly, and as such stabilizes the active neuronal responses, which is consistent with the authors' report.

      Related, I couldn't find through the manuscript which synapses involving abGCs they focus on, or what is the relative contribution of the various plastic synapses shown in the cartoon from Figure 4 A1 (circles and triangles).

      We thank the reviewer for raising this question. As the reviewer pointed out, several studies have shown that abGCs integrate into the bulb circuit in an activity dependent manner. They preferentially form synapses onto mitral/tufted cells that respond to behaviorally important odors, this “selection of surviving cells” is not included in our model. Instead, we use STDP at the synaptic level. This is of course not analogous, but provides a computational framework wherein the selection of surviving abGCs could be incorporated in future studies. It is perhaps notable that in our large scale simulations, synaptic changes at the population level may reflect some of this activity-dependent selection.

      To that end, our model provides a new insight and suggests a broader function for adult neurogenesis. For example, when certain odors are reinforced in an activity dependent manner, abGCs born during that period may stabilize the circuits that respond to those odors. The resulting reduction of drift would help keep the representation of those odors stable over time, even while other parts of the circuit continue to change. We now highlight this idea in the Discussion (line 366-373).

      For the second part of the question: in our model, STDP acts on two sets of connections. It applies to the synapses onto abGCs from M/T cells, GC/SAC cells, and PCx neurons. It also applies to the synapses that abGCs project to, including those onto M/T cells and GC/SAC cells. We have clarified this in the revised Methods (line 10011004).

      (6) The study would be strengthened, in my opinion, by including specific testable predictions that the authors' models make, which can be further food for thought for experimentalists.

      How does suppression of adult-born neurogenesis in the OB impact the stability of mitral cell odor responses? How about piriform cortex ensembles?

      We appreciate the reviewer’s suggestion and formalize the following two predictions from our model:

      Prediction 1: Suppressing adult neurogenesis will reduce spontaneous representational drift in the PCx. Increasing spike-timing-dependent plasticity during periods of experience with a specific odor will selectively stabilize representations of that odor.

      Prediction 2: Adult neurogenesis will not affect AON representations of odor identity or concentration in the same way that PCx representations are altered and drift.

      We include these two ideas in the discussion as experimentally testable predictions.

      Reviewer #2 (Public review):

      Summary:

      The authors address a critical problem in olfactory coding. It has long been known that adult neurogenesis, specifically in the form of adult-born granule cells that embed into the existing inhibitory networks on the olfactory bulb, can potentially alter the responses of Mitral/Tufted neurons that project activity to the Piriform Cortex and to other areas of the brain. Fundamentally, it would seem that these granule cells could alter the stability of neural codes in the OB over time. The authors develop a spiking network model to explore how stability can be achieved both in the OB over time and in the PC, which receives inputs. The model recapitulates published activity recordings of M/T cells and shows how activity in different M/T cells from the same glomerulus shifts over time in ways that, in spite of the shift, preserve population/glomerular level codes. However, these different M/T cells fan out onto different pyramidal cells of the PC, which gives rise to instability at that level. STDP then, is necessary to maintain stability at the PC level as long as odor environments remain constant. These results may also apply to a similar neurogenesis-based change in the Dentate Gyrus, which generates instability in CA1/3 regions of the hippocampus

      Strengths:

      A robust network model that untangles important, seemingly contradictory mechanisms that underlie olfactory coding.

      Weaknesses:

      The work is a significant contribution to understanding olfactory coding. But the manuscript would benefit from a brief discussion of why neurogenesis occurs in the first place - e.g., injury, ongoing needs for plasticity, and adapting to turnover of ORNs. There is literature on this topic. It seems counterintuitive to have a process in the MOB (and for that matter in the DG) that potentially disrupts the ability to generate stable codes both in the MOB and PC, and in particular a disruption that requires two different mechanisms - multiple M/T cells per glomerulus in the MOB and STDP in the PC - to counteract.

      We appreciate the reviewer’s suggestion and added discussion on this point in the revised manuscript (line 431-435).

      Given that neurogenesis has an important function, and a mechanism is in place to compensate for it in the MOB, why would it then be disrupted in fan-out projections to the PC? The answer may lie in the need for fan-out projections so that pyramidal neurons in the PC can combinatorially represent many different inputs from the MOB. So something like STDP would be needed to maintain stability in the face of the need for this coding strategy.

      This kind of discussion, or something like it, would help readers understand why these mechanisms occur in the first place. It is interesting that PC stability requires that odor environments be stable, and that this stability drives PC representational stability. This result suggests experimental work to test this hypothesis. As such, it is a novel outcome of the research.

      We agree with the reviewer. The fan-out from the bulb to the piriform cortex is essential for the combinatorial coding that allows PCx neurons to represent many odor features and mixtures. This architecture gives the piriform cortex great coding capacity, but it also makes the system sensitive to small changes in its inputs. As a result, drift that originates in the bulb can spread more easily in PCx. A stabilizing mechanism is therefore needed downstream. In our model, STDP provides this stabilization by reinforcing the dimensions that carry meaningful odor structure. This allows the piriform cortex to keep a stable population code even when its inputs change over time. Neurogenesis supplies the flexibility, the fan-out supplies the expressive power, and STDP supplies the stability. All three elements work together to support a system that must recognize odors reliably while still adapting to new sensory experiences. We have added discussion on this point in the revised manuscript (line 395-405).

      Reviewer #3 (Public review):

      Summary

      The authors set out to explore the potential relationship between adult neurogenesis of inhibitory granule cells in the olfactory bulb and cumulative changes over days in odorevoked spiking activity (representational drift) in the olfactory stream. They developed a richly detailed spiking neuronal network model based on Izhikevich (2003), allowing them to capture the diversity of spiking behaviors of multiple neuron types within the olfactory system. This model recapitulates the circuit organization of both the main olfactory bulb (MOB) and the piriform cortex (PCx), including connections between the two (both feedforward and corticofugal). Adult neurogenesis was captured by shuffling the weights of the model's granule cells, preserving the distribution of synaptic weights. Shuffling of granule cell connectivity resulted in cumulative changes in stimulus-evoked spiking of the model's M/T cells. Individual M/T cell tuning changed with time, and ensemble correlations dropped sharply over the temporal interval examined (long enough that almost all granule cells in the model had shuffled their weights).

      Interestingly, these changes in responsiveness did not disrupt low-dimensional stability of olfactory representations: when projected into a low-dimensional subspace, population vector correlations in this subspace remained elevated across the temporal interval examined. Importantly, in the model's downstream piriform layer, this was not the case. There, shuffled GC connectivity in the bulb resulted in a complete shift in piriform odor coding, including for low-dimensional projections. This is in contrast to what the model exhibited in the M/T input layer. Interestingly, these changes in PCx extended to the geometrical structure of the odor representations themselves. Finally, the authors examined the effect of experience on representational drift. Using an STDP rule, they allowed the inputs to and outputs from adult-born granule cells to change during repeated presentations of the same odor. This stabilized stimulus-evoked activity in the model's piriform layer.

      Strengths

      This paper suggests a link between adult neurogenesis in the olfactory bulb and representational drift in the piriform cortex. Using an elegant spiking network that faithfully recapitulates the basic physiological properties of the olfactory stream, the authors tackle a question of longstanding interest in a creative and interesting manner. As a purely theoretical study of drift, this paper presents important insights: synaptic turnover of recurrent inhibitory input can destabilize stimulus-evoked activity, but only to a degree, as representations in the bulb (the model's recurrent input layer) retain their basic geometrical form. However, this destabilized input results in profound drift in the model's second (piriform) layer, where both the tuning of individual neurons and the layer's overall functional geometry are restructured. This is a useful and important idea in the drift field, and to my knowledge, it is novel. The bulb is not the only setting where inhibitory synapses exhibit turnover (whether through neurogenesis or synaptic dynamics), and so this exploration of the consequences of such plasticity on drift is valuable. The authors also elegantly explore a potential mechanism to stabilize representations through experience, using an STDP rule specific to the inhibitory neurons in the input layer. This has an interesting parallel with other recent theoretical work on drift in the piriform (Morales et al., 2025 PNAS), in which STDP in the piriform layer was also shown to stabilize stimulus representations there. It is fascinating to see that this same rule also stabilizes piriform representations when implemented in the bulb's granule cells.

      The authors also provide a thoughtful discussion regarding the differential roles of mitral and tufted cells in drift in piriform and AON and the potential roles of neurogenesis in archicortex.

      In general, this paper puts an important and much-needed spotlight on the role of neurogenesis and inhibitory plasticity in drift. In this light, it is a valuable and exciting contribution to the drift conversation.

      We appreciate the reviewer’s comment and thank them for their thoughtful feedback.

      Weaknesses

      I have one major, general concern that I think must be addressed to permit proper interpretation of the results.

      I worry that the authors' model may confuse thinking on drift in the olfactory system, because of differences in the behavior of their model from known features of the olfactory bulb. In their model, the tuning of individual bulbar neurons drifts over time.

      This is inconsistent with the experimental literature on the stability of odor-evoked activity in the olfactory bulb.

      In a foundational paper, Bhalla & Bower (1997) recorded from mitral and tufted cells in the olfactory bulb of freely moving rats and measured the odor tuning of well-isolated single units across a five-day interval. They found that the tuning of a single cell was quite variable within a day, across trials, but that this variability did not increase with time. Indeed, their measure of response similarity was equivalent within and across days. In what now reads as a prescient anticipation of the drift phenomenon, Bhalla and Bower concluded: "it is clear, at least over five days, that the cell is bounded in how it can respond. If this were not the case, we would expect a continual increase in relative response variability over multiple days (the equivalent of response drift). Instead, the degree of variability in the responses of single cells is stable over the length of time we have recorded." Thus, even at the level of single cells, this early paper argues that the bulb is stable.

      This basic result has since been replicated by several groups. Kato et al. (2012) used chronic two-photon calcium imaging of mitral cells in awake, head-fixed mice and likewise found that, while odor responses could be modulated by recent experience (odor exposure leading to transient adaptation), the underlying tuning of individual cells remained stable. While experience altered mitral cell odor responses, those responses recovered to their original form at the level of the single neuron, maintaining tuning over extended periods (two months). More recently, the Mizrahi lab (Shani-Narkiss et al., 2023) extended chronic imaging to six months, reporting that single-cell odor tuning curves remained highly similar over this period. These studies reinforce Bhalla and Bower's original conclusion: despite trial-to-trial variability, olfactory bulb neurons maintain stable odor tuning across extended timescales, with plasticity emerging primarily in response to experience. (The Yamada et al., 2017 paper, which the authors here cite, is not an appropriate comparison. In Yamada, mice were exposed daily to odor. Therefore, the changes observed in Yamada are a function of odor experience, not of time alone. Yamada does not include data in which the tuning of bulb neurons is measured in the absence of intervening experience.)

      Therefore, a model that relies on instability in the tuning of bulbar neurons risks giving the incorrect impression that the bulb drifts over time. This difference should be explicitly addressed by the authors to avoid any potential confusion. Perhaps the best course of action would be to fit their model to Mizrahi's data, should this data be available, and see if, when constrained by empirical observation, the model still produces drift in piriform. If so, this would dramatically strengthen the paper. If this is not feasible, then I suggest being very explicit about this difference between the behavior of the model and what has been shown empirically. I appreciate that in the data there is modest drift (e.g., Shani-Narkiss' Figure 8C), but the changes reported there really are modest compared to what is exhibited by the model. A compromise would be to simply apply these metrics to the model and match the model's similarity to the Shani-Narkiss data. Then the authors could ask what effect this has on drift in piriform.

      The risk here is that people will conclude from this paper that drift in piriform may simply be inherited from instability in the bulb. This view is inconsistent with what has been documented empirically, and so great care is warranted to avoid conveying that impression to the community.

      We thank the reviewer for highlighting this important issue. We agree that the interpretation of our model requires care to avoid implying that the olfactory bulb exhibits spontaneous drift. As the reviewer points out, the empirical literature shows that M/T-cell tuning is highly stable for infrequently experienced odors, but can change with daily, persistent odor exposure (e.g., Kato et al., 2012; Yamada et al., 2017).

      We thank the reviewer for highlighting the Bhalla and Bower paper, as it is foundational and actually raises a number of interesting and important points. As the authors noted, there was significant variability in trial-to-trial responses over sessions and days in single neurons. This is likely due to on-going dynamics (Laurent, 1999), the impact of behaviorally relevant top-down feedback (Chen and Padmanabhan, 2022), decision making (Kay and Laurent, 1999), and an array of factors that our model does not include. In that manuscript, the authors note “the variability of the same neuron recorded over different days…was not statistically different from the within day comparisons.” While these results appear prima facie to be different from our results, there are several reasons why they may not be the case.

      First, different metrics are used for measuring neuronal stability, which may contribute to some of the differences. Second, and perhaps more importantly and interestingly, the authors in that study noted the significant trial-to-trial variability within day, which is not present in our study because our model has none of the richness of behavior that Bhalla and Bower found in the freely behaving rat. This variability within day (which is much higher than what we report) would reduce the impact of drift across days - a result that would complicate how plasticity across multiple timescales occurs. We thank the reviewer for the insights on this critical study and include these points in our discussion (line 321-330).

      Neural responses to odor representations are incredibly variable across different time scales (Padmanabhan and Urban 2010, Angelo et al 2011, Kapoor and Urban 2006, Friedrich and Laurent, 2001, Smear et al 2011, Wesson et al 2008). In our model, none of this selection of survival related to behavior is included, nor are there specific rules about which synapses may be preferentially strengthened (due to neuro modulation corresponding to behavioral choice and reinforcement learning). Instead, we aimed to recapitulate the experimental design of a few studies (Kato et al 2012, Yamada et al, 2017) to understand how neurogenesis and drift are related. Over the simulated 10 days, the odor is presented every day, and the network is otherwise frozen between sessions—meaning the model lacks mechanisms that would normally support recovery during intervals without odor exposure. Under these conditions, adult neurogenesis effectively interacts with repeated experience, producing gradual changes in individual M/T-cell tuning. Thus, our results should be interpreted as modeling experience dependent changes over the timescale of neurogenesis, not as evidence for spontaneous drift in the bulb. We now state this explicitly in the Discussion to prevent confusion and expand the discussion to incorporate some of these critical ideas (line 321-330).

      Major comments (all related to the above point)

      (1) Lines 146-168: The authors find in their model that "individual M/T cells changed their responses to the same odor across days due to adult-neurogenesis, with some cells decreasing the firing rate responses (Fig.2A1 top) while other cells increased the magnitude of their responses (Fig. 2A2 bottom, Fig. S2)" they also report a significant decrease in the "full ensemble correlation" in their model over time. They claim that these changes in individual cell tuning are "similar to what has been observed by others using calcium imaging of M/T cell activity (Kato et al., 2012 and Yamada et al., 2017)" and that the decrease in full ensemble correlation is "consistent with experimental observations (Yamada et al., 2017)." However, the conditions of the Kato and Yamada experiments that demonstrate response change are not comparable here, as odors were presented daily to the animals in these experiments. Therefore, the changes in odor tuning found in the Kato and Yamada papers (Kato Figure 4D; Yamada Figure 3E) are a function of accumulated experience with odor. This distinction is crucial because experience-induced changes reflect an underlying learning process, whereas changes that simply accumulate over time are more consistent with drift. The conditions of their model are more similar to those employed in other experiments described in Kato et al. 2012 (Figure 6C) as well as Shani-Narkiss et al. (2023), in which bulb tuning is measured not as a function of intervening experience, but rather as a function of time (Kato's "recovery" experiment). What is found in Kato is that even across two months, the tuning of individual mitral cells is stable. What alters tuning is experience with odor, the core finding of both the Kato et al., 2012 paper and also Yamada et al., 2017. It is crucial that this is clarified in the text.

      We thank the reviewer. As the issue raised here is related to the previous comment, we have clarified this in the revised text to avoid any misleading comparison and specify what aspects of our computational model map onto experimental studies and what aspects we cannot recapitulate and as a result, the places where our comparisons are limited.

      (2) The authors show that in a reduced-space correlation metric, the correlation of lowdimensional trajectories "remained high across all days"..."consistent with a recent experimental study" (Shani-Narkiss et al., 2023). It is true that in the Shani-Narkiss paper, a consistent low-dimensional response is found across days (t-SNE analysis in Shani-Narkiss Figure 7B). However, the key difference between the Shani-Narkiss data and the results reported here is that Shani-Narkiss also observed relative stability in the native space (Shani-Narkiss Figure 8). They conclude that they "find a relatively stable response of single neurons to odors in either awake or anesthetized states and a relatively stable representation of odors by the MC population as a whole (Figures 6-8; Bhalla and Bower, 1997)." This should be better clarified in the text.

      We agree with the reviewer that some of the cells in Shani-Narkiss Figure 8B showed relatively stable responses (while others did not). However, there is a clear monotonic increase in the “Average differences” over time, from “Same day” to “1 month” to “6 month”, as quantified in their Figure 8B. Although the author concluded that they "find a relatively stable response of single neurons”, we would argue that their data also provided evidence for what we would term “relatively unstable responses” as found in our model. But per reviewer’s suggestion, we better clarify it in the text now (line 194197).

      (3) In the discussion, the authors state that "In the MOB, individual M/T cells exhibited variable odor responses akin to gain control, altering their firing rate magnitudes over time. This is consistent with earlier experimental studies using calcium-imaging." (L3146). Again, I disagree that these data are consistent with what has been published thus far. Changes in gain would have resulted in increased variability across days in the Bhalla data. Moreover, changes in gain would be captured by Kato's change index ("To quantify the changes in mitral cell responses, we calculated the change index (CI) for each responsive mitral cell-odor pair on each trial (trial X) of a given day as (response on trial X - the initial response on day 1)/(response on trial X + the initial response on day 1). Thus, CI ranges from −1 to 1, where a value of −1 represents a complete loss of response, 1 represents the emergence of a new response, and 0 represents no change." Kato et al.). This index will capture changes in gain. However, as shown in Figure 4D (red traces), Figure 6C (Recovery and Odor set B during odor set A experience and vice versa), the change index is either zero or near zero. If the authors wish to claim that their model is consistent with these data, they should also compute Kato's change index for M/T odor-cell pairs in their model and show that it also remains at 0 over time, absent experience.

      We appreciate the reviewer’s suggestion and edited the text to make it more accurate (line 319-320).

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      (1) Line 28 "a graduate alteration in sensory perception". We do not know if drift results in changes in perception. If anything, behavioral evidence suggests that perception remains stable in spite of drift. For example, in Driscoll et al. (2017) mice are able to successfully navigate a virtual T maze despite drift, and in Schoonover et al. (2021), mice maintain aversive responses following fear conditioning, despite drift in the piriform. Finally, spatial navigation appears unimpaired despite pronounced drift in the hippocampus (e.g., Climer et al., 2025). It would be more appropriate to say "stimulusevoked activity patterns" than "sensory perception" or other words that refer to neuronal activity rather than cognition or behavior.

      We edited the text to make it more accurate per the reviewer’s suggestion (line 27).

      (2) In the introduction, the authors state: "This representational drift has led to the hypothesis that PCx, rather than being a primary sensory area, may be more like an association cortical region." (L76-78). However, the hypothesis that PCx operates as an association cortex comes originally from Haberly's work and thinking (e.g., Haberly and Bower, 1984, elaborated in extensive detail in Haberly, 2001). I think it would be appropriate to acknowledge that here.

      We added the references to make acknowledge that per the reviewer’s suggestion (line 77).

      (3) In the methods, the authors elegantly describe how they induce neurogenesis in their model using weight reshuffling (L805-814). I think it could really help the reader understand the model if this idea were also included in the results section. As the results section currently reads, it seems as if their model implemented neurogenesis in a different fashion: "To do this, following elimination of 10% of the GCs in the network, we added new cells and randomly assigned synaptic weights between these abGCs and M/Ts". I appreciate that in their model, shuffling all the weights of a given GC randomly is akin to "elimination", but I feel like at first blush the results section risks giving an impression a bit different than that actually used in the model.

      We edited the text to make it more accurate per the reviewer’s suggestion (line 110-112).

    1. eLife Assessment

      This manuscript introduces a new low-cost and accessible method for assembling combinatorially complete microbial consortia using basic laboratory equipment, which is a valuable contribution to the field of microbial ecology and biotechnology. The evidence presented is compelling, demonstrating the method's effectiveness through empirical testing on both synthetic colorants and Pseudomonas aeruginosa strains.

    2. Reviewer #1 (Public review):

      This work develops a simple, rapid, low-cost methodology for assembling combinatorially complete microbial consortia using basic laboratory equipment. The motivation behind this work is to make the study of microbial community interactions more accessible to laboratories that lack specialized equipment such as robotic liquid handlers or microfluidic devices. The method was tested on a library of Pseudomonas aeruginosa strains to demonstrate its practicality and effectiveness. It provided a means to explore the complex functional interactions within microbial communities and identify optimal consortia for specific functions, such as biomass production.

      The primary strength of this manuscript lies in its accessibility and practicality. The method proposed by the authors allows any laboratory with standard equipment, such as multichannel pipettes and 96-well plates, to readily construct all possible combinations of microbial consortia from a given set of species. This greatly enhances access to full factorial designs, which were previously limited to labs with advanced technology.

      Another strength of the manuscript is the measurement and analysis of the biomass of all possible combinations of 8 strains of P. aeruginosa. This analysis provides a concrete example of how the authors' new methodology can be used to identify the best-performing communities and map pairwise and higher-order functional interactions.

      Notably, the authors do exceptionally well in providing a thorough description of the methodology, including detailed protocols and an R script for customizing the method to different experimental needs. This enhances the reproducibility and adaptability of the methodology, making it a valuable resource for researchers wishing to adopt this methodology.

      Comments on revisions:

      I thank the authors for their response. The revisions have addressed all of the issues raised in my original review, and I believe they have improved the clarity of the manuscript.

    3. Reviewer #3 (Public review):

      The author developed a useful methodology for generating all combinations of multiple reagents using standard lab equipment. This methodology has clear uses in for studying of microbial ecology as they demonstrated. The methodology will likely be useful for other types of experiments that required exhaustive testing of all possible combinations of a given set of reagents (e.g., drug-drug antagonism and synergy).

      The authors provided a useful R script that generates a detailed experimental protocol for building desired combination from any number of reagents. The produced document is useful and has clear instructions. The output of the computer script will be strengthened if graphical output is also provided (similar to the one provided in Figure 1C).

      The authors show that the error rate of the method doesn't go up with the number of combinations using dyes (Figure 2).

      The authors demonstrate the value of their methodology for studying interactions within microbial consortia by assembling all possible combinations of eight strains of Pseudomonas aeruginosa. The value of their methodology for this application is well founded. However, it is also unclear why specific experimental choices were made for this application. It is unclear why authors continue to show the absorbance measurements of strain assemblies over the entire wavelength spectrum and not just for ABS 600 nm (figures 3 and 4). It is also unclear why the authors provided information on the "sum of the three spectra" as this reference line is meaningless and not a reasonable null model for estimating how well specific strain combinations will grow together.

      Figure 5 illustrates the various analysis types that can be performed on the data collected from growing combinations of eight Pseudomonas aeruginosa strains. It is a very informative figure since it provides a "roadmap" on the various ways in which the dataset produced can be explored. The information in Figure 5 and S6 will likely be very useful for a wide audience.

      Comments on revisions:

      We thank the author for considering the review and providing additional clarifications. The authors disagree with some of the points we raised and decided to reject some of our recommendations. All the points of disagreement are minor and clearly subjective (e.g., stylistic). Congratulations again for this elegant manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This work develops a simple, rapid, low-cost methodology for assembling combinatorially complete microbial consortia using basic laboratory equipment. The motivation behind this work is to make the study of microbial community interactions more accessible to laboratories that lack specialized equipment such as robotic liquid handlers or microfluidic devices. The method was tested on a library of Pseudomonas aeruginosa strains to demonstrate its practicality and effectiveness. It provided a means to explore the complex functional interactions within microbial communities and identify optimal consortia for specific functions, such as biomass production.

      The primary strength of this manuscript lies in its accessibility and practicality. The method proposed by the authors allows any laboratory with standard equipment, such as multichannel pipettes and 96-well plates, to readily construct all possible combinations of microbial consortia from a given set of species. This greatly enhances access to full factorial designs, which were previously limited to labs with advanced technology.

      Another strength of the manuscript is the measurement and analysis of the biomass of all possible combinations of 8 strains of P. aeruginosa. This analysis provides a concrete example of how the authors' new methodology can be used to identify the best-performing communities and map pairwise and higher-order functional interactions.

      Notably, the authors do exceptionally well in providing a thorough description of the methodology, including detailed protocols and an R script for customizing the method to different experimental needs. This enhances the reproducibility and adaptability of the methodology, making it a valuable resource for researchers wishing to adopt this methodology.

      We thank the reviewer for their thoughtful comments and positive assessment of our work. Below we detail the changes we have introduced in the manuscript to clarify issues raised by the reviewer.

      While the methodology is robust and well-presented, there are some limitations that should be acknowledged more thoroughly. First, the method's scalability is an important factor. The authors indicate that it should be effective for up to 10-12 species, but there is no discussion of what sets this scale: time, amount of labor, consumables, the likelihood of error, sample volume, etc.

      The 10-12 species estimation is based on our own experience implementing the protocol, and set primarily by time, labor, and consumables (as rightly pointed out by the reviewer) rather than conceptual limitations of the approach. We have added clarifications in the Discussion (lines 401-405) regarding these scalability-limiting factors.

      Second, this methodology is tailored to construct communities where the abundance of each strain is identical in each combination. Therefore, combinations with a different number of strains also differ in the total initial amount of microbial cells. Second, variations in the initial proportions of the same set of strains cannot be readily explored.

      Note that the “density homogenization” step is optional and it could be skipped entirely, which would result in a same species being present at variable densities across consortia: specifically, skipping this step would make the density of a species in a consortium inversely proportional to the number of species in that consortium. Further variations in initial abundance could be explored by treating a same strain at two (or more) starting abundances as distinct inputs of the protocol – though this would naturally increase the number of combinations to test.

      We have included a paragraph in the Discussion (lines 416-423) describing how we can, in principle, extend our protocol to explore abundance effects.

      Third, the manuscript only discusses how to construct the combinations, and not how to assay them afterward (e.g. for community function, interspecific interactions, etc.). While details on how to achieve these goals are clearly outside the scope of this work, the use of biomass as an example function may obfuscate this caveat, which should be stated more explicitly.

      We agree that the manuscript focuses exclusively on the construction of microbial communities and does not address how these communities should be assayed afterward. This is an intentional scope decision. The proposed protocol is fully compatible with a wide range of functional, interaction-based, or omics-based assays. Absorbance is mentioned as an illustrative example of a possible readout, rather than as a recommended or exclusive parameter. We have revised the text to explicitly state that the assessment of community function or interspecific interactions lies outside the scope of this work and must be tailored to the specific biological question being addressed.

      Reviewer #1 (Recommendations for the authors):

      A few specific technical notes and notes about clarity:

      (1) It may be worth being more explicit about how to produce replicates. For example, producing technical replicates by inoculating multiple times from the same set of combinations, while biological replicates require making the combinations multiple times.

      We have updated the main text to clarify this point (line 780-781).

      (2) Figure 2C: May be worth adding some context to these performance numbers. What are typical accuracies? What would they be in a liquid handler?

      Assessing typical accuracies is nuanced since the error depends not only on the assembly steps, but also on potential intrinsic variation of the specific community function being tested and the method used to quantify it. One of the main reasons for including the experiment using colorant combinations was precisely to minimize these other sources of variation. In this experiment, we find that the error we quantify is consistent with cumulative pipetting variation (as a reference, a typical lab micropipette has an error of 0.5-1%). This is now explicitly mentioned in the manuscript.

      (3) Figure 5A: I realize it is unlikely that strains go extinct in these experiments. But it is still worth clarifying that the number of strains is the number inoculated, rather than the one present at the time of measurement.

      We updated the caption of Figure 5A as recommended by the reviewer.

      (4) Figure 5B: I realize this is just for illustration purposes, but you should provide more information about the magnitude of the difference in performance of these combinations and the confidence in their ranking (or variability in performance across replicates).

      Following this suggestion, we have added a paragraph where we report the variation across replicates for the highest-performing consortia (lines 318-323). Indeed, while variation across replicates is small, it is enough to produce an overlap between the confidence intervals of the function of some of the highest-performing consortia. This is now explicitly acknowledged in the manuscript.

      (5) Figure 5C: I believe the bold black lines indicate the combinations shown in panel D, but that is not explicitly stated.

      We have updated the caption of Figure 5C.

      Reviewer #2 (Public review):

      A simple and effective method for combinatorial assembly of microbes in synthetic communities of <12 species.

      Overall, this manuscript is a useful contribution. The efficiency of the method and clarity of the presentation is a strength. It is well-written and easy to follow. The figures are great, the pedagogical narrative is crisp. I can imagine the method being used in lots of other contexts too.

      The authors could better clarify what HOIs mean. They could address challenges with assaying community function. However, neither of these “weaknesses” affects the primary goal of the paper which is methodological.

      We thank the reviewer for the positive assessment. With respect to HOIs, we recognize that defining and quantifying them is a non-trivial subject within the broader field of microbial ecology (see e.g. ref. 24 within the manuscript). Since our aim with this manuscript is methodological, as the reviewer notes, here we have done our best to avoid introducing new or ambiguous definitions. For this reason, we simply adopt a definition given in previous works (including refs. 10, 19, 24, 29, 37, and 38 in the manuscript), where the context-dependence of pairwise interaction terms is taken as a signature of HOIs. With respect to the challenges in assaying community function, please see our responses below.

      Reviewer #2 (Recommendations for the authors):

      Overall, this manuscript is a useful contribution, I appreciate the authors taking the time to write it up! I have a few relatively minor comments.

      (1) It would be nice in the introduction to address why we might want the full factorial construction of communities in the first place. This is an especially relevant question in light of the authors' 2023 Nat E&E paper where they showed that the function of communities can often be learned even when only a fraction of all possible communities is measured. This is addressed in part in the paragraph on line 34, but I think it might be worth expanding a bit given the focus on the paper.

      We sincerely appreciate the reviewer’s feedback. In fact, one of the reasons that make full factorial construction desirable is precisely to test theoretical and computational models of community function, including (but not only) the statistical models developed in our 2023 Nature E&E paper. In that work, we showed that low-order models can explain a substantial fraction of the variation in community function in previously-published datasets, but we also predict that the same models could fail under complex structures of microbial interactions (e.g., strong high-order interactions). The protocol we present here enables the empirical quantification of such interactions, making this prediction (and others) directly testable. We have included that clarification in the revised manuscript (lines 56-58).

      (2) Around line 74, I think it is worth mentioning that even this elegant design will face insurmountable practical challenges (time, liquid handling operations, number of plates will explode) for full factorial design with 20, 30, 40 species or more. This is relevant for some very complex synthetic consortia that some microbiome groups are constructing (e.g. hCom2 from Huang/Fishbach groups) https://www.sciencedirect.com/science/article/pii/S0092867422009904.

      We agree with the reviewer that full factorial designs become impractical for very large species pools. These limits are now more clearly mentioned in the revised manuscript. We refer the reviewer to our response to comment #1 by Reviewer 1 for further details.

      (3) The binary construction is a really nice clean way to explain the protocol. Appreciate the pedagogy!

      We thank the reviewer for the appreciation.

      (4) In the experiment with pseudomonas strains the consortia are grown in LB. This medium will support growth to relatively high OD (>1). At these densities, the change in OD with density is almost certainly not linear with cell density, and this nonlinearity likely depends on strain identity. In this case, the assumption of additivity may not hold. As a result, some of the observed "interactions" may simply be non-linearity in the assay and not the abundance of bacteria in the communities. Of course, this does not affect the assembly protocol in any way, but it does complicate the interpretation of interactions via this assay. I think this is worth pointing out since other researchers may have to think carefully about the assay they use when constructing these synthetic consortia. I think in this methods paper it is important to emphasize this so other researchers do not mistakenly identify interactions due to issues with the assay.

      We thank the reviewer for pointing out this important aspect. In our experiment, we use Abs<sub>600</sub> simply as an example of a measurable community-level function. The reviewer is absolutely correct in that mapping absorbance to biomass is nuanced at large OD values, where this relationship becomes non-linear. While this is not an issue from the perspective of the protocol itself, it is indeed an important consideration for users who may want to obtain reliable quantifications of biomass. We have updated the manuscript to explicitly mention this potential issue (lines 307-313). We have also emphasized the fact that our focus on Abs<sub>600</sub> is strictly for illustrative purposes, and we have removed all instances where a direct mapping from Abs<sub>600</sub> to biomass was implied in the text.

      (5) Subtle point regarding HOIs. HOI (or pairwise) statistical interactions need not quantitatively be the same as interactions in a lotka volterra sense. I realize the authors do not explicitly use the term "interaction" in an gLV model formalism but this is how the majority of readers will interpret this term. I believe it is a research question as to how pairwise gLV interactions manifest themselves in terms of functional interactions. For example, a purely pairwise LV model could easily have HOI "functional interactions" if the function is total abundance since abundances depend nonlinearly on LV interactions. I think this part of the manuscript could be confusing to readers for this reason. I think the term "functional interaction" really helps with this issue, but just asking the authors to make sure this is clear.

      I say this because ref: 37 is focused on HOIs in an LV sense. Here, as the authors are aware, they are computing statistical "interactions" in the sense of epistasis. Given that they are computing this epistasis averaged across all community compositions a more appropriate citation might be [https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004771] where the same quantity is computed in a protein context.

      We thank the reviewer for pointing out this important issue. Indeed, we use the term “interaction” in a statistical sense (as the deviation of the observed community function from a null, additive expectation) rather than in a Lotka-Volterra sense. We agree that the reference suggested by the reviewer is more appropriate in this context. We have updated the reference list accordingly.

      (6) Figure 5G - a little hard to see. Any way to show this data more clearly? It looks like all interactions have a mean of 0 because of the way the data are presented.

      The reviewer is indeed correct in that, as defined, the interactions that we quantify are back ground dependent, and their average across backgrounds lies near zero for all species. More than an issue with the representation, we think that this is an important empirical observation: it indicates that a same species pair may interact positively or negatively depending on its ecological context. We believe that the current representation is most appropriate for making this clear, but we would be open to discussing alternatives if the reviewer had a specific suggestion in mind.

      Reviewer #3 (Public review):

      The authors developed a useful methodology for generating all combinations of multiple reagents using standard lab equipment. This methodology has clear uses for studying microbial ecology as they demonstrated. The methodology will likely be useful for other types of experiments that require exhaustive testing of all possible combinations of a given set of reagents (e.g., drug-drug antagonism and synergy).

      The authors provided a useful R script that generates a detailed experimental protocol for building the desired combination from any number of reagents. The produced document is useful and has clear instructions. The output of the computer script will be strengthened if graphical output is also provided (similar to the one provided in Figure 1C).

      The authors show that the error rate of the method doesn't go up with the number of combinations using dyes (Figure 2).

      The authors demonstrate the value of their methodology for studying interactions within microbial consortia by assembling all possible combinations of eight strains of Pseudomonas aeruginosa. The value of their methodology for this application is well-founded. However, it is also unclear why specific experimental choices were made for this application. It is unclear why authors continue to show the absorbance measurements of strain assemblies over the entire wavelength spectrum and not just for ABS 600 nm (Figures 3 and 4). It is also unclear why the authors provided information on the "sum of the three spectra" as this reference line is meaningless and not a reasonable null model for estimating how well specific strain combinations will grow together.

      Figure 5 illustrates the various analysis types that can be performed on the data collected from growing combinations of eight Pseudomonas aeruginosa strains. It is a very informative figure since it provides a "roadmap" on the various ways in which the dataset produced can be explored. The information in Figures 5 and S6 will likely be very useful for a wide audience.

      Reviewer #3 (Recommendations for the authors):

      (1) Congratulations. I think the manuscript lays out a simple and very elegant methodology that will be useful for many. While I think the method is overall well explained and rationalized, the paper can greatly benefit from further expansion of Figure 5 at the expense of Figures 3 and 4.

      We thank the reviewer for their thoughtful assessment of our work. We have considered the recommendations and discuss the following points in response.

      (2) Unless I am missing something, there is no reason to present data collected across the entire wavelength spectrum for microbial assemblies (Figures 3 and 4). Moreover, using the same color palette for bacterial strains (Figure 3A) and colorants (Figure 2) is highly confusing. I suggest considering using only the 600 nm wavelength for any data collected from microbial assemblies and using a very different color palette for bacteria and colorants to avoid misinterpretation of the data.

      We thank the reviewer for this suggestion. Our goal with Figures 3-4 was to illustrate the convenience of the protocol and the ease with which many measurements can be performed in parallel once the combinatorial assembly has been completed. While we focus on Abs<sub>600</sub> for all subsequent analyses, we chose to display the full spectra in Figs. 3-4 in hopes that future studies can make use of our rich dataset to interrogate questions on microbial interactions, with the option to focus on other wavelengths (which can effectively be treated as different community-level functions in their own right; for instance, we have previously used Abs<sub>405</sub> as a proxy for siderophore concentration). We think there is value in Figs. 3-4 in their current form to make this clear to readers.

      (3) Unlike dye absorbance, bacterial carrying capacity has an upper limit, so summing individual population absorbance as a reference line seems unjustified. If the summation of absorbance is meant to provide a "null model" for expected growth, a more suitable model should be considered (e.g., max spectra or a weighted sum of the spectra from individual members).

      We agree with the reviewer that our null model is not biologically constrained, and we did not intend to imply that the additive expectation was derived from biological principles. Instead, this additive expectation should be interpreted as a simple statistical baseline with minimal assumptions. The use of an additive baseline for quantifying microbial interactions has been addressed in the literature (see, e.g., references 10, 19, 24, 29, 37, and 38), and so here we chose to conform to this convention to avoid introducing new, non-standard quantifications of pairwise and higher-order interactions. We have revised the text to make this more explicit.

      (4) The R script is a valuable tool. I think that a valuable improvement will be to also generate visual representations as part of the script’s output such as the colored plates in Figure 1C that are specific to the generated protocol.

      We have updated the script so that it now also outputs a table specifying the location of each consortium within the plates. We chose to make this a text, rather than a graphics output, to ensure cross-device compatibility.

      (5) The discussion rightly acknowledges the potential to extend the protocol to larger libraries using liquid handlers. To facilitate this implementation, it might be beneficial to modify the script output so that the ‘volume’, ‘plate’, and ‘column’ values are tab- or comma-delimited.

      We thank the reviewer for the suggestion. We have modified the output so that it is now tab-delimited.

      (6) Figures 3 and 4 do not provide a lot of insight. I would suggest combining them into a single figure and using only absorbance values at 600 nm. It would also be interesting to add a histogram of these absorbance values and possibly show histograms for subgroups (e.g. all assemblies with more than 3 strains vs all assemblies with 3 or fewer strains).

      With respect to Figs. 3 and 4, we refer the reviewer to our response to comment #2. With respect to the histogram/subgroups plot, we understand that this would be a slightly modified version of the current Fig. 5A, where we show means and standard deviations across all subgroups of 1 to 8 species, and so we find it unclear what this figure would add.

      (7) With the recommendations of removing or reworking Figures 3 and 4, and the fact that Figure 5 is data-rich (and extremely useful), it would be beneficial to split Figure 5 and include the data shown in Figure S6 in the main figure. The analysis in Figure 6S is valuable and it might be beneficial to elevate this analysis to a primary figure and provide a detailed explanation of its rationale and methods in the main text.

      We appreciate this suggestion. In our view, we find that both the text and the figures benefit from a heavy focus on the assembly protocol, as this is the main contribution of this work. While we do think it is valuable to highlight the type and amount of data that can be collected with a full factorial assembly, as well as the types of analyses that can be performed with this data, we are afraid that allocating more space to these analyses may distract readers from the methodology itself. We have therefore chosen to keep the original structure for Figs. 5 and S6.

    1. eLife Assessment

      This study presents valuable findings by reanalyzing previously published MEG and ECoG datasets to challenge the predictive nature of pre-onset neural encoding effects. The evidence supporting the central conclusions remains incomplete, as additional details of the analyses are needed and alternative interpretations, such as the possibility that pre-onset predictive and sensory-evoked responses rely on distinct neural representations, have not been sufficiently addressed. The work may be of interest to researchers in language processing, predictive coding, and related fields.

    2. Reviewer #1 (Public review):

      The manuscript analyzes previously published MEG and ECoG datasets to examine pre-onset neural encoding effects during language processing, replicating effects that have been reported in earlier work and demonstrating that they persist even after controlling for correlations in the stimulus sequence. Replication of these effects across recording modalities and datasets is a valuable contribution, as it strengthens confidence in the robustness of anticipatory neural activity related to upcoming linguistic input. However, I have significant concerns regarding the interpretation of these findings, particularly the conclusion that the absence of temporal generalization between pre- and post-onset activity implies that pre-onset activity does not reflect predictive pre-activation of the upcoming word.

      The central inferential step in this argument relies on an implicit assumption: that if the brain were predicting an upcoming word, the neural representation prior to word onset should resemble, or generalize to, the representation observed after word onset. This assumption is not theoretically necessary and is not supported by a substantial body of work on predictive processing. Many contemporary models posit that predictions are represented in abstract, compressed, or probabilistic formats that differ from sensory-evoked representations, particularly in hierarchical systems such as language (e.g., Rao & Ballard, 1999; Friston, 2005; Federmeier, 2007; Kuperberg & Jaeger, 2016; de Lange et al., 2018). Under such accounts, predictive representations may encode expectations over latent semantic features or probability distributions rather than reinstating the neural code associated with perceptual input.

      In this context, the temporal generalization analyses presented here convincingly demonstrate that pre-onset and post-onset activity do not share a stable representational code. However, this result does not rule out predictive processing per se. Rather, it rules out a specific and relatively strong hypothesis: that prediction takes the form of early reinstatement of the same neural representation used during post-onset word processing. The data are equally consistent with the interpretation that pre-onset activity reflects predictive information expressed in a different representational format that is transformed upon stimulus onset.

      I therefore recommend that the authors substantially soften and clarify their conclusions regarding prediction. Statements suggesting that pre-onset activity does not reflect prediction should be revised to more precisely reflect what is directly supported by the analyses, namely, the absence of representational identity or stable overlap between pre- and post-onset activity. Explicit acknowledgement of alternative interpretations grounded in established predictive processing frameworks would improve theoretical alignment and avoid overstating the implications of the temporal generalization results.

      Overall, the empirical analyses are carefully executed, and the replication across datasets is a strength. However, the current framing risks over-interpreting what the data can rule out about prediction. A clearer distinction between representational equivalence and predictive processing would significantly strengthen the manuscript's theoretical contribution.

    3. Reviewer #2 (Public review):

      Summary:

      The authors show that pre-onset neural encoding is likely not a product of predictive processing. They demonstrate this primarily through two analyses:

      (1) They decorrelate the neural responses between pre- and post-word onset and show that this does not eliminate pre-onset neural encoding. This suggests that this pre-onset neural encoding is not a result of pre-activation driven by an underlying predictive process.

      (2) They show that the future word improvement to encoding performance shown in Caucheteux et al. is likely a result deriving from the low temporal resolution in fMRI, as it does not reproduce in MEG or ECoG data, modalities that have a higher temporal resolution better suited to this kind of analysis.

      Strengths:

      Both of the paper's arguments are overall very compelling and point to potential problems in the underlying literature that may require reevaluation. The paper does not make any unreasonable claims. The limitations of the study are appropriately addressed. The paper is well-reasoned and well-written. Overall, I believe the paper is a worthy addition to the literature on this subject.

      Weaknesses:

      One concern is that I wonder about the degree to which the residualization/decorrelation that the authors employ in Figure 4 is truly forcing the model to unlearn all the interactions between pre- and post-word onset when referencing the neural activity. This point is explicitly noted in Schonmann et al. (which the authors cite): "While residualised word embeddings no longer contain temporal stimulus dependencies, these dependencies are still represented in the neural data, and can hence be 're-learned' when fitting the regression model." I imagine the inverse of this could be true here - the dependencies are still represented in the stimulus and so can be relearned when mapping to the neural data. It is possible that the small positive onset correlation that occurs after decorrelation can be entirely explained by this. This is not a bad thing per se (as it aligns with the overall point of the article), but it is a potential methodological oversight. A clear description of the decorrelation process is necessary in the methods section.

      The paper correctly notes that their removal of bigram/n-gram information does not entirely exclude all stimulus dependencies. However, removing this fully would be extremely difficult, and the small reduction in performance of the bigram-ablated model does not point to this being a major issue.

      Separately, some of the figures are a little rough. Suggestions have been provided to the authors.

    4. Reviewer #3 (Public review):

      Previous studies have shown that language model embeddings of future words can predict brain responses to language. This has been interpreted as evidence for predictive representations in the brain. The primary finding of the present study is that this index of predictive processing is not consistent with a pre-activation account, but instead suggests continuously evolving representations. A strength of the manuscript is that it uses methods that build on previous studies and shows that previous results replicate in the current datasets, before testing new hypotheses. Addressing some minor weaknesses would further strengthen the results and ascertains that the conclusions are justified:

      (1) When analyzing neural data, "words with multiple tokens assigned by the model were excluded" (11). I am wondering whether this could have had an influence on the results. I suspect that using only single token words would bias the dataset towards semantically light high frequency and function words. Pre-activation may be different for those words from more semantically rich, longer words.

      (2) The study only used a context window of 50 tokens for language model predictions (11). This is less than in previous studies, and may constitute a confound when comparing results across studies. This may be particularly relevant in comparison to Caucheteux et al. (2003), whose results suggested more extensive predictions (9), which may require more extensive context.

      (3) The manuscript is largely missing data on the reliability of the results. Some form of significance test, and indication of variability and/or the noise floor in the figures would be helpful.

      A primary concern when analyzing naturalistic speech data is that different speech features are highly correlated across linguistic levels and across time. The manuscript makes a reasonable effort to control for stimulus autocorrelations. It is encouraging that the effect survived this correction. As the manuscript concedes, control is not perfect and controlling for "all regularities inherent to natural speech" remains a challenge (9). This should be kept in mind when interpreting the results.

      Finally, the manuscript also argues that "we observed clear signatures of postdiction, with neural activity reflecting persistent encoding of prior words" (abstract). I did not follow this reasoning. The ostensible evidence for this is that "including the previous word ... improves encoding even after the current word's onset" (Figure 5). However, this is not further surprising, because the previous word can often only be recognized around the end of the word, corresponding to the time of the current word onset. Language model embeddings reflect a contextual semantic interpretation of the word, which likely requires further processing after word recognition. I would thus expect that the initial contextual interpretation of a word occurs during presentation of the subsequent word. Evidence for "persistent encoding" should include encoding beyond this point, i.e., over the course of several subsequent words. Contrary to this, Figure 5 a (left) suggests that the predictive effect of the previous word (d-1) stops around the offset of the current word (d). This suggests to me that, once controlling for subsequent embeddings, the embedding of a word disappears from the neural activity soon after word recognition.

    5. Author response:

      Reviewer 1:

      We thank the reviewer for bringing a critical theoretical distinction to our attention. We agree that the Temporal Generalization (TG) results specifically rule out the reinstatement of post-onset neural codes, the idea that the brain pre-activates the same neural representation evoked by the stimulus. In fact, we mention in the discussion: "This temporal variability underscores the need for a more nuanced view of what constitutes predictive pre-activation, as no stable representational state appears to persist after word presentation that could serve as its target.".

      To our understanding, prediction is rarely explicitly defined in the literature, and the distinction between predictive pre-activation and other forms of prediction is seldom made. Moreover, the idea of compressed or abstract forms of pre-activated representations has not, to our knowledge, been explicitly articulated in the literature. Our TG findings therefore, put meaningful constraints on theories of prediction. In the revisions we will expand on this more and include a broader description of potential forms of pre-activation. We will emphasize that the TG results specifically rule out that the brain pre-activates the same neural code used for sensory-evoked processing.

      Moreover, although TG analysis does not rule out alternative notions of predictive pre-activation, we believe our second analysis (the inclusion of future word embeddings) provides independent evidence that argues against more abstract forms of prediction. Unlike the TG analysis, this encoding approach is not constrained to a specific neural code; if the brain represented upcoming words in any linearizable format (abstract, probabilistic, or latent) incorporating those embeddings should have improved the brain score at the current word's onset. We found no such improvement until the word was actually heard. In the revised manuscript, we will reformulate the narrative to clarify that while TG alone rejects a specific form of pre-activation, the combined evidence from both analyses suggests there is a broader lack of predictive pre-activation.

      Reviewer 2:

      We thank the reviewer for their constructive feedback and for bringing to our attention the missing information in our Methods section. We realized that the final two sections were inadvertently omitted during formatting changes before submission. These will be restored in the revised version.

      We appreciate the reviewer's careful reading of this analysis and agree that the concern whether the decorrelation in figure 4 forces the model to unlearn the associations between pre- and post-onset activity is a valid one. To clarify, this is not what we intended to claim. Rather, our argument follows a different logic: if we assume that pre-onset encoding is purely a signature of predictive pre-activation, then decorrelating the pre- and post-onset brain responses should effectively remove that signature. The fact that pre-onset encoding remains largely intact after this procedure suggests that our initial premise was false; the observed pre-onset encoding is likely not a signature of pre-activation. We would also like to note that in this analysis, we use both residualized neural data and we use decorrelated embeddings. Therefore, the majority of stimulus dependencies are removed. Nevertheless, as the reviewer notes, some dependencies such as bi-grams and other word-co-occurrences, inevitably remain. These dependencies might explain the remaining pre-onset encoding we observed. This aligns with our main message of the paper. In the revisions, will provide a detailed description of the decorrelation process and we will make this interpretive logic more explicit in the main text.

      Reviewer 3:

      We are grateful for the reviewer’s detailed comments and for raising several points that will significantly improve the clarity and comparability of our study. Specifically, the reviewer’s feedback helped us realize that our evidence for postdiction required further clarification. While the encoding of the immediate preceding word ($d-1$) may involve recognition lags, we observe that word $d-2$ further improves the brain score even after the current word's onset, beyond what is explained by word $d-1$ alone. This may extend beyond simple recognition delays. To address this we will visualize this effect further in the upcoming version and expand the manuscript to include alternative explanations for this observation, such as extended lexical processing or integration delays.

      To ensure our results are not biased toward high-frequency or function words, we will re-run our analyses including multi-token words. Given that these words constitute a small part of the datasets, we expect our core findings to remain stable.

      In line with our response to reviewer 2, we will more clearly emphasize that despite our extensive controls, we cannot be sure that we accounted for all regularities inherent to natural speech.

      Additionally, we will increase the context windows of the LLM to match the larger windows used in previous literature and add significance tests, error bars, and noise floor indications to our figures to ensure the reliability and variability of our findings are clearly communicated.

    1. eLife Assessment

      The authors developed and validated a gut-on-chip system to mimic the gut environment for studies of Clostridioides difficile infection in vitro. Although the data generated is useful to the field, the evidence provided to support the conclusions is incomplete. Methodology that is not complete, as well as discrepancies regarding the proposed mode of action of lipoxin A4, are significant weaknesses.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates the potential for the immune mediator, lipoxin A4 (LXA4), to alleviate inflammation/damage caused by the healthcare-associated pathogen, Clostridioides difficile. Using both a novel in vitro "gut-on-a-chip" system and a murine model of disease, the authors demonstrate potential disease attenuation by LXA4. Specifically, LXA4 at select administration times during development of C. difficile infection (CDI) may upregulate markers associated with intestinal barrier integrity (ZO-1) and attenuate immune markers typically associated with inflammation (IL-8, IFN-γ, etc.). Overall strengths of the study include the establishment of a novel in vitro model that incorporates anaerobic and aerobic environmental conditions of the gut, as well as some results suggesting a potential role for LXA4 in modulating CDI. However, critical weaknesses of the manuscript, including incomplete methods and a lack of some critical controls or measurements, lead to only partial support for the authors' conclusions. Collectively, the data suggest alternate potential (and perhaps more likely) mechanisms by which LXA4 might modulate CDI. Specific strengths and weaknesses are listed below.

      Strengths:

      (1) A major strength of the study is the use and description of the gastight, gut-on-a-chip system that allows for co-culture of host cells (with aerobic needs) with anaerobic bacteria. While perhaps this (and other in vitro) system does not exactly "more accurately recapitulate specific host-microbe interactions (line 82)", integration of oxic and anoxic conditions that recapitulate the gut is indeed difficult to incorporate in vitro. Results surrounding C. difficile and Caco-2 cell viability in the described system seem substantiated.

      (2) Assessing LXA4 in both an in vitro and in vivo (mouse) model is a complementary strategy. Results from both experiments seem to support the observation that LXA4 can possibly attenuate C. difficile.

      (3) Overall, the manuscript is well-written and straightforward (albeit lacking in some details-see below).

      Weaknesses:

      (1) A major weakness of the manuscript in its current state is that the methods are incomplete or unclear. Details on how C. difficile was handled (strain info, preparation in experiments, quantification) are lacking. Mouse model information (inoculation, housing, number of animals) is missing, particularly for the second set of mouse experiments, which is not described at all in the methods. An IACUC or similar statement is not included.

      a) For in vitro experiments, how exactly were C. difficile quantified using flow cytometry? This is not exactly clear in the methods or the results, where C. difficile counts are referred to as 'normalized' without specific units (Figure 1D). What are these counts normalized to? How much of the total effluent was measured? This might also explain the discrepancy in C. difficile counts, referred to below.

      b) How exactly were C. difficile quantified for the mouse studies? The authors state that fecal samples were plated on CCFA agar, but the y-axis merely states "numbers of bacteria". Other bacteria grow on CCFA. How were C difficile specifically enumerated?

      c) Figure 4. For the vancomycin / LXA4 experiments, were mice subjected to antibiotics to render them initially susceptible to C. difficile? If so, this should be included in experimental timelines. If not, how do the investigators know that mice were colonized with C. difficile in each instance (usually mice require abx perturbation for susceptibility)? How was vancomycin administered to mice? In any case, C. difficile loads should be quantified for all conditions in these experiments.

      d) Related to the above (Figure 4 experiments), were all of these measurements taken only 24 hours post-infection? These experiments are not described well in the results and are not described at all in the methods.

      e) How many total mice were included in the study groups, and how were they housed? Cage effects can influence any mouse study, but are especially important in CDI studies, given the importance of the microbiome in the development of CDI.

      f) How were mice inoculated with C. difficile? Was this a spore or vegetative inoculum, and how? The state inoculum of 1x10^-9 is quite large.

      g) What is the history/ribotype of the C. difficile strain (1482?) used in all the experiments? How does this compare to other commonly used strains of C. difficile? Different strains demonstrate overall virulence, disease dynamics, and disease severity in animal and in vitro models.

      (2) Related to some methodological clarifications, there are some missing controls that would bolster support for final interpretations and some odd discrepancies in the study that are not explained.

      a) Figure 1C: How does the mucin layer (i.e., Caco-2 cell differentiation) look under anoxic conditions? This measurement was only included in the oxic conditions.

      b) In initial C. difficile quantification within the system (Figure 1D), C. difficile counts seem to range from 3 - 12 (undefined units). In the C. difficile / LXA4 experiments, these counts only reach ~1.8 (undefined units) in the CDI group. What explains this large discrepancy? Furthermore, the prophylactic LXA4 group seems to hover around < 0.5, similar to what is seen at 0 or 3 hours with C. difficile alone. This suggests that C. difficile might not proliferate at all in the presence of LXA4, perhaps explaining why epithelial barrier functions and immune attenuation are improved.

      c) Figure 2B. What do untreated controls (no CDI, but with or without LX4A) look like compared to the experimental groups? These controls should be included with the main Figure 2 results.

      d) If all metrics in Figure 4 were measured only 24 hours after infection, this is a VERY short timeline for CDI. Depending on the strain, damage might not even be quantifiable by this time point. For instance, C. difficile 630 disease signs only appear 2-4 days post-infection. C. difficile VPI kills mice within 36 hours, but Figure 3 results suggest that mice survive just fine. What is known about this strain's disease dynamics in mice? Alternatively, is it possible that LXA4 alone increases barrier integrity / attenuates inflammation? The inclusion of non-CDI controls (with or without abx; untreated; etc) might help distinguish this.

      (3) Perhaps the largest weakness of the manuscript is the interpretation of how LXA4 might attenuate CDI, which is also misleading as a title. The authors purport that disease attenuation is via LXA4, increasing barrier integrity and attenuating inflammation. However, much of the evidence suggests that LXA4 might limit C. difficile colonization. If there is less C. difficile (thus less toxin) in any system, all aspects of the disease will be attenuated. Indeed, their data suggest that there are decreasing amounts of C. difficile in the presence of LXA4, which could be due to direct inhibition of C. difficile or its toxin, removing nutrients necessary for C. difficile growth, or indirect effects on microbes in the gut environment (in mice). Proper quantification of C. difficile, toxin measurements, and dose responses would better distinguish which mechanism is more likely.

      a) The initial LXA4 experiments assessing potential therapeutic effects (mainly Figure 2) were conducted at 6 hours post-infection. What is the C. difficile load and/or toxin burden at this time? In some ways, LXA4 administration at this time point could also be thought of as 'prophylactic', given that damage (and maybe C. difficile virulence?) has not occurred yet.

      b) Is it possible that LX4A administration prior to C. difficile inoculation influences C. difficile physiology (colonization; toxin production), rather than alleviating C. difficile damage? C. difficile colonization should be quantified in all the LX4A experiments (only a subset is shown in Figure 2).

      c) Line 213 / Figure 2G. While it is possible that "LXA4 reprograms the intestinal epithelial transcriptome to bolster barrier function and temper immune signaling", the decreased C. difficile measurements in the presence of LXA4 suggest it impacts C. difficile colonization / function. This decreased level of C. difficile (and thus less toxin) could also explain immune response attenuation. Toxin measurements, as well as some C. difficile dose responses within the system, could help distinguish which possibility is more likely.

      d) Both in vitro and in vivo experimental results suggest a prophylactic role for LXA4 in CDI. However, the current experiments cannot distinguish whether this prophylactic response is due to host-specific anti-inflammatory attenuation (which the authors suggest) or due to an impact on C. difficile colonization/function (which is not acknowledged). The effect of LXA4 on C. difficile could be via direct inhibition of C. difficile growth or host remodeling that modulates C. difficile colonization or metabolism.

      e) Figure 4. While the data seem to support some preservation of barrier function and attenuation of inflammatory responses, this could once again be due to delaying, decreasing, or inhibiting C. difficile colonization itself, rather than attenuation by LXA4. Indeed, vancomycin-induced improvements within this short amount of time are likely due to inhibiting C. difficile, as it is an antibiotic used to directly kill C. difficile.

      (4) Other comments:

      a) Given that the current results cannot preclude alternate, if not more likely, explanations for how LXA4 might attenuate CDI, the manuscript should include a more comprehensive discussion. This could include study caveats, C. difficile-specific context about infection (i.e., infection dynamics, context with other experiments).

      b) Dysbiosis: undefined definition, as this is context-dependent. For CDI, what does this mean?

      c) Unclear if in vitro intestinal models "more accurately recapitulate specific host-microbe interactions", even considering caveats of animal models. Rather, each model has their own purpose; I would be careful about this phrasing (line 82).

      d) Line 86: not just "thrives under strict anaerobic conditions", but is necessary for growth. C. difficile is an obligate anaerobe.

    3. Reviewer #2 (Public review):

      C. difficile infection (CDI) is one of the most common nosocomial intestinal infections with a high rate of disease recurrence. Importantly, antibiotics used to treat CDI are a double-edged sword because disruption of the gut microbiome also increases the susceptibility to CDI. Therefore, there is an unmet need for alternative therapeutic approaches against CDI. CDI pathogenesis is initiated by the cytotoxic toxins TcdA and TcdB that target and induce cell death of intestinal epithelial cells, leading to epithelial barrier breakdown and inflammation. Innate immune cells such as neutrophils and innate lymphoid cells (ILCs) were shown to be crucial to control CDI during the acute phase. Based on previous reports that the pro-resolving mediator Lipoxin A4 (LXA4) inhibits neutrophil infiltration and promotes efferocytosis as well as mucosal repair, the authors reason that LXA4 could be leveraged as a therapy against CDI.

      The authors developed and validated a gut-on-chip (GOC) system to mimic the gut environment for C. difficile infection in vitro studies. LXA4 was able to decrease C. difficile-induced inflammation only when used as a prevention but not as a therapy. IEC RNA-seq revealed that LXA4 treatment upregulates a transcriptional program that reinforces barrier function. These data were replicated in an in vivo model of CDI. Overall, the study provides evidence that LXA4 could be repurposed for CDI treatment, but some claims are not fully supported by the data, such as the synergy between LXA4 and vancomycin, which has not been experimentally tested in vivo.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Completeness and clarity of Methods (Weakness #1).

      We will substantially expand the Methods section to include:

      (a) Detailed information on C. difficile strain ribotype 1382 (correcting the typographical error "1482"), including its virulence characteristics, toxin production dynamics, and rationale for its selection.

      (b) Step-by-step protocols for on-chip bacterial quantification by flow cytometry, including sample collection volume, processing, and the specific normalization procedure (with clarification that normalized values are intended for within-experiment comparisons only).

      (c) Full description of mouse experiments: antibiotic pre-treatment regimen, inoculation details (spores vs. vegetative cells, justification of the 1×10^9 CFU dose), animal numbers, housing conditions, and cage-effect considerations. The IACUC approval statement will be moved from Acknowledgments to Methods.

      (2) Mucin layer characterization under anoxia (Weakness #2a).

      We will clarify in the Methods that mucin staining was performed after the initial oxic culture phase to confirm differentiation prior to anaerobic challenge. We will cite relevant literature discussing the stability of pre-formed mucin layers under short-term anoxic conditions and incorporate this discussion to contextualize our experimental design in the revised Methods.

      (3) Discrepancy in C. difficile counts and mechanism of LXA4 action (Weakness #2b, #3).

      We will provide a detailed explanation of our flow cytometry normalization algorithm, emphasizing that values are only comparable within a given experimental batch. We plan to perform additional in vitro experiments to directly assess the effect of LXA4 on bacterial growth and toxin secretion. These data will help distinguish between direct antibacterial effects and host-mediated protection, and the revised Discussion will incorporate this analysis.

      (4) Missing controls and experimental timelines (Weakness #2c–d).

      We will clarify that Figure 4 presents gut-on-chip experiments, not animal studies. The corresponding methods will be fully described. Additionally, we will include cross-experiment alignment analyses (using the CDI group as a common reference) to integrate negative control data from separate experimental batches. We also plan to generate additional data examining the effect of LXA4 alone (without infection) on epithelial barrier integrity and inflammatory status, which will be included as supplementary controls.

      (5) C. difficile strain characterization (Weakness #1g).

      A comprehensive section on ribotype 1382 will be added to the Methods, detailing its in vitro growth kinetics, toxin production profiles, and disease dynamics in the murine model, with appropriate literature citations.

      (6) Dysbiosis definition and phrasing adjustments (Other comments #b–d).

      We will revise the text to provide a clear definition of dysbiosis in the context of CDI. We will also temper the phrasing in line 82 to more accurately describe the advantages of our GOC system relative to other in vitro models, and correct the description of C. difficile as an obligate anaerobe.

      Reviewer #2 (Public review):

      (1) Synergy between LXA4 and vancomycin in vivo.

      We agree that the synergistic effect observed in the GOC model requires validation in an animal model. We are currently conducting mouse experiments to test the combination of prophylactic LXA4 with vancomycin treatment. The results will be included as a new Figure 5 in the revised manuscript.

      We are confident that these planned revisions will fully address the reviewers' concerns and significantly enhance the rigor and impact of our study.

    1. eLife Assessment

      This fundamental study describes long-range serial dependence of performance on a visual texture discrimination training task that manipulated conditions to induce differing degrees of location transfer of learning. The authors re-analyzed previously-published, behavioral data, generating compelling evidence from converging approaches that the serial dependence effects persist over multiple days of training, and may share a common causal mechanism with training-induced location transfer. By informing our understanding of the importance of temporal integration to long-term perceptual learning and its propensity towards specificity or generalizability, these results should interest neuroscientists who seek to uncover underlying neural mechanisms for these processes.

    2. Reviewer #1 (Public review):

      This paper presents a reanalysis of a large existing dataset to examine whether serial dependence effects-systematic influences of recent stimulus history on current perceptual judgments-are associated with generalization in perceptual learning. The central hypothesis is that extended, longer-range history effects (beyond the most recent trials) are beneficial for transfer across locations. The authors reanalyze data from a texture discrimination task in which observers discriminated peripheral target orientation against a line background, with performance quantified by stimulus-onset asynchrony thresholds. Three training conditions were compared: a fixed single-location condition, a two-location alternating condition, and a dummy-trial condition with frequent target-absent trials. Transfer was assessed after training at new locations. Serial dependence was quantified using history-sequence analyses and linear mixed-effects models estimating bias weights across stimulus lags, with summary measures distinguishing recent (1-3 trials back) and more distant (4-6 trials back) dependencies.

      The authors report extended serial dependence effects, persisting up to 6-10 trials back, with substantial cumulative bias that remains stable across multiple days of training and is not correlated with overall performance thresholds. Recent history effects are stronger for faster responses, suggesting a contribution from decision- or response-related processes, whereas more distant effects decline within sessions, potentially reflecting adaptation dynamics. Critically, longer-range serial dependence is significantly stronger in training conditions that promote generalization than in the single-location condition. Individual differences in the strength and decay profile of distant history effects predict the magnitude of transfer across locations, whereas recent history effects do not. History effects are also correlated across trained locations, suggesting stable individual differences.

      The authors interpret longer-range serial dependence as reflecting integrative processes that extract task-relevant structure over time, thereby supporting generalization, while shorter-range effects are attributed to more transient mechanisms such as priming or decision-level bias. The discussion connects these findings to Bayesian accounts of perceptual stability and to concepts of overfitting in machine learning.

      The study offers a novel and thoughtful link between short-term serial dependence and long-term generalization in perceptual learning, helping bridge two literatures that are often treated separately. The large dataset enables robust estimation of individual differences, and the use of mixed-effects modeling appropriately accounts for variability across observers. The empirical distinction between recent and more distant history effects is well-supported and adds important nuance to interpretations of serial dependence. Converging evidence from both group-level comparisons and individual-level correlations strengthens the central conclusions.

      Comments on revisions:

      The authors have effectively addressed my concerns. The new robustness analyses (Supp. Fig. S3), supplementary toy model, clearer DDM-based mechanistic distinctions, and expanded discussion of limitations and generality fully resolve my original points.

    3. Reviewer #3 (Public review):

      Summary:

      This reanalysis of a classic study of visual perceptual learning in a texture discrimination task convincingly demonstrates the presence of sequential dependence effects, commonly seen in response time analyses in 2-alternative tasks, on response accuracy in the texture task in visual periphery and in a simultaneous central letter report at fixation. Overall, this paper provides a new and interesting analysis of the effects of sequential dependencies from trial to trial on performance, learning, and generalizability in perceptual learning.

      Strengths:

      This new analysis of sequential dependency effects (SDEs) extends commonly observed sequential effects in two-choice reaction times to accuracy and relates them to response accuracy during visual learning in a frequently used perceptual learning task. The paper makes a convincing case that different conditions known to impact generalization of learning to a second visual location also expresses quantitatively distinct n-back SDEs.

      Weaknesses:

      Additional analyses now back up the analysis of effects of SDEs using trials selected to enhance the size of the effects, specifically when the current trial is low visibility and the prior trial is of high visibility. The authors now provide a practical analytic reason for this choice.

      Comments on revisions:

      The revision has successfully addressed comments in the original reviews.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This paper presents a reanalysis of a large existing dataset to examine whether serial dependence effects-systematic influences of recent stimulus history on current perceptual judgments-are associated with generalization in perceptual learning. The central hypothesis is that extended, longer-range history effects (beyond the most recent trials) are beneficial for transfer across locations. The authors re analyze data from a texture discrimination task in which observers discriminated peripheral target orientation against a line background, with performance quantified by stimulus-onset asynchrony thresholds. Three training conditions were compared: a fixed single location condition, a two-location alternating condition, and a dummy-trial condition with frequent target-absent trials. Transfer was assessed after training at new locations. Serial dependence was quantified using history-sequence analyses and linear mixed effects models estimating bias weights across stimulus lags, with summary measures distinguishing recent (1-3 trials back) and more distant (4-6 trials back) dependencies.

      The authors report extended serial dependence effects, persisting up to 6-10 trials back, with substantial cumulative bias that remains stable across multiple days of training and is not correlated with overall performance thresholds. Recent history effects are stronger for faster responses, suggesting a contribution from decision- or responserelated processes, whereas more distant effects decline within sessions, potentially reflecting adaptation dynamics. Critically, longer-range serial dependence is significantly stronger in training conditions that promote generalization than in the single-location condition. Individual differences in the strength and decay profile of distant history effects predict the magnitude of transfer across locations, whereas recent history effects do not. History effects are also correlated across trained locations, suggesting stable individual differences.

      The authors interpret longer-range serial dependence as reflecting integrative processes that extract task-relevant structure over time, thereby supporting generalization, while shorter-range effects are attributed to more transient mechanisms such as priming or decision-level bias. The discussion connects these findings to Bayesian accounts of perceptual stability and to concepts of overfitting in machine learning.

      The study offers a novel and thoughtful link between short-term serial dependence and long-term generalization in perceptual learning, helping bridge two literatures that are often treated separately. The large dataset enables robust estimation of individual differences, and the use of mixed-effects modeling appropriately accounts for variability across observers. The empirical distinction between recent and more distant history effects is well-supported and adds important nuance to interpretations of serial dependence. Converging evidence from both group-level comparisons and individuallevel correlations strengthens the central conclusions.

      Several limitations should be addressed. First, the study relies entirely on previously collected data, without experimental manipulations designed to selectively isolate serial dependence mechanisms. Filtering choices, while theoretically motivated, may amplify history effects in ways that are difficult to quantify. Second, sequential dependencies can arise from multiple sources, including gradual updating of internal weight structures, adaptation processes, and history-dependent biases in decisionmaking. The current analyses do not clearly separate these contributions, limiting mechanistic attribution of long-range effects. Third, the conclusions are based on a single perceptual task, leaving open questions about generality across paradigms. Finally, while the discussion references computational ideas, no explicit modeling is provided to test whether plausible learning rules can jointly account for the observed history profiles and transfer effects.

      We now address these issues in the manuscript (see below for detailed responses) and provide a toy model (supplementary material) where the observed effects are explained by simple learning mechanisms.

      The findings align with theoretical frameworks that conceptualize perceptual learning as gradual reweighting of stable sensory representations at the decision stage (e.g., Petrov et al., 2005). Trial-by-trial updates in these models naturally give rise to sequential dependencies and sensitivity to training statistics. The observation that longer-range history effects predict generalization is consistent with broader temporal integration supporting more flexible learning, while narrower integration may lead to specificity. The results also indicate that multiple mechanisms - including decisionlevel biases and adaptation - may coexist with reweighting processes, highlighting the value of hybrid accounts.

      In summary, this is a careful and data-rich reanalysis that highlights a potentially important role for serial dependence in enabling generalization during perceptual learning. While the underlying mechanisms remain underspecified, the evidence supporting the reported associations is strong, and the work provides a valuable empirical foundation for further experimental and modeling efforts.

      Reviewer #2 (Public review):

      This manuscript investigates how people's perceptual reports are influenced by events and trials in the past, and how this long-range dependence relates to broader learning across locations in a visual learning task. The authors present clear and internally consistent analyses showing that extended temporal integration is associated with greater generalization of learning. The study is thought-provoking and may contribute meaningfully to understanding how short-term influences and long-term improvement interact, although several interpretational points would benefit from clarification.

      Strengths:

      (1) The manuscript identifies unusually long-range perceptual biases extending up to ten trials back, which is a striking and potentially important finding.

      (2) The association between strong long-range dependence and greater learning generalization is clearly documented and supported by consistent analyses.

      (3) The dataset is large and rich, and the authors apply repeated and well-controlled analyses that give confidence in the stability of the effects.

      (4) The writing is generally clear, and the manuscript raises interesting conceptual links between temporal integration and generalization of learning.

      Weaknesses / Points Requiring Clarification:

      (1) The manuscript repeatedly equates generalization with increased efficiency, but this relationship is not universally true. In some populations or tasks, excessive generalization can reduce task-specific efficiency. The authors should discuss this context-dependence to clarify when generalization is beneficial versus detrimental.

      We agree with the reviewer that generalization does not strictly imply increased efficiency; in some contexts, over-generalization can indeed be detrimental. We now explicitly note in the Introduction that serial dependence can impair performance when stimuli vary randomly across trials. We have reviewed the manuscript to ensure we do not explicitly equate generalization with efficiency. Our argument is specifically that long-range SDEs support the transfer of learning (generalization).

      (2) Serial dependence is also present, though smaller, in the central fixation task. It remains unclear whether this bias could contribute to the serial dependence observed in the main task. The authors should clarify whether the two biases are independent or whether the central-task bias might partially influence orientation judgments in the main task.

      These two tasks are independent, one requires T/L discrimination the other V/H discrimination. See our detailed response below.

      (3) Several figure captions and labels contain minor inconsistencies in formatting and terminology. Careful proofreading would improve clarity.

      We thank the reviewer for pointing this out and have proofread the captions to improve formatting and terminology consistency throughout.

      Reviewer #3 (Public review):

      This reanalysis of a classic study of visual perceptual learning in a texture discrimination task convincingly demonstrates the presence of sequential dependence effects, commonly seen in response time analyses in 2-alternative tasks, on response accuracy in the texture task in the visual periphery and in a simultaneous central letter report at fixation. Overall, this paper provides a new and interesting analysis of the effects of sequential dependencies from trial to trial on performance, learning, and generalizability in perceptual learning.

      Strengths:

      This new analysis of sequential dependency effects (SDEs) extends commonly observed sequential effects in two-choice reaction times to accuracy and relates them to response accuracy during visual learning in a frequently used perceptual learning task. The paper makes a convincing case that different conditions known to impact generalization of learning to a second visual location also express quantitatively distinct n-back SDEs.

      Weaknesses:

      Most of the new analyses emphasize the effects of SDEs, including trials designed to enhance the size of the effects, specifically when the current trial is low visibility, and the prior trial is of high visibility. Unless there is an argument that learning and subsequent generalization primarily occur in low-visibility trials, the presentation should also include displays and an emphasized discussion of analysis for all trials, unfiltered.

      We analyze effects on close to threshold (small-medium SOA) current targets preceded by above threshold (high SOA) reference targets. This is motivated by both technical issues and theoretical assumptions. In psychophysics, when using percent correct as a measure of performance, bias cannot be reliably estimated at or near ceiling performance, as correct responses leave little room for bias to manifest. Regarding the ‘easy’ targets used as a reference, having them at low SOA introduces uncertainty as for the reference orientation against which bias is measured, with their perceptual effect being ambiguous. Theoretically, we note that in perceptual learning with threshold targets, the introduction of clear targets in the absence of feedback enables learning (see Discussion, where we added: 'Most interestingly, in our experiments without feedback on the texture task, the experimental conditions yielding the strongest bias were also found to enhance learning in the absence of feedback (Liu et al., 2012)')

      We have addressed this concern also by conducting additional robustness analyses with unfiltered prior-trial history. We analyzed data without the prior-visibility filter; results are presented in a new Supplementary Figure S3 and confirm our main findings (see addition to Methods: "Finally, to verify that our findings are not artifacts of these filtering choices, we also conducted control analyses including all prior-trial history regardless of visibility; these results are presented in Supplementary Figure S3 and confirm the robustness of our main findings.").

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) How manipulations of stimulus statistics, uncertainty, or feedback could selectively engage different forms of serial dependence

      We expect serial dependence to be modulated by all these parameters. In classical SDT, stimulus statistics are known to affect response bias, as are temporal correlations in stimulation sequences. We note in the manuscript that we employed random sequences (50% chance for V and 50% for H targets), eliminating expectation-based biases toward either orientation. Stimulus uncertainty is known to increase serial dependence, as we also found here. Feedback is also expected to have an effect, the literature is somewhat ambiguous about this, but this may also depend on experimental design. We note that the main task studied here (TDT) had no feedback while the central T/L task did have feedback, both showing serial dependencies. In the manuscript we point to reviews of SDE where much of this is discussed.

      (2) How explicit computational models could help distinguish decision bias from structural learning

      We use the drift diffusion model (DDM) to distinguish decision bias (starting point in DDM) from structural learning (changes in drift rate). DDM predicts that decision bias is short lived, mainly affects fast reaction times (RT) while biases due to drift rate asymmetry persists to long RTs. We present these results in Figure 3.

      (3) Whether similar relationships are observed in other perceptual domains

      We are not aware of any other study linking serial dependence and perceptual learning or reporting such a link. We expect the link between long-range serial dependence and learning generalization to extend beyond the TDT (see new paragraph in Discussion). We hope this framework will motivate similar analysis in other labs where comparable datasets exist.

      (4) How sensitive are the results to the filtering choices used in the analysis?

      We analyze effects on close to threshold (small-medium SOA) current targets preceded by above threshold (high SOA) reference targets. This is motivated by both technical issues and theoretical assumptions. In psychophysics, when using percent correct as a measure of performance, bias cannot be reliably estimated at or near ceiling performance, as correct responses leave little room for bias to manifest. Regarding the ‘easy’ targets used as a reference, having them at low SOA introduces uncertainty as for the reference orientation against which bias is measured, with their perceptual effect being ambiguous. Theoretically, we note that in perceptual learning with threshold targets, the introduction of clear targets in the absence of feedback enables learning (see Discussion, where we added: 'Most interestingly, in our experiments without feedback on the texture task, the experimental conditions yielding the strongest bias were also found to enhance learning in the absence of feedback (Liu et al., 2012)')

      We have addressed this concern also by conducting additional robustness analyses with unfiltered prior-trial history. We analyzed data without the prior-visibility filter; results are presented in a new Supplementary Figure S3 and confirm our main findings (see addition to Methods: "Finally, to verify that our findings are not artifacts of these filtering choices, we also conducted control analyses including all prior-trial history regardless of visibility; these results are presented in Supplementary Figure S3 and confirm the robustness of our main findings.").

      Reviewer #2 (Recommendations for the authors):

      (1) Clarify mechanisms underlying long-range serial dependence. Please better distinguish possible sources of serial dependence (e.g., decision bias, adaptation, reweighting) and clarify which interpretations are supported or remain ambiguous given the current analyses

      Our manuscript discusses the mechanisms underlying the dissociation between recent and distant SDEs in the Discussion section. Specifically, we report that:

      Recent SDEs are RT-dependent (stronger with faster responses) consistent with decision-level criterion shifts (Dekel & Sagi, 2020)

      Distant SDEs are RT-independent consistent with neural reweighting/template updating

      We also discuss the role of sensory adaptation in truncating long-range integration, supported by within-session decline of SDEs, reduced distant SDEs in the 1loc condition, and the original findings by Harris et al. (2012).

      We have added an explicit acknowledgment that our correlational approach cannot definitively establish causality (see addition to Discussion: "While these converging findings support distinct mechanisms for recent and distant SDEs, our correlational approach cannot definitively establish causality, and targeted experimental manipulations would further strengthen these interpretations.").

      (2) Test robustness to analytic choices

      We have conducted robustness analyses by removing the prior-trial visibility filter. The results are presented in a new Supplementary Figure S3 and confirm that our key findings remain qualitatively unchanged (see addition to Methods referencing Supplementary Figure S3).

      (3) Strengthen the computational link

      We have expanded the Discussion to reference relevant computational models and specify predictions for future modeling work. We now cite Petrov et al. (2005). We provide a toy model implementing trial-by-trial template update that show SDE that is correlated with learning transfer. Importantly, in this model, long range SDE is a consequence of learning dynamics (see new paragraph in Discussion, and model simulation in supplementary material).

      (4) Discuss generality and experimental tests. Briefly address whether similar effects are expected across other tasks or sensory domains, and outline experimental manipulations that could causally test the role of serial dependence in generalization.

      We have added discussion of generality across perceptual domains and outlined the prediction that future work could test the SDE-generalization link in other tasks where both phenomena have been documented (see new paragraph in Discussion).

      Reviewer #2 (Public Review - Point 2): Central task SDE independence

      The SDEs observed in the central letter task and peripheral TDT are likely independent, as they involve different stimulus features (letter identity vs. orientation), different response mappings, and show distinct performance patterns across conditions. The absence of condition differences in central-task SDEs (described in the Results section under "SDE differences between conditions" end of paragraph), despite robust differences in TDT SDEs, further suggests that the peripheral orientation biases are not contaminated by central-task response tendencies. Note that the central task was fixed across conditions, stayed at fixation when location was changed, and when dummy trials were presented.

      Reviewer #3 (Recommendations for the authors):

      (1) Reference to Falmagne, Cohen, & Dwivedi (1975)

      We have added this reference to the Introduction, acknowledging the historical foundation of sequential effects in perceptual decisions

      (2) The SDE data of Figure 1 are (per the figure legend) from the 1 loc data of Harris et al., "pooled over all testing days", and filtered for trials with low-visibility current targets (SOA < SOA-threshold+20ms). Specify whether this threshold criterion is on a per-subject basis. State in the legend that "all testing days" includes Days 1-8 (4 days with the first location and another 4 days testing generalization to a second location).

      We have revised the Figure 1 legend to clarify:

      "Days 1–8; 4 days at the first location and 4 days at the second location to assess generalization"

      "calculated on a per-subject basis"

      (3) The leadup emphasizes that the analysis in the figure emphasizes trials where the effect is expected to be as large as possible (cited as 40 +/- 3%), while visible current targets (at n) biases were 5+/-1%.

      See below, after (4).

      (4) Unless a theoretical position associates learning just with low visibility (if so, explain), consider including two other panels showing the sequential dependencies for all trials, and the linear model weights over the last 10 trials for all trials.

      We acknowledge that the main analyses emphasize conditions that maximize SDE expression. To verify robustness, we conducted control analyses including all prior-trial history regardless of visibility; these results are presented in Supplementary Figure S3 and confirm our main findings.

      There are both theoretical and technical justifications for the filtering applied:

      It is well known that learning, in particular without feedback (as in our TDT), is facilitated by a mixture of threshold level stimuli and suprathreshold easy trials (e.g., Liu et al., 2012).

      Technically, it is impossible to measure bias with highly discriminable stimuli where performance is perfect or close to it, thus such trials are expected to dilute the measured effect. On the other hand, when considering serial effects from low sensitivity trials, we face an uncertainty involved in defining the actual orientation relative to which the bias needs to be computed.

      (5) Figure S1 seems to indicate that average thresholds over all days (location 1 and location 2) are unrelated to the sequential dependence across subjects and that the amount of learning in location 1 is unrelated to the sequential dependencies across subjects in all the varied conditions. Since Figure S1 includes all 50 subjects, it includes some conditions with dummy trials interspersed. Clarify in the description whether the dummy trials are ignored for the purposes of the SDE analyses.

      We have clarified in the Methods how trials are handled in the analysis: "To preserve the precise temporal structure of the data, all trials were included in the sequential n-back count across all experimental conditions, thus dummy trials were counted as time bins but their contribution was ignored. In the Linear Mixed Effects (LME) analysis, we modeled these trial types using distinct regressors: each n-back lag included separate predictors for visible and invisible targets, further differentiated by trial type (dummy vs. target) and relative location (ipsilateral vs. contralateral) where applicable. The SDE values reported here reflect only the influence of relevant target-present history trials; the effects of other history types (e.g., dummy trials), while estimated to ensure the temporal integrity of the model, are not presented."

      (6) The conclusion from this analysis seems to be that the overall average threshold and the amount of initial learning are both uncorrelated with the strength of sequential dependencies across subjects. This conclusion should be added to the description in the main paper.

      This finding is now discussed in the Discussion section, referring to the main Results section [ No significant correlation was found between biases and SOA thresholds across observers (r = -0.13, p = 0.37, average across days 1-8), nor between biases and improvements in performance at the first location (r = -0.09, p = 0.54, average across days 1-4), suggesting that the magnitude of serial dependence does not predict the overall amount of perceptual learning (Supplementary Figure S1)].

      (7) Decay of SDE section clarifications

      We have made the following clarifications:

      RT definition: Added to Methods: "The reaction time (RT) used in the analysis was defined as RT(TDT) – RT (fixation task), where RT for each task was measured from stimulus onset."

      N-back counting: Clarified in Methods (see response to point 5 above): all trials were included in the chronological sequence; the LME analysis assigned separate predictors at each lag for visible/invisible targets and for trial categories (dummy vs. target) and locations (ipsilateral vs. contralateral). The results reported do not include effects of dummy trial, except where response dependent SDE was reported (Fig 2a, SDE for response key).

      2loc n-back effect: The longer-range effects in the 2loc condition likely reflect reduced adaptation allowing longer temporal integration, combined with the location-selective nature of SDEs.

      RT and mechanism interpretation: The manuscript discusses that the critical observation is the qualitative difference in RT sensitivity between recent and distant SDEs, consistent with the drift-diffusion framework where criterion shifts are RTdependent while drift bias is RT-independent (Dekel & Sagi, 2020). We have added an acknowledgment of the correlational limitations of this interpretation.

      Moving figures to supplement: We prefer to keep Figures 4 and 5 in the main text as they document important dynamics supporting our mechanistic interpretation.

    1. eLife Assessment

      This study presents important findings that bovine mammary epithelial cells can be infected with both avian and human influenza A viruses, providing a potential site for viral reassortment. The evidence to support these claims is generally solid; however, the evidence suggesting lower permissiveness of cells from other organs is incomplete. The work will be of interest to virologists and evolutionary biologists working on cross-species transmission of viruses and pandemic preparedness.

    2. Reviewer #1 (Public review):

      Summary:

      Here, Pinto and colleagues set out to investigate whether the cow udder is a potential mixing site for the influenza virus. The authors have demonstrated that bovine mammary epithelial cells can be infected with both avian and human influenza A viruses, supporting the idea that the cow udder may be a potential site for reassortment. Furthermore, they demonstrate that the bovine-adapted IAV replicates to similar titers in avian epithelial cells when compared to an AIV precursor virus. Thus, suggesting there is no fitness trade-off, and confirms the potential for spill-back of the cattle B3.13 into poultry, which has already been observed. Overall, I believe the authors achieved their aims. However, there are instances in which the results do not entirely support the conclusions (noted in weaknesses). Given the ongoing questions surrounding highly pathogenic avian influenza A virus in dairy cows, this work provides valuable evidence for the potential of the cow udder as a site of reassortment. These findings highlight the need for surveillance of influenza A virus incursions into livestock species, particularly cows. Some specific strengths and questions regarding weaknesses have been outlined below.

      Strengths:

      (1) The authors use a diverse range of cell types and influenza A virus strains, as well as a wide range of techniques to address the questions at hand.

      (2) The use of cells from multiple bovine breeds for the MAC-T, bMEC and explants suggests the phenomenon is not unique to a single breed.

      (3) The results suggesting there is no fitness trade-off for Cattle Texas in an avian host are interesting, and confirm the potential for spill-back of the cattle B3.13 into poultry, which has been observed.

      Weaknesses:

      I have listed my complete questions/concerns below. However, there are two main weaknesses of the article in its current state. Firstly, there is no apples-to-apples comparison in terms of determining a preference for IAV to infect the cow udder over other organs (Q4). The mammary gland and respiratory tract are represented by epithelial cells, but for other organs, fibroblasts were chosen. I think the fairer comparison would be to compare epithelial cells from different organs to demonstrate a preference for the mammary gland. Secondly, the main premise of the article relies on bMEC and MAC-T (primary and immortalised mammary epithelial cells), facilitating higher viral growth than the cells from other organs. Yet throughout the article, a 10x higher dose of IAV is used in the bMEC cells compared to everything else (Q6). This raises the question of how much of the results are due to a preference for the mammary epithelial cells, and how much is simply due to the increased dose.

    3. Reviewer #2 (Public review):

      The authors use a library of influenza A viruses from different strains, classified in lab-adapted, human, avian, and swine according to the animal from which they were isolated. They propose that the cow mammary gland serves as a mixing vessel for influenza A viruses. As a first approach, the authors assess susceptibility to infection across different cell types, including continuous and primary cell lines, bovine mammary cells, and mammary explants. All these cells support polymerase activity. Then, they analyzed changes in the bovine virus's viral fitness relative to an avian precursor. The authors use single-gene replacement to study whether and which RNP segments improve viral transcription. As part of this section, they also test IFN-specific antagonism by NS1 to assess the input of segment 8. Quantitative glycomic analysis was performed on the continuous bovine mammary cell line to demonstrate the presence of both a2,3 and a2,6, which is consistent with their observation that these cells can be co-infected with human and avian IAVs simultaneously. The main question, however, is: what is the glycome in the explants, or directly from tissues?

      Overall, the manuscript is clearly written and provides new insights into the behaviour of the cattle isolate, now compared with a representative group of model or precursor HAs of different origins.

      It would be great if a consistent nomenclature for the IAV strains could be used in the study. There is a mix of origin (Texas), animal from which the virus was isolated (mallard), or abbreviations that do not follow guidelines (IAV07). Are the USSR and Udorn not lab-adapted?

      The experimental setup includes bovine mammary primary and continuous cells, as well as mammary explants. Some of the most significant differences, for example, in viral fitness studies and co-infection experiments, are observed in these explants. Perhaps there could be some additional focus on this observation. The implications in comparison to the results obtained in cultured cells could be described. How will the human and other HA subtype viruses fare in the explants?

    4. Reviewer #3 (Public review):

      Summary:

      This excellent manuscript by Pinto, Sharp, and colleagues examines bovine tissue tropism for influenza viruses. They find that bovine flu, as well as other strains, has strong replication in mammary tissue. They also map the genetic changes to influenza that improve replication in bovine cells. Overall, the study is well designed and executed, and the results are very timely.

      Strengths:

      (1) The experiments are well-controlled.

      (2) The figures are well-constructed and easy to follow.

      (3) The Methods and legends are detailed, with sufficient information.

      Weaknesses:

      (1) A comparison to human cells would strengthen the overall impact of the results. Are human mammary cells also uniquely susceptible to influenza? Are bovine mammary cells special in some way?

      (2) For the virus infection studies with segment 8 swaps, it should at least be noted that some of the phenotypes could be driven by NEP.

      (3) The data demonstrating that bMEC can support co-infection are compelling and important, but would be strengthened with a comparison from a different cell type or species. Do mammary cells uniquely support higher co-infection?

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Here, Pinto and colleagues set out to investigate whether the cow udder is a potential mixing site for the influenza virus. The authors have demonstrated that bovine mammary epithelial cells can be infected with both avian and human influenza A viruses, supporting the idea that the cow udder may be a potential site for reassortment. Furthermore, they demonstrate that the bovine-adapted IAV replicates to similar titers in avian epithelial cells when compared to an AIV precursor virus. Thus, suggesting there is no fitness trade-off, and confirms the potential for spill-back of the cattle B3.13 into poultry, which has already been observed. Overall, I believe the authors achieved their aims. However, there are instances in which the results do not entirely support the conclusions (noted in weaknesses). Given the ongoing questions surrounding highly pathogenic avian influenza A virus in dairy cows, this work provides valuable evidence for the potential of the cow udder as a site of reassortment. These findings highlight the need for surveillance of influenza A virus incursions into livestock species, particularly cows. Some specific strengths and questions regarding weaknesses have been outlined below.

      Strengths:

      (1) The authors use a diverse range of cell types and influenza A virus strains, as well as a wide range of techniques to address the questions at hand.

      (2) The use of cells from multiple bovine breeds for the MAC-T, bMEC and explants suggests the phenomenon is not unique to a single breed.

      (3) The results suggesting there is no fitness trade-off for Cattle Texas in an avian host are interesting, and confirm the potential for spill-back of the cattle B3.13 into poultry, which has been observed.

      Weaknesses:

      I have listed my complete questions/concerns below. However, there are two main weaknesses of the article in its current state. Firstly, there is no apples-to-apples comparison in terms of determining a preference for IAV to infect the cow udder over other organs (Q4). The mammary gland and respiratory tract are represented by epithelial cells, but for other organs, fibroblasts were chosen. I think the fairer comparison would be to compare epithelial cells from different organs to demonstrate a preference for the mammary gland. Secondly, the main premise of the article relies on bMEC and MAC-T (primary and immortalised mammary epithelial cells), facilitating higher viral growth than the cells from other organs. Yet throughout the article, a 10x higher dose of IAV is used in the bMEC cells compared to everything else (Q6). This raises the question of how much of the results are due to a preference for the mammary epithelial cells, and how much is simply due to the increased dose.

      When we set out to test if cow mammary gland cells were particularly susceptible to IAV infection compared to other bovine cell types, we used what was available in the Roslin Institute in the first instance – a mix of primary and continuous cells from various anatomical sites: three epithelial cell types (two mammary, one respiratory tract) two immune cell types and four sets of fibroblasts from various organs. Given the representation of different anatomical sites, cell types and differentiation statuses, we considered this a suitably diverse panel with which to characterise infection dynamics of a broad range of IAVs, before more focussed investigations using the mammary bMEC and explant tissues. Both mammary epithelial cell types grew our library of influenza challenge strains significantly better than the BAT-II respiratory epithelial cells, as well as the two immune cell types and all four fibroblast populations. Of the fibroblast cells, those derived from the brain grew IAV significantly better than the skin and turbinate fibroblasts, while blood-derived macrophages grew virus significantly better than the lymphocytes and non-brain fibroblasts. Therefore, there are “apple to apple” comparisons as well as apple to pear comparisons that give significant differences. We therefore think that our conclusions (in the abstract) that mammary cells are particularly replication competent for IAV, (at the end of the introduction) that “a wide range of cow-derived cells are susceptible” and that (in the results section) that “mammary cells showed the highest susceptibility” are entirely justifiable. We do not claim that mammary cells are the only permissive bovine cells, but our evidence suggests they are highly susceptible.

      We used a higher MOI for bMECs because test experiments with WT PR8 and the Cattle Texas 6:2 reassortant showed that MOI 0.01 infections gave more variable results than ones run at MOI 0.1, perhaps because of the intrinsic variability of mixed primary cell populations. We therefore chose to go with the higher MOI. However, the end-point titres between the two conditions were not significantly different, so we do not think this choice is a confounding issue. We will add the comparison of the two MOIs as a supplementary figure in the formal revision.

      Reviewer #2 (Public review):

      The authors use a library of influenza A viruses from different strains, classified in lab-adapted, human, avian, and swine according to the animal from which they were isolated. They propose that the cow mammary gland serves as a mixing vessel for influenza A viruses. As a first approach, the authors assess susceptibility to infection across different cell types, including continuous and primary cell lines, bovine mammary cells, and mammary explants. All these cells support polymerase activity. Then, they analyzed changes in the bovine virus's viral fitness relative to an avian precursor. The authors use single-gene replacement to study whether and which RNP segments improve viral transcription. As part of this section, they also test IFN-specific antagonism by NS1 to assess the input of segment 8. Quantitative glycomic analysis was performed on the continuous bovine mammary cell line to demonstrate the presence of both a2,3 and a2,6, which is consistent with their observation that these cells can be co-infected with human and avian IAVs simultaneously. The main question, however, is: what is the glycome in the explants, or directly from tissues?

      We report quantitative glycomics for the primary bovine mammary epithelial cells as well as the continuous line the referee highlights. However, we agree with R2 that a detailed glycomic analysis of primary bovine mammary tissue would allow a better understanding of the actual glycosylation status in vivo. This has now been undertaken by the authors and is available as a bioRxiv preprint:

      Bovine H5N1 influenza viruses have adapted to more efficiently use receptors abundant in cattle

      Jack A. Hassard, Jiayun Yang, Bernadeta Dadonaite, Jonathan E.Pekar, Jin Yu, Samuel A. S. Richardson, Rute M. Pinto, Kristel Ramirez Valdez, Philippe Lemey, Jessica L. Quantrill, JinghanXue, Tereza Masonou, Katie-Marie Case, Jila Ajeian, Maximillian N. J. Woodall, Rebecca A. Ross, Nicolas Hudson, Kan Zhong, Hongzhi Cao, Samuel Jones, Hannah J. Klim, Brian R. Wasik, Desi N. Dermawan, Jean-Remy Sadeyen, Dirk Werling, DylanYaffy, Joe James, Alessandro Nunez, Paul Digard, Ian H. Brown, Daniel H. Goldhill, Pablo R. Murcia, Claire M. Smith, Yan Liu, Jesse D. Bloom, Munir Iqbal, Wendy S. Barclay, Stuart M.Haslam, Thomas P. Peacock: bioRxiv 2026.04.02.715584; doi:https://doi.org/10.64898/2026.04.02.715584

      Overall, the manuscript is clearly written and provides new insights into the behaviour of the cattle isolate, now compared with a representative group of model or precursor HAs of different origins.

      It would be great if a consistent nomenclature for the IAV strains could be used in the study. There is a mix of origin (Texas), animal from which the virus was isolated (mallard), or abbreviations that do not follow guidelines (IAV07). Are the USSR and Udorn not lab-adapted?

      We chose the abbreviated names for a variety of reasons. Partly from common usage (e.g. PR8, Udorn), partly for consistency with other already published papers from the FluTrailMap consortia (e.g. Cattle Texas; Dholakia et al 2026), partly to make diversity obvious in certain figures (e.g. H3N1, H5N2 etc) and partly to avoid confusion between viruses that originate from the same geographic area (e.g. AIV07, AIV09, H5N8-20 etc which are all Ck/England/isolate numbers). Overall, we found it more confusing to use the expanded nomenclature. Re AIV07 which the referee criticises for not following naming guidelines – if this is a reference to the EURL nomenclature, AIV07 is the abbreviation for the specific virus A/Chicken/England/053052/2021, our representative virus for EURL genotype EA-2020-C, as we say in the text. We should however have included this nomenclature in Table 1, which otherwise provides a cross-reference for all the names. This will be added in the formal revision to help with clarity.

      As to whether USSR and Udorn are lab adapted – that depends on definitions. There is a continuum of adaptive changes and/or sequence drift starting from the very first growth of an isolate in the laboratory. The viruses we define here as lab adapted are ones that have been deliberately adapted to other hosts or which have very long passage histories in multiple host species resulting in known functionally significant changes. For example, PR8, with 100s of passages in mice, ferrets and embryonated hens eggs (doi: 10.3390/v12060590), makes it unarguably lab-adapted. We admit that A/USSR/77 and A/Udorn/307/1972 are probably further along this adaptive pathway than more recent isolates such as A/Norway/3433/2018, but are unaware of any specific reason that would put them into our lab adapted category.

      The experimental setup includes bovine mammary primary and continuous cells, as well as mammary explants. Some of the most significant differences, for example, in viral fitness studies and co-infection experiments, are observed in these explants. Perhaps there could be some additional focus on this observation. The implications in comparison to the results obtained in cultured cells could be described. How will the human and other HA subtype viruses fare in the explants?

      We agree that this is an important and interesting question, and have tested the strains we used for co-infections, human seasonal H1N1 “Norway” and low pathogenic avian influenza “H3N1”, in the mammary explants. Both replicate, the avian virus to 20-fold higher titres. We will add this new information to the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      This excellent manuscript by Pinto, Sharp, and colleagues examines bovine tissue tropism for influenza viruses. They find that bovine flu, as well as other strains, has strong replication in mammary tissue. They also map the genetic changes to influenza that improve replication in bovine cells. Overall, the study is well designed and executed, and the results are very timely.

      Strengths:

      (1) The experiments are well-controlled.

      (2) The figures are well-constructed and easy to follow.

      (3) The Methods and legends are detailed, with sufficient information.

      Weaknesses:

      (1) A comparison to human cells would strengthen the overall impact of the results. Are human mammary cells also uniquely susceptible to influenza? Are bovine mammary cells special in some way?

      This is an interesting question but we have not tested mammary gland cells from humans (or any other species of mammal), but we have reported elsewhere (Dholakia et al., Nat Commun. 2026 Jan 16;17(1):1603. doi: 10.1038/s41467-026-68306-6.) that Cattle Texas grows well in a variety of human respiratory cells. Here we are considering the bovine mammary organ as a potential reassortment site for IAVs; human mammary organs are unlikely to create this opportunity.

      (2) For the virus infection studies with segment 8 swaps, it should at least be noted that some of the phenotypes could be driven by NEP.

      We agree, and will change the text to acknowledge this in a revised version.

      (3) The data demonstrating that bMEC can support co-infection are compelling and important, but would be strengthened with a comparison from a different cell type or species. Do mammary cells uniquely support higher co-infection?

      We have data showing that co-infection also occurs in the continuous MAC-T udder cell line and will include these data in a revision. We have not tested bovine cells from other organs for co-infection potential as they do not seem to be significant sites of infection in vivo.

    1. eLife Assessment

      This potentially important paper questions the evolutionary origin of the tunicate endoderm, using single-cell sequencing on a developmental series of the ascidian Styela clava that covers metamorphosis and gut development. The authors base their conclusions on a comparison with the development of mouse gut endoderm, where they point out similarities in the origin of tissues, perhaps representing a case of "deep homology". This work has the potential to make a significant contribution to the field of chordate evolution, but in its current form, the evidence it presents is incomplete and is limited by a problematic discussion of evolutionary implications and by major issues regarding the clarity and cogency of data presentation.

    2. Reviewer #1 (Public review):

      Summary:

      The authors employ state-of-the-art single-cell sequencing technologies to map the gene expression profiles of the developing digestive tract in the ascidian Styela clava, a member of the invertebrate sister group to vertebrates. This data has the potential to provide a new perspective on the relationships between the guts of an invertebrate like this ascidian relative to vertebrate systems. Key findings include the elaboration of our understanding that the Styela gut arises from two distinct cellular origins, with this being comparable to the dual embryogenic origin of vertebrate guts (at least, as exemplified by the mouse digestive tract arising from both definitive and visceral endoderm).

      Strengths:

      The resolution that can be achieved from the series of developmental stages analysed by the authors through the metamorphosis and early gut specification and development is vital to the strength of this new dataset. This new scRNAseq data is likely to provide a useful foundation for future work that delves into the functions of various genes within regions of the ascidian gut.

      Weaknesses:

      The main weakness of the manuscript as it currently stands is the lack of clarity about the genetic comparisons between ascidian and mouse, and what the precise genetic underpinnings are for any statements of similarity.

    3. Reviewer #2 (Public review):

      This manuscript explores endodermal lineage specification during metamorphosis in Styela clava. As biphasic lifestyle organisms, the endoderm exists as a rudiment in the non-feeding larvae that differentiates throughout metamorphosis to build the digestive components of the adult body plan. The authors of this manuscript use scRNA sequencing of individuals throughout the metamorphic process, as well as maturing juveniles, to follow the trajectories of the endodermal precursors. They identify two distinct populations that give rise to the stomach and intestinal lineages, and they suggest that there are homologous relationships between tunicate & vertebrate dual-origin endodermal lineages. Additionally, the authors highlight the role of conserved FGF signal-dependent programs in digestive organ patterning and suggest that endodermal fate restriction occurs earlier in Styela in comparison with the mouse gut.

      Overall, the paper is the first in-depth look at tunicate endodermal fate from a single-cell sequencing perspective and provides a robust framework for understanding the evolutionary origins of the deuterostome/chordate gut. The data is substantial and of great interest. However, we find their discussion of evolutionary implications to be highly problematic, and there are also numerous major issues regarding the clarity and cogency of their data presentation. Thus, we consider that substantial revision is required to provide a more accurate analysis of this data and its evolutionary implications. This revision would not require further experimentation.

    4. Author response:

      We sincerely thank the Reviewing Editor, Senior Editor, and both reviewers for their careful and constructive assessment of our manuscript. We are encouraged that the reviewers recognize the value of our dataset and its potential contribution. We greatly appreciate the thoughtful comments and have carefully considered the reviews. We plan to revise the manuscript accordingly. 

      First, we will revise and refine the cross-species comparative analysis, with particular attention to clarifying the basis of the comparisons between ascidian and mouse endodermal lineages. In particular, we will adopt a more cautious and precise comparative framework, clarify the scope and limitations of the mouse comparison, and broaden the context by incorporating additional vertebrate and invertebrate deuterostome systems where relevant.

      Second, we will strengthen the gene-level interpretation of the identified endodermal populations and clarify the molecular basis for the similarities and differences. In particular, we will more clearly identify the key marker genes defining each population, better explain their relationship to previously described developmental sources. 

      Third, we will improve the clarity of the Results presentation, including the description of the two major endodermal progenitor populations and their subcategories, as well as the organization of the text, figures, and figure legends. 

      Fourth, we will substantially rewrite the Discussion, especially the sections dealing with evolutionary implications, to ensure that our interpretations are presented in a more cautious manner.

      These revisions are intended to address the reviewers’ concerns regarding both the evolutionary framing and the presentation of the data. We believe that these revisions, which will include both rewriting and additional analyses, will improve the clarity and rigor of the manuscript. We look forward to submitting a revised version.

      We thank the editors and reviewers again for their time and expertise.

    1. eLife Assessment

      This study presents a valuable finding on the role of intracellular zinc as a regulator of the sperm-specific potassium channel Slo3, demonstrating that zinc export during capacitation contributes to alkalinization-induced membrane hyperpolarization. The electrophysiological evidence supporting zinc-mediated inhibition of Slo3 is solid, though the mechanistic basis of this inhibition is not complete, as the proposed zinc-binding site involving E169 and E205 has not been directly tested through double-mutant analysis. This work will be of interest to reproductive biologists and ion channel biophysicists studying the molecular mechanisms of sperm capacitation.

    2. Reviewer #1 (Public review):

      Summary:

      In their manuscript, Andriani et al. show intracellular zinc is exported from sperm during capacitation and suppresses the alkalinization-induced hyperpolarization in sperm. Intracellular zinc inhibits Slo3 current, which is enhanced by the co-expression of gamma subunit Lrrc52. Computational studies reveal that the Zn binding site on mSlo3 is located near E169 and E205, which are involved in the sustained zinc inhibition of mSlo3 current. The authors propose that intracellular zinc play a key role of sperm capacitation by inhibiting the Slo3 channel.

      Strengths:

      Overall, the work appears well designed (e.g., oocyte patch-clamp experiments), and clearly presented. Three-dimensional structural modeling and flooding simulations are executed.

      Weaknesses:

      The simple mutagenesis analysis of E169 and E205 showed partial abolishment, but the molecular mechanism by which zinc inhibits Slo3 current is not yet fully shown. The authors should consider performing more extensive experiments, such as creating double mutants or combination mutants involving other residues. Additionally, could other mechanisms explain the role of zinc in regulating the Slo3 current?

      While elucidating the mechanism of Slo3 is interesting, there is substantial literature indicating how zinc regulates channel functions at a molecular level. Given this, the manuscript should provide a deeper understanding by clearly elucidating the molecular mechanism of the regulation of Slo3 current by zinc.

      The manuscript includes no experimental data on the mechanism of intracellular zinc export during sperm capacitation, despite being crucial for the regulation of sperm function.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, Andriani and colleagues are examining the potential role of Zn flux in sperm and its effect on Slo3 channels. This is an interesting question that is likely critical to how sperm function properly and Slo3 channels are a possible candidate for a downstream molecule that is impacted by Zn. In this paper the authors using Zn imaging, sperm motility assays, and electrophysiology to show that Zn flux has impacts on sperm function. They then go on to look at the impact Zn has on Slo3 current and propose a binding site based on MD simulations. Revisions of the paper added new critical controls and improved description of the methodology.

      Strengths:

      The question of how Zn flux impacts membrane potential and sperm motility is an important one. Moreover, Slo3 make present an interesting candidate or the target of Zn regulation. The combination of methods used here also has the potential to uncover mechanisms of Zn regulation of Slo3.

      Weaknesses:

      The responses sufficiently answered my original concerns.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their manuscript, Andriani et al. show intracellular zinc is exported from sperm during capacitation and suppresses the alkalinization-induced hyperpolarization in sperm. Intracellular zinc inhibits Slo3 current, which is enhanced by the co-expression of gamma subunit Lrrc52. Computational studies reveal that the Zn binding site on mSlo3 is located near E169 and E205, which are involved in the sustained zinc inhibition of mSlo3 current. The authors propose that intracellular zinc plays a key role in sperm capacitation by inhibiting the Slo3 channel.

      Strengths:

      Overall, the work appears well-designed (e.g., oocyte patch-clamp experiments), and clearly presented. Three-dimensional structural modeling and flooding simulations are executed.

      Weaknesses:

      The simple mutagenesis analysis of E169 and E205 showed partial abolishment, but the molecular mechanism by which zinc inhibits Slo3 current is not yet fully shown. The authors should consider performing more extensive experiments, such as creating double mutants or combination mutants involving other residues. Additionally, could other mechanisms explain the role of zinc in regulating the Slo3 current?

      We thank the reviewer’s thoughtful comments regarding the mutagenesis analysis and the possible mechanisms underlying zinc regulation of Slo3. Regarding the suggestion to perform double or combination mutants, we agree that such experiments would provide valuable mechanistic insight. However, due to limited resources, we were not able to perform these additional experiments within the scope of this study. Our current results show that mutations at E169 and E205 partially abolish zinc inhibition, which suggests that the inhibitory mechanism is not mediated through a single residue and is likely more complex.

      Alternative mechanisms that may contribute to zinc modulation of Slo3 include indirect effects through modulation of nearby charged residues, structural rearrangements influenced by zinc binding, or the presence of multiple zinc binding sites within Slo3 channel other than the sites discovered through this study. At present, these mechanisms remain speculative and further studies will be required to clarify their contributions. This study provides the foundational basis for understanding how zinc inhibits the Slo3 channel and serves as an important starting point for defining the molecular mechanism in more detail.

      We already acknowledged in the Discussion section that the precise molecular basis of zinc inhibition remains unknown and that future work involving more extensive mutational and structural analyses will be essential to fully resolve this issue.

      We also added the discussion section as follows:

      “It is worth noting that the incomplete loss of zinc sensitivity in these mutants suggests that additional mechanisms may participate in zinc modulation of Slo3. These may include modulation of nearby charged residues, structural rearrangements influenced by zinc binding, or the presence of multiple zinc binding sites. Comparisons with Slo2.2 (J. Zhang et al., 2023), KCNQ4 (Gao et al., 2017), and voltage-gated calcium channels (Sun et al., 2007) further support the possibility of diverse molecular determinants for zinc inhibition. Our VCF, mutagenesis, and simulation data together indicate that zinc influences voltage sensor movement in mSlo3, which may suggest a distinct inhibitory mechanism that warrants further investigation.”

      While elucidating the mechanism of Slo3 is interesting, there is substantial literature indicating how zinc regulates channel functions at a molecular level. Given this, the manuscript should provide a deeper understanding by clearly elucidating the molecular mechanism of the regulation of Slo3 current by zinc.

      Thank you for highlighting a very important point that requires deeper discussion and explanation regarding how zinc regulates Slo3 current at the molecular level. As reported, Slo3 is gated by membrane depolarization and, at the same time, this channel is also gated by intracellular pH, particularly alkalinization (Leonetti et al., 2012; Schreiber et al., 1998; X. Zhang et al., 2006). This makes the gating mechanism of this channel complex. The molecular mechanism underlying pH regulation of the Slo3 channel remains unknown (M. D. Lyon et al., 2023). We tested different pH conditions and membrane voltage to elucidate the effect of zinc on the Slo3 channel. Our data suggests that zinc inhibition in mSlo3 channels is dependent on pH (Fig. 2A-E), voltage (Fig. 2G-H; Fig.2—figure supplement 1A, B) and exhibits a long-lasting inhibitory effect (Fig. 2I, K).

      However, as much as we are aware that these data alone cannot explain the molecular mechanisms of zinc’s effect on Slo3 current, our mutagenesis experiments also did not provide a straightforward answer. The single amino acid mutations examined in this study, which contain clustered negative residues, did not significantly alter zinc-mediated current reduction compared to the wild type. As the reviewer pointed out, mutating one single amino acid may not be sufficient to fully identify other contributing residues within the predicted mSlo3 zinc-binding site. Therefore, more extensive mutagenesis studies will be required to fully elucidate the molecular mechanism of zinc inhibition in mSlo3, which could not be fully understood in this study.

      On the other hand, when we analyzed the percentage of current recovery of all the mutants, E169A and E205A showed significant current recovery upon the wash-out by pH 8.0 alone. Consistent with MD simulations, our electrophysiological recordings demonstrated that the long-lasting inhibitory effect of zinc was partly abolished by these mutations. Thus, our findings highlight the contribution of E169A, located at the lower end of S3 domain and E205A, located at the lower region of S4 domain, to zinc-mediated inhibition of mSlo3 current.

      Additionally, since the molecular mechanism of pH regulation on Slo3 channel remains unknown, the molecular basis of its dual gating has yet to be elucidated, making it difficult to draw a single definitive conclusion from our current research data on how zinc inhibits mSlo3 current. Nevertheless, this study provides the foundation for understanding possible mechanisms of zinc inhibition. Our VCF data suggest that zinc influences the movement of VSD of mSlo3, and together with our mutagenesis and MD simulations results, these findings represent an important first step toward elucidating the molecular mechanism of zinc inhibition of the mSlo3 current.

      Intracellular zinc exerts inhibitory effect on mSlo3, similar to what has been reported for Slo2.2 channels (J. Zhang et al., 2023), high- and low-voltage activated calcium channel families (Sun et al., 2007) and KCNQ4 channels (Gao et al., 2017). These studies identified different regions, amino acids, and possible mechanisms of zinc inhibition among these ion channels. For instance, in Slo2.2 channels, which belong to the same Slo family as Slo3, the zinc-binding site was identified in the RCK2 domain, where cysteine and histidine residues form a canonical zinc binding motif (J. Zhang et al., 2023). In KCNQ4 channels, zinc inhibits the channel activity in a non-canonical manner that depends on its physiological activator, the membrane lipid PI(4,5)P<sub>2</sub> (Gao et al., 2017). Although zinc exerts the inhibitory effects on those various voltage-gated potassium and calcium channels, the mechanisms differ. Our data suggests another distinct mechanism of zinc inhibition in the mSlo3 channel with the identified sites located in the VSD, where zinc influences the voltage-sensor motion, and consequently affects the complex gating of Slo3.

      We revised the discussion section as follows, which is also related to the previous comment:

      “It is worth noting that the incomplete loss of zinc sensitivity in these mutants suggests that additional mechanisms may participate in zinc modulation of Slo3. These may include modulation of nearby charged residues, structural rearrangements influenced by zinc binding, or the presence of multiple zinc binding sites. Comparisons with Slo2.2 (J. Zhang et al., 2023), KCNQ4 (Gao et al., 2017), and voltage-gated calcium channels (Sun et al., 2007) further support the possibility of diverse molecular determinants for zinc inhibition. Our VCF, mutagenesis, and simulation data together indicate that zinc influences voltage sensor movement in mSlo3, which may suggest a distinct inhibitory mechanism that warrants further investigation.”

      The manuscript includes no experimental data on the mechanism of intracellular zinc export during sperm capacitation, despite being crucial for the regulation of sperm function.

      We thank the reviewers for the valuable comment in this regard. We agree that mechanism of intracellular zinc export during capacitation is crucial for the regulation of sperm function, and it would be an important finding if we could provide the experimental data on this. However, there are significant technical difficulties in performing such experiments. Two protein families facilitate the transport of zinc across cellular and intracellular membranes in opposite directions: ZnT and ZIP. ZIP12 has been reported to be highly expressed in mouse testis (Zhu et al., 2022), as well as ZnT-1 (Elgazar et al., 2005). To date, there are no known inhibitors for zinc transporters, and there is also no suitable antibodies available for these transporters, which makes it difficult to design experiments to examine the intracellular zinc transport during sperm capacitation. Apart from the two reported zinc transporters, the functional significance of other ZnTs and ZIPs, particularly those related to capacitation, remains largely unclear, leaving the mechanisms of zinc transport in sperm during capacitation poorly understood. Moreover. homozygous Znt-1 knockout mice exhibit a lethal phenotype (Andrews et al., 2004).

      Reviewer #2 (Public review):

      Summary:

      In this paper, Andriani and colleagues are examining the potential role of Zn flux in sperm and its effect on Slo3 channels. This is an interesting question that is likely critical to how sperm function properly and Slo3 channels are a possible candidate for a downstream molecule that is impacted by Zn. In this paper, the authors use Zn imaging, sperm motility assays, and electrophysiology to show that Zn flux impacts sperm function. They then go on to look at the impact Zn has on Slo3 current and propose a binding site based on MD simulations. While the ideas are interesting, the experiments are not well described in many places making understanding the results very difficult. In addition, critical controls are missing throughout the paper.

      Strengths:

      The question of how Zn flux impacts membrane potential and sperm motility is an important one. Moreover, Slo3 presents an interesting candidate or the target of Zn regulation. The combination of methods used here also has the potential to uncover mechanisms of Zn regulation of Slo3.

      Weaknesses:

      Much of the paper lacks experimental description which makes interpretation quite difficult, or a detailed discussion is missing. Examples include:

      (1) Figure 1, particularly the Zn imaging, is not sufficiently described. How is the fluorescence intensity measured? A representative ROI? The whole tail and head? Are the sperm immobile? If not, there is evidence that motion artifacts can significantly distort these sorts of measures from Calcium measurements in Cilia. Were there controls done? Is the small amount of Zn seen in the tail above the background?

      We sincerely thank the reviewer for pointing out important details that we should provide in this study in order to make it well understood. We would like to answer and respond to the points raised by reviewer as follows:

      Fluorescence intensity is measured by the signal taken from the whole head and the proximal part of tail in sperm. We have included this in the materials and methods.

      Materials and Methods

      “Fluorescence intensity is measured by the signal taken from the whole head and the proximal part of tail in sperm.”

      Yes sperm is immobile during zinc imaging.

      We added the control data of zinc imaging without capacitation medium and incorporated the data into the graph in Figure 1B. For the control in non-capacitation medium, we use HS medium as newly explained in the methods, results, related figure (Figure 1B), and figure legends.

      Yes the small amount of Zn seen in the tail above the background. As shown in Fig. 1A we confirmed that the signal intensity at the proximal region of the tail was higher than the background. Therefore, the data for this region were calculated after background subtraction.

      (2) The second half of Figure 1 is also not well described. What is the extracellular solution in the recordings? When you apply the Zn ionophore, do you expect influx or efflux? I assume efflux is based on the conclusions but this should be discussed explicitly.

      The extracellular solution in the recordings for Figure 1 is HS solution (HEPES-buffered saline solution), a standard non-capacitation medium. We will include this information in the materials methods.

      Materials and methods

      “HS-based solution was used as the extracellular solution.”

      We assume that intracellular zinc levels increase upon application of zinc ionophore. Previous work has reported that sperm contain approximately 35.7 ng/10<sup>6</sup> cells in the head and flagellum (Henkel et al., 1999). When zinc pyrithione is applied, it facilitates the influx of Zn<sup>2+</sup> from the surrounding medium into the cell, thereby increasing intracellular zinc concentration. Zinc pyrithione functions both as a zinc source and as a transport facilitator, allowing Zn<sup>2</sup> to cross the otherwise impermeable lipid membrane without compromising membrane integrity.

      (3) Figure 2H labels the Y axis, "normalized current". Normalized to what? Why do neither of the curves end at 1? A better description of what this figure represents is needed.

      Normalization for figure 2H was performed by dividing the absolute current of mSlo3 at pH 8.0 of each voltage by the absolute current at the pre-determined highest voltage that still produced a stable mSlo3 current (i.e., good patch, good clamp). In this analysis, +140 mV was chosen as the highest voltage for normalization, since in few cells the patch was lost at +160mV and +180mV. Similar to the control condition, the absolute current of mSlo3 in the presence of 100 µM zinc was normalized to the absolute current of the control at +140 mV. This information has been included in the figure legends and the Materials Methods section of the revised manuscript.

      Materials Methods section:

      Figure legends for figure 2H has been updated.

      (4) The alpha fold simulations are not well described. How many Zn binding sites were found? Are all of the histidine mutations in Figure 4 Supplement 1 the ones that were found?

      We thank the reviewer for the question. In our AlphaFold3 input, we only input the transmembrane region of the protein. From there, we found four sites located as follows:

      Given that we are only interested in the intracellular side of the membrane, we are only interested in the site with the highest pLDDT value (confidence values). On the IC side, there are only two sites, where the other sites are located near the pore domain. The site is near E310 and K319.

      Author response image 1.

      AlphaFold3 prediction of the Zn binding site on IC side of Slo3

      The histidines in Fig. 4—figure supplement 1 are all histidines that are not in the transmembrane region. These residues were not included in the initial inputs for AlphaFold3. However, we conducted MD simulations including these residues and we were able to show that a few of these residues are in contact with Zn. We have now plotted the minimum distance between each of these residues and Zn in the flooding simulations.

      Author response image 2.

      MD simulations of histidines residues located in IC of Slo3

      Minimum distances between histidines in Fig. 4—figure supplement 1 and Zn<sup>2+</sup> from the flooding simulations. Different colors indicate different repeats.

      (5) There is no discussion of physiological intracellular Zn concentration. How much Zn is inside the sperm? How much if likely Free vs buffered? Is 100uM a reasonable physiological concentration?

      We estimated the intracellular zinc concentration in sperm based on human sperm data, which report a zinc concentration of approximately 35.7 ng/10<sup>6</sup> cells in the head and flagellum (Henkel et al., 1999). Considering the volume of a typical human sperm is about 15 µm<sup>3</sup> (Laufer et al., 1977), this translates to an estimated intracellular zinc concentration of approximately 400 mM, although the concentration of free zinc must be much lower than this level. Although exact intracellular zinc concentrations in mouse sperm are not well-documented, this estimate supports the observation of elevated zinc in non-capacitated sperm.

      There are a number of areas where the interpretation is not well supported by the data including:

      (6) You say in the Figure 4 supplement, that "we did not observe any significant decrease in the percentage of current inhibition." But that is a pretty misleading statement. There are large changes (increases) in the amount of zinc inhibition. These might be allosteric changes but I don't think you can safely eliminate these as relevant Zn binding sites. Also, some of these mutations appear to allow at least some unbinding of Zn.

      In our MD simulations, H720 is not at the zinc binding site and therefore, mutation to arginine would indeed eliminate its binding. We are showing this in the minimum distance analysis between Zn and H720 and show that they are further than 4 Å from each others (n=3), as shown in author response image 2.

      Chimera of Slo3/Slo1 RCK2 also showed large increases in the amount of zinc inhibition, and this might serve as a potential binding site. We agree that the statement: “we did not observe any significant decrease in the percentage of current inhibition.” is misleading, therefore we revised our interpretation and statement into:

      We revised the result section as follows:

      “However, the percentage of current inhibition varied across the mutated constructs, showing either increases or no appreciable change (Fig. 4—figure supplement 1B, C).”

      (7) Following up on the above point, it seems unfair to conclude that the D162S, E169A, and E205 mutants are part of the inhibitory binding site for Zn when the mutation has no effect on inhibition and only an effect on the washout. The mutations on the intracellular side also had an impact on the washout so it seems equally likely that they are the critical residues based on your data.

      We thank the reviewer for this important point. We agree that the absence of a strong reduction in the initial zinc inhibition makes it challenging to assign any single residue as a definitive zinc binding site. However, our interpretation is based not only on the electrophysiological data but also on the MD simulations, which consistently identified E169 and E205 as residues that frequently interact with zinc and stabilize zinc occupancy within the VSD region. Although the mutations did not markedly reduce the peak level of zinc inhibition, both E169A and E205A significantly altered the long-lasting inhibitory component during washout, which is consistent with the MD-predicted interactions. In contrast, the intracellular mutations affected washout but were not supported by MD simulations as potential zinc interaction sites. Taken together, these combined datasets support the idea that E169 and E205 contribute to zinc modulation of Slo3 in the VSD, even though additional residues or mechanisms are likely involved.

      (8) Nowhere in the paper do you make the specific link between Zn flux and membrane hyperpolarization via Slo3. You show that Zn flux changes the ability of the sperm to hyperpolarize and you show that Slo3 is inhibited by Zn but the connection between the two is not demonstrated. There appears to be a specific Slo3 blocker. If you use this in sperm, do you no longer see the Zn effect?

      Thank you for pointing out the need for clarifying this point. It is already known that sperm capacitation is well associated with the increase of intracellular pH (Vredenburgh‐Wilberg & Parrish, 1995; Y. Zeng et al., 1996), the hyperpolarization of the membrane (Arnoult et al., 1999; Y. Zeng et al., 1995) and the elevation of intracellular Ca<sup>2+</sup> concentration level (Breitbart, 2002; Publicover et al., 2007) through diverse ion channel activities. To explore whether these pathways are influenced by intracellular zinc, we used patch-clamp techniques to measure the membrane potential (Vm) as shown in Fig. 1D-K. It has been reported that under the whole-cell current clamp of mouse epididymal spermatozoa, resting membrane potential is hyperpolarized after intracellular alkalinization (Navarro et al., 2007). We mentioned this in line 100-108 in the manuscript.

      Next, our findings from the experiments using mouse spermatozoa suggest that intracellular zinc inhibits a key process in sperm capacitation, specifically the alkalinization-induced hyperpolarization. Previous studies have identified the pH-and voltage-dependent potassium channel Slo3 is responsible for the principal K<sup>+</sup> current (I<sub>KSper</sub>) in mouse spermatozoa (Navarro et al., 2007; Santi et al., 2010; Schreiber et al., 1998; X. H. Zeng et al., 2011). During capacitation, the rise in pHi leads to the activation of Slo3 channels, resulting in membrane hyperpolarization (Santi et al., 2010). Given this context, we next investigated whether intracellular zinc acts directly on the Slo3 channel and found that zinc inhibits mSlo3 current. We explained this rationale of the experiment in line 143-150.

      We add following sentence to add more clarity to the text:

      “During capacitation, the rise in pHi leads to the activation of Slo3 channels, resulting in membrane hyperpolarization (Santi et al., 2010).”

      Therefore, the text was modified into:

      “Our findings suggest that intracellular zinc inhibits a key process in sperm capacitation, specifically the alkalinization-induced hyperpolarization. Previous studies have identified the pH-and voltage-dependent potassium channel Slo3 is responsible for the principal K<sup>+</sup> current (I<sub>KSper</sub>) in mouse spermatozoa (Navarro et al., 2007; Santi et al., 2010; Schreiber et al., 1998; X. H. Zeng et al., 2011). During capacitation, the rise in pHi leads to the activation of Slo3 channels, resulting in membrane hyperpolarization (Santi et al., 2010). Given this context, we next investigated whether intracellular zinc acts directly on the Slo3 channel.”

      Regarding the specific inhibitor, as has been pointed out by the reviewer that a new Slo3 inhibitor, VU0546110, exhibited more than 40-fold selective for human Slo3 over Slo1 (M. Lyon et al., 2023). However, the effect of VU0546110 in mSlo3 has not been tested yet. Both mouse and human Slo3 exhibit similar responses to certain inhibitors, but mouse and human Slo3 is also differ in their responses to several other inhibitors (M. D. Lyon et al., 2023), making it uncertain if this VU0546110 will work on mSlo3.

      (9) In the second half of Figure 1, the authors suggest that there is "no hyperpolization in 100uM Zn. That is not really true. It is reduced but not absent.

      We modified the wording of “no hyperpolarization in 100 µM Zn” to “alkalinization-induced hyperpolarization was reduced in the 100 µM ZnCl<sub>2</sub> group.”

      “In contrast, alkalinization-induced hyperpolarization was reduced in the 100 µM ZnCl<sub>2</sub> group”

      (10) The claim that Lrcc52 with Slo3 shows a higher current inhibition at pH 7.5 than pH 8 is not well supported because there are only 3 replicates in the 7.5 case. In addition, the claim is made in the test that 100uM ZnCl2 "already inhibited mSlo3+Lrcc52 at pH7.5", contrasted with mSlo3 alone, is not tested statistically.

      Thank you for the valuable comment. Although Fig. 3F shows a statistical difference, we agree that having only three replicates at pH 7.5 may somewhat weaken the conclusion. Following this suggestion, we have revised the sentence as follows:

      “Alkalinization appeared to increase the percentage of current inhibition by 100 µM ZnCl<sub>2</sub>.”

      We provided statistical analysis to compare pH 7.5 between mSlo3 alone and mSlo3+Lrrc52 in the Figure 3—figure supplement 1D:

      The statistical analysis showed that 100 µM zinc significantly inhibited the mSlo3 + Lrrc52 current at pH 7.5 compared to the mSlo3 current alone. We have incorporated the necessary changes into the revised manuscript and updated the figure legends accordingly.

      In a number of places, better controls are needed.

      (11) How specific is this effect for Zn? Mg2+, for instance, is also a divalent cation that is in the hundreds of uM range inside the cell. Does it exert the same effect? Each ion certainly has unique preferred coordination geometries, does your predicted binding with MD show what you might expect for tetrahedral coordination with Zn? Did you test other divalent cations functionally or in silicon?

      To answer this question, we have tested this by building another AlphaFold3 model, with Mg<sup>2+</sup> instead of Zn<sup>2+</sup>. We did not opt for the all-atoms MD simulations due to the cost of the simulation. Here, the model shows that Mg are all clustered at the pore domain and does not reside anywhere near the Zn<sup>2+</sup> site from both MD simulations and the AF3 model.

      Author response image 3.

      AlphaFold3 model of Slo3 channel with Mg<sup>2+</sup>

      The Slo3 AlphaFold model from residue M1 to L330. The colour gradient reflects the pLDDT score range from 1.73 to 95.69. Purple sticks highlighted E169, N171 and E205. In this study, we did not examine other divalent cations in our electrophysiological recordings. Exploring their effects will be an important direction for future research.

      (12) For the VCF experiments, a significantly higher concentration of Zn was used (10mM). What is the reason for this? There is no discussion of how much a "puff" is. Assuming you are using the RNA injector it is probably on the order of 50nL or less. Assuming the volume of an oocyte is 1uL that would argue that the final concentration is 500uM or higher. But this is also complicated by potential local effects of high Zn at the injection site, artifacts of injecting that much metal, and the fact that a great deal of the Zn will likely be bound to other things inside the cell. Better controls are needed for this experiment.

      As pointed out by the reviewer, the volume of the oocytes is estimated to be approximately 1 µL. We performed manual injections using glass needle typically used for RNA injection. However, because the injections were done manually during real-time VCF recording (as illustrated in the experimental scheme), the exact volume of the solution injected into each oocyte could not be precisely controlled. We estimated that each drop to be approximately 50 nL, resulting in a final concentration around 500 µM, as described by the reviewer.

      The rationale for using relatively high concentration was to ensure that the zinc concentration inside the oocyte reached an effective level, since manual injection may sometimes deliver less than 50 nL of solution. In some cases, injections failed entirely due to the technical difficulty of the method. Because VCF recordings are already technically difficult, we aimed to ensure that zinc injection was successful in oocytes that exhibited robust fluorescence signal by injecting an excess amount of zinc that would not disrupt normal oocyte conditions. For example, 10 mM zinc was prepared in an acidic solution (pH 2.5). We verified that this acidic condition did not affect mSlo3 current by performing control injections with the acidic solution alone, since the mSlo3 current is not activated under acidic pH conditions

      Author response image 4.

      VCF control experimentes: vehicle injection.

      Reviewer #3 (Public review):

      Summary:

      The study titled "Zinc is a Key Regulator of the Sperm-Specific K+ Channel (Slo3) Function" aims to investigate the role of intracellular zinc in sperm capacitation and its regulation of the sperm-specific Slo3 potassium channel. Capacitation is a crucial physiological process that enables sperm to fertilize an egg, and membrane hyperpolarization through Slo3 activation is a well-established event in this process. The authors propose that intracellular zinc dynamically decreases during capacitation and inhibits Slo3-mediated K⁺ currents, thereby playing a regulatory role in sperm function.

      Strengths:

      (1) Novel Contribution to Sperm Physiology.

      The study provides new insights into how zinc dynamics contribute to sperm capacitation, specifically through its direct inhibition of Slo3 activity.<br /> Previous research has focused primarily on extracellular zinc's effect on sperm function; this work expands the discussion to intracellular zinc regulation, an area with limited prior investigation.

      (2) Strong Electrophysiological Evidence.

      The study employs inside-out patch-clamp recordings in Xenopus oocytes to demonstrate zinc's direct inhibition of Slo3 currents. The observed slow dissociation of zinc from Slo3 suggests a long-lasting regulatory effect, adding to the understanding of ion channel modulation in sperm cells.

      (3) Molecular Mechanistic Insights

      Using Molecular Dynamics (MD) simulations and mutagenesis, the authors identify potential zinc-binding sites within Slo3's voltage-sensing domain (VSD), particularly E169 and E205. These computational predictions are supported by electrophysiological recordings, strengthening the argument that zinc directly binds and inhibits Slo3.

      (4) Physiological Relevance and Functional Implications

      The study suggests that zinc inhibition of Slo3 could contribute to sperm motility regulation during capacitation.

      The authors provide sperm motility assays as supporting evidence, showing that zinc chelation affects motility only after capacitation has begun, suggesting a dynamic role of intracellular zinc in the capacitation process.

      Weaknesses:

      While the study presents compelling electrophysiological data and molecular insights, there are several critical gaps that must be addressed before fully supporting the physiological relevance of the findings.

      (1) The authors should measure the effects in sperm cells using the patch-clamp technique to directly record Slo3 currents. By normalizing Slo3 currents to cell capacitance at different intracellular zinc concentrations, the authors can quantitatively assess the extent of Slo3 inhibition by zinc and strengthen the physiological relevance of their findings.

      We thank the reviewer for the valuable comments to strengthen the physiological relevance of our findings. We provided additional data of Slo3 currents measured using perforated patch-clamp recording in sperm cells in experiments with zinc pyrithione (ZnPy) before and after the addition of 10 mM NH<sub>4</sub>Cl. Control experiments were conducted in the absence of ZnPy, in which Slo3 current were recorded before and after the application of 10 mM NH<sub>4</sub>Cl. These data have been integrated into Figure 1L-N and Figure 1—figure supplement 1A, B.

      It is worth noting that Slo3 current in this recording might contain other endogenous current, as no specific blocker was used. Nonetheless, the data showed that the Slo3 current in sperm tends to be inhibited by zinc, as shown by the plot of absolute Slo3 current after the addition of 10 mM NH<sub>4</sub>Cl in the absence of ZnPy (control) and in the presence of 100 µM ZnPy. There was a decrease in the fold change calculated from the absolute current before and after the addition of 10 mM NH<sub>4</sub>Cl of ZnPy treated group compared to the control group.

      We also provided data with the cell capacitance as suggested; however, cell capacitance obtained from the sperm recordings showed the capacitance throughout the head and midpiece of spermatozoa. On the other hand, Slo3 channels are not expressed in the entire spermatozoa, therefore the cell capacitance acquired from these recordings does not accurately reflect the area where the Slo3 channels are localized. Although we included normalization of Slo3 currents to cell capacitance before and after ZnPy application, this normalization should be interpreted with caution for the reasons mentioned above. The corresponding figure has been included in the supplementary data Figure 1—figure supplement 1A, B.

      We added sentences to the result section as follows:

      “We also measured Slo3 current using perforated patch-clamp recordings in spermatozoa treated with ZnPy, before and after the addition of NH<sub>4</sub> Cl. Control experiments were conducted in the absence of ZnPy, in which Slo3 current were recorded before and after the application of 10 mM NH<sub>4</sub>Cl (Fig. 1L-N; Fig. 1—figure supplement 2A, B). Slo3 current in sperm tended to be inhibited by zinc, as shown by the plot of absolute Slo3 current after the addition of 10 mM NH<sub>4</sub>Cl in the absence of ZnPy (control) and in the presence of 100 µM ZnPy (Fig. 1L, M). There was a decrease in the fold change calculated from the absolute current before and after the addition of 10 mM NH<sub>4</sub>Cl of ZnPy treated group compared to the control group (Fig. 1N). Taken together, these results confirmed that intracellular zinc indeed inhibits alkalinization-induced hyperpolarization in mouse sperm.”

      (2) Lack of Controls in Non-Capacitated Sperm

      The claim that zinc is exported from sperm during capacitation needs stronger experimental validation.

      The authors did not include a control group of non-capacitated sperm in key fluorescence imaging experiments, making it difficult to confirm that the observed zinc decrease is capacitation-specific rather than a general zinc redistribution process.

      To strengthen this conclusion, experiments should be performed in non-capacitating conditions to determine whether intracellular zinc levels remain unchanged.

      We added the control group of non-capacitated sperm in key fluorescence imaging experiments, as integrated in Figure 1B.

      The following changes in the Results and Figure Legend sections are revised and added:

      “We observed that there was a gradual and significant decrease in fluorescence intensity in both regions (Fig. 1B), particularly prominent in the flagellum (Fig. 1C). This decline suggests the active release of intracellular zinc from sperm flagellum occurs during capacitation. In contrast, the fluorescence intensity of the control group of non-capacitated sperm remained unchanged (Fig. 1B).”

      Figure Legend 1B was modified accordingly.

      (3) Unclear Role of Zinc in Physiological Capacitation

      The study clearly demonstrates zinc inhibition of Slo3 but does not sufficiently establish how this affects capacitation at a functional level.

      Additional motility and capacitation markers should be analyzed to confirm that zinc influences sperm behavior beyond Slo3 inhibition.

      We thank the reviewer for this valuable comment. We fully agree that zinc can influence sperm physiology through multiple mechanisms and that its overall effects on capacitation are complex. However, the main goal of our study is to investigate the mechanism and to determine whether intracellular Zn<sup>2+</sup> directly inhibits Slo3. Our results from both the heterologous expression system and the sperm membrane potential recordings consistently support this conclusion.

      For these reasons, we believe that adding such assays would not clarify the role of Slo3 in capacitation but rather risk confounding interpretation. Instead, we have expanded the Discussion to explicitly acknowledge these limitations and to emphasize that future studies combining genetic or pharmacological modulation of Slo3 with comprehensive capacitation analyses will be required to fully define its physiological impact.

      We added sentences to the discussion section in the revised manuscript as follows:

      “Although these results support a mechanistic link between zinc and Slo3 activity, future studies that combine genetic or pharmacological modulation of Slo3 with comprehensive capacitation analyses will be required to define its physiological impact in more detail. Within this context, this study highlights the potential importance of intracellular zinc in the regulation of sperm capacitation.”

      (4) Insufficient Data on Zinc-Slo3 Specificity

      The authors should consider using quinidine, a known washable Slo3 inhibitor, to confirm that zinc acts specifically on Slo3 channels rather than other endogenous ion channels.

      The study would benefit from including washout controls in the inside-out patch-clamp recordings, as seen in Figure 3-Supplement 1, to confirm that zinc inhibition is reversible or long-lasting.

      We thank the reviewer for raising the point regarding the need to confirm that the current observed in our recordings indeed represents Slo3 current by using a specific blocker such as quinidine, as there is a possibility that endogenous currents might also be present and that zinc could act on those endogenous currents. Performing experiments with quinidine would indeed be crucial to demonstrate the specificity of Slo3 current in our patch-clamp recordings.

      However, in our current experimental protocol, we apply ramp pulses multiple times and require a long series of recordings within a single session in one patch as described in the materials and methods as well as Figure 2I, Figure 4—figure supplement 1C, Figure 5B (pH 8.0 → 100 µM zinc → pH 8.0, to observe the washout effect). Incorporating quinidine into this sequence would make the protocol even longer (pH 8.0 → quinidine → washout → pH 8.0 → 100 µM zinc), which increases the likelihood of patch loss before completing the full set.

      Furthermore, we have ensured that the recorded current corresponds to Slo3 by using appropriate experimental conditions, specifically the suitable voltage range for activation, a high intracellular pH (pH 8.0), and high-potassium solutions in our recordings.

      (5) Missing Discussion of Zinc's Role in CatSper Regulation

      The study focuses solely on Slo3 but does not mention CatSper, the principal Ca<sup>2+</sup> channel essential for sperm capacitation.

      Zinc has been reported to inhibit CatSper activity, which could significantly impact sperm function.

      The discussion should address whether zinc's effect on Slo3 represents a broader regulatory mechanism influencing multiple ion channels during capacitation.

      Thank you for the comment. To the best of our knowledge, there have been no reports showing that CatSper activity is directly regulated by zinc ions.

      Furthermore, in our patch-clamp recordings with NH<sub>4</sub>Cl and ZnPy, we observed that the normal CatSper current increased even in the presence of ZnPy, which makes it challenging to conclude whether zinc directly affects CatSper channel activity.

      We added sentences to the discussion section in the revised manuscript as follows:

      “In addition to that, to date, there are only few reports on the effect of zinc on other sperm ion channels, and none have been reported in mouse sperm. One important study was reported by (Jeschke et al., 2021), in which seminal zinc was found to inhibit prostaglandin-induced activation of CatSper, a sperm-specific Ca<sup>2+</sup> channel, in human sperm. The complex opposing action of seminal zinc and prostaglandins on CatSper may help preventing premature activation of CatSper in the ejaculate and act as a dilution sensor, although this study does not provide direct evidence for zinc acting directly on CatSper (Jeschke et al., 2021).”

      Final Assessment

      This work presents important findings on zinc regulation of Slo3 channels, supported by strong electrophysiological and molecular analyses. However, the physiological relevance of these findings remains unclear due to missing controls, and needs additional functional assays. Addressing these issues would significantly enhance the manuscript's scientific rigor and impact.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Most of the specific comments and suggestions are in the public review. Minor additional comments primarily focused on presentation and textual errors are here.

      (1) There is something strange happening in Figure 6D in the -100ish range. I think it's likely related to the reversal potential of K+.

      Thank you for pointing it out. Yes in figure 6D there was strange plot in the range of -100 mV. As the reviewer has pointed out we also think that it is related to the reversal potential of potassium ions.

      (2) There are a number of errors in the text that make following it difficult. For instance, multiple times the authors say "In consistent" (line 120 as an example) when I think they mean consistent with.

      We changed the “in consistent” with “consistent with” throughout the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      The authors provide well-described experiments, particularly those examining the effects of intracellular zinc on Slo3 channels using inside-out patch-clamp recordings. However, some experimental designs intended to assess the physiological relevance of these findings during capacitation require additional controls and data before the authors' claims can be fully supported.

      Comments

      Major Concerns & Suggested Improvements

      Line 65: "In the present study, we find that intracellular zinc is exported during capacitation, indicating that zinc dynamics in spermatozoa play an important role in fertilization."

      This claim requires additional experimental data to be fully supported.

      Thank you for pointing it out. We have provided data for control experiments of zinc imaging in non-capacitated conditions in Figure 1B.

      Line 79: "Intracellular zinc is exported from sperm during capacitation."

      The authors should include controls in non-capacitated conditions to determine whether zinc export is specific to capacitation or a general process in sperm cells.

      Again, we have provided data for control experiments of zinc imaging in non-capacitated conditions in Figure 1B.

      Figures - General Comment:

      In all figures, please replace SEM (Standard Error of the Mean) with Standard Deviation (SD) for consistency and a more accurate representation of variability.

      SEM (Standard Error of the Mean) has been replaced with SD (Standard Deviation) in all figures (main figures and supplements) as well as in numerical description accordingly.

      Figure 1

      Panel B:

      Include a non-capacitating media control to confirm that the observed decrease in zinc-sensitive dye fluorescence is not due to artifact/photobleaching.

      We have provided data for control experiments of zinc imaging in non-capacitated conditions in Figure 1B.

      Perform an experiment with capacitating media supplemented with a higher concentration of zinc. If intracellular zinc export is a real effect, added extracellular zinc should prevent or reduce this phenomenon.

      We appreciate the reviewer’s suggestion; however, we believe that supplementing the medium with high concentrations of zinc is unsuitable for validating the export phenomenon due to confounding physiological factors. Our preliminary tests demonstrated that increasing extracellular zinc triggers a drastic increase in intracellular zinc as well (Author response image 5). Furthermore, the high concentration of BSA in the capacitation medium acts as a potent zinc buffer, precluding precise control over free Zn<sup>2+</sup> levels. Therefore, the inherent difficulty in maintaining defined extracellular and intracellular Zn<sup>2+</sup> gradients makes the interpretation of such data highly problematic. Future studies will focus on identifying the specific zinc transporters involved and characterizing their molecular mechanisms.

      Author response image 5.

      Zinc addition

      Clarify whether the "n" value represents different cells or multiple recordings from the same cell.

      n value represents different cells.

      Supplemental Figure 1:

      Incorporate Δ (delta) comparison between 10 min and 2 hours under control conditions and in the presence of TPEN.

      Here we provide data:

      Author response image 6.

      Δ comparition between control and TPEN

      Provide statistical analysis for these comparisons to make the effects of capacitation clearer.

      We did the calculation and statistical analysis, however there was no statistical difference, as shown in the author response figure 6 due to high variability of individual data.

      Figure 2

      Panel C:

      Incorporate inhibition at pH 7.4 and 6.0 for direct comparison.

      Recording inhibition effect of zinc at pH 6.0 is not possible because there would be no current to begin with, as mSlo3 is gated by both voltage and alkaline pH.

      Panel D:

      Include a washout control, similar to what is shown in Panel A.

      We included a washout control trace to Figure 2D.

      Panel E:

      Provide a longer reference trace in the absence of zinc to clearly visualize the control condition. The current reference segment is too short to properly assess baseline activity.

      Although we do not have a longer reference trace in the absence of zinc for Figure 2E, we instead show the trace recorded under the application of 0.1 µM zinc in Figure 2—figure supplement 1A to illustrate the current behavior.

      Panels G-H:

      Include inside-out patch-clamp traces and quantification of zinc washout effects.

      Inside out patch traces are shown in Figure 2G as we applied step-pulses protocol. The zinc washout effect could not be quantified because the patch was usually lost after the second step-pulse application.

      Panels I-K:

      Provide additional traces. In Panel I, the inhibition by zinc is clear, but in Panel J, the reduction appears less distinct and could be due to rundown or an artifact. Additional controls should clarify this.

      Figure 2K presents the most representative trace among five recorded cells. The apparent reduction is less distinct, likely due to an artifact caused by a bubble in the rapid perfusion system during solution exchange. However, at the end of zinc application (t = 50 s), the current amplitude was clearly reduced compared with that at t = 0–10 s.

      Figure 3

      Panel D:

      Include additional data showing the transition to pH 6 and washout with pH 7.5, similar to the experimental design in Panels A and B.

      We included additional data showing raw trace of the application of pH 6.0 in Figure 3D, also included the transition to pH 6 and washout with pH 7.5 in Figure 3E.

      Figure 3-Supplement 1:

      Include zinc washout experiments. This approach is one of the best ways to evaluate the reversibility of zinc inhibition on the channel.

      As mentioned above, in this recording we recorded step pulses up to +180 mV. The zinc washout effect could not be quantified because the patch was usually lost after the second step-pulse application.

      Figure 6

      Zinc Inhibition Specificity:

      The authors should use quinidine, a known washable Slo3 inhibitor, to assess Slo3 activity before and after zinc injection.

      This experiment would confirm that zinc specifically inhibits Slo3, rather than affecting other endogenous channels.

      We sincerely thank the reviewer for this valuable suggestion. However, given the technical difficulty of these experiments, which involve lengthy VCF recordings and manual zinc injections that significantly compromise oocyte health, it is not feasible to apply quinidine at this stage.

      Moreover, we observed voltage-dependent fluorescence changes around the VSD, and this change was influenced by the application of zinc, confirming that zinc specifically inhibits Slo3 rather than affecting other endogenous channels.

      Discussion - Key Revisions Needed

      Line 308: "Our results demonstrated that intracellular zinc is exported from spermatozoa during capacitation."

      This claim needs to be supported by experiments using non-capacitated conditions.

      Additionally, measuring maximum and minimum zinc concentrations under different conditions would improve the interpretation of fluorescence intensity changes.

      We now include negative control in non-capacitated sperm. The data is incorporated into Figure 1B.

      Line 309: "We further discovered that intracellular zinc regulates alkalinization-induced hyperpolarization in mice spermatozoa, mediated by Slo3 channel."

      Additional controls are needed to substantiate this claim.

      At this stage of the study, we do not have access to Slo3 knockout (KO) mice; therefore, performing additional experiments is not feasible.

      Line 316: "Using FluoZin3-AM for zinc imaging, we confirmed the presence of intracellular zinc in sperm (Fig. 1A), which is consistent with previous findings (Henkel et al., 1999). Our observations revealed that treatment with capacitation medium induced a decrease in zinc fluorescence intensity (Fig. 1B, C), suggesting that zinc levels are dynamic during capacitation."

      This statement must be supported by negative controls, including non-capacitated sperm conditions.

      We now include negative control in non-capacitated sperm. The data is incorporated into Figure 1B.

      Line 327: "We also observed that zinc chelator significantly affected the sperm motility only after, but not before, capacitation (Fig. 1-figure supplement 1)."

      Data presentation should be revised to highlight the effects of capacitation itself.

      The discussion should specify which motility parameters were affected and why others were not.

      In the text we mentioned that:

      “We incubated the isolated spermatozoa with cell permeable Zn<sup>2+</sup> chelator N,N,N',N'-Tetrakis(2-pyridylmethyl)ethylenediamine (TPEN) and measured the motility parameters before and after capacitation. We found that VAP (average path velocity), VCL (curvilinear velocity), and VSL (straight-line velocity) were influenced by the TPEN treatment only after the capacitation, as shown in Fig. 1—figure supplement 1. These results demonstrate that the dynamics of zinc levels during capacitation potentially contributes to sperm motility, highlighting the importance of zinc action in sperm physiology.”

      Indeed, we observed that zinc chelator significantly affected the sperm motility specifically in VAP (average path velocity), VCL (curvilinear velocity), and VSL (straight-line velocity) only after, but not before, capacitation (Fig. 1—figure supplement 1). Of note, it has been recently reported that all these motility parameters (VAP, VCL, and VSL) are reduced by Slo3-specific inhibitors in human sperm (M. Lyon et al., 2023). These findings are consistent with the idea that endogenous zinc dynamics control sperm motility through Slo3 during the capacitation process.

      Figure legend is revised accordingly.

      Line 369: "Structural determinants of zinc inhibition in the mSlo3 channel."

      The authors should include an analysis of the evolutionary conservation of the mutated sites across Slo1, Slo2, and Slo3.

      If Slo3 has a unique regulatory mechanism, these sites should show high sequence variability compared to other Slo channels.

      If these sites are highly conserved, the authors should explain how Slo3 differs functionally from Slo1 and Slo2 despite this conservation.

      We thank the reviewer for the valuable suggestions regarding the inclusion of additional discussion points on the structural determinants of zinc inhibition in the mSlo3 channel. We performed sequence alignment by using ClustalO between mSlo3, mSlo1, and mSlo2.2. It is worth noting that only human and frog variants of Slo2.1 sequence are available in the database, so we included only Slo2.2 subtype, as our focus was on Slo3 in mouse sperm.

      Based on the alignment, E169 (mSlo3 numbering) is conserved among the Slo family channels in mice, while in contrast E205 (mSlo3 numbering) is not. To date, there have been no report examining the corresponding residues to E169 (E191 in mslo1 or E176 in mslo2.2) for their zinc sensitivity. This might be because in both channels the zinc-binding sites are well defined where they are located in RCK1 domain for Slo1 (Hou et al., 2010) and RCK2 domain for Slo2.2 (J. Zhang et al., 2023). The identified binding site in Slo2.2 is conserved in Slo2.1 but not present in Slo1 and Slo3 (J. Zhang et al., 2023), further suggesting that zinc regulation differs among Slo family members. However, this does not rule out the possibility that regions surrounding E191 or E176 could provide to additional insights into zinc regulation in these channels, which could be of interest for future studies.

      Interestingly, in contrast to E169, E205 is not conserved across the Slo family, making this residue unique to the mouse Slo3 channel and potentially a determinant of zinc sensitivity in mSlo3. Given that E205 is located in the S4 domain and supported by our VCF results showing that zinc inhibition influences the motion of voltage-sensing domain of mSlo3, E205 represents an important residue to be explored in future studies. Furthermore, as this residue is unique only to Slo3, it highlights the distinct functional properties of Slo3 such as its gating mechanism as it is regulated by both membrane voltage and alkalinization, which has a different voltage range of activation compared to mSlo1 (Li et al., 2024) and involves distinct ligands and gating mechanisms compared to Slo2 (J. Zhang et al., 2023).

      We add the sequence alignment results into Figure 5—figure supplement 1F.

      We revised the results section as follows:

      “Additionally, we performed sequence alignment by using ClustalO between mSlo3, mSlo1, and mSlo2.2. It is worth noting that only human and frog variants of Slo2.1 sequence are available in the database, so we included only Slo2.2 subtype, as our focus was on Slo3 in mouse sperm. Based on the alignment, E169 (mSlo3 numbering) is conserved among the Slo family channels in mice, while in contrast E205 (mSlo3 numbering) is not. (Figure 5—figure supplement 1F).”

      We revised the discussion section as follows:

      “Based on sequence alignment, E169 (mSlo3 numbering) is conserved among Slo family channels in mice, whereas E205 (mSlo3 numbering) is not (Fig. 5—figure supplement 1F). To date, no studies have examined the corresponding residues to E169 (E191 in mSlo1 or E176 in mSlo2.2) for their potential zinc sensitivity, likely because the established zinc binding sites in these channels are located in the RCK1 domain for Slo1 (Hou et al., 2010) and the RCK2 domain for Slo2.2 (J. Zhang et al., 2023). The identified zinc binding site in Slo2.2 is conserved in Slo2.1 but is absent in both Slo1 and Slo3 (J. Zhang et al., 2023), further suggesting that zinc regulation differs among Slo family members. Although regions surrounding E191 or E176 may still provide additional insights into zinc regulation and could be of interest for future investigation, E205 stands out because, unlike E169, it is not conserved across the Slo family, making it unique to mSlo3 and potentially a specific determinant of zinc sensitivity in this channel.”

      Figure legend is revised accordingly.

      Line 392: "Physiological relevance of zinc inhibition of the mSlo3 channel in mouse sperm."

      The authors should mention the effects of zinc on CatSper channels, as CatSper is also crucial for capacitation.

      Slo3 inhibition may represent only one component of zinc's broader regulatory role during capacitation.

      We thank the reviewer for raising this important point regarding the physiological relevance of zinc inhibition of the mSlo3 channel in mouse sperm. We agree that we should have also discussed the effect of zinc on CatSper channels, as this channel is crucial for capacitation. To date, there are only few reports on the effect of zinc on CatSper channels, and none have been reported in mouse sperm. One important study was reported by (Jeschke et al., 2021), in which seminal zinc was found to inhibit prostaglandin-induced activation of CatSper in human sperm. The complex opposing action of seminal zinc and prostaglandins on CatSper may help preventing premature activation of CatSper in the ejaculate and act as a dilution sensor, which facilitating sperm to escape into female genital tract (Jeschke et al., 2021). Taking this into consideration, as the reviewer pointed out, zinc inhibition on Slo3 may represent only one component of zinc’s broader regulatory role during capacitation.

      We added a sentence to the discussion section in the revised manuscript as follows:

      “In addition to that, to date, there are only few reports on the effect of zinc on other sperm ion channels, and none have been reported in mouse sperm. One important study was reported by (Jeschke et al., 2021), in which seminal zinc was found to inhibit prostaglandin-induced activation of CatSper, a sperm-specific Ca<sup>2+</sup> channel, in human sperm. The complex opposing action of seminal zinc and prostaglandins on CatSper may help preventing premature activation of CatSper in the ejaculate and act as a dilution sensor, although this study does not provide direct evidence for zinc acting directly on CatSper (Jeschke et al., 2021).”

      The study presents valuable insights into the role of intracellular zinc in sperm capacitation and Slo3 channel function. However, the physiological impact of these findings remains unclear due to insufficient controls and missing key experimental data. The suggested revisions would strengthen the validity of the claims made by the authors and improve the overall scientific rigor of the manuscript.

      Key Areas for Improvement:

      Control experiments in non-capacitated conditions.

      Increased statistical rigor in figure analyses.

      More detailed experiments to confirm specificity of zinc action on Slo3.

      Expanded discussion of zinc's role beyond Slo3, including CatSper regulation.

      The authors should measure these effects in sperm cells using the patch-clamp technique to directly record Slo3 currents. By normalizing Slo3 currents to cell capacitance at different intracellular zinc concentrations, the authors can quantitatively assess the extent of Slo3 inhibition by zinc and strengthen the physiological relevance of their findings.

      By addressing these concerns, the manuscript will provide a more robust foundation for understanding zinc's regulatory role in sperm physiology and capacitation.

    1. eLife Assessment

      The study presents valuable findings of an optimized E. coli cell-free protein synthesis (eCFPS) system that has been simplified by reducing the number of core components from 35 to 7; furthermore, the findings communicate a simplified 'fast lysate' preparation that eliminates the need for traditional runoff and dialysis steps. It is interesting that the system's robustness is exhibited by its applicability to nanoluc, a protein that expresses readily in many systems, to more challenging proteins like the functional self-assembling vimentin and the active restriction endonuclease Bsal. Despite the study representing an advancement towards simplifying protein expression workflows, the evidence supporting some of the claims remains incomplete: performance or efficiency claims of the new system needs to be supported by comparisons with typical cell free expression systems. Despite this shortcoming, the paper remains of interest to scientists in cell and molecular biology, microbiology, biotechnology and protein synthesis.

    2. Reviewer #1 (Public review):

      Summary:

      The authors presented a simplified E. coli cell-free protein synthesis (eCFPS) system reduces core reaction components from 35 to 7, improving protein expression levels. They also presented a "fast lysate" protocol that simplifies extract preparation, enhancing accessibility and robustness for diverse applications.

      Strengths:

      The authors present a valuable new protocol for eCFPS, which simplifies its application.

      Weaknesses:

      The authors provide data for optimization but offer insufficient explanation of the fundamental mechanisms underlying the phenomenon.

      Comments on revisions:

      The authors have adequately addressed the concerns raised by the reviewers. However, the data added by the authors on this revision raised new concerns.

      On page 17, lines 358-363, and Figure 3G, the authors compared the nLuc production of mRNA-based and DNA-based reactions using initial and optimized lysates.

      The authors concluded that the optimized system showed significant enhanced transcription, which compensated for the decrease in translational efficiency. If this interpretation is correct, the low yield of the initial system is simply due to the insufficient level of effective T7 RNA polymerase in the initial lysate. Supplementing the initial lysate with sufficient T7 RNA polymerase could potentially make it outperform the optimized system, and the optimized system is not so much superior to the initial system in the protein production performance. This could be easily verified by quantifying mRNA using the real-time PCR method in both the initial and optimized systems.

    3. Reviewer #2 (Public review):

      Summary:

      The authors have made a convincing argument that the current system of in vitro translation using E. coli extracts can be significantly optimized to work with much lesser components, while maintaining activity. They have showcased their improved activity using not only physical but also functional readouts.

      Strengths:

      The experiments are designed in a very logical and easy to understand manner, which makes it easier not only to follow the paper, but also reproduce the results. Functional assays with the synthesized proteins are a good way to demonstrate functionality and applicability of the system.

      Weaknesses:

      The production of the lysate requires special instrumentation, limiting accessibility.

      Comments on revisions:

      Thank you, authors, for addressing the minor concerns outlined in my comments. I have no further recommendations.

    4. Reviewer #3 (Public review):

      Summary:

      The authors aimed to overcome the challenges associated with complex, conventional prokaryotic cell-free protein synthesis (CFPS) systems, which require up to thirty-five components, by developing a streamlined and efficient E. coli CFPS platform to encourage broader adoption. The main objective was to reduce the number of reaction components from thirty-five to seven, while also developing an accessible 'fast lysate' preparation protocol that eliminates time-consuming runoff and dialysis steps. The authors also sought to demonstrate the robustness and translational quality of this streamlined system by efficiently synthesising challenging functional proteins, including the cytotoxic restriction endonuclease BsaI and the self-assembling intermediate filament protein vimentin.

      Strengths:

      This study presents several key strengths of the optimised E. coli cell-free protein synthesis system in terms of its design, performance and accessibility.

      - The reaction mixture has been dramatically simplified, with the number of essential core components successfully reduced from up to thirty-five in conventional systems to just seven.<br /> - The "fast lysate" protocol is a significant advance in terms of procedure.<br /> - The system's ability to synthesise challenging, functional proteins is evidence of its robustness.

      Weaknesses:

      (1) Title: "A simplified and highly efficient cell-free protein synthesis system for prokaryotes".<br /> - This title is misleading since one would expect a simplified and highly efficient cell-free protein synthesis system to yield similar protein levels compared to current cell-free protein synthesis systems. What this study shows is that the composition of cell-free protein synthesis systems can be simplified while maintaining a certain level of protein synthesis. Here, optimisation does not involve maintaining protein synthesis yield while simplifying the cell-free protein synthesis system; rather, it involves developing a simplified cell-free protein synthesis system. As mentioned in my comments below, this study lacks a comparison of protein levels with a typical cell-free protein synthesis system.<br /> - What do the authors mean by "highly efficient"? Highly efficient compared to what experimental conditions? If one is interested by the yield of protein synthesis, is this simplified system highly efficient compared to current systems?

      (2) Figure 1, 3-5 :<br /> - What do relative luciferase units represent? How are these units calculated?<br /> - In this system, the level of expression depends mainly on the level of NLuc transcripts and the efficiency of NLuc translation. How did the authors ensure that the chemical composition of the different eCFPS buffers only affected protein translation and not transcript levels? In other words, are luciferase units solely an indicator of protein synthesis efficiency, or do they also depend on transcription efficiency, which could vary depending on the experimental conditions?<br /> - How long were the eCFPS reactions allowed to proceed before performing the luciferase activity measurement? Depending on the reaction time, the absence or presence of certain compounds may or may not impact NLuc expression. For example, it can be assumed that tRNA does not significantly affect NLuc levels over a short period of time, and that endogenous tRNA in the lysate is present at sufficient concentrations. However, over a longer period of time, the addition of tRNA could essential to achieve optimal NLuc levels.<br /> - The authors show that tRNA and amino acids are not strictly essential for the expression of NLuc, likely due to residual amounts within the cell lysate. However, are the protein levels achieved without added amino acids and tRNA sufficient for biochemical assays that require a certain amount of protein? It is important to note that the focus here is on optimising the simplicity of the buffer rather than the level of protein expression. In fact, the simplicity of the buffer is prioritised over the amount of protein produced. This should be made clear.<br /> - How would the NLuc level compare if all the components were optimised individually and present in an optimised buffer, compared to a buffer optimised for simplicity as described by the authors?

      (3) Line 71, Streamlining eCFPS: removal of dispensable components. This title is misleading because it creates the false impression that proteins can be produced in vitro without the addition of certain compounds. While this is true, the level of protein produced may not be sufficient for subsequent biochemical analyses. This should be made clear.

      (4) Figure 2: In the legend, change "(A) Protein expression levels of the eCFPS system measured at varying concentrations of KGlu and MgGlu2" to "(A) Protein expression levels of the eCFPS system using an Nanoluciferase (NLuc) reporter DNA measured at varying concentrations of KGlu and MgGlu2".

      (5) Lanes 302-303: "The thorough optimization of the seven core components was a critical step in achieving high protein expression levels". What are "high expression levels"? Compared to what?

      Comments on revisions:

      The authors have adequately addressed the issues I had raised in my initial review.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      The study presents valuable findings of an optimized E. coli cell-free protein synthesis (eCFPS) system that has been simplified by reducing the number of core components from 35 to 7; furthermore, the findings communicate a simplified 'fast lysate' preparation that eliminates the need for traditional runoff and dialysis steps. This study is an advance towards simplifying protein expression workflows, and the evidence provided is solid, starting with nanoluc, a protein that expresses readily in many systems, to applications to more challenging proteins like the functional self-assembling vimentin and the active restriction endonuclease Bsal. Data on the underlying mechanisms and efficiency of the presented system in terms of protein yield relative to other known cell-free systems would greatly enhance the findings' significance and the strength of the evidence. The paper remains of interest to scientists in microbiology, biotechnology and protein synthesis.

      We thank the editors for the positive assessment of our optimized E. coli cellfree protein synthesis (eCFPS) system and the "fast lysate" preparation.

      As suggested, we have significantly strengthened the evidence by adding:

      (1) Mechanism data: We have integrated a detailed analysis of the endogenous metabolic pathways (amino acids and nucleotides) into the Discussion section, supported by literature (Prinz et al. 1997; Yokoyama et al. 2010; Kigawa et al. 1999).

      (2) Efficiency comparisons: We have added quantitative comparisons of absolute protein yields between our simplified 7-component system and the conventional 35-component system (now in Figure S3 E-F), demonstrating that our system matches or exceeds traditional titers.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors only provided the data for optimization, leaving the underlying mechanism that explains the phenomena unexplained.

      We appreciate this feedback. To address the mechanism of how protein synthesis persists without exogenous additives, we have expanded the Discussion to explain how the "fast lysate" retains active endogenous enzymes. By omitting runoff and dialysis, our system preserves the metabolic capacity to synthesize amino acids (e.g., Cys and Trp from Ser) and nucleotides from residual precursors, as supported by the literature (Prinz et al. 1997; Yokoyama et al. 2010; Kigawa et al. 1999).

      Reviewer #2 (Public review):

      The production of the lysate requires special instrumentation, limiting accessibility. While the strengths of the study are well-emphasized, the limitations are not mentioned.

      We thank the reviewer for this point. While a high-pressure homogenizer is common in many molecular biology labs, we acknowledge it may be a barrier for some. We have now included a dedicated Limitations paragraph in the Discussion addressing accessibility and the inherent challenges of prokaryotic systems in producing complex human proteins requiring post-translational modifications.

      Reviewer #3 (Public review):

      (1) Clarification on "highly efficient" and the lack of comparison with typical high-yield systems.

      We have clarified "highly efficient" as a holistic balance of high yield, robustness, and simplified preparation. Crucially, we added absolute yield data (sfGFP standard curve) to Figure S3E-F demonstrating that our 7-component system performs comparably to or better than traditional high-yield protocols.

      (2) How did the authors ensure chemical composition only affected translation and not transcription?

      This is a key distinction. We performed new experiments using pretranscribed mRNA templates (Figure S3G) to isolate translational effects. While translation efficiency slightly decreased in the simplified buffer, the overall protein yield increased significantly due to a dramatic boost in transcription efficiency, confirming the system's net performance gain.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      There are specific concerns that need to be addressed:

      (1) On page 4, lines 103-109, the authors speculate that protein synthesis persists even in the absence of amino acids like arginine, cysteine, and tryptophan. They suggest that this is likely due to residual amounts of these amino acids present in the cell lysate. Yokoyama et al. demonstrated that these amino acids are generated from other amino acids by endogenous amino acid metabolic enzymes in the cell lysate (J. Biomol. NMR 48, 193, (2010), doi: 10.1007/s10858-010-9455-3.). Cysteine and tryptophan can be derived from serine. In this context, asparagine and glutamine can be disregarded because they are synthesized from aspartate and glutamate, respectively. A more indepth analysis is required to interpret the results accurately.

      We thank the reviewer for this insightful comment and for pointing us toward the relevant literature. We agree that the persistence of protein synthesis in the absence of exogenous amino acids like Arg, Cys, and Trp is driven by the robust metabolic capacity of our "fast lysate."

      Unlike conventional protocols, our "fast lysate" procedure deliberately omits runoff and dialysis steps, ensuring the maximal retention of active endogenous metabolic enzymes and residual small-molecule pools. As demonstrated by Yokoyama et al. (2010), E. coli cell extracts retain functional enzymes capable of synthesizing acid-sensitive amino acids from precursors or more stable amino acids. We have integrated a detailed mechanistic analysis of these endogenous metabolic pathways into the Discussion section and have cited Yokoyama et al. (2010) to support this interpretation.

      (2) On page 4, lines 111-115, the authors demonstrated that protein synthesis could occur even in the absence of CTP or UTP, provided ATP and GTP are present. This phenomenon can also be attributed to the analogous complementary actions of metabolic pathways.

      We agree with the reviewer's assessment. The ability of the optimized eCFPS to function without exogenous CTP/UTP relies on the same principle of endogenous metabolic conversion mentioned above. The omission of dialysis ensures that the lysate retains not only residual nucleotide pools but also the full suite of nucleotide metabolic enzymes. Powered by our optimized energy regeneration system, these enzymes maintain sufficient levels of CTP and UTP to support transcription and translation. This explanation has been added to the Discussion section to clarify the robustness of our system.

      (3) On Figure 3A, protein synthesis kinetics are presented in a stair plot instead of the commonly used scatterplot. Is there a specific reason for choosing the stair plot?

      We chose the stair plot representation to more clearly visualize the cumulative process of protein synthesis and its stabilization over discrete time intervals. Given that sampling occurred every 10 minutes, a stair plot effectively highlights the "plateau" phases and the incremental nature of accumulation, which can sometimes be obscured by dense scatter plots.

      (4) On Figure 3C. It is unclear which system is referred to as the "initial" system in Figure 3C. Which data point on Figures 3A and 3B corresponds to this "initial" system?

      We apologize for the lack of clarity. In Figure 3C, "initial" refers to the traditional 35-component system prior to our streamlining process. Figures 3A and 3B characterize the performance of the final optimized system alone. To resolve this ambiguity, we have updated the legend for Figure 3 to explicitly define the "initial" system as the pre-optimization control.

      (5) In Figure 5D, previously reported eCFPS and the system using "fast lysate" were compared. The only difference between the two systems seems to be the type of lysate used, according to the Supplementary table. Optimal concentrations for the components are the same for both lysates, or is there still room for optimization for "fast lysate"?

      The "fast lysate" primarily differs from conventional lysates in its preparation speed and the retention of endogenous cofactors/enzymes. While the optimal salt and energy concentrations remained consistent across both lysates in our tests, the "fast lysate" provides a higher baseline signal due to the endogenous T7 RNA polymerase and metabolic factors. We believe this demonstrates the robustness of the optimized reaction buffer across varying lysate preparation qualities.

      (6) The study suggests that the removal of DTT didn't negatively affect protein expression. However, based on my experience, certain proteins, especially those with cysteine residues on their surface, tend to aggregate without DTT. Did the authors attempt to express such proteins, or did they draw this conclusion based on the limited number of proteins tested?

      This is a valid concern. We based our conclusion on the functional expression of Bsal and vimentin—two proteins that are inherently prone to aggregation and misfolding. Their successful synthesis suggests that the intrinsic reducing capacity of the lysate (e.g., glutathione and thioredoxin systems) is sufficient for many targets (Prinz et al. 1997). However, we acknowledge that specialized cysteine-rich proteins may still require exogenous DTT. We have addressed this in the Discussion.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 77-78 "we iteratively evaluated the contribution of individual constituents through luciferase reporter assays" - where is all the data? Please use an appropriate figure citation. Figure 1 cherry picks some components, but I think all data should be included.

      We have structured the data presentation to show dispensable components in Figure 1 (where removal does not inhibit reaction) and essential components in Figure 2 (where 0-concentration results in zero activity). This ensures a logical flow of the "streamlining" narrative. All raw data for these screenings have been included in the Source Data files.

      (2) Line 127 typo "concentrations".

      We thank the reviewer for pointing out this error. The typo "concentrations" has been corrected.

      (3) Figure 2: "protein expression levels" measured how?/what is the unit of the vertical bar on the right? I'm assuming that this experiment was conducted for discrete concentrations and thus generated discrete data points. However, the graph makes it seem as if this is continuous data. Kindly change the type of graphing to indicate that this is discrete data, showing each data point.

      We appreciate the reviewer's suggestion. Protein expression levels were measured using the Nanoluciferase (NLuc) reporter gene assay. We utilized heatmaps/contour plots because our data are bivariate, representing the simultaneous optimization of two concentrations (e.g., Mg<sup>2+</sup> and K<sup>+</sup> in Figure 2A). For such matrix-based screenings, heatmaps are significantly more effective than scatter plots at conveying synergistic trends and identifying optimal reaction landscapes. Notably, this visualization approach for discrete biochemical optimization data was successfully employed by Ban lab in their recent study on translation system optimization (Bothe and Ban 2024). The vertical color bar on the right represents the relative expression ratio, normalized to the maximum yield. Although we have provided a scatter plot of this discrete data for reference (see Author response image 1), we believe it appears visually cluttered due to the high density of data points, making it difficult to discern overarching trends. Heatmaps, by contrast, offer a much clearer representation of the optimal reaction landscape. To maintain transparency, the discrete concentration points tested are clearly reflected by the axis ticks, and all raw discrete data are available in the Source Data files.

      Author response image 1.

      (4) Also, for all figures: the way the units are presented (DTT/mM) is confusing to me; it could just be something like [DTT] (mM).

      We have revised all figures and tables to follow the standard format (e.g., [Component] (unit)) as suggested.

      (5) Do the sucrose gradient sedimentation data have replicates? If so, please indicate statistics.

      The sucrose gradient data provided (Figure 5C) is intended as qualitative evidence that the "fast lysate" method preserves intact 70S ribosomes across different preparation batches. This experiment has been performed independently multiple times with consistent results, demonstrating the high reproducibility of our preparation method. While we did not perform a quantitative comparative analysis of ribosome concentration, the consistency of the peaks confirms the integrity of the translational machinery.

      (6) Line 457: fix the red line.

      We thank the reviewer for pointing this out. The formatting issue has been resolved in the revised manuscript.

      (7) Please mention the limitations of this study in the discussion.

      We thank the reviewer for this suggestion. We have added a paragraph to the Discussion addressing the limitations of prokaryotic systems regarding complex eukaryotic post-translational modifications and chaperone requirements.

      (8) Please include all uncropped gels in the source data, alongside the raw data, as you have already done.

      As requested, we have provided all original, uncropped gel images in the Source Data files, alongside the raw data, to ensure full transparency and compliance with the journal's data sharing policies.

      Reviewer #3 (Recommendations for the authors):

      (1) The study lacks a comparison of protein levels with a typical cell-free protein synthesis system.

      We have performed new quantitative experiments (now included in Figure S3 E-F) to measure absolute protein yields. Our optimized system achieves yields comparable to, or exceeding, several widely recognized highyield protocols while utilizing significantly fewer components. We have also clarified in the text that "highly efficient" refers to the synergistic balance of high yield, low cost, and simplified preparation time.

      (2) What do the authors mean by "highly efficient", often used in the manuscript?

      We thank the reviewer for the opportunity to clarify our terminology. We have performed new quantitative experiments (now included in Figure S3) to measure absolute protein yields, demonstrating that our optimized system achieves yields comparable to, or exceeding, several widely recognized highyield protocols while utilizing significantly fewer components.

      In the context of this manuscript, we use the term "highly efficient" as a holistic descriptor that encapsulates three key dimensions of the system:

      (1) Performance Superiority: Achieving higher expression levels and faster kinetics compared to conventional 35-component systems.

      (2) Functional Robustness: The ability to efficiently synthesize challenging targets, such as cytotoxic proteins (BsaI) and aggregation-prone proteins (vimentin), which often fail in simplified systems.

      (3) Practical Utility: A drastic reduction in preparation time and cost through the "fast lysate" protocol and the removal of 28 auxiliary components, thereby lowering the barrier to adoption.

      This definition aligns with the study's core objective: developing a system where efficiency is measured not only by final yield but by the synergy between high performance and extreme ease of use.

      (3) In this article, the term 'optimisation' is used as a synonym for 'simplification'. In biochemistry, optimisation commonly refers to an increase in yield, or the same yield achieved more easily or at a lower cost. In this case, however, we have no idea how this new system compares to a conventional expression system in terms of yield.

      We thank the reviewer for this conceptual clarification. We agree that in biochemistry, "optimization" typically implies an improvement in yield or cost-effectiveness. In our study, we use the term to describe the process of achieving a superior balance between system simplicity and protein production. To address the reviewer's concern regarding the lack of a direct yield comparison, we have added new data in Figure S3. This figure provides a sideby-side comparison of protein yields between our simplified 7-component system and the conventional 35-component system. The results demonstrate that our system not only matches the performance of the traditional setup but frequently exceeds it in terms of final protein titer, while significantly reducing the reagent cost and preparation complexity. Thus, the simplification achieved in this work represents a true biochemical optimization of the cell-free synthesis process.

      (4) The levels of transcripts of the proteins studied were not determined in any of the experiments performed. Therefore, it is unknown whether the effects of different experimental conditions on NLuc, GFP or other protein expression are due to an effect on transcription, translation, or both.

      This is an excellent point. We performed a new set of experiments using mRNA templates instead of DNA to isolate the effects on translation (Figure S3G). Our results indicate that while the system's overall boost in NLuc expression is partially attributable to enhanced transcription efficiency, the translation machinery remains highly robust. We have updated the Results and Discussion to reflect this distinction.

      References

      Bothe, Adrian, and Nenad Ban. 2024. “A Highly Optimized Human in Vitro Translation System.” Cell Reports Methods 4 (4): 100755.

      Kigawa, T., T. Yabuki, Y. Yoshida, M. Tsutsui, Y. Ito, T. Shibata, and S. Yokoyama. 1999. “Cell-Free Production and Stable-Isotope Labeling of Milligram Quantities of Proteins.” FEBS Letters 442 (1): 15–19.

      Prinz, W. A., F. Aslund, A. Holmgren, and J. Beckwith. 1997. “The Role of the Thioredoxin and Glutaredoxin Pathways in Reducing Protein Disulfide Bonds in the Escherichia Coli Cytoplasm.” The Journal of Biological Chemistry 272 (25): 15661–67.

      Yokoyama, Jun, Takayoshi Matsuda, Seizo Koshiba, and Takanori Kigawa. 2010. “An Economical Method for Producing Stable-Isotope Labeled Proteins by the E. Coli Cell-Free System.” Journal of Biomolecular NMR 48 (4): 193–201.

    1. eLife Assessment

      This is a valuable study that presents human single nuclei RNA-seq and spatial transcriptomics data of the developing outflow tract and adult aortic valves that will facilitate research in this area. Data presented are solid, with bioinformatics analyses showing cell lineage and trajectory relationships, intriguingly suggesting persistence of embryonic signature in adult aortic valve cells. The latter results would be strengthened by experimental validation.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Bobola et al reports single nuclear expression analysis with some supporting spatial expression data of human embryonic and fetal cardiac outflow tracts compared to adult aortic valves. The transcription factor GATA6 is identified as a top regulator of one of the mesenchymal subpopulations and potential interacting factors and downstream target genes are identified bioinformatically. Additional bioinformatic tools are used to describe cell lineage relationships and trajectories for developmental and adult cardiac cell types.

      Strengths:

      The strengths of the study are studies of human tissue and extensive gene expression data that will be valuable to the field.

      Weaknesses:

      In the revised manuscript the data remain largely correlative since functional relationships in cell lineages and gene regulatory interactions are based on coexpression data and bioinformatic analyses that were not subjected to further validation.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Leshem et al. presents a transcriptomic analysis of the developing human outflow tract (OFT) at embryonic and fetal stages using snRNAseq and spatial transcriptomic. Additionally, the authors analyze transcriptomic data from the adult aortic valve to compare embryonic and adult cell population, aiming to identify persistent embryonic transcriptional signatures in adult cells. A total of 15 clusters were identified from the embryonic and fetal OFT samples, including three mesenchymal and four endothelial clusters. Using SCENIC analysis on the embryonic snRNAseq data, the authors identified GATA6 as a key regulator of valve precursor cells. Spatial transcriptomic analysis of four fetal OFT sections further revealed the spatial distribution of mesenchymal nuclei, smooth muscle cells, and valvular interstitial cells. Trajectory analysis identified two distinct developmental origins of fetal mesenchymal cells: the neural crest and the second heart field. Finally, the authors used snRNAseq data from the adult aortic valve to propose that embryonic transcriptional signatures persist in a subset of adult cells.

      Strengths:

      (1) The study offers a rich and detailed dataset, combining snRNA-seq and spatial transcriptomics in human embryonic and fetal OFT, which are challenging to obtain.

      (2) The use of SCENIC and trajectory analysis adds mechanistic insight into cell lineage and regulatory programs during valve development.

      (3) This study confirms GATA6 ss a key regulator of valve precursor cells.

      (4) Comparison between embryonic/fetal and adult datasets represents a novel attempt to trace persistence of developmental transcriptional programs.

      Weaknesses:

      (1) A major limitation is the lack of experimental validation to support key conclusions, particularly the claim of persistent embryonic transcriptional signatures in adult cells.

      (2) The manuscript would benefit from a clearer discussion of how these results advance beyond previous studies in human heart and valve development.

      (3) The comparison between embryonic and adult data is interesting but would be more convincing with additional evidence supporting the proposed persistence of embryonic transcriptional signatures in adult cells

      Comments on revisions:

      The final section of the results concludes with the search for a distribution pattern similar to JAG1. The authors end their article by identifying the FOXC1 and OSR1 genes without providing further validation for their discovery, which is regrettable.

    4. Reviewer #3 (Public review):

      Leshem et al have generated a transcriptional cell atlas of the human outflow tract at two developmental timepoints and its adult valvular derivatives. This carefully performed study provides a useful resource for the study of known genes implicated in outflow tract defects and potentially also to discover new disease genes. The authors reveal neural crest and mesodermal contributions to different outflow tract components and show that GATA6, known to play a role in arterial valve development, controls a set of genes expressed in endocardial derived cells during valve development. Interestingly the results reveal intersection with GLI3 and suggest lineage persistence of gene expression through to the adult timepoint, a main new finding of this study.

      Comments on revisions:

      The authors have carefully addressed previous comments, including the addition of new analysis pointing to potential cooperation between GATA6 and GLI3.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Bobola et al reports single-nucleus expression analysis with some supporting spatial expression data of human embryonic and fetal cardiac outflow tracts compared to adult aortic valves. The transcription factor GATA6 is identified as a top regulator of one of the mesenchymal subpopulations, and potential interacting factors and downstream target genes are identified bioinformatically. Additional bioinformatic tools are used to describe cell lineage relationships and trajectories for developmental and adult cardiac cell types.

      Strengths:

      The studies of human tissue and extensive gene expression data will be valuable to the field.

      Weaknesses:

      (1) The expression data are largely confirmatory of previous studies in humans and mice. Thus, it is not clear what novel biological insights are being reported. While there is some novelty and impact in using human tissue, there are extensive existing publications and data sets in this area.

      (2) Major conclusions regarding spatial localization, differential gene expression, or cell lineage relationships based on bioinformatic data are not validated in the context of intact tissues.

      (3) The conclusions regarding lineage relationships are based on common gene expression in the current study and may not reflect cellular origins or lineage relationships that have previously been reported in genetic mouse models.

      (4) An additional limitation is the exclusive examination of adult aortic valve leaflets that represent only a subset of outflow tract derivatives in the mature heart. The conclusion, as stated in the title regarding adult derivatives of the outflow tract, is not accurate based on the limited adult tissue evaluated, exclusive bioinformatic approach, and lack of experimental lineage analysis of cell origins.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Leshem et al. presents a transcriptomic analysis of the developing human outflow tract (OFT) at embryonic and fetal stages using snRNAseq and spatial transcriptomics. Additionally, the authors analyze transcriptomic data from the adult aortic valve to compare embryonic and adult cell populations, aiming to identify persistent embryonic transcriptional signatures in adult cells. A total of 15 clusters were identified from the embryonic and fetal OFT samples, including three mesenchymal and four endothelial clusters. Using SCENIC analysis on the embryonic snRNAseq data, the authors identified GATA6 as a key regulator of valve precursor cells. Spatial transcriptomic analysis of four fetal OFT sections further revealed the spatial distribution of mesenchymal nuclei, smooth muscle cells, and valvular interstitial cells. Trajectory analysis identified two distinct developmental origins of fetal mesenchymal cells: the neural crest and the second heart field. Finally, the authors used snRNAseq data from the adult aortic valve to propose that embryonic transcriptional signatures persist in a subset of adult cells.

      Strengths:

      (1) The study offers a rich and detailed dataset, combining snRNA-seq and spatial transcriptomics in human embryonic and fetal OFT, which are challenging to obtain.

      (2) The use of SCENIC and trajectory analysis adds mechanistic insight into cell lineage and regulatory programs during valve development.

      (3) This study confirms GATA6 as a key regulator of valve precursor cells.

      (4) Comparison between embryonic/fetal and adult datasets represents a novel attempt to trace persistence of developmental transcriptional programs.

      Weaknesses:

      (1) A major limitation is the lack of experimental validation to support key conclusions, particularly the claim of persistent embryonic transcriptional signatures in adult cells.

      (2) The manuscript would benefit from a clearer discussion of how these results advance beyond previous studies in human heart and valve development.

      (3) The comparison between embryonic and adult data is interesting, but would be more convincing with additional evidence supporting the proposed persistence of embryonic transcriptional signatures in adult cells.

      Reviewer #3 (Public review):

      Leshem et al have generated a transcriptional cell atlas of the human outflow tract at two developmental timepoints and its adult valvular derivatives. This carefully performed study provides a useful resource for the study of known genes implicated in outflow tract defects and potentially also for discovering new disease genes. The authors reveal neural crest and mesodermal contributions to different outflow tract components and show that GATA6, known to play a role in arterial valve development, controls a set of genes expressed in endocardium-derived cells during valve development. Interestingly, the results suggest lineage persistence of expression of certain genes through to the adult timepoint, a main new finding of this study.

      The following points should be addressed to reinforce the conclusions and emphasize the novel features of this study.

      (1) It would be helpful to clarify how these new findings confirm or diverge from what is known from analysis of neural crest and mesodermal lineage contributions to different cell populations in the mouse heart. Did the authors identify any human-specific populations of cells, such as the LGR5 population reported by Sahara et al?

      (2) The authors should clarify in the introduction and results that they consider the endocardium to be on the SHF trajectory as indicated in Figure S4C. Please add a reference for this point.

      (3) The GATA6 results are interesting and support this experimental approach. The paper would be reinforced if the authors could provide any functional validation (in addition to their GATA6 genomic occupancy data) that the designated target genes are regulated by GATA6. This might involve looking at mutant mouse embryos or cultured cells. Do the authors consider that GATA6 may regulate the endocardial to mesenchymal transition during the early stages of valve development? Or the valve interstitial cell versus fibroblast fate choice?

      (4) Do the new findings reveal whether human valves have a direct SHF to VIC trajectory (ie, without transiting through endocardium) as has been recently shown in the murine non-coronary valve leaflet? Relevant to this point, Figure 5E appears to show contributions to a single adult aortic valve leaflet - this should be explained, or corrected.

      We sincerely thank the Editor and the Reviewers for their constructive and insightful comments. We have carefully addressed the majority of the points raised and believe the revisions have substantially strengthened the manuscript.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Overall, the reviewers felt that integrating these datasets with prior snRNAseq datasets on human OFT (de Bono et al, 2025) would enhance analyses and provide broader context.

      Several human fetal heart single-cell datasets have been published, including De Bono et al, 2025. We carefully considered whether integrative analyses with these datasets would further strengthen our study. However, there are substantial differences in anatomical scope: most published datasets encompass broad cardiac regions, whereas our study specifically targets the OFT, enabling higher-resolution characterization of OFT-specific cell states. Integration across datasets with markedly different regional compositions would likely be driven by largescale anatomical differences rather than yield additional OFT-specific insight. In addition, cross-study integration requires batch correction. When datasets differ in anatomical scope, as well as developmental timing, and experimental protocols, stronger correction may be needed, increasing the risk of overcorrection and potential loss of biologically meaningful OFTspecific signals.

      Importantly, our dataset has been deposited in the Human Cell Atlas and is fully available for future comparative analyses. We therefore believe that broader cross-dataset integration is best undertaken within such harmonized frameworks as more closely matched datasets become available.

      Overall, cluster annotations should be more rigorous, which may be facilitated by comparisons with earlier studies.

      We have clarified all the points raised by the reviewer regarding cluster annotation. Specifically: (1) the “cardiac” cluster has been renamed “cardiac muscle” to more accurately reflect its transcriptional identity; and (2) we now explicitly state that mesenchymal populations not resolved in the initial global analysis (across all samples) were subsequently defined through dedicated sub clustering analyses performed separately for the adult and developmental datasets. These clarifications have been incorporated into the revised manuscript.

      Citation of other spatial transcriptomics studies on human OFT would be useful.

      We apologise for missing these contributions. They have now been added to the text.

      Can the authors identify a human-specific population of cells, such as the LGR5 population reported by Sahara et al?

      While our dataset does not reveal a novel single-gene marker comparable to the human specific LGR5 marker described for the LGR5-positive population by Sahara et al., it does identify a distinct GATA6-enriched embryonic mesenchymal population that functions as a human valve progenitor lineage. Using regulatory network analysis, RNA velocity, lineage tracing and spatial transcriptomics, we show that this GATA6-driven program is specifically associated with semilunar valve morphogenesis and that its transcriptional signature persists in fetal and adult VIC populations. Thus, the novelty of our study lies in defining this human GATA6-regulated valve progenitor population and its lineage trajectory, rather than in the identification of previously unreported single marker genes.

      “….Although we have not defined a novel single-gene marker (analogous to LRG5 [Sahara et al]), our identification of a GATA6 network highlights…..”

      Further investigation of the specific role of GATA6 would strengthen findings.

      FISH studies would indicate whether GATA6 is involved in EMT or fibroblast versus valve interstitial cell fate choice.

      We have added a panel to Fig. S2 (D), showing that GATA6 expression is not restricted to specific outflow tract populations. In CS16-17 embryos, GATA6-expressing nuclei are detected across all embryonic clusters. Given this broad expression pattern, FISH analysis would not distinguish whether GATA6 functions in EMT or in fibroblast versus valve interstitial cell fate specification. While we cannot exclude the possibility that GATA6 contributes to EMT, we observe that its expression levels are highest in cluster 4 (post-EMT) cells. This suggests that GATA6 activation is more likely a consequence of the transition rather than its initiating cause (shown in Fig. S2D).

      Functional validation of some proposed GATA6 targets would strengthen findings.

      To our knowledge, there are currently no publicly available datasets defining the GATA6 regulatory network in human OFT cells or valvular fibroblast progenitors. Existing datasets focus primarily on cardiomyocytes, which arise from a distinct developmental lineage. Given the well-established cell-type and context dependence of transcription factor activity, these datasets are unlikely to provide meaningful insight into regulatory relationships within the valvular lineage examined here.

      As noted in the original submission, we previously leveraged published mouse GATA6 ChIPseq data from E11.5 OFT (DOI: https://doi.org/10.7554/eLife.31362) as independent support for the GATA6 regulon identified in our human dataset. In this revised version, we have now extended this analysis by formally quantifying the overlap between the cluster 4 GATA6 regulon and genes bound by GATA6 in the mouse OFT dataset. Using a hypergeometric enrichment test, we found that the observed overlap is approximately two-fold greater than expected by chance and highly significant (p = 1.2 × 10<sup>-33</sup>). This statistical analysis strengthens our original interpretation and provides quantitative support that the identified regulon is strongly enriched for bona fide GATA6-bound targets in a closely related developmental context.

      In addition, we examined the spatial expression pattern of the GATA6 regulon gene set and found that it specifically localizes to the semilunar valves (OFT derivatives), consistent with GATA6 activity in this developmental context. This new analysis has been incorporated into Figure 2F of the revised manuscript.

      Collectively, the cross-species binding enrichment and valve-specific expression pattern provide orthogonal support for the biological relevance of the identified GATA6 regulon and strengthen the mechanistic interpretation of GATA6 function in OFT and valve development.

      As GATA6 has been previously identified in mouse studies, can the authors identify novel transcription factors potentially involved in OFT development?

      To identify additional transcription factors potentially involved in OFT development and to define regulators that may confer specificity to GATA6 activity, we compared the GATA6 regulon with the regulons of other cluster 4 transcription factors identified by SCENIC (SOX4, GLI3, RARG, ETV1, GLIS3, BACH2, ZNF423, FOXO3, ZBTB20).

      While all cluster 4 regulators share some downstream targets, GLI3 regulon showed approximately twice the degree of overlap with the GATA6 regulon compared to the other factors. This suggests a potential functional interaction between GATA6 and GLI3 in OFT associated mesenchyme. Consistent with this, cooperation between GATA6 and GLI3 has been reported in mouse limb development. These findings have now been incorporated into the Results section, and co-expression of GATA6 and GLI3 in CS16-17 populations is shown in Figure S2DE.

      Although GATA6 has previously been implicated in OFT development, SCENIC analysis provides mechanistic insight by defining the downstream gene programs active in specific human embryonic lineages. Thus, the novelty of our findings lies not in re-identifying GATA6, but in characterizing its regulon in human OFT- and valve-associated mesenchyme and identifying potential cooperating regulators such as GLI3.

      Embryonic signatures in adult valve cells are an interesting finding, that should be further explored by pseudotime trajectories, which may also indicate whether SHF cells have a direct trajectory to VIC (without transiting endocardium), as recently shown in mice.

      We included all embryonic populations, including cardiac progenitor cells (SHF), in the pseudotime trajectory analysis. However, we did not observe evidence of a direct trajectory from SHF cells toward VIC. In contrast, the same analysis consistently identified a trajectory linking endocardial cells to VIC, supporting an endocardial origin in our dataset.

      Reviewer #1 (Recommendations for the authors):

      (1) Major conclusions regarding cell lineages and derivatives are based on common gene expression patterns and bioinformatic tools. Thus, these conclusions are not based on empirical data, and assumptions regarding lineages based on gene expression may not be accurate. The language related to lineage analysis, derivative, and longitudinal gene expression is not supported by data. For example, studies in mice have shown that aortic valve interstitial cells from endocardial cushions and neural crest-derived lineages have overlapping patterns of ECM gene expression and cannot be easily distinguished in adults. Thus, it is not possible to determine derivation and cell origins based on gene expression alone.

      While we fully acknowledge that gene expression-based analyses provide correlative rather than direct lineage-tracing evidence, the Reviewer’s statement that “it is not possible to determine derivation and cell origins based on gene expression alone,” and the example cited in support, appear to equate global transcriptional similarity with the distinct embryonic transcriptional signatures that underpin our analysis.

      As the Reviewer notes, a given differentiated cell type can derive from different embryonic progenitors. Due to functional convergence, differentiated cells often exhibit highly similar expression profiles that reflect their shared function rather than developmental origin. Consequently, discriminating embryonic origins based on global expression profiles, or even for highly distinctive genes of differentiated cells, is very challenging. The example cited by the Reviewer - overlapping ECM gene expression in aortic valve interstitial cells derived from endocardial cushions and neural crest - illustrates precisely this point.

      However, our analysis does not rely on global transcriptional similarity or on markers of mature differentiated cells. Instead, we specifically identified gene sets that are highly distinctive of embryonic clusters prior to the onset of differentiation. These signatures are enriched for transcription factors and signaling molecules that define developmental identity, rather than functional effector genes associated with mature cell states. We have shown that these embryonic signatures persist in fetal cells (which already express differentiated markers but are developmentally closer to the embryonic stage relative to adult cells) and remain detectable, albeit attenuated, in adult cells. It is these distinctive embryonic transcriptional signatures, rather than global or shared functional gene expression, that we have used to infer potential lineage relationships.

      We fully acknowledge that this constitutes correlative evidence rather than direct lineage tracing, which is not feasible in human studies. However, the persistence of embryonic regulatory signatures into fetal and adult stages provides a biologically plausible link to developmental origin. This persistence most plausibly reflects partial retention of ancestral embryonic transcriptional programs in descendant cells, rather than de novo activation later in life of embryonic genes that were never previously expressed in that cell’s lineage.

      (2) Most of the findings related to cell composition, gene expression, and cell lineages seem to be largely confirmatory of previous reports. Novel findings should be emphasized and validated in the tissues.

      We agree that several aspects of our dataset reproduce and extend findings from previous human and animal studies, which we regard as an important validation of the atlas. However, our study also provides multiple novel insights that are directly supported by our spatial data. Specifically, we (i) identify a GATA6-enriched embryonic mesenchymal valve progenitor population, (ii) delineate its GATA6 transcriptional regulon and direct targets implicated in OFT and valve disease, and (iii) trace its embryonic transcriptional signature into fetal and adult valve interstitial cell populations. These findings are strengthened by our spatial transcriptomic data, which maps the GATA6 regulon and key targets to the semilunar valves and adjacent arterial root, providing in situ validation of both cell identity and gene expression patterns (see Fig. 3 and the newly added Fig. 2F). We have revised the Discussion to more explicitly highlight these novel aspects and their spatial validation in the final

      “In summary, our work goes beyond confirming previously reported cell types by (i) defining a GATA6-regulated human valve progenitor lineage and its descendants, (ii) establishing distinct embryonic origins for smooth muscle and valvular fibroblasts, and (iii) demonstrating persistence of embryonic signatures in adult valve cell populations. These findings are directly supported in tissue by our spatial transcriptomics data, which map these lineages and regulatory programs to defined anatomical domains within the human OFT and semilunar valves.”

      (3) The developing outflow tract of the heart contributes to more than just the aortic valve leaflets in adults. Additional conotruncal structures need to be evaluated in order to define adult derivatives of the developing outflow tract as described in the title.

      The title has been changed to reflect that only adult aortic valves were examined.

      (4) Major conclusions regarding the GATA6 regulatory network and downstream target genes are not validated in the context of the developing outflow tract or adult valves. Is GATA6 expression restricted to specific outflow tract populations? Is GATA6 binding or responsive gene expression detected for the indicated target genes?

      We performed additional analyses that further reinforce the relationship between GATA6 and its target genes and support the biological relevance of GATA6 downstream targets in arterial valve development. Below, we address the specific questions raised by the reviewer.

      (1) Is GATA6 expression restricted to specific outflow tract populations?

      GATA6 expression is not restricted to specific outflow tract populations. In CS16-17 embryos, GATA6-expressing cells are detected across all embryonic clusters; however, expression levels are highest in cluster 4 (valve precursor cells).

      Despite this broad expression pattern, SCENIC identifies GATA6 activity (i.e., a GATA6 regulon) specifically in cluster 4. This apparent restriction of GATA6 regulatory activity to cluster 4 may be explained, at least in part, by its elevated expression levels within this cluster. Alternatively, given that transcription factors often act in a combinatorial manner, GATA6 may co-regulate its target genes in cluster 4 together with additional cluster-specific regulators. To explore this possibility, we compared the GATA6 regulon with the regulons of other cluster 4 transcription factors identified by SCENIC (namely SOX4, GLI3, RARG, ETV1, GLIS3, BACH2, ZNF423, FOXO3, ZBTB20) in order to identify potential co-regulatory modules. As expected, since these regulons are sampled from the subset of genes enriched in cluster 4, all regulators share a substantial proportion of downstream targets with GATA6. However, GLI3 stands out, showing approximately twice the degree of overlap compared to the other factors. This suggests a functional interaction between GATA6 and GLI3, consistent with previously reported cooperation in mouse limb development. These results have been incorporated into the Results section, and the expression of GATA6 and GLI3 in CS16-17 cell populations is shown in Fig. S2DE.

      (2) Is GATA6 binding or responsive gene expression detected for the indicated target genes?

      We were unable to find public data describing the GATA6 regulatory network or its downstream targets in the specific human cell types examined here (OFT cells; valvular fibroblast progenitors). Available datasets focus primarily on cardiomyocytes, which arise from a distinct lineage, and because transcription factor function is highly cell-type and context dependent, these datasets are unlikely to be helpful in inferring regulatory relationships in the valvular lineage.

      The strongest validation for the GATA6 regulon identified in this study comes from the mouse GATA6 occupancy data (this was included in the original manuscript). Although derived from a different species, GATA6 binding has been profiled in a highly related developmental context, the OFT. To assess the relevance of these data to our human findings, we performed a hypergeometric test comparing the GATA6 regulon identified in cluster 4 (this study) with genes bound by GATA6 in E11.5 mouse OFT ChIP-seq data (DOI: https://doi.org/10.7554/eLife.31362). The observed overlap is substantially greater than expected by chance: it is approximately twice the expected value, and the enrichment is highly significant (p = 1.2 × 10<sup>-33</sup>). Biologically, this strongly supports the interpretation that many genes within GATA6 regulon are likely to be direct GATA6 targets, or at minimum are strongly associated with GATA6 binding, rather than representing a random gene set. This analysis has been added to the revised manuscript.

      In this revised version of the manuscript, we also overlapped the expression of GATA6 regulon genes to our fetal spatial transcriptomics data. GATA6 regulon was identified in embryonic cluster 4, whose expected trajectory is fetal valvular fibroblasts (cluster 12). Remarkably, GATA6 regulon genes are expressed in both the aortic and pulmonary valves, and their expression pattern aligns closely with HAPLN1-positive valvular fibroblasts (cluster 12), further supporting the biological relevance of this gene set. This new data has been added to Fig 2(F).

      Together, the strong enrichment of GATA6 regulon genes among GATA6-bound targets in the OFT, and the specific expression of this gene set within the arterial valves (cluster 4 descendant cells), support the biological relevance of GATA6 downstream targets in arterial valve development and disease. In addition, we identify GLI3 as a potential GATA6 co-binding partner.

      (5) What are "cardiac" cell types in the embryonic single cell clustering? Are these cardiomyocytes? Cardiac is an ambiguous term if the cells being analyzed are all in the heart.

      Thank you for highlighting this ambiguity. The “cardiac” population refers specifically to cardiac muscle cells. We have updated the labels in Fig. 1E, 1F, and Fig. S3A to make this explicit.

      (6) The methods and analytical tools seem fairly standard for single nuclear gene expression and spatial genomics studies. What are the new tools and resources being reported? The "novel lineage tracing algorithm" mentioned in the methods is not well described. A Cellxgene VIP app is mentioned, but is not described in detail. Also, it seems to be housed on a local server, which is not optimal.

      The description of the lineage tracing algorithm has been expanded in the method’s section of the paper.

      The data has been submitted to the Human Cell Atlas, a coordinated global effort to systematically map human cell types using standardized, interoperable formats. Public access via cell x gene enables interactive visualization, gene-level queries, and cross-dataset comparisons without requiring advanced computational expertise. This broad accessibility enhances reproducibility, facilitates integration with complementary single-cell and spatial datasets, and maximizes the visibility, transparency, and long-term impact of our work.

      (7) Only adult aortic valves from females were included in the study.

      The rationale for using female tissues has been explained in the result section:

      We collected female samples to mitigate individual variability and maximise the possibility to analyse healthy aortic valves, justified by the lower incidence and severity of aortic disease in females versus males.

      (8) In many of the figures, the font size of the text is too small to read.

      We have increased the font size in all figures where this was compatible with the layout. For the larger plots, additional enlargement would necessitate scaling the panels beyond the allowable page dimensions, and therefore could not be implemented.

      (9) "CAT" is not a commonly used abbreviation for congenital heart anomalies related to persistent truncus arteriosus.

      CAT is now the preferred term for PTA as latinised terms are no longer used.

      Reviewer #2 (Recommendations for the authors):

      Overall, this study is thoughtfully conducted and offers valuable observations that contribute to our understanding of valve morphogenesis. However, my main concern is the lack of experimental validation to support the findings, particularly the conclusion regarding the persistence of transcriptional signatures in adult cells, which is not sufficiently substantiated or clearly argued. It is unclear how this study advances beyond previous research in humans.

      Major points:

      (1) Several recent studies have applied spatial transcriptomics to human embryonic and fetal hearts, including OFT (Asp et al., 2019; Queen et al., 2023; Farah et al., 2024; De Bono et al., 2025). It is disappointing that the authors did not acknowledge these important contributions.

      We apologise for missing these contributions. They have now been added to the text.

      (2) The present study used snRNAseq to explore the transcriptional signature of the fetal OFT. A similar approach was used by De Bono et al. (2025) to analyze fetal hearts. Integrating these complementary snRNAseq datasets could enhance the current analysis and provide broader context for the findings.

      The reviewers suggested that integrating our datasets with prior snRNA-seq datasets on human OFT (de Bono et al., 2025) could enhance the analyses and provide broader context. While several fetal heart datasets have been published (e.g., Sahara et al.), our study focuses specifically on the OFT. These other studies do not perform cross-dataset comparisons. We therefore do not see a strong rationale for integrating ours, especially given that those datasets cover much larger regions of the heart.

      (3) Figure 1 presents 18 distinct clusters identified through unsupervised clustering. The authors classify three of these clusters broadly as mesenchymal cells. However, the term "mesenchymal cells" lacks precision. The authors should clarify why these clusters were not more specifically defined as fibroblasts or myofibroblasts based on marker expression.

      Clustering of the full dataset does not provide sufficient resolution to distinguish all mesenchymal cell types. The clusters broadly annotated as mesenchymal comprise heterogeneous populations, including both undifferentiated embryonic mesenchymal cells and more differentiated fetal mesenchymal cells. These mesenchymal clusters were therefore further subclustered, and the resulting cell identities are described in detail in the Results sections corresponding to Fig. 2 and Fig. 3.

      (4) The authors used SCENIC on their snRNAseq datasets to infer key cell fate regulators and identified GATA6 as a top regulator of embryonic mesenchymal cluster 4. However, the rationale for focusing on GATA6, which is already known to be associated with CHD in humans, is not fully convincing. Why not investigate a transcription factor whose role in valve development remains unexplored?

      There are two key outcomes from a SCENIC analysis: (1) the identification of major transcriptional regulators driving the differentiation of a given cluster, and (2) the identification of their regulons (the downstream gene programs they control). While GATA6 is indeed already known to be associated with CHD in humans, including valve malformations and major OFT defects, its downstream targets in the relevant human developmental lineages have not been defined. Understanding these targets is essential for clarifying the molecular basis of GATA6-mediated CHD. Thus, the significance of our result does not lie in the rediscovery of GATA6 as a CHD-related factor, but in identifying the genes it regulates in embryonic OFT- and valve-associated mesenchyme. These GATA6-controlled genes in the OFT and valves represent biologically plausible candidate genes for human OFT defects, as disruption of GATA6 targets could similarly contribute to CHD.

      In this revised version we have performed a hypergeometric test showing that GATA6 regulon genes are significantly enriched among genes bound by GATA6 in the OFT. Biologically, this strongly supports the interpretation that many genes within the GATA6 regulon are likely to be direct GATA6 targets, or at minimum are strongly associated with GATA6 binding in the OFT, rather than representing a random gene set.

      We have also mapped the expression of GATA6 regulon to the semilunar valves. Collectively, these analyses demonstrate that the GATA6 regulon captures a biologically coherent and developmentally relevant program, offering new mechanistic insight into how GATA6 influences OFT and valve formation and how its disruption may contribute to CHD.

      (5) Several studies have already suggested a role for GATA6 in EMT. Do the authors propose that GATA6 regulates this process during embryonic valve development? Once again, validation using FISH would be important to support these findings.

      We do not propose that GATA6 directly regulates EMT during embryonic valve development. We rather make two independent observations: (1) cluster 4 derives from cluster 7 (likely through EMT); (2) GATA6 regulates cluster4-specific genes.

      The first observation is supported by RNA velocity, which links cluster 7 to cluster 4. Supporting this interpretation, endothelial cluster 7 is enriched for genes associated with arterial valve development, and mesenchymal cluster 4 cells are identified as progenitors of fetal valve fibroblasts. Because cluster 7 is endothelial and cluster 4 is mesenchymal, this trajectory suggests an endothelial-to-mesenchymal transition.

      Second, SCENIC analysis identifies GATA6 as a regulator of cluster 4 genes. Additionally, the GATA6 regulon shows distinct localization to the formed valves in fetal cells (new data added to Fig 2F). Together these findings support the notion that GATA6 regulates a gene program specific to the cell populations that will give rise to the valves and that these genes remain selectively expressed in valve cells once the arterial valves have formed.

      While we cannot exclude the possibility that GATA6 contributes to EMT, we observe that GATA6 expression levels are highest in cluster 4 (post-EMT) cells, suggesting that its activation may be a consequence of the transition rather than its initiating cause (now shown in Fig S2D).

      For validation using FISH, please see response to point 6 below

      (6) I found it curious that the ST section was used to validate MECOM expression (Figure 2I), while ST had not yet been introduced at this point in the manuscript. Validation using FISH would have been a more appropriate approach.

      Thank you for drawing attention to this discrepancy. Spatial transcriptomics is now introduced before MECOM analysis, in the Results section pertaining to Figure 2F

      “…spatial transcriptomic analysis of a later stage (12pcw) OFT shows that GATA6 regulon is mainly restricted to the aortic and pulmonary valves (Fig 2F)”.

      With regard to this and the above comment concerning FISH, while RNA FISH/RNAscope would provide an additional orthogonal approach, the Visium-based spatial transcriptomics platform directly measures MECOM transcripts in tissue sections and, in our view, represents an appropriate and sufficiently sensitive method for validating its spatial distribution in the human OFT. We have therefore relied on the spatial transcriptomics dataset to confirm and validate gene expression patterns, rather than performing additional FISH experiments. We now explicitly state that this approach serves as an independent in situ validation of gene expression, including MECOM.

      (7) "Spatial resolution of mesenchymal nuclei in the OFT" section: It is unclear which cluster the authors are referring to in this section.

      As mentioned in the text, we “mapped the five fetal mesenchymal clusters to distinct structures in the OFT” and used distinctive markers to confirm spatial assignments.

      (8) The authors should justify their choice to use Cell2location instead of a deconvolution method.

      We selected cell2location because it provides a probabilistic, hierarchical Bayesian framework that explicitly models technical variability across both single-cell reference data and spatial transcriptomics platforms. Rather than relying on predefined marker genes or simple linear regression, cell2location leverages the full transcriptomic profile of reference single-cell data and incorporates a factor analysis-based framework to model shared transcriptional signatures and latent structure across cell types. This approach improves discrimination between closely related cell states and reduces sensitivity to gene selection bias. Additionally, the probabilistic formulation yields uncertainty estimates for inferred cell abundances, enhancing interpretability and statistical rigor. Together, these features make cell2location particularly well suited for resolving complex cellular composition in our fetal human tissue spatial transcriptomics data.

      (9) Figure 3: Cluster 9 is identified as endothelial, yet it includes markers such as MYH11 among its top genes, a gene more commonly associated with cells at the base of the aorta. This raises questions about the accuracy of the cluster annotation.

      We could not find the definition of cluster 9 as endothelial to which the reviewer refers to. In Fig 3, both in the result text and in the figure legend, cluster 9 is identified as smooth muscle, which is consistent with MYH11 expression. The endothelial cluster is shown in Fig S3C.

      (10) The approach used to trace embryonic signatures in adult cells, based on overlap with the top 100 genes in embryonic clusters, relies largely on gene expression similarity, without incorporating lineage inference tools such as RNA velocity or pseudotime analysis. This limits the ability to distinguish true developmental relationships from shared functional programs. I believe that the use of aggregated adult samples may mask individual variability. Validation in separate samples (AV1 and AV3) lacks statistical rigor. The observed lower expression of embryonic genes in adult cells further complicates interpretation, raising the possibility that these signatures reflect residual expression rather than persistent lineage markers.

      We thank the reviewer for the opportunity to clarify our approach.

      We fully agree that tools such as RNA velocity and pseudotime are powerful for capturing short-term dynamic transcriptional changes and inferring lineage trajectories within continuous developmental processes. Indeed, we applied RNA velocity and identified a transition between clusters 7 and 4 in embryonic cells (Fig 2). However, as noted in the Results section, “trajectory inference methods failed to establish lineage relationships between embryonic and fetal populations”. These methods assume temporal continuity and comparable transcriptional kinetics between cells. When comparing samples separated by large developmental intervals (e.g., embryonic versus adult tissues), these assumptions do not hold: RNA velocity vectors become unreliable and may even yield biologically meaningless directions. Therefore, rather than forcing a continuous trajectory across temporally distant datasets, we employed an anchoring approach designed to identify conserved transcriptional programs and potential lineage correspondences between embryonic and adult cell types.

      To address the concern about individual variability, we performed analyses both on aggregated adult samples and on individual replicates (AV1 and AV3). The results were highly consistent across both levels of analysis, and statistical significance was supported by very low p-values, indicating that the observed patterns are robust and reproducible. We therefore believe our analysis in independent samples is statistically sound.

      Finally, we agree that adult cells display lower expression of embryonic genes, and we acknowledge that these signatures may represent residual rather than persistent expression. This observation aligns with our intended interpretation: our goal was not to demonstrate enduring embryonic marker expression, but to highlight that adult cells retain transcriptional traces that connect them to their developmental origins.

      Reviewer #3 (Recommendations for the authors):

      (1) Please clarify if MEIS1, JAG1, ROR1, PRDM6 have been previously implicated in neural crest cell development. Are these then new potential regulators of neural crest cells? The same applies to SOX6 for the mesodermal population.

      The main reason for selecting these genes (MEIS1, JAG1, ROR1, and PRDM6 in cluster 20, and SOX6 in cluster 4) is that they serve as distinctive markers of specific embryonic clusters. Because their expression remains restricted at later developmental stages, they allow reliable tracing of bona fide descendant cells originating from cluster 20 and cluster 4 into fetal and adult tissues. Importantly, MEIS1, JAG1, ROR1, and PRDM6 were not chosen as new potential regulators of neural crest (NC) cells, but rather because their expression is enriched in cluster 20 and remains restricted at later developmental stages, allowing reliable tracing of bona fide descendant cells originating from cluster 20. Since cluster 20 is, based on transcriptional profiles, the embryonic mesenchymal cluster most closely related to the NC lineage, these markers enable lineage tracing of NC-descendent cells. Nonetheless, these genes have all been linked to neural crest biology, either through known functional roles or through specific expression patterns associated with NC development.

      Similarly, SOX6 was selected for its restricted expression in cluster 4, a pattern that is preserved in its descendant populations, making it a suitable marker for tracking the mesoderm-derived lineage.

      (2) Please comment in the text whether any regional transcriptional differences (rather than cell type differences) were detected between the aortic and pulmonary regions.

      We have added the following text to the result section related to Fig 3: “No molecular differences or distinguishing markers were identified between the aortic and pulmonary valves.”

      (3) There appear to be no myocardial cells in the adult valve tissue - the authors could discuss what the fate of myocardium is in the embryonic OFT. Are they only looking at a subset of derivatives of the embryonic OFT?

      Our adult dataset represents the aortic valve complex and adjacent arterial root tissue (a subset of outflow tract derivatives) rather than the entire outflow tract (this has now been specified in the title). Spatial transcriptomic analysis identified myocardial gene expression within the ventricular and outflow tract walls at CS16-19, but not within the valve leaflet cluster (Queen et al., 2023). This is consistent with previous observations that myocardium contributes to the arterial root and supports early cushion formation, but does not persist in mature valve tissue, which becomes predominantly fibrous and populated by valve interstitial cells. This explanation has been added to the analysis of cell populations in the valves.

      (4) Please equate Carnegie stages 13-23 to embryonic days or weeks of gestation in the first paragraph to help the general reader.

      We have added the suggested clarification and noted that this period spans four weeks of human development, rather than the three weeks previously indicated. The text has been updated accordingly.

      (5) I suggest rewriting the first sentence of the introduction using the plural, as there are many different types of CHD.

      The sentence has been changed accordingly.

      (6) It would be helpful to add the persistence of embryonic signatures into adult valve cell types in Figure 4E.

      We thank the reviewer for this helpful suggestion. To address this point, we have now added an analysis of the persistence of embryonic signatures in adult valve cell types to Figure 4E. Specifically, we selected 10 representative genes from the 100-gene embryonic signature lists of cluster 4 and cluster 20 and projected their expression onto the t-SNE shown in Figure 4E. The combined (module) expression of these 10 genes is now shown in Figure S6E, and the expression of the individual genes is presented in the newly added Figure S7.

      We would like to clarify that our statistical framework identifies potential descendant populations based on significant enrichment of an embryonic gene signature. Therefore, individual embryonic genes are not necessarily expected to be expressed exclusively or uniformly within a single adult population.

      (7) Please explain how the 2-dimensional plot in 2J relates to the other plots.

      The plot originally shown in Fig 2J (now Fig 2K) was generated by applying RNA velocity exclusively to CS16-17 nuclei. Developmental nuclei (excluding adult samples) were subclustered as shown in Fig S2AB, resulting in the 5 clusters of embryonic nuclei analysed in Fig 2J: cardiac muscle (2, 17), endothelial (7), and mesenchymal (4, 20).

    1. eLife Assessment

      This important study examines the potential role of ARHGAP36 transcriptional regulation by FOXC1 in controlling sonic hedgehog signaling in human neuroblastoma. While there are many solid findings that strongly support this signaling pathway, there are some aspects of the study that are underdeveloped, particularly the generalizability in the context of cancer cells.

    2. Reviewer #1 (Public review):

      This thoughtful and thorough mechanistic and functional study reports ARHGAP36 as a direct transcriptional target of FOXC1 which regulates Hedgehog signaling (SUFU, SMO, and GLI family transcription factors) through modulation of PKAC. Clinical outcome data from patients with neuroblastoma, one of the most common extracranial solid malignancies in children, demonstrate that ARHGAP36 expression is associated with improved survival. Although this study largely represents a robust and near-comprehensive set of focused investigations on a novel target of FOXC1 activity, several significant omissions undercut the generalizability of the findings reports.

      (1) It is notable that the volcano plot in Fig. 1a does now show evidence of canonical Hedgehog gene regulation even though the subsequent studies in this paper clearly demonstrate that ARHGAP36 regulates Hedgehog signal transduction. Is this because canonical Hedgehog target genes (GLI1, PTCH1, SUFU) simply weren't labeled? Or is there a technical limitation that needs to be clarified? A note about Hedgehog target genes is made in conjunction with Table S1, but the justification or basis of defining these genes as Hedgehog targets is unclear. More broadly, it would be useful to see ontology analyses from these gene expression data to understand FOXC1 target genes more broadly. Ontology analyses are included in a supplementary table, but network visualizations would be much preferred.

      (2) Likewise, the ChIP-seq data in Fig. 2 are under-analyzed, focusing only on the ARHGAP36 locus and not more broadly on the FOXC1 gene expression program. This is a missed opportunity that should be remedied with unbiased analyses intersecting differentially expressed FOXC1 peaks with differentially expressed genes from RNA-sequencing data displayed in Fig. 1.

      (3) RNA-seq and ChIP-seq data strongly suggest that FOXC1 regulates ARHGAP36 expression, and the authors convincingly identify genomic segments at the ARHGAP36 locus where FOXC1 binds, but they do not test if FOXC1 specifically activates this locus through the creation of a luciferase or similar promoter reporter. Such a reagent and associated experiments would not only strengthen the primary argument of this investigation but could serve as a valuable resource for the community of scientists investigating FOXC1, ARHGAP36, the Hedgehog pathway, and related biological processes. CRISPRi targeting of the identified regions of the ARHGAP locus is a useful step in the right direction, but these experiments are not done in a way to demonstrate FOXC1 dependency.

      (4) It would be useful to see individual fluorescence channels in association with images in Fig. 3b.

      (5) Perhaps the most significant limitation of this study is the omission of in vivo data, a shortcoming the authors partly mitigate through the incorporation of clinical outcome data from pediatric neuroblastoma patients in the context of ARHGAP36 expression. The authors also mention that high levels of ARHGAP36 expression were also detected in "specific CNS, breast, lung, and neuroendocrine tumors," but do not provide clinical outcome data for these cohorts. Such analyses would be useful to understand the generalizability of their findings across different cancer types. More broadly, how were high, medium, and low levels of ARHGAP36 expression identified? "Terciles" are mentioned, but such an approach is not experimentally rigorous and RPA or related approaches (nested rank statistics, etc) are recommended to find optimal cutpoints for ARHGAP36 expression in the context of neuroblastoma, "specific CNS, breast, lung, and neuroendocrine" tumor outcomes.

      Comments on revisions:

      I am underwhelmed by this revision, for which I recommended more visualizations of already-generated bioinformatic data that the authors have not provided. Some attempts were made (e.g. network analysis), but other suggestions for improvement were not incorporated (e.g. more comprehensive ChIP-seq analysis). Beyond these relatively straightforward missed opportunities for improvement, there remains a lack of in vivo data and the clinical relevance of these findings are unclear due to potential sources of bias in the data sets analyzed.

    3. Reviewer #2 (Public review):

      FOXC1 is a transcription factor essential for the development of neural crest-derived tissues and has been identified as a key biomarker in various cancers. However, the molecular mechanisms underlying its function remain poorly understood. In this study, the authors used RNA-seq, ChIP-seq, and FOXC1-overexpressing cell models to show that FOXC1 directly activates transcription of ARHGAP36 by binding to specific cis-regulatory elements. Elevated expression of FOXC1 or ARHGAP36 was found to enhance Hedgehog (Hh) signaling and suppress PKA activity. Notably, overexpression of either gene also conferred resistance to Smoothened (SMO) inhibitors, indicating ligand-independent activation of Hh signaling. Analysis of public gene expression datasets further revealed that ARHGAP36 expression correlates with improved 5-year overall survival in neuroblastoma patients. Together, these findings uncover a novel FOXC1-ARHGAP36 regulatory axis that modulates Hh and PKA signaling, offering new insights into both normal development and cancer progression.

      Main strengths of the study are:

      (1) Identification of a novel signaling pathway involving FOXC1 and ARHGAP36, which may play a critical role in both normal development and cancer biology. 2) Mechanistic investigation using RNA-seq, ChIP-seq, and functional assays to elucidate how FOXC1 regulates ARHGAP36 and how this axis modulates Hh signaling. 3) Clinical relevance demonstrated through analysis of neuroblastoma patient datasets, linking ARHGAP36 expression to improved 5-year overall survival.

      Comments on revisions:

      Consider adding subsection titles to the Results section to better organize the findings and improve readability.

      The authors may consider adding a statement in paragraph 4 of the Results section or in the Discussion noting that ARHGAP36 has been reported to inhibit PKAC activity and promote PKAC degradation.

    4. Reviewer #3 (Public review):

      Summary:

      The focus of the research is to understand how transcription factors with high expression in neural crest cell derived cancers (e.g., neuroblastoma) and roles in neural crest cell development function to promote malignancy. The focus is on the transcription factor FOXC1 and using murine cell culture, gain- and loss of function approaches and ChIP profiling, among other techniques, to place PKC inhibitor ARHGAP36 mechanistically between FOXC1 and another pathway associated with malignancy, Sonic Hedgehog (SHH).

      Strengths:

      Major strengths are the mechanistic approaches to identify FOXC1 direct targets, definitively showing that FOXC1 transcriptional regulation of ARHGAP36 leads to dysregulation of SHH signaling downstream of ARHGAP36 inhibition of PKC. Starting from a screen of Foxc1 OE to get to ARHGAP36 and then using genetic and pharmacological manipulation to work through the mechanism is very well done. There is data that will be of use to others studying FOXC1 in mesenchymal cell types, in particular the FOXC1 ChIP-seq.

      Weaknesses:

      Work is almost all performed in NIH3T3 or similar cells (mouse cells, not patient or mouse-derived cancer cells) so the link to neuroblastoma that forms the major motivation of the work is not clear. The authors look at ARHGAP36 levels in association the neuroblastoma patient survival however the finding, though interesting and quite compelling, is misaligned with what the literature shows about FOXC1 and SHH, their high expression is associated with increased malignancy (also maybe worse outcomes?). Therefore, ARHGAP36 expression may be more complicated in a tumor cell or may be unrelated to FOXC1 or SHH, leaving one to wonder what the work in NIH3T3 cells, though well done, is telling us about the mechanisms of FOXC1 as an oncogene in neuroblastoma cells or in any type of cancer cell. Does it really function as a SHH activator to drive tumor growth? The 'oncogenic relevance' and 'contribution to malignancy' claimed in the last paragraph of the introduction is currently weakly supported with the data as presented. This could be improved with studying some of these mechanisms in patient-derived neuroblastoma cells with high FOXC1 expression. Does inhibiting FOXC1 change SHH and ARHGAP36 and have any effect on cell proliferation or migration? Alternatively, does OE of FOXC1 in NIH3T3 cells increase their migration or stimulate proliferation in some way and is this dependent on ARHGAP36 or SHH? Application of their mechanistic approaches in cancer cells or looking for hallmarks of cancer phenotypes with FOXC1 OE (and dependent on SHH or ARHGAP36) could help to make a link with cellular phenotypes of malignant cells.

      In the revised manuscript, the authors did not add studies in any malignant cell type (mouse or human, neuroblastoma or other) with Foxc1 overexpression to test if the mechanisms they identify in the mouse fibroblasts is present in cancer cells nor if this relates to cellular phenotypes of malignancy (migration or proliferation). Therefore strengths and weaknesses identified by this reviewer in the prior version are the same.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This thoughtful and thorough mechanistic and functional study reports ARHGAP36 as a direct transcriptional target of FOXC1…… Although this study largely represents a robust and near-comprehensive set of focused investigations on a novel target of FOXC1 activity, several significant omissions undercut the generalizability of the findings reported.

      (1) It is notable that the volcano plot in Figure 1a does now show evidence of canonical Hedgehog gene regulation, even though the subsequent studies in this paper clearly demonstrate that ARHGAP36 regulates Hedgehog signal transduction. Is this because canonical Hedgehog target genes (GLI1, PTCH1, SUFU) simply weren't labeled? Or is there a technical limitation that needs to be clarified? A note about Hedgehog target genes is made in conjunction with Table S1, but the justification or basis of defining these genes as Hedgehog targets is unclear. More broadly, it would be useful to see ontology analyses from these gene expression data to understand FOXC1 target genes more broadly. Ontology analyses are included in a supplementary table, but network visualizations would be much preferred.

      Space constraints precluded labelling the Volcano plot with all 285 significantly differentially expressed genes. So rather than just Hedgehog pathway members, the most dysregulated were labelled (those with a 4-fold change: -2 <log\<sub>2\</sub>> +2) and the full list of DEGs provided in the supplemental excel file. We have added the suggested network analysis, and for additional rigor also included protein interaction partners of Gli1 and Arhgap36 (Fig. S12).

      (2) Likewise, the ChIP-seq data in Figure 2 are under-analyzed, focusing only on the ARHGAP36 locus and not more broadly on the FOXC1 gene expression program. This is a missed opportunity that should be remedied with unbiased analyses intersecting differentially expressed FOXC1 peaks with differentially expressed genes from RNA-sequencing data displayed in Figure 1.

      We agree that genome-wide analysis of ChIP-seq data from Foxc1 over-expression is worthwhile, not least for diverse malignancies where FOXC1 is over-expressed. We chose to restrict the focus of this paper in order to define, as comprehensively as we could, the FOXC1 - ARHGAP36 relationship. Our ChIP and RNA-seq datasets are freely available to other researchers via GEO (GSE297865/GSE297719). Our future manuscript is integrating ChIP-seq and RNA-seq with ATAC-seq: replicate ATAC-seq experiments permit rigorous characterization of genes transcriptionally regulated by Foxc1 as well as Foxc1’s pioneering abilities. However, these additional assays, and particularly validation of findings, take significant time and so lie beyond the scope of the current manuscript.

      (3) RNA-seq and ChIP-seq data strongly suggest that FOXC1 regulates ARHGAP36 expression, and the authors convincingly identify genomic segments at the ARHGAP36 locus where FOXC1 binds, but they do not test if FOXC1 specifically activates this locus through the creation of a luciferase or similar promoter reporter. Such a reagent and associated experiments would not only strengthen the primary argument of this investigation but could serve as a valuable resource for the community of scientists investigating FOXC1, ARHGAP36, the Hedgehog pathway, and related biological processes. CRISPRi targeting of the identified regions of the ARHGAP locus is a useful step in the right direction, but these experiments are not done in a way to demonstrate FOXC1 dependency.

      We agree and undertook the suggested luciferase reporter assays. The results demonstrate that transcriptional activity is dependent on Foxc1 and abrogated by mutation of the predicted Foxc1binding motifs (Fig. S8).

      (4) It would be useful to see individual fluorescence channels in association with images in Figure 3b.

      The figure has been revised to provide individual fluorescence channel data, as suggested.

      (5) Perhaps the most significant limitation of this study is the omission of in vivo data, a shortcoming the authors partly mitigate through the incorporation of clinical outcome data from pediatric neuroblastoma patients in the context of ARHGAP36 expression. The authors also mention that high levels of ARHGAP36 expression were also detected in "specific CNS, breast, lung, and neuroendocrine tumors," but do not provide clinical outcome data for these cohorts. Such analyses would be useful to understand the generalizability of their findings across different cancer types. More broadly, how were high, medium, and low levels of ARHGAP36 expression identified? "Terciles" are mentioned, but such an approach is not experimentally rigorous, and RPA or related approaches (nested rank statistics, etc) are recommended to find optimal cutpoints for ARHGAP36 expression in the context of neuroblastoma, "specific CNS, breast, lung, and neuroendocrine" tumor outcomes.

      The issue of analyzing in vivo data for neuroblastoma is addressed in more detail below, as it is also raised by the other reviewers. The neuroblastoma data represent the initial findings after the Foxc1Arhgap36 link was defined. There is vastly more that could and should be undertaken to determine mechanism(s) for ARHGAP36’s beneficial association with this tumor’ survival. This is the ongoing focus for the lab.

      The original text omitted details of the cancer expression datasets surveyed that revealed high levels of ARHGAP36 expression were also detected in "specific CNS, breast, lung, and neuroendocrine tumors". This oversight has been corrected – when submitting, we omitted to upload a supplemental file (Table S4) that provided these data, which were derived from the following four sites (TCGA, TARGET, PCAWG and CCLE). However, these excellent online resources infrequently provide clinical outcome data.

      The three independent neuroblastoma cohorts were analyzed identically. Each was stratified into an ordered dataset for ARHGAP36 expression, and then divided into three equal-sized groups [terciles]. Stratification into smaller subgroups [quartiles/quintiles] would have been equally feasible. The same methodology is used by the UCSC Xena browser for Kaplan-Meier survival analysis, and offers the advantage of avoiding a priori assumptions; it is thus agnostic regarding the data. We agree that there is scope for additional approaches, including recursive partitioning analyses, but suggest it may be better to reserve these for the future, not least in analyses that test the reported ARHGAP36-survival association in additional neuroblastoma datasets.

      Reviewer #2 (Public review):

      FOXC1 is a transcription factor essential for the development of neural crest-derived tissues and has been identified as a key biomarker in various cancers. … Together, these findings uncover a novel FOXC1-ARHGAP36 regulatory axis that modulates Hh and PKA signaling, offering new insights into both normal development and cancer progression.

      The main strengths of the study are:

      (1) Identification of a novel signaling pathway involving FOXC1 and ARHGAP36, which may play a critical role in both normal development and cancer biology.

      (2) Mechanistic investigation using RNA-seq, ChIP-seq, and functional assays to elucidate how FOXC1 regulates ARHGAP36 and how this axis modulates Hh signaling.

      (3) Clinical relevance demonstrated through analysis of neuroblastoma patient datasets, linking ARHGAP36 expression to improved 5-year overall survival.

      The main weaknesses of the study are:

      (1) Lack of validation in neuroblastoma models - the study does not directly test its findings in neuroblastoma cell models, limiting translational relevance.

      We agree that the mechanisms by which increased ARHGAP36 levels are protective, are important to define. Despite experiments over many months manipulating ARHGAP36 expression, that induce quite rapid death of neuroblastoma cells in vitro, the precise mechanism(s) remain unresolved. Currently, we are endogenously labelling multiple neuroblastoma lines with Histone 2B-mCherry to facilitate live cell imaging and differentiate effects on proliferation and apoptosis. In the interim, we believe publication of the current dataset allows other researchers to independently test our findings for this pediatric malignancy. We are also establishing collaborations to access patient tissue samples, that will facilitate investigation of non cell autonomous mechanisms mediated via the tumor microenvironment.

      (2) Incomplete mechanistic insight into PKA regulation - the study does not fully elucidate how FOXC1-ARHGAP36 regulates PKAC activity at the molecular level.

      Other laboratories elegantly demonstrated that ARHGAP36’s effect on Hedgehog output is mediated by one motif blocking PKAC activity and the targeting of PKAC for degradation [PMIDs 25024229, 27713425, 30598432]. With these effects well-established, we limited experiments to confirming that Foxc1induced Arhgap36 reduced PKAC, and pT197 PKAC levels, to those of ectopic Arhgap36 expression.

      (3) Insufficient discussion of clinical outcome data - while ARHGAP36 expression correlates with improved survival in neuroblastoma, the manuscript lacks a clear interpretation of this unexpected finding, especially given the known oncogenic roles of FOXC1, ARHGAP36, and Hh signaling.

      ARHGAP36 expression may influence neuroblastoma survival via multiple mechanisms. Considering just canonical Hedgehog, possibilities include: cell cycle modulation, symmetric vs asymmetric cell division, maintenance of cancer stem cells, EMT, metastasis… Others include Hedgehog’s anti-apoptotic roles and the diverse mechanisms by which PKA influences cell function and survival. Faced with such diversity, we focused the discussion on what the presented data demonstrate.

      Reviewer #3 (Public review):

      Summary:

      The focus of the research is to understand how transcription factors with high expression in neural crest cell-derived cancers (e.g., neuroblastoma) and roles in neural crest cell development function to promote malignancy. The focus is on the transcription factor FOXC1 and using murine cell culture, gain- and loss-of-function approaches, and ChIP profiling, among other techniques, to place PKC inhibitor ARHGAP36 mechanistically between FOXC1 and another pathway associated with malignancy, Sonic Hedgehog (SHH).

      Strengths:

      Major strengths are the mechanistic approaches to identify FOXC1 direct targets, definitively showing that FOXC1 transcriptional regulation of ARHGAP36 leads to dysregulation of SHH signaling downstream of ARHGAP36 inhibition of PKC. Starting from a screen of Foxc1 OE to get to ARHGAP36 and then using genetic and pharmacological manipulation to work through the mechanism is very well done. There is data that will be of use to others studying FOXC1 in mesenchymal cell types, in particular, the FOXC1 ChIP-seq.

      Weaknesses:

      Work is almost all performed in NIH3T3 or similar cells (mouse cells, not patient or mouse-derived cancer cells), so the link to neuroblastoma that forms the major motivation of the work is not clear. The authors look at ARHGAP36 levels in association with the neuroblastoma patient survival; however, the finding, though interesting and quite compelling, is misaligned with what the literature shows about FOXC1 and SHH, their high expression is associated with increased malignancy (also maybe worse outcomes?). Therefore, ARHGAP36 expression may be more complicated in a tumor cell or may be unrelated to FOXC1 or SHH, leaving one to wonder what the work in NIH3T3 cells, though well done, is telling us about the mechanisms of FOXC1 as an oncogene in neuroblastoma cells or in any type of cancer cell. Does it really function as an SHH activator to drive tumor growth? The 'oncogenic relevance' and 'contribution to malignancy' claimed in the last paragraph of the introduction are currently weakly supported by the data as presented. This could be improved by studying some of these mechanisms in patient-derived neuroblastoma cells with high FOXC1 expression. Does inhibiting FOXC1 change SHH and ARHGAP36 and have any effect on cell proliferation or migration? Alternatively, does OE of FOXC1 in NIH3T3 cells increase their migration or stimulate proliferation in some way, and is this dependent on ARHGAP36 or SHH? Application of their mechanistic approaches in cancer cells or looking for hallmarks of cancer phenotypes with FOXC1 OE (and dependent on SHH or ARHGAP36) could help to make a link with cellular phenotypes of malignant cells.

      The manuscript stems from the lab’s findings that Foxc1 influences cilia-mediated signaling (Hedgehog and PDGFRalpha), offering an explanation for FOXC1’s pleiotropic phenotypes. Due to FOXC1’s largely unexplained roles in malignancy, the effects on Hedgehog prompted investigation of differential gene expression in NIH3T3 cells when Foxc1 was over-expressed. This identified Arhgap36 as a prime candidate for the Hedgehog pathway alterations, and most of the paper reports the characterization of this relationship. The final, small component of the paper, tests the relevance in neural crest derived cells, where Foxc1 has key roles. Neuroblastoma’s frequent lethality has created a network of highly supportive researchers with shared datasets, and these survival data were assayed. This in turn revealed that high levels of ARHGAP36 expression were associated with a favorable survival outcome.

      Defining the underlying molecular mechanisms for this novel association is clearly important. As outlined above, one challenge reflects the diversity of potential mechanisms, coupled with the requirement to validate those identified from 2-D culture in patient-derived tumor explants as well as immuno-deficient model organisms. Such experiments take significant time, and our present focus is on manipulating ARHGAP36 expression directly, rather than by altering FOXC1 expression, which inevitably has even more diverse effects.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The study would be strengthened by validating key findings, such as the resistance to Hh inhibition, in neuroblastoma cell lines to enhance disease relevance.

      Planned future experiments include in vitro evaluation of PKA antagonists and agonists on neuroblastoma survival.

      The authors show that FOXC1/ARHGAP36 reduces PKAC protein levels; however, it is unclear whether this regulation occurs at the transcriptional level. Assessing PKAC mRNA expression would help explain the mechanism. Additionally, if PKAC is transcriptionally downregulated, overexpression of PKAC can be used to test whether it reverses the FOXC1/ARHGAP36induced activation of Hh signaling.

      The RNA-sequencing data exclude this possibility at the transcriptional level, since PKA is not significantly differentially expressed (Table S1). Instead, Figures 1&3 support Foxc1 inducing Arhgap36 expression, with elevated Arhgap36 protein levels reducing those of PKAC and catalytically active pT197 PKAC, in both the cytoplasm and adjacent to the basal body.

      The Discussion should address the potential effects of ARHGAP36 overexpression on other signaling pathways-particularly Hh and PKA signaling and PKA in neuroblastoma. These effects may help interpret the observed association between ARHGAP36 expression and clinical outcomes in patients. Of note, it has been reported that Hh may correlate with better survival in neuroblastoma (Cancers, 2021 Apr 15;13(8):1908; J Pediatr Surg. 2010 Dec;45(12):2299).

      Both Hedgehog signaling and protein kinase A have broad effects on normal cell biology, that are likely more extensive in malignant cells. Consequently, although tempting to propose why ARHAGP36 overexpression is associated with enhanced survival, it may be better to wait until the causative mechanisms have been defined.

      If treatment information for the patient cohorts is available, it should be included as it may enhance the interpretability of the survival analyses.

      This is an excellent suggestion, although at present this information is not available to us. As the manuscript moves forward to publication, we will be liaising with the corresponding authors of the three datasets [GSE49711, E-MTAB-178191 and TARGET] to explore such additional clinical possibilities.

      The 'A' label in Figures S9 and S10 should be removed, as neither figure contains sub-panels.

      This has been corrected, as suggested.

      Reviewer #3 (Recommendations for the authors):

      Other comments:

      (1) Figure 5A, B: Unclear how meaningful the inhibitor experiments are in the absence of SHH (presumable none in the media or made by NIH3T3 cells?), other than as a control for the FOXC1 OE treated with Smo antagonists. A potentially better experiment could be to take malignant cells with high FOXC1 and high SHH signaling and put on Smo inhibitors.

      Figure 5A demonstrates Foxc1’s induction of GLI1 expression is not dependent on Hedgehog ligand. While certainly feasible to repeat in malignant cells strongly expressing FOXC1, doing this comprehensively would require testing lines from many or all of the ~15 malignancies where FOXC1 has a defined contribution.

      (2) Figure 6: the Gli2-mGFP seem to have higher levels of ciliary Sufu, they also have higher levels of Gli1 (see Figure 1C), does the Gli2-mGFP expression change SHH signaling? What controls have the authors done to test if this is a serious confound in their studies? They use it for most experiments, this is important to address.

      Although Gli2-mGFP expression affects Hedgehog signaling, in the absence of Gli2 (e.g. untransformed NIH3T3) Foxc1 induces Arhgap36 expression. The scope for interaction between Foxc1 and Gli2 represents an additional motivation for the ATAC-seq experiments described above to better determine if these two transcription factors have synergistic effects.

      (3) Figure 3B: (1) Please use color-blind friendly LUTs for the signals (same comment for other figures), (2) The Gli2-mGFP line with the current color scheme is confusing; it looks like only 647 and 555 secondaries were used, did they not image with the mGFP? Why not? (3) What is the evidence that these are basal bodies? (4) Why did the authors use cycloheximide in these IF experiments? Was this also done in other methods? The reasoning behind this is missing.

      For now, we have included separate channels for Figure 3. In future manuscripts we will adopt the suggestion of moving to either magenta and green, or cyan and magenta combinations for depicting immunofluorescence.

    1. eLife Assessment

      This valuable study utilizes a newly developed approach to culture T gondii bradyzoites in myotubes, and then takes advantage of the antiparasitic compound collection known as the Pathogen Box, to find compounds that target both tachyzoite and bradyzoite forms of the parasite. A set of compounds yielding patterns consistent with targeting the mitochondrial bc1 complex was explored further, with convincing evidence for changes in ATP production in bradyzoites to support the conclusions about the importance of this complex. The paper will be interesting for parasitologists studying drug discovery of apicomplexan parasites.

    2. Reviewer #1 (Public review):

      Summary:

      The authors' goal was to advance the understanding of metabolic flux in the bradyzoite cyst form of the parasite T. gondii, since this is a major form of transmission of this ubiquitous parasite, but very little is understood about cyst metabolism and growth. This is an important advance in understanding and targeting bradyzoite growth.

      Strengths:

      The study used a newly developed technique for growing T. gondii cystic parasites in a human muscle-cell myotube format, which enables culturing and analysis of cysts. This enabled screening of a set of anti-parasitic compounds to identify those that inhibit growth in both vegetative (tachyzoite) forms and bradyzoites (cysts). Three of these compounds were used for comparative Metabolomic profiling to demonstrate differences in metabolism between the two cellular forms.<br /> One of the compounds yielded a pattern consistent with targeting the mitochondrial bc1 complex, and suggest a role for this complex in metabolism in the bradyzoite form, an important advance in understanding this life stage.

      Weaknesses:

      Studies such as these provide important insights into the overall metabolic differences between different life stages, and they also underscore the challenge with interpreting individual patterns caused by metabolic inhibitors due to the systemic level of some of the targets. The authors have employed mock treatment and non-metabolic inhibitor controls to alleviate these challenges.

    3. Reviewer #2 (Public review):

      Summary:

      A particular challenge in treating infections caused by the parasite Toxoplasma gondii is to target (and ultimately clear) the tissue cysts that persist for the lifetime of an infected individual. The study by Maus and colleagues leverages the development of a powerful in vitro culture system for the cyst-forming bradyzoite stage of Toxoplasma parasites to screen a compound library for candidate inhibitors of parasite proliferation and survival. They identify numerous inhibitors capable of inhibiting both the disease-causing tachyzoite and the cyst-forming bradyzoite stages of the parasite. To characterize the potential targets of some of these inhibitors, they undertake metabolomic analyses. The metabolic signatures from these analyses lead them to identify one compound (MMV1028806) that interferes with aspects of parasite mitochondrial metabolism. In the revised version of the manuscript, the authors present convincing evidence that MMV1028806 targets the mitochondrial electron transport (ETC) chain of the parasite (although they don't identify the actual target in the ETC). The revised manuscript also nicely addresses my other criticisms of the original version. Overall, the study presents an exciting approach for identifying and characterizing much-needed inhibitors for targeting tissue cysts in these parasites.

      Strengths:

      The study presents convincing proof-of-principle evidence that the myotube-based in vitro culture system for T. gondii bradyzoites can be used to screen compound libraries, enabling the identification of compounds that target the proliferation and/or survival of this stage of the parasite. The study also utilizes metabolomic approaches to characterize metabolic 'signatures' that provide clues to the potential targets of candidate inhibitors. In addition to insights into candidate bradyzoite inhibitors, the study also provides new insights into the physiological role of the mitochondrial electron transport chain of bradyzoites, and raises a host of interesting questions around the functional roles of mitochondria in this stage of the parasite.

      Weaknesses:

      As noted in my previous review, the authors present convincing evidence that one of the compounds they have identified (MMV1028806) is targeting the mitochondrial electron transport chain (ETC). However, in the absence of an assay that directly measures bc1 activity (e.g. an enzymatic assay), they cannot be certain that it targets the bc1 complex in the ETC. I appreciate that the authors have toned down some of the conclusions around this. I do still think there are some places where the text is overstating the finding (noted below).

      Line 30. "Stable isotope-resolved metabolic profiling on tachyzoites and bradyzoites identified the mitochondrial bc1-complex as a target of bradyzocidal compounds".

      Line 546. "Metabolic profiling and stable isotope tracing in treated tachyzoites suggested the inhibition of the mitochondrial bc1-complex by MMV1028806 and the reference compound BPQ."

      Line 622. "In addition to abundance data, the incorporation of ¹³C and ¹⁵N stable isotopes from glucose and glutamine, respectively, into TCA cycle and pyrimidine biosynthesis intermediates suggest the bc1-complex as a target."

    4. Reviewer #3 (Public review):

      Summary:

      The authors described an exciting 400-drug screening using a MMV pathogen box to select compounds that effectively affects the medically important Toxoplasma parasite bradyzoite stage. This work utilises a bradyzoites culture technique that was published recently by the same group. They focused on compounds that affected directly the mitochondria electron transport chain (mETC) bc1-complex and compared with other bc1 inhibitors described in the literature such as atovaquone and HDQs. They further provide metabolomics analysis of inhibited parasites which serves to provide support for the target and to characterise the outcome of the different inhibitors.

      Strengths:

      This work is important as, until now, there are no effective drugs that clear cysts during T. gondii infection. So, the discovery of new inhibitors that are effective against this parasite-stage in culture and thus have the potential to battle chronic infection is needed. The further metabolic characterization provides indirect target validation and highlight different metabolic outcome for different inhibitors. The latter forms the basis for new studied in the field to understand the mode of inhibition and mechanism of bc1-complex function in detail.

      The authors focused in the function of one compound, MMV1028806, that is demonstrated to have a similar metabolic outcome to burvaquone. Furthermore, the authors evaluated the importance of ATP production in tachyzoite and bradyzoites stages and under atovaquone/HDQs drugs.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors' goal was to advance the understanding of metabolic flux in the bradyzoite cyst form of the parasite T. gondii, since this is a major form of transmission of this ubiquitous parasite, but very little is understood about cyst metabolism and growth.

      Nonetheless, this is an important advance in understanding and targeting bradyzoite growth.

      Strengths:

      The study used a newly developed technique for growing T. gondii cystic parasites in a human muscle-cell myotube format, which enables culturing and analysis of cysts. This enabled screening of a set of anti-parasitic compounds to identify those that inhibit growth in both vegetative (tachyzoite) forms and bradyzoites (cysts). Three of these compounds were used for comparative Metabolomic profiling to demonstrate differences in metabolism between the two cellular forms.

      One of the compounds yielded a pattern consistent with targeting the mitochondrial bc1 complex, and suggest a role for this complex in metabolism in the bradyzoite form, an important advance in understanding this life stage.

      Weaknesses:

      Studies such as these provide important insights into the overall metabolic differences between different life stages, and they also underscore the challenge with interpreting individual patterns caused by metabolic inhibitors due to the systemic level of some of some targets, so that some observed effects are indirect consequences of the inhibitor action. While the authors make a compelling argument for focusing on the role of the bc1 complex, there are some inconsistencies in the some patterns that underscore the complexity of metabolic systems.

      Thank you for reviewing the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      A particular challenge in treating infections caused by the parasite Toxoplasma gondii is to target (and ultimately clear) the tissue cysts that persist for the lifetime of an infected individual. The study by Maus and colleagues leverages the development of a powerful in vitro culture system for the cyst-forming bradyzoite stage of Toxoplasma parasites to screen a compound library for candidate inhibitors of parasite proliferation and survival. They identify numerous inhibitors capable of inhibiting both the disease-causing tachyzoite and the cyst-forming bradyzoite stages of the parasite. To characterize the potential targets of some of these inhibitors, they undertake metabolomic analyses. The metabolic signatures from these analyses lead them to identify one compound (MMV1028806) that interferes with aspects of parasite mitochondrial metabolism. In the revised version of the manuscript, the authors present convincing evidence that MMV1028806 targets the mitochondrial electron transport (ETC) chain of the parasite (although they don't identify the actual target in the ETC). The revised manuscript also nicely addresses my other criticisms of the original version. Overall, the study presents an exciting approach for identifying and characterizing much-needed inhibitors for targeting tissue cysts in these parasites.

      Strengths:

      The study presents convincing proof-of-principle evidence that the myotube-based in vitro culture system for T. gondii bradyzoites can be used to screen compound libraries, enabling the identification of compounds that target the proliferation and/or survival of this stage of the parasite. The study also utilizes metabolomic approaches to characterize metabolic 'signatures' that provide clues to the potential targets of candidate inhibitors. In addition to insights into candidate bradyzoite inhibitors, the study also provides new insights into the physiological role of the mitochondrial electron transport chain of bradyzoites, and raises a host of interesting questions around the functional roles of mitochondria in this stage of the parasite.

      Weaknesses:

      In the revised manuscript, the authors have included additional oxygen consumption rate data that indicate that MMV1028806 targets the mitochondrial electron transport chain (ETC). These data are convincing. On line 481, the authors state that "treatments with ATQ, BPQ, MMV1028806, and antimycin A resulted in substantially reduced oxygen consumption levels relative to the DMSO control and suggest indeed a blockage of the mETC consistent with the inhibition of the bc1-complex." The OCR assay the authors use is still only an indirect measure of bc1 activity. Given that most OCR-inhibiting compounds in T. gondii are bc1 inhibitors, it is possible (and perhaps likely) that MMV1028806 is targeting this complex. However, the data cannot rule out that it is targeting another component of the ETC (or potentially even a TCA cycle enzyme). Without a direct test that MMV1028806 inhibits bc1 complex activity, the authors should be more cautious in their interpretation (e.g. by acknowledging the limitations of their conclusion, or acknowledging other possible targets). Similarly, the conclusion on line Line 622 that "... we confirmed the bc1-complex as a target" is overstating the findings. The phrasing on lines 683-695 is more appropriate: "... suggesting that it also targets complex III or a functionally linked site within the mitochondrial electron transport chain."

      We are grateful for he thorough review of the updated manuscript and the identification the minor issues. We addressed all of them as detailed below. We also tempered our conclusions regarding the identification of the bc1-complex as a target in line 616:

      “In addition to abundance data, Additionally, we confirmed the bc1-complex as a target by monitoring the incorporation of <sup>13</sup>C and <sup>15</sup>N stable isotopes from glucose and glutamine, respectively, into TCA cycle and pyrimidine biosynthesis intermediates suggest the bc1-complex as a target”

      Reviewer #3 (Public review):

      Summary:

      The authors described an exciting 400-drug screening using a MMV pathogen box to select compounds that effectively affect the medically important Toxoplasma parasite bradyzoite stage. This work utilises a bradyzoites culture technique that was published recently by the same group. They focused on compounds that affected directly the mitochondria electron transport chain (mETC) bc1-complex and compared with other bc1 inhibitors described in the literature such as atovaquone and HDQs. They further provide metabolomics analysis of inhibited parasites which serves to provide support for the target and to characterise the outcome of the different inhibitors.

      Strengths:

      This work is important as, until now, there are no effective drugs that clear cysts during T. gondii infection. So, the discovery of new inhibitors that are effective against this parasite-stage in culture and thus have the potential to battle chronic infection is needed. The further metabolic characterization provides indirect target validation and highlight different metabolic outcome for different inhibitors. The latter forms the basis for new studies in the field to understand the mode of inhibition and mechanism of bc1-complex function in detail.

      The authors focused in the function of one compound, MMV1028806, that is demonstrated to have a similar metabolic outcome to burvaquone. Furthermore, the authors evaluated the importance of ATP production in tachyzoite and bradyzoites stages and under atovaquone/HDQs drugs.

      Thank you for reviewing the revised manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Thanks for making appropriate updates. I believe it makes the report stronger. Just please double-check proof-reading in newly added text: for example "integration" is misspelled in Figure 4 legend (C, E).

      Typos have been corrected throughout the manuscript.

      Reviewer #2 (Recommendations for the authors):

      I congratulate the authors on an excellent study. I have several minor comments for the authors to consider before publication.

      Line 99. Schistosoma –

      Corrected

      Line 123. What was the pH of the bicarb-free RPMI medium?

      Added “at pH 7.2”

      Line 218 (and again on line 687). "RHku80" - are these just standard RH strain parasites? Or do the authors mean to imply that the ku80 gene has been knocked out in this line? If the latter, RH∆ku80 may be a better way to describe this line.

      We harmonized all mentions of this strain to RH∆ku80.

      Line 225. "Parasites were incubated in medium with one of the following treatments ..." How long were the parasites incubated in the different treatments before the plate was read? Was there any preincubation? I think not, but it would help to state this so the reader can appreciate that the effects of the compounds on OCR is likely an immediate (rather than a secondary) effect.

      This is indeed a good suggestion. There was no pre-incubation and we added changed the text to: “Parasites were incubated in medium with one of the following treatments immediately before measurement: … “

      Figure S2A. Check the spelling of Toxoplasmosis.

      Done, we corrected this sentence.

      Figure S2B. do you mean 'tachyzoidal' or 'tachyzocidal'? 'bradyzoidal' or 'bradyzocidal'?

      We clarified the formulation of the legends for Fig S2.

      Figure S2D. The "Tachyzoite lowest cytotoxicity" and "Bradyzoite lowest cytotoxicity" columns are, I think, depicting compound toxicity in host cells. Would it be clearer to rename these columns relative to the host cells being tested? e.g. "HFF/KD3 myotube lowest cytotoxicity"

      Good suggestion and we changed the designation accordingly.

      Line 369. "We found that tachyzocidal, bradyzocidal and dually active compounds possess a statistically significantly higher lipophilicity and this trend appeared more accentuated for bradyzocidal and dually active compounds." Significantly higher than what? Need to be clearer about the comparison being made: i.e. to non-active compounds.

      You are correct and we corrected this sentence accordingly.

      Line 500. "we attribute these changes to inhibition of host mitochondria (Fig. 5A)." The reason for referencing Figure 5A here isn't clear. Do the authors mean to point out that host mitochondrial membrane potential is affected by compound treatment? This could be stated more clearly.

      We deleted the reference to Fig 5A. We did not systematically measure the effect of the inhibitors on the membrane potential of the host mitochondria. We also changed the sentence to emphasize the speculative nature of this assertion: “we attribute these changes to potential inhibitory effects on host mitochondria”.

      Line 840. 'hurdling mechanisms'. The authors don't explain what they mean by this expression.

      We truncated the figure title to: “Untargeted metabolomic analysis of bradyzoites treated with bc1-complex inhibitors shows an energy imbalance.”

    1. eLife assessment

      This study presents an important finding of dynamic reprogramming of global H3K4me2 during mouse oocyte-to-embryo transition. While the H3K4me2 epigenome data is convincing, the interpretation and the potential mechanistic claims of the authors are incomplete in the current shape with the primary concerns regarding the contribution of Kdm1b or Kdm1a, as well as the specificity of the inhibitor and the antibody. The work will be of interest to researchers interested in epigenetic reprogramming.

    2. Reviewer #1 (Public Review):

      By mapping H3K4me2 in mouse oocytes and pre-implantation embryos, the authors aim to elucidate how this histone modification is erased and re-established during the parental-to-zygotic transition, as well as how the reprogramming of H3K4me2 regulates gene expression and facilitates zygotic genome activation.

      Employing an improved CUT&RUN approach, the authors successfully generated H3K4me2 profiling data from a limited number of embryos. While the profiling experiments are very well executed, several weaknesses, particularly in data analysis, are apparent:

      (1) The study emphasizes H3K4me2, which often serves as a precursor to H3K4me3, a well-studied modification during early development. Analyzing the new H3K4me2 dataset alongside published H3K4me3 data is crucial for comprehensively understanding epigenetic reprogramming post-fertilization and the interplay between histone modifications. However, the current analysis is preliminary and lacks depth.

      (2) Tranylcypromine (TCP) is known as an irreversible inhibitor of monoamine oxidase and LSD1. While the authors suggest TCP inhibits the expression of LSD2, this assertion is questionable. Given TCP's potential non-specific effects in cells, conclusions related to the experiments using TCP should be made with caution.

      (3) Some batches of H3K4me2 antibody are known to cross-react with H3K4me3. Has the H3K4me2 antibody used in CUT&RUN been tested for such cross-reactivity? Heatmaps in the figures indeed show similar distribution for H3K4me2 and H3K4me3, further raising concerns about antibody specificity.

      (4) Certain statements lack supporting references or figures (examples on page 9 can be found on line 245, line 254, and line 258).

      (5) Extensive language editing is recommended to clarify ambiguous sentences. Additionally, caution should be taken to avoid overstatement - most analyses in this study only suggest correlation rather than causality.

    3. Reviewer #2 (Public Review):

      Chong Wang et al. investigated the role of H3K4me2 during the reprogramming processes in mouse preimplantation embryos. The authors show that H3K4me2 is erased from GV to MII oocytes and re-established in the late 2-cell stage by performing Cut & Run H3K4me2 and immunofluorescence staining. Erasure and re-establishment of H3K4me2 have not been studied well, and profiling of H3K4me2 in germ cells and preimplantation embryos is valuable to understanding the reprogramming process and epigenetic inheritance.

      (1) The authors claim that the Cut & Run worked for MII oocytes, zygotes, and the 2-cell embryos. However, it is unclear if H3K4me2 is erased during the stage or if the Cut & Run did not work for these samples. To support the hypothesis of the erasure of H3K4me2, the authors conducted immunofluorescence staining, and H3k4me2 was undetected in the MII oocyte, PN5, and 2-cell stage. However, the published papers showed strong staining of H3K4me2 at the zygote stage and 2-cell stage ((Ancelin et al., 2016; Shao et al., 2014)). The authors need to cite these papers and discuss the contradictory findings.

      The authors used 165 MII oocytes and 190 GV oocytes for the Cut & Run. The amount of DNA in MII oocytes is halved because of the emission of the first polar body. Would it be a reason that H3K4me2 has fewer H3K4me2 peaks in MII oocytes than GV oocytes?

      In Figure 3C, 98% (13,183/13,428) of H3K4me2 marked genes in GV oocytes overlap with those in the 4-cell stage. Furthermore, 92% (14,049/15,112) of H3K4me2 marked genes in sperm overlap with those in the 4-cell stage. Therefore, most regions maintain germ line-derived H3K4me2 in the 4-cell stage. The authors need to clarify which regions of germ line-derived H3K4me2 are maintained or erased in preimplantation embryos. Additionally, it would be interesting to investigate which regions show the parental allele-specific H3K4me2 in preimplantation embryos since the authors used hybrid preimplantation embryos (B6 x DBA).

      (2) The authors claim that Kdm1a is rarely expressed during mouse embryonic development (Figure 4A). However, the published paper showed that KDM1a is present in the zygote and 2-cell stage using immunostaining and western blotting ((Ancelin et al., 2016)). Additionally, this paper showed that depletion of maternal KDM1A protein results in developmental arrest at the two-cell stage, and therefore, KDM1a is functionally important in early development. The authors should have cited the paper and described the role of KDM1a in early embryos.

      (3) The authors used the published RNA data set and interpreted that KDM1B (LSD2) was highly expressed at the MII stage (Figure S3A). However, the heat map shows that KDM1B expression is high in growing oocytes but not at 8w_oocytes and MII oocytes. The authors need to interpret the data accurately.

      (4) All embryos in the TCP group were arrested at the four-cell stage. Embryos generated from KDM1b KO females can survive until E10.5 (Ciccone et al., 2009); therefore, TCP-treated embryos show a more severe phenotype than oocyte-derived KDM1b deleted embryos. Depletion of maternal KDM1A protein results in developmental arrest at the two-cell stage ((Ancelin et al., 2016)). The authors need to examine whether TCP treatment affects KDM1a expression. Western blotting would be recommended to quantify the expression of KDM1A and KDM1B in the TCP-treated embryos.

      (5) H3K4me2 is increased dramatically in the TCP-treated embryos in Figure 4 (the intensity is 1,000 times more than the control). However, the Cut & Run H3K4me2 shows that the H3K4me2 signal is increased in 251 genes and decreased in 194 genes in the TCP-treated embryos (Fold changes > 2, P < 0.01). The authors need to explain why the gain of H3K4me2 is less evident in the Cut & Run data set than in the immunofluorescence result.

      References

      Ancelin, K., ne Syx, L., Borensztein, M., mie Ranisavljevic, N., Vassilev, I., Briseñ o-Roa, L., Liu, T., Metzger, E., Servant, N., Barillot, E., Chen, C.-J., Schü le, R., & Heard, E. (2016). Maternal LSD1/KDM1A is an essential regulator of chromatin and transcription landscapes during zygotic genome activation. https://doi.org/10.7554/eLife.08851.001

      Ciccone, D. N., Su, H., Hevi, S., Gay, F., Lei, H., Bajko, J., Xu, G., Li, E., & Chen, T. (2009). KDM1B is a histone H3K4 demethylase required to establish maternal genomic imprints. Nature, 461(7262), 415-418. https://doi.org/10.1038/nature08315

      Shao, G. B., Chen, J. C., Zhang, L. P., Huang, P., Lu, H. Y., Jin, J., Gong, A. H., & Sang, J. R. (2014). Dynamic patterns of histone H3 lysine 4 methyltransferases and demethylases during mouse preimplantation development. In Vitro Cellular and Developmental Biology - Animal, 50(7), 603-613. https://doi.org/10.1007/s11626-014-9741-6

    4. Reviewer #3 (Public Review):

      Summary:

      This study explores the dynamic reprogramming of histone modification H3K4me2 during the early stages of mammalian embryogenesis. Utilizing the advanced CUT&RUN technique coupled with high-throughput sequencing, the authors investigate the erasure and re-establishment of H3K4me2 in mouse germinal vesicle (GV) oocytes, metaphase II (MII) oocytes, and early embryos.

      Strengths:

      The findings provide valuable insights into the temporal and spatial dynamics of H3K4me2 and its potential role in zygotic genome activation (ZGA).

      Weaknesses:

      The study primarily remains descriptive at this point. It would be advantageous to conduct further comprehensive functional validation and mechanistic exploration.<br /> Key areas for improvement include enhancing the innovation and novelty of the study, providing robust functional validation, establishing a clear model for H3K4me2's role, and addressing technical and presentation issues. The text would benefit from the introduction of a novel conceptual framework or model that provides a clear explanation of the functional consequences and molecular mechanisms underlying H3K4me2 reprogramming in the transition from parental to early embryonic development.

      While the findings are significant, the current manuscript falls short in several critical areas. Addressing major and minor issues will significantly strengthen the study's contribution to the field of epigenetic reprogramming and embryonic development.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      By mapping H3K4me2 in mouse oocytes and pre-implantation embryos, the authors aim to elucidate how this histone modification is erased and re-established during the parental-to-zygotic transition, as well as how the reprogramming of H3K4me2 regulates gene expression and facilitates zygotic genome activation.

      Employing an improved CUT&RUN approach, the authors successfully generated H3K4me2 profiling data from a limited number of embryos. While the profiling experiments are very well executed, several weaknesses, particularly in data analysis, are apparent:

      (1) The study emphasizes H3K4me2, which often serves as a precursor to H3K4me3, a well-studied modification during early development. Analyzing the new H3K4me2 dataset alongside published H3K4me3 data is crucial for comprehensively understanding epigenetic reprogramming post-fertilization and the interplay between histone modifications. However, the current analysis is preliminary and lacks depth.

      Thank you very much for your valuable suggestions. The data of histone H3K4me3 in humans and mice has been published,and our previous data revealed the unique pattern of H3K4me3 during early human embryos and oocytes (Science. 2019 Jul 26;365(6451):353-360.) . So, this study mainly focuses on the localization of H3K4me2 in mouse oocytes and preimplantation embryos, how it is erased and re-established during mammalian parental-to-zygote transition, and its function. The combined analysis of H3K4me2 and H3K4me3 is not our main work, but it is not ruled out that there may be new discoveries between these two histones. Previously, our data tended to show that the H3K4me2 not only acts as a precursor of H3K4me3, but also plays its role independently.

      (2) Tranylcypromine (TCP) is known as an irreversible inhibitor of monoamine oxidase and LSD1. While the authors suggest TCP inhibits the expression of LSD2, this assertion is questionable. Given TCP's potential non-specific effects in cells, conclusions related to the experiments using TCP should be made with caution.

      Thank you for pointing this out, and we thank the reviewer again for the important suggestion. We found that the previous study (.Binda C, Valente S, Romanenghi M, Pilotto S, Cirilli R, Karytinos A, Ciossani G, Botrugno OA, Forneris F, Tardugno M, Edmondson DE, Minucci S, Mattevi A, Mai A. Biochemical, structural, and biological evaluation of tranylcypromine derivatives as inhibitors of histone demethylases LSD1 and LSD2. J Am Chem Soc. 2010 May 19;132(19):6827-33.) indicated that TCP was a non-reversible inhibitor of LSD1 and LSD2 (Human LSD2/KDM1b/AOF1 Regulates Gene Transcription by Modulating Intragenic H3K4me2 Methylation, Mol Cell. 2010 Jul 30; 39(2): 222–233.), but according to our data, the content of LSD1 was very low in the early stages of mouse embryos, which mainly inhibited the function of LSD2.

      (3) Some batches of H3K4me2 antibody are known to cross-react with H3K4me3. Has the H3K4me2 antibody used in CUT&RUN been tested for such cross-reactivity? Heatmaps in the figures indeed show similar distribution for H3K4me2 and H3K4me3, further raising concerns about antibody specificity.

      We thank the reviewer for the insightful comments. The H3K4me2 antibody was purchased from Millipore (cat. 07030). Figure 2A shows the specific enrichment area of H3K4me2 in promoter and distal region. Some batches of H3K4me2 antibody are known to cross-react with H3K4me3, but the H3K4me2 antibody we used in our CUT&RUN seems to have Low cross-reactivity.

      (4) Certain statements lack supporting references or figures (examples on page 9 can be found on line 245, line 254, and line 258).

      Thank you for pointing this out, and we will add references to support the statement in the paper as suggested.

      (5) Extensive language editing is recommended to clarify ambiguous sentences. Additionally, caution should be taken to avoid overstatement - most analyses in this study only suggest correlation rather than causality.

      Thank you for your kind comments. We will revise the expression in the manuscript later.

      Reviewer #2 (Public Review):

      Chong Wang et al. investigated the role of H3K4me2 during the reprogramming processes in mouse preimplantation embryos. The authors show that H3K4me2 is erased from GV to MII oocytes and re-established in the late 2-cell stage by performing Cut & Run H3K4me2 and immunofluorescence staining. Erasure and re-establishment of H3K4me2 have not been studied well, and profiling of H3K4me2 in germ cells and preimplantation embryos is valuable to understanding the reprogramming process and epigenetic inheritance.

      (1) The authors claim that the Cut & Run worked for MII oocytes, zygotes, and the 2-cell embryos. However, it is unclear if H3K4me2 is erased during the stage or if the Cut & Run did not work for these samples. To support the hypothesis of the erasure of H3K4me2, the authors conducted immunofluorescence staining, and H3k4me2 was undetected in the MII oocyte, PN5, and 2-cell stage. However, the published papers showed strong staining of H3K4me2 at the zygote stage and 2-cell stage ((Ancelin et al., 2016; Shao et al., 2014)). The authors need to cite these papers and discuss the contradictory findings.

      The authors used 165 MII oocytes and 190 GV oocytes for the Cut & Run. The amount of DNA in MII oocytes is halved because of the emission of the first polar body. Would it be a reason that H3K4me2 has fewer H3K4me2 peaks in MII oocytes than GV oocytes?

      First of all, thank you for your valuable advice. The published papers showed strong staining of H3K4me2 at the zygote stage and 2-cell stage (Ancelin et al., 2016), which is interesting. I think we may have used different parameters in the confocal laser shooting process. We used the same parameter to continuously shoot the blastocyst stage from the GV stage. If we only shot the fertilized egg and the 2-cell stage, I think we may also see weak fluorescence at the 2-cell stage under different parameters. We will refer to this reference and discuss it in the resubmitted version.

      Moreover, you mentioned the H3K4me2 has fewer H3K4me2 peaks in MII oocytes than GV oocytes, because the MII expelled the polar body. There is no problem with this logic. However, the first polar body expelled from the MII stage is still in the zona pellucida, and we also collected the polar body in the CUT&RUN experiment; Therefore, compared to GV, the DNA content of MII samples is not halved. After further discussion, we believe that the reduction of H3K4me2 peaks in MII stage compared with GV stage may be closely related to oocyte maturation. It is the specific modification of histones in different forms at different times that affects the chromatin structure change appropriately with the different stages of meiosis. At present, it has been confirmed that H3K4me3 gradually decreases from GV to MII stage during the maturation of human oocytes. H3K27me3 did not change from GV to MII stage.

      In Figure 3C, 98% (13,183/13,428) of H3K4me2 marked genes in GV oocytes overlap with those in the 4-cell stage. Furthermore, 92% (14,049/15,112) of H3K4me2 marked genes in sperm overlap with those in the 4-cell stage. Therefore, most regions maintain germ line-derived H3K4me2 in the 4-cell stage. The authors need to clarify which regions of germ line-derived H3K4me2 are maintained or erased in preimplantation embryos. Additionally, it would be interesting to investigate which regions show the parental allele-specific H3K4me2 in preimplantation embryos since the authors used hybrid preimplantation embryos (B6 x DBA).

      Thank you very much for your suggestion. Further analysis of which regions show the parental allele-specific H3K4me2 in preimplantation embryos will make the study more interesting. We will discuss this in depth in resubmitted vision.

      (2) The authors claim that Kdm1a is rarely expressed during mouse embryonic development (Figure 4A). However, the published paper showed that KDM1a is present in the zygote and 2-cell stage using immunostaining and western blotting (Ancelin et al., 2016). Additionally, this paper showed that depletion of maternal KDM1A protein results in developmental arrest at the two-cell stage, and therefore, KDM1a is functionally important in early development.

      The authors should have cited the paper and described the role of KDM1a in early embryos.

      In the analysis of this experiment, we believe that in the early embryonic development of mice, the expression of KDM1A is lower than that of KDM1B, which is relative. Similarly, the transcriptome data we cite also show that KDM1A is expressed at elevated levels during oocyte maturation and fertilization compared to immature oocytes. In addition, the effects of loss of maternal KDM1a on embryonic development were not discussed. We believe that the absence of maternal KDM1b blocks embryonic development, and we will cite and discus the references later.

      (3) The authors used the published RNA data set and interpreted that KDM1B (LSD2) was highly expressed at the MII stage (Figure S3A). However, the heat map shows that KDM1B expression is high in growing oocytes but not at 8w_oocytes and MII oocytes. The authors need to interpret the data accurately.

      After re-checking the data, we found that there was a problem with the normalization method of our heat map, and we will re-make the heatmap and submit it in the modified version. With reference to Figure 4A, the content of Kdm1b is indeed higher than that of Kdm1a.

      (4) All embryos in the TCP group were arrested at the four-cell stage. Embryos generated from KDM1b KO females can survive until E10.5 (Ciccone et al., 2009); therefore, TCP-treated embryos show a more severe phenotype than oocyte-derived KDM1b deleted embryos. Depletion of maternal KDM1A protein results in developmental arrest at the two-cell stage ((Ancelin et al., 2016)). The authors need to examine whether TCP treatment affects KDM1a expression. Western blotting would be recommended to quantify the expression of KDM1A and KDM1B in the TCP-treated embryos.

      We will further dig the transcriptome data to confirm the specificity of TCP to KDM1b. In addition, the intervention of TCP on the whole fertilized egg in this study increased the H3K4me2 content, and the embryo development retarding effect was more significant than that obtained by crossing with normal paternal lines after knocking down KDM1B from the mother.

      (5) H3K4me2 is increased dramatically in the TCP-treated embryos in Figure 4 (the intensity is 1,000 times more than the control). However, the Cut & Run H3K4me2 shows that the H3K4me2 signal is increased in 251 genes and decreased in 194 genes in the TCP-treated embryos (Fold changes > 2, P < 0.01). The authors need to explain why the gain of H3K4me2 is less evident in the Cut & Run data set than in the immunofluorescence result.

      Thanks a lot for your question. In the experimental group, the fluorescence value of H3K4me2 in IF was increased by 1000 times (Figure 4E), and the expression of H3K4Me2-related genes in CR was up-regulated and down-regulated for a total of 445 changes (Figure 6A). In our opinion, as a semi-quantitative analysis, immunofluorescence cannot be compared with the quantitative analysis method of CR because of the different analysis models and threshold Settings.

      References

      Ancelin, K., ne Syx, L., Borensztein, M., mie Ranisavljevic, N., Vassilev, I., Briseñ o-Roa, L., Liu, T., Metzger, E., Servant, N., Barillot, E., Chen, C.-J., Schü le, R., & Heard, E. (2016). Maternal LSD1/KDM1A is an essential regulator of chromatin and transcription landscapes during zygotic genome activation. https://doi.org/10.7554/eLife.08851.001

      Ciccone, D. N., Su, H., Hevi, S., Gay, F., Lei, H., Bajko, J., Xu, G., Li, E., & Chen, T. (2009). KDM1B is a histone H3K4 demethylase required to establish maternal genomic imprints. Nature, 461(7262), 415-418. https://doi.org/10.1038/nature08315

      Shao, G. B., Chen, J. C., Zhang, L. P., Huang, P., Lu, H. Y., Jin, J., Gong, A. H., & Sang, J. R. (2014). Dynamic patterns of histone H3 lysine 4 methyltransferases and demethylases during mouse preimplantation development. In Vitro Cellular and Developmental Biology - Animal, 50(7), 603-613. https://doi.org/10.1007/s11626-014-9741-6

      Reviewer #3 (Public Review):

      Summary:

      This study explores the dynamic reprogramming of histone modification H3K4me2 during the early stages of mammalian embryogenesis. Utilizing the advanced CUT&RUN technique coupled with high-throughput sequencing, the authors investigate the erasure and re-establishment of H3K4me2 in mouse germinal vesicle (GV) oocytes, metaphase II (MII) oocytes, and early embryos.

      Strengths:

      The findings provide valuable insights into the temporal and spatial dynamics of H3K4me2 and its potential role in zygotic genome activation (ZGA).

      Weaknesses:

      The study primarily remains descriptive at this point. It would be advantageous to conduct further comprehensive functional validation and mechanistic exploration.

      Key areas for improvement include enhancing the innovation and novelty of the study, providing robust functional validation, establishing a clear model for H3K4me2's role, and addressing technical and presentation issues. The text would benefit from the introduction of a novel conceptual framework or model that provides a clear explanation of the functional consequences and molecular mechanisms underlying H3K4me2 reprogramming in the transition from parental to early embryonic development.

      While the findings are significant, the current manuscript falls short in several critical areas. Addressing major and minor issues will significantly strengthen the study's contribution to the field of epigenetic reprogramming and embryonic development.

    1. eLife Assessment

      This study characterizes several novel activities of SARS-CoV-2 helicase nsp13, providing valuable insights into potentially new functions of this essential RNA-processing enzyme in the virus life cycle. However, the experimental evidence to support the authors' claims is incomplete. In addition, the placement of the polyhistidine affinity tag on nsp13 may cause artifacts, raising concerns about the interpretation of the results.

    2. Reviewer #1 (Public review):

      In the manuscript by Li et al., the authors perform a comprehensive study on the template and cofactor determinants of the SARS-CoV-2 nsp13 protein. They find that, alongside the classical processive unwinding ability of helicases driven by ATP consumption, other chaperone-like and ATP-independent functions exist for this enzyme. By testing DNA and RNA oligos in several conformations, the authors show that these functions are highly dependent on template identity, but also on the ratio of ATP to divalent cations. Ultimately, it is suggested that these distinct mechanisms of action are employed by nsp13 to orchestrate viral replication.

      Overall, this study provides some novel insights into the functionality of a central and conserved enzyme of a relevant human pathogenic virus. While the approach is important and adds to the field, particularly by characterizing the chaperoning activities and adding G-quadruplexes as templates, previous studies have already identified several determinants of nsp13 template binding and processing in vitro (Sommers et al., 2023, JBC; Park et al., 2025, JBC). In addition, some issues regarding experimental design need to be addressed to increase the cogency and biological relevance of the study.

      (1) Generally, low concentrations of monovalent cations (20 mM), as used throughout this study, may influence helicase activity and artificially enhance protein binding/oligomerization, which could favor the observed chaperoning activity (Venus et al., 2022, Methods). In contrast, some helicases, such as HCV NS3, are inhibited by higher K+ concentrations (Gwack et al., 2004, FEBS). Thus, the influence of higher concentrations of monovalent cations should be tested in relevant assays, as intracellular K+ levels are usually >100 mM. Additionally, this could significantly affect template stability. For instance, in some G4 assays, the addition of the trap already leads to observable duplex formation (Figure 5), which may be due to low K+ conditions.

      (2) As in most publications that focus strictly on helicase (or other enzymatic) functions, the activity of the isolated protein is examined. However, particularly in the case of nsp13, core functions rely on other factors, such as nsp7/8 and other components of the replication-transcription complex (RTC). The overall structure and oligomerization state of nsp13 are altered within the complex (Chen et al., 2022, NSMB). The inclusion of such factors in key experiments would greatly improve the biological relevance of the findings.

      (3) In Figure 4, the authors claim that Mg2+ concentration inhibits RNA unwinding. While this is likely considering previous findings, it must be validated that duplex stabilization is not the primary cause for the observed lower dissociation rates. As the template is only 12 bp long with extensive overhangs, higher ion concentrations may significantly stabilize base pairing by reducing fraying effects. Similarly, in Figure 6, template-dependent effects of Mg2+/ATP should be ruled out.

      (4) It is not entirely clear to me by which principle the templates were chosen. In my opinion, it would improve the overall comparability of the experimental results if, for instance, the blunt-ended duplex had the same sequence as the oligos with overhangs, since factors such as length, G/C content, Tm, etc., may play a significant role in binding and unwinding. Similarly, the oligos for binding and unwinding should be kept somewhat comparable, e.g., the G4 for the binding assay has 3 stacks, whereas RG1 has only 2. This discrepancy could make a significant difference. Thus, key experiments should be repeated using comparable sequence pairs.<br /> Moreover, in the initial characterization of the binding abilities (Figure 1), the authors should include blunt-ended controls (duplex/hairpin) and, importantly, a pseudoknot (PK), as these structures are crucial for multiple steps in the viral life cycle (frameshifting, replication). Specifically, the PK in the 3'UTR (Sola et al., 2011, RNA Biology) may be an interesting target structure for unwinding assays, as it recruits the RTC, and, to my knowledge, no studies are available regarding nsp13 function at a PK. This would be particularly interesting in combination with nsp7/8 (Ohyama et al., 2024, JACS Au).