10,000 Matching Annotations
  1. Jan 2026
    1. eLife Assessment

      This important study provides solid evidence to support the anti-tumor potential of citalopram, originally an anti-depression drug, in hepatocellular carcinoma (HCC). In addition to their previous report on directly targeting tumor cells via glucose transporter 1 (GLUT1), the authors tried to uncover additional working mechanisms of citalopram in HCC treatment in the current study. The data here suggests that citalopram may regulate the phagocytotic function of TAM via C5aR1 or CD8+T cell function to suppress HCC growth in vivo.

    2. Reviewer #1 (Public review):

      Summary:

      In their previous publication (Dong et al. Cell reports 2024), the authors showed that citalopram treatment resulted in reduced tumor size by binding to the E380 site of GLUT1 and inhibiting the glycolytic metabolism of HCC cells, instead of the classical citalopram receptor. Given that C5aR1 was also identified as the potential receptors of citalopram in the previous report, the authors focused on exploring the potential of immune-dependent anti-tumor effect of citalopram via C5aR1. C5aR1 was found to be expressed on tumor-associated macrophages (TAMs) and citalopram administration showed potential to improve the stability of C5aR1 in vitro. Through macrophage depletion and adoptive transfer approaches in HCC mouse models, the data demonstrated the potential importance of C5aR1-expressing macrophage in the anti-tumor effect of citalopram in vivo. Mechanistically, their data suggested that citalopram may regulate the phagocytosis potential and polarization of macrophages through C5aR1, thereby potentiated CD8+T cell responses in vivo. Finally, as the systemic 5-HT level is down-regulated by citalopram, the authors analyzed the association between a low 5-HT and a superior CD8+T cell function against tumor.

      Strengths:

      The idea of repurposing clinical-in-used drugs showed great potential for immediate clinical translation. The data here suggested that the anti-depression drug, citalopram displayed immune regulatory role on TAM via a new target C5aR1 in HCC.

      Comments on revised version:

      The authors have already addressed the previous comments.

    3. Reviewer #2 (Public review):

      Summary:

      Dong et al. present a thorough investigation into the potential of repurposing citalopram, an SSRI, for hepatocellular carcinoma (HCC) therapy. The study highlights the dual mechanisms by which citalopram exerts anti-tumor effects: reprogramming tumor-associated macrophages (TAMs) toward an anti-tumor phenotype via C5aR1 modulation and suppressing cancer cell metabolism through GLUT1 inhibition, while enhancing CD8+ T cell activation. The findings emphasize the potential of drug repurposing strategies and position C5aR1 as a promising immunotherapeutic target.

      Strength:

      It provides detailed evidence of citalopram's non-canonical action on C5aR1, demonstrating its ability to modulate macrophage behavior and enhance CD8+ T cell cytotoxicity. The use of DARTS assays, in silico docking, and gene signature network analyses offers robust validation of drug-target interactions. Additionally, the dual focus on immune cell reprogramming and metabolic suppression presents a comprehensive strategy for HCC therapy. By highlighting the potential of existing drugs like citalopram for repurposing, the study also underscores the feasibility of translational applications. During revision, the authors experimentally demonstrated that TAM has lower GLUT1 levels, further strengthening their claim of C5aR1 modulation-dependent TAM improvement for tumor therapy.

      Comments on revised version:

      The authors have addressed most of my concerns about the paper.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      In their previous publication (Dong et al. Cell Reports 2024), the authors showed that citalopram treatment resulted in reduced tumor size by binding to the E380 site of GLUT1 and inhibiting the glycolytic metabolism of HCC cells, instead of the classical citalopram receptor. Given that C5aR1 was also identified as the potential receptors of citalopram in the previous report, the authors focused on exploring the potential of immune-dependent anti-tumor effect of citalopram via C5aR1. C5aR1 was found to be expressed on tumor-associated macrophages (TAMs) and citalopram administration showed potential to improve the stability of C5aR1 in vitro. Through macrophage depletion and adoptive transfer approaches in HCC mouse models, the data demonstrated the potential importance of C5aR1-expressing macrophage in the anti-tumor effect of citalopram in vivo. Mechanistically, their in vitro data suggested that citalopram may regulate the phagocytosis potential and polarization of macrophages through C5aR1. Next, they tried to investigate the direct link between citalopram and CD8+T cells by including an additional MASH-associated HCC mouse model. Their data suggest that citalopram may upregulate the glycolytic metabolism of CD8+T cells, probability via GLUT3 but not GLUT1-mediated glucose uptake. Lastly, as the systemic 5-HT level is down-regulated by citalopram, the authors analyzed the association between a low 5-HT and a superior CD8+T cell function against tumor. Although the data is informative, the rationale for working on additional mechanisms and logical link among different parts are not clear. In addition, some of the conclusion is also not fully supported by the current data. 

      Strengths: 

      The idea of repurposing clinical-in-used drugs showed great potential for immediate clinical translation. The data here suggested that the anti-depression drug, citalopram displayed immune regulatory role on TAM via a new target C5aR1 in HCC. 

      Comments on revised version: 

      The authors have addressed most of my concerns about the paper.

      We thank you the reviewer. We appreciate the reviewer’s constructive suggestions that helped improve the clarity and robustness of the study.

      Reviewer #2 (Public review):

      Summary: 

      Dong et al. present a thorough investigation into the potential of repurposing citalopram, an SSRI, for hepatocellular carcinoma (HCC) therapy. The study highlights the dual mechanisms by which citalopram exerts anti-tumor effects: reprogramming tumor-associated macrophages (TAMs) toward an anti-tumor phenotype via C5aR1 modulation and suppressing cancer cell metabolism through GLUT1 inhibition, while enhancing CD8+ T cell activation. The findings emphasize the potential of drug repurposing strategies and position C5aR1 as a promising immunotherapeutic target.

      Strengths:

      It provides detailed evidence of citalopram's non-canonical action on C5aR1, demonstrating its ability to modulate macrophage behavior and enhance CD8+ T cell cytotoxicity. The use of DARTS assays, in silico docking, and gene signature network analyses offers robust validation of drug-target interactions. Additionally, the dual focus on immune cell reprogramming and metabolic suppression presents a comprehensive strategy for HCC therapy. By highlighting the potential for existing drugs like citalopram to be repurposed, the study also emphasizes the feasibility of translational applications. During revision, the authors experimentally demonstrated that TAM has lower GLUT1, which further strengthens their claim of C5aR1 modulation-dependent TAM improvement for tumor therapy.

      Weaknesses:

      The authors proposed that CD8+ T cells have an TAM-independent role upon Citalopram treatment. However, this claim requires further investigation to confirm that the effect is truly "TAM independent".

      We appreciate the reviewer’s insightful comment regarding the interpretation of CD8<sup>+</sup> T cell roles. In this study, in vitro analyses show that citalopram directly enhances CD8<sup>+</sup>T cell activity, as evidenced by increased CFSE proliferation, upregulation of activation markers, and cytotoxic effector readouts (Figures S10A–E). Accordingly, we infer a TAM-independent CD8<sup>+</sup> T cell activation by citalopram in vitro.

      Our in vivo data indicate that the primary anti-tumor mechanism of citalopram involves targeting C5aR1<sup>+</sup> TAMs, which subsequently enhances CD8<sup>+</sup> T cell immunity. This conclusion is supported by the near-complete ablation of citalopram’s therapeutic effect upon TAM depletion with clodronate liposomes (Figure S5). Additionally, citalopram reduces serum serotonin (5-HT) levels (Figure 4E), recapitulating the serotonergic state of Tph1<sup>−/−</sup> mice. Notably, the anti-tumor effect and CD8<sup>+</sup> T cell activation induced by citalopram exceed those observed in Tph1<sup>−/−</sup> mice (Figures 4G–I), suggesting that 5-HT reduction contributes to CD8<sup>+</sup> T cell activation but operates alongside other mechanisms in vivo, prominently including TAM targeting. As suggested, we further tested CD8<sup>+</sup> T cell activity in the context of macrophage depletion. The result showed that citalopram did not further enhance CD8<sup>+</sup> T cell cytotoxicity after macrophage depletion, indicating that TAM-dependent pathways are central to CD8<sup>+</sup> T cell–mediated anti-tumor immunity and largely underlie the anti-tumor effects of citalopram.

      To accurately reflect our main findings, we had made several revisions to the manuscript. First, we have revised the title to “Citalopram exhibits immune-dependent anti-tumor effects by modulating C5aR1<sup>+</sup> TAMs”. In the Results section, the Conclusions have been updated to: “These data not only corroborate recent reports that SSRIs modulate CD8<sup>+</sup> T cell function via serotonergic-dependent mechanism, but also reveals additional in vivo regulatory avenues by which citalopram affects CD8<sup>+</sup> T cells, such as its ability to reprogram C5aR1<sup>+</sup> TAMs. Notably, in the context of macrophage depletion, CD8<sup>+</sup> T cell cytotoxicity was not further enhanced by citalopram, indicating that TAM-dependent pathways are central to CD8<sup>+</sup> T cell-mediated anti-tumor immunity and largely underlie the anti-tumor effects of citalopram”. In the Discussion part, we have included the following content: “Although citalopram directly stimulates CD8<sup>+</sup> T cells in vitro, the TAM-independent activation is not evident in vivo within the complex TME, as CD8<sup>+</sup> T cell responses are abolished by macrophage depletion, indicating that the in vivo effects of citalopram on CD8<sup>+</sup> T cells and tumor growth are largely TAM-dependent”.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Fig S5 and Fig 3: To improve clarity regarding the roles of TAMs and CD8+ T cells, can the authors experimentally demonstrate the macrophage-independent function of CD8+ T cells? An experiment in Fig 3J using or not using Clodro-Liposome to deplete TAMs would be more informative.

      We thank the reviewer for the insightful suggestion. In this study, in vitro analyses show that citalopram directly enhances CD8<sup>+</sup> T cell activity, as evidenced by increased CFSE proliferation, upregulation of activation markers, and cytotoxic effector readouts (Figures S10A–E). Therefore, we conclude a TAM-independent CD8<sup>+</sup> T cell activation induced by citalopram. Previously, in Figure S5, we analyzed the therapeutic effect of citalopram after macrophage depletion by clodronate liposomes and also probed the immune profiles. The result showed that CD8<sup>+</sup> T cell cytotoxic activities were not significantly affected by citalopram in this context (Figure S5E), indicating that the TAM-dependent pathway is central to CD8<sup>+</sup> T cell-mediated anti-tumor immunity and to the anti-tumor effects of citalopram. We have incorporated this result into the revised manuscript.

      Fig S4: The figure panel showing sample/treatment annotations is missing.

      Thank you for pointing this out. We have updated Fig. S4 to include explicit sample identifiers, treatment group labels, and drug concentrations.

      Since Glut3 is vital in both TAMs and CD8+ T cells, the authors should discuss the interaction between Glut3 and Citalopram. Additionally, include details about the structural homology between Glut1 and Glut3 in the discussion.

      Thank you for the suggestion. Citalopram was docked into the GLUT1 substrate-binding pocket, with the best poses showing an electrostatic interaction centered on E380 accompanied by hydrophobic contacts within the pocket (Our previous publication, Dong et al. Cell Reports 2024). Although GLUT1 and GLUT3 share a highly conserved core substrate-binding pocket, isoform-specific regulation arises from features outside the canonical site. Structural homology between GLUT1 and GLUT3 is high in the transmembrane core, but regulatory features, such as the cytosolic Sugar Porter (SP) motif network, the conserved A motif, lipid interfaces, and gating dynamics, differ between the two isoforms (PMID: 33536238). These regulatory differences can alter pocket accessibility, coupling to conformational transitions, and allosteric communication with the cytosol, such that a ligand binding GLUT1 in the inward-facing state may not stabilize a GLUT3 conformation that yields appreciable transport inhibition. Consistently, functional experiments have indicated robust GLUT1 engagement in cancer cells (Dong et al. Cell Reports 2024), while equivalent GLUT3 inhibition has not been observed in TAMs (Figure S8), suggesting isoform-selective targeting by citalopram. We have included these discussion in the revised manuscript.

      Fig 3O: Please clarify the statement regarding the requirements of CD8 T cells for the pro-tumor phenotype of C5aR1+ TAMs. Specify whether this relates to a pro- or anti-tumor effect of CD8 T cells.

      Thanks. As suggested, we have improved the statement as follows: “depletion of CD8<sup>+</sup> T cells abrogated the C5aR1<sup>+</sup> TAM-mediated enhancement of tumor growth (Figure 3O), suggesting that the anti-tumor effects of CD8<sup>+</sup> T cells are required for the pro-tumor phenotype of C5aR1<sup>+</sup> TAMs”.

    1. eLife Assessment

      This fundamental work significantly advances our understanding of gravity sensing and orientation behavior in the ctenophore, an animal of major importance in understanding the evolution of nervous systems. Through comprehensive reconstruction with volumetric electron microscopy, and time-lapse imaging of cilia motion, the authors provide compelling evidence that the aboral nerve net coordinates the activity of balancer cilia. The resemblance to the ciliomotor circuit in marine annelids provides a fascinating example of how neural circuits may convergently evolve to solve common sensorimotor challenges.

    2. Reviewer #1 (Public review):

      Summary:

      This work presents an interesting circuit dissection of the neural system allowing a ctenophore to keep its balance and orientation in its aquatic environment by using a fascinating structure called the statocyst. By combining serial-section electron microscopy with behavioral recordings, the authors found a population of neurons which exists as a syncytium and could associate these neurons with specific functions related to controlling the beating of cilia located in the statocyst. The type A ANN neurons participate in arresting cilia beating, and the type B ANN neurons participate in resuming cilia beating and increasing their beating frequency.

      Moreover, the authors found that bridge cells are connected with the ANN neurons, giving them the role of rhythmic modulators.

      From these observations, the authors conclude that the control is coordination instead of feedforward sensory-motor function, a hypothesis that had been put forth in the past but could not be validated until now. They also compare it to the circuitry implementing a similar behavior in a species that belongs to a different phylum where the nervous system is thought to have evolved separately.

      Therefore, this work significantly advances our knowledge of the circuitry implementing the control of the cilia that participate in statocyst function which ultimately allow the animal to correct its orientation. It explains how the nervous system allows an animal to solve a specific problem and puts it in an evolutionary perspective showing a convincing case of convergent evolution.

      Strengths:

      The evidence for how the circuitry is connected is convincing. Pictures of synapses showing the direction of connectivity are clear and there are good reasons to believe that the diagram inferred is valid, even though we can always expect that some connections are missing.

      The evidence for how the cilia change their beating frequency is also convincing, and the paradigm and recording methods seem pretty robust.

      The authors achieved their aims and the results support their conclusions. This work impacts its field by presenting a mechanism by which ctenophores correct their balance, which will provide a template for comparison with other sensory systems.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors describe the production of a high-resolution connectome for the statocyst of a ctenophore nervous system. This study is of particular interest because of the apparent independent evolution of the ctenophore nervous system. The statocyst is a component of the aboral organ, which is used by ctenophores to sense gravity and regulate the activity of the organ's balancer cilia. The EM reconstruction of the aboral organ was carried out on a five-day old larva of the model ctenophore Mnemiopsis leidyi. To place their connectome data in a functional context, the authors used high-speed imaging of ciliary beating in immobilized larvae. With these data, the authors were able to model the circuitry used for gravity sensing in a ctenophore larva.

      Strengths:

      Because of it apparently being the sister phylum to all other metazoans, Ctenophora is a particularly important group for studies of metazoan evolution. Thus, this work has much to tell us about how animals evolved. Added to that is the apparent independent evolution of the ctenophore nervous system. This study provides the first high-resolution connectomic analysis of a portion of a ctenophore nervous system, extending previous studies of the ctenophore nervous system carried out by Sid Tamm. As such it establishes the methodology for high-resolution analysis of the ctenophore nervous system. While the generation of a connectome is in and of itself an important accomplishment, the coupling of the connectome data with analysis of the beating frequency of balancer cell cilia provides a functional context for understanding how the organization of the neural circuitry in the aboral organ carries out gravity sensing. In addition, the authors identified a new type of syncytial neuron in Mnemiopsis. Interestingly, the authors show that the neural circuitry controlling cilia beating in Mnemiopsis shares features with the circuitry that controls ciliary movement in the annelid Platynereis, suggesting convergent evolution of this circuity in the two organisms. The data in this paper are of high quality, and the analyses have been thoroughly and carefully done.

      Weaknesses:

      The paper has no obvious weaknesses.

      Comments on revisions:

      The authors have satisfactorily addressed the minor issues that I brought up in my original review.

    4. Reviewer #3 (Public review):

      Summary:

      It has been a long time since I enjoyed reviewing a paper as much as this one. In it, the authors generate an unprecedented view of the aboral organ of a 5-day old ctenophore. They proceed to derive numerous insights by reconstructing the populations and connections of cell types, with up to 150 connections from the main Q1-4 neuron.

      Strengths:

      The strengths of the analysis are the sophisticated imaging methods used, the labor-intensive reconstruction of individual neurons and organelles, and especially the mapping of synapses. The synaptic connections to and from the main coordinating neurons allow the authors to created a polarized network diagram for these components of the aboral organ. These connections give insight about the potential functions of the major neurons, which also giving some unexpected results, particularly the lack of connections from the balancer system to the coordinating system.

      Weaknesses:

      There were no significant weaknesses in the paper - only a slate of interesting unanswered questions to motivate future studies.

      Comments on revisions:

      This manuscript was already strong from the start, and I am fully satisfied with the revisions, which corrected a few glitches and points of clarification.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work presents an interesting circuit dissection of the neural system allowing a ctenophore to keep its balance and orientation in its aquatic environment by using a fascinating structure called the statocyst. By combining serial-section electron microscopy with behavioral recordings, the authors found a population of neurons that exists as a syncytium and could associate these neurons with specific functions related to controlling the beating of cilia located in the statocyst. The type A ANN neurons participate in arresting cilia beating, and the type B ANN neurons participate in resuming cilia beating and increasing their beating frequency.

      Moreover, the authors found that bridge cells are connected with the ANN neurons, giving them the role of rhythmic modulators.

      From these observations, the authors conclude that the control is coordination instead of feedforward sensory-motor function, a hypothesis that had been put forth in the past but could not be validated until now. They also compare it to the circuitry implementing a similar behavior in a species that belongs to a different phylum, where the nervous system is thought to have evolved separately.

      Therefore, this work significantly advances our knowledge of the circuitry implementing the control of the cilia that participate in statocyst function, which ultimately allows the animal to correct its orientation. It represents an example of systems neuroscience explaining how the nervous system allows an animal to solve a specific problem and puts it in an evolutionary perspective, showing a convincing case of convergent evolution.

      Strengths:

      The evidence for how the circuitry is connected is convincing. Pictures of synapses showing the direction of connectivity are clear, and there are good reasons to believe that the diagram inferred is valid, even though we can always expect that some connections are missing.

      The evidence for how the cilia change their beating frequency is also convincing, and the paradigm and recording methods seem pretty robust.

      The authors achieved their aims, and the results support their conclusions. This work impacts its field by presenting a mechanism by which ctenophores correct their balance, which will provide a template for comparison with other sensory systems.

      Thank you very much for these comments.

      Weaknesses:

      The evidence supporting the claim that the neural circuitry presented here controls the cilia beating is more correlational because it only relies on the fact that the location of the two types of ANN neurons coincides with the quadrants that are affected in the behavioral recordings. Discussing ways by which causality could be established might be helpful.

      We have now added additional discussions in a new “Future Directions” section explaining that for example calcium imaging or targeted neuron ablations could be used in future work to establish causality. This would require the development of genetic delivery techniques to e.g. introduce GCaMP calcium sensor or transgenic reporters.

      The explanation of the relevance of this work could be improved. The conclusion that the work hints at coordination instead of feedforward sensory-motor control is explained over only a few lines. The authors could provide a more detailed explanation of how the two models compete (coordination vs feedforward sensory-motor control), and why choosing one option over the other could provide advantages in this context.

      We added a more detailed explanation about the two types of model and why we believe that a coordination model is more compatible with our connectome data.

      “An alternative model for the function of the nerve net would be a feedforward sensory-motor system, in which balancer cells provide mechanosensory input to motor effectors via the nerve net, similar to a reflex arc. None of our observations support such a sensory-motor model. There are no synaptic pathways from balancer cells or any other sensory cells to the nerve net. The only synaptic input to ANNs comes from the bridge cells (discussed below) and from each other. The three synaptically interconnected ANNs may generate endogenous rhythm that controls balancer cilia and is influenced by bridge input. ANNs may also be influenced by neuropeptides secreted by other aboral organ neurons. Such chemical inputs may underlie the flexibility of gravitaxis and its modulation by other cues (e.g. light). Overall, the coordination model parsimoniously explains both the ANN wiring topology and the observed dynamics, whereas a simple feedforward reflex does not.”

      Since the fact that the ANN neurons form a syncytium is an important finding of this study, it would be useful to have additional illustrations of it. For instance, pictures showing anastomosing membranes could typically be added in Figure 2.

      We have now included a movie (Video 3) showing a volumetric reconstruction of a segment of an ANN neuron, which highlights the anastomosing morphology in greater detail than static images.

      “Video 3. Volumetric reconstruction of a single ANN Q1-4 neuron showing syncytial soma (cyan) and nuclei (magenta). The rotating view highlights the anastomosing morphology, although not all fine details could be reconstructed due to data limitations.”

      Also, to better establish the importance of the study, it could be useful to explain why the balancers’ cilia spontaneously beat in the first place (instead of being static and just acting as stretch sensors).

      We have discussed in more detail why it may be important for the balancer cilia to beat.

      “The observation that balancer cilia beat spontaneously, even in the absence of external tilt, suggests that they are active sensory oscillators rather than static stretch sensors. Their spontaneous beating could set a dynamic baseline of sensitivity, which can then be modulated by ANN inputs or sensory changes during tilt. Such a dynamic system may be more sensitive to small deflections and be more responsive [@Lowe1997]. Thus, the regulated beating of balancer cilia should not be seen as noise, but as an adaptive feature that enables flexible and robust graviceptive responses. The ctenophore balancer may thus use active ciliary oscillations for enhanced sensorimotor integration similar to other sensory systems [@Wan_2023].”

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors describe the production of a high-resolution connectome for the statocyst of a ctenophore nervous system. This study is of particular interest because of the apparent independent evolution of the ctenophore nervous system. The statocyst is a component of the aboral organ, which is used by ctenophores to sense gravity and regulate the activity of the organ’s balancer cilia. The EM reconstruction of the aboral organ was carried out on a five-day-old larva of the model ctenophore Mnemiopsis leidyi. To place their connectome data in a functional context, the authors used high-speed imaging of ciliary beating in immobilized larvae. With these data, the authors were able to model the circuitry used for gravity sensing in a ctenophore larva.

      Strengths:

      Because of it apparently being the sister phylum to all other metazoans, Ctenophora is a particularly important group for studies of metazoan evolution. Thus, this work has much to tell us about how animals evolved. Added to that is the apparent independent evolution of the ctenophore nervous system. This study provides the first high-resolution connectomic analysis of a portion of a ctenophore nervous system, extending previous studies of the ctenophore nervous system carried out by Sid Tamm. As such, it establishes the methodology for high-resolution analysis of the ctenophore nervous system. While the generation of a connectome is in and of itself an important accomplishment, the coupling of the connectome data with analysis of the beating frequency of balancer cell cilia provides a functional context for understanding how the organization of the neural circuitry in the aboral organ carries out gravity sensing. In addition, the authors identified a new type of syncytial neuron in  Mnemiopsis. Interestingly, the authors show that the neural circuitry controlling cilia beating in Mnemiopsis shares features with the circuitry that controls ciliary movement in the annelid Platynereis, suggesting convergent evolution of this circuitry in the two organisms. The data in this paper are of high quality, and the analyses have been thoroughly and carefully done.

      Weaknesses:

      The paper has no obvious weaknesses.

      We thank the reviewer for these comments.

      Reviewer #3 (Public review):

      Summary:

      It has been a long time since I enjoyed reviewing a paper as much as this one. In it, the authors generate an unprecedented view of the aboral organ of a 5-day-old ctenophore. They proceed to derive numerous insights by reconstructing the populations and connections of cell types, with up to 150 connections from the main Q1-4 neuron.

      Strengths:

      The strengths of the analysis are the sophisticated imaging methods used, the labor-intensive reconstruction of individual neurons and organelles, and especially the mapping of synapses. The synaptic connections to and from the main coordinating neurons allow the authors to create a polarized network diagram for these components of the aboral organ. These connections give insight into the potential functions of the major neurons. This also gives some unexpected results, particularly the lack of connections from the balancer system to the coordinating system.

      Thank you for these positive comments on the paper.

      Weaknesses:

      There were no significant weaknesses in the paper - only a slate of interesting unanswered questions to motivate future studies.

      Recommendations for the authors:

      Reviewing Editor Comments:

      In consultation, the reviewers recommend that improving the evidence to “exceptional” would require additional perturbation experiments (e.g., ablation of specific neurons), as Reviewer 1 suggests. They also recommend adding a “Future Directions” section to the manuscript, because it opens up so many new experimental directions.

      We have added a new “Future Directions” section at the end of the Discussion. To carry out the proposed perturbation or calcium imaging experiments would require significant additional work and method development. We are actively working in establishing mRNA and DNA injection into ctenophore zygotes to enable live imaging, cell labelling or ablations in the future.

      Reviewer #1 (Recommendations for the authors):

      Suggestions for improved or additional experiments, data, or analyses:

      To establish causality (neurons control balancer cilia), an important experiment would be to manipulate each of these neuronal populations (e.g., by ablating them) and measure the effect of these ablations on the beating frequency of the balancer cilia of the four quadrants. Moreover, direct observation of neuronal activity (e.g., by using calcium imaging) would also provide more compelling evidence for neuronal control.

      We agree with the reviewer that such perturbation experiments would be needed to establish causality. Such experiments are currently still not possible in ctenophoes and would require significant technology development. We discuss such experiments in the “Future directions” section and also place this in the context of the currently available techniques in ctenophores. We are actively working on this but waiting for such technological breakthroughs and new experiments would significantly delay the publication of a version of record of the paper.

      Recommendations for improving the writing and presentation:

      ANN neurons are described in great detail, though SNN neurons are described more loosely. Perhaps a more detailed description of SNN neurons would be helpful.

      We added the information on SNNs to show that these cells are distinct from the ANN neurons. Since our focus is on the aboral organ, we did not aim for a comprehensive reconstruction of SNNs. Several of the processes of the SNNs are also truncated and outside our EM volume. We have nevertheless added additional details about the morphology and connectivity of SNN neurons.

      “Near the perifery of the aboral organ, we identified four further anastomosing nerve-net neurons. These resembled the previously reported syncytial subepithelial nerve net (SNN) neurons in the body wall of Mnemiopsis (Figure 2–figure supplement 1C–G) and were clearly distinct from the ANN neurons (both in location and morphology). SNN neurons show a blebbed morphology and contain dense core vesicles @Burkhardt2023 but no synapses.”

      Minor corrections to the text and figures:

      (1) Figure 2 C): “mitochondia” instead of “mitochondria”.

      corrected

      (2) Figure 3. Title: “balancer and and bridge”.

      corrected

      (3) Figure 3.C) “shown in xxx color”

      corrected

      Reviewer #2 (Recommendations for the authors):

      Clearer usage of the terms statocyst, aboral organ, aboral nerve net, statolith, dome, and lithocytes would be helpful. For readers not familiar with ctenophore anatomy, things can get a bit confusing. A single schematic with all of these terms would be helpful. In Figure 1E, there is a label “dc”. Should this be “do”?

      We have added an annotated schematic to Figure 1, explaining these terms.

      Figure 1C “The statocyst is a cavity-like organ enclosed by the dome cilia (do), which contains the statolith formed by lithocytes (li) and supported by the balancer cilia (bal).”

      Reviewer #3 (Recommendations for the authors):

      My comments are numerous, but mostly minor suggestions for improving the clarity.

      [Suggested insertions/changes are indicated by square brackets]

      (1) [It would be much easier to review this if there were line numbers, or with a double-spaced manuscript that was more accommodating for markup.]

      Thank you for this comment. We have increased the line spacing in the revised version. (We set the CSS line-height property on the html ‘body’ element to 2em).

      (2) The terms statolith, statocyst, and lithocytes can be confusing, so it would be nice to have an upfront definition of how they relate to each other.

      We have now explain these terms in the Introduction and also have improved the annotation of Figure 1.

      Figure1C. “The statocyst is a cavity-like organ enclosed by the dome cilia (do), which contains the statolith formed by lithocytes (li) and supported by the balancer cilia (bal).”

      (3) Statolith is spelled as statolyth in the early pages, but statolith in the later pages. I think -lith is more common, but in any case, these should be standardized.

      corrected to ‘statolith’

      ABSTRACT:

      (1) Differential load[s] on the balancer cilia [lead] to altered

      changed

      (2) We used volume electron microscopy (vEM) to image the aboral organ.

      changed

      (3) also form reciprocal connections with the bridge cells.

      corrected

      INTRODUCTION:

      (1) “identify conserved neuronal markers in ctenophores” - confusing - does this mean conserved across ctenophores, or conserved in ctenophores and other animals?

      changed to “classical neuronal markers”

      (2) “either increase or decrease their [ciliary] activity, indicating” - otherwise it sounds like the balancers are increasing activity.

      changed to “balancer cells may either increase or decrease their ciliary activity”

      (3) after “matches the setup used in high-speed imagine experiments”, it might be nice to add a statement like “Future studies could potentially investigate activity in the inverted orientation, when the statolith is suspended below the cilia, to see if the response differs.”

      In this sentence we referred to the orientation of the animals in our figures. There is a consensus among ctenophore researchers that when depicting ctenophores, the aboral organ should face downwards. However, for this paper we chose the opposite orientation to better match our experiments and help interpreting the results. We changed the text to: “In this study, we represent ctenophores with their aboral organ facing upwards (”balancer-up” posture), as this configuration facilitates intuitive interpretation of balance-like functions and matches the setup used in high-speed imaging experiments. ”

      We added the sentences “Future experiments could also explore how orientation affects the response of balancer cilia. For example, when the statolith is suspended below the cilia (the”balancer-down” posture), ciliary beating patterns may differ from what we observed here in the “balancer-up” configuration.” to the section Future Directions”.

      (4) “abolished by calcium[-]channel inhibitors”

      corrected

      (5) “By functional imaging, we uncovered” - It is not clear what functional imaging is. Maybe a fewword definition here, and be sure to explain in the methods.

      changed to “By high-speed ciliary imaging”. The details of the imaging are explained in the Methods section under “Imaging the Activity of Balancer Cilia”.

      RESULTS:

      (1) “five-day-old” - is it worth saying post-fertilization here?

      Thank you for pointing this out. In accordance with Presnell et al. (2022), we use post-hatching as the reference. We have revised the text in the Materials and Methods section to read: “5-day-old (5 days post-hatching)”

      (2) “We classified these cells into cell types [based on …]” - specify a bit about how you classified them based on morphology, the presence of organelles, etc.

      We added a clarification. “Our classification was based on i) ultrastructural features (e.g. number of cilia), ii) cell morphology (e.g. nerve net or bridge cells), iii) unique organelles (e.g. lamellate body, plumose cells), iv) and similarities to cell types previously described by EM. Our classification agrees with the cell types identified in the 1-day-old larva [@ferraioli2025].”

      (3) “CATMAID only supports [bifurcating] skeleton trees” - Correct?

      yes, a node in CATMAID cannot be fused to another node of the same skeleton to represent anastomoses

      FIGURE 1:

      (1) It is not worth redrawing and renumbering everything, but I wish the lateral view in A matched the rotated aboral view in B, instead of having to do two rotations to get the alignment to coincide. (Rotating panel B 90{degree sign} clockwise would make them match, but then it wouldn’t coincide with all the subsequent figures.)

      Thank you for the suggestion. We have replaced panel A with a lateral view that now matches panel B.

      (2) The labels on Figure 1 are a mix of two typefaces (Helvetica and Myriad?). They should be standardized to all use one typeface (preferably Helvetica).

      we have changed the font to Helvetica

      (3) Panel C legend: arrows are not really arrows. Say “Eye icons” or something like that. Can you show the location of the anal pores in the DIC image?

      Changed to ‘eye icons’. The anal pores are usually closed and only open briefly therefore it is not clear where exactly they would be, so indicating their position would be misleading.

      (4) Panel F, I cannot see the lines mentioned in the legend at all, except for maybe a tiny wisp in a couple of places. Either omit or make visible.

      changed to “The spheres indicate the position of nuclei in the reconstructed cells.”

      (5) Panel G. “Cells are color coded according to quadrants”… but unfortunately, the color scale is 90{degree sign} off of what is presented in the rest of the panels and the paper. Q1 and Q3 have been blue, but now Q2+4 are blue/purple, while Q1+3 are orange/yellow. Again, it seems like too much work to recolor panel G, but in future, it would be nice to maintain that consistency, especially since other panels specifically mention the consistent colors.

      We have changed the color code in panels B, C and E to match G and the subsequent panels/figures.

      RESULTS: Aboral synaptic nerve net

      (1)“We reconstructed three aboral nerve-net (ANN) neurons” - out of how many total? Were these three just the first ones traced, or are they likely to be all of the multi-domain neurons? One can’t tell if these are the top 3 (out of X), or if there are other multi-quad neurons that were not traced. Are there any Q1Q4 or Q2Q3 neurona? Specify overall composition.

      There are only three ANN neurons in the aboral organ. These are all completely reconstructed and contained within the volume. We have clarified this in the text. “We identified and reconstructed three aboral nerve-net (ANN) neurons, each exhibiting a syncytial morphology characterized by anastomosing membranes and multiple nuclei (ranging from two to five) (Figure 2A and B, Figure 2–figure supplement 1C). These three neurons are the only fully reconstructed ANN neurons contained within the volume. Several small ANN-like fragments were also observed at the periphery of the aboral organ, but their connectivity to the main ANN remains uncertain.”

      FIGURE 2:

      (1) Panel C: “N > 2 cells for each cell type” - is that supposed to say “N > 2 mitochondria”? More than 2 cells in all the types shown in the graph.

      It is number of cells for each cell type

      (2) Panel D: Is this the wrong caption? I can only see green and black circles, not red, yellow, or blue. Make them larger or “flat” (circled, not shaded spheres) if they are supposed to be visible

      Thank you for pointing this out. The caption was incorrect and has been corrected to match the figure.

      (3) Panel E: Amazing to see the cross-network connections!

      Thank you

      (4) Again, it is great to see the three ANN mapped out, but … are there other connections that weren’t mapped in this study? Other high-level coordinating neurons? ANN_Q1Q4 or Q2Q3?

      The reconstruction is complete and there are no other neurons or connections. Given the large size of ctenophore synapses, we are confident that we identified all or most synapses and their connections.

      RESULTS: Synaptic connectome

      (1) “displaying rotational symmetry” - This is one of the things I am most curious about. Where is the evidence of rotational symmetry in the network diagram? Is it the larger number of connections to Q2 and Q4? Any evidence of rotational symmetry, like Q1 and Q3 connect to Q2 and Q4 respectively, but not the other way around?

      changed to “displaying biradial symmetry”, we do not consider the slight difference in synapse number from ANN Q1-4 to the Q1-Q3 vs. Q2-Q4 balancers as significant or strong enough evidence for a single rotational symmetry (i.e. 180 degrees rotation)

      (2) “Surprisingly” - this *was* really surprising. There have to be some afferent neurons connecting from the balancers, don’t there? I can’t remember the connections to the SNN, but is there a tertiary set of ANNs that connect between the balancers and the top 3 ANNs? I would like a little more discussion about this.

      Indeed, this is why this is so surprising. Most people would have expected some output connections from the balancer to the nerve net or elsewhere. There are none. We have the complete balancer network and all balancer cells are ‘sink nodes’ (inputs only)(Figure3–figure supplement 1).

      we added a short statement in the beginning of the Bridge Cells as Feedback Regulators of Ciliary Rhythms section noting that no direct connections from the balancers to the ANN were found and that all balancer cells act as sink nodes (inputs only; Figure 3–figure supplement 1). This highlights that bridge cells are indeed the sole neuronal input to the ANN circuit.

      Figure 3:

      (1) As you know, during development, the diagonally opposite cells have a shared heritage and shared functionality. Are there neuronal signatures that correspond to the rotational symmetry that we see, for example, in the position of the anal pores?

      We did not find any evidence in neuronal complement for a diagonal symmetry, suggesting that neuronal organization does not simply mirror the organism’s rotational body symmetry.

      (2) Do you have the information to say whether there are any diagonal or asymmetric connections? Can’t tell if those would have shown up in the mapping efforts or if you focused on the major ones only.

      Based on our complete mapping, we did not find evidence for a diagonal pattern. The connectivity instead shows a biradial organization.

      (3) “extending across opposite quadrant regions” - to me, opposite would be diagonally opposite, but this looks like a set of cells between Q1 and Q2 is connecting to a sister-set in Q3+Q4. I wonder if, in a more detailed view, you could see whether this is a rotational correspondence, rather than a reflection. There are some subtle hints of this in the aboral view, with some cells on the right of the blue cluster and the left of the magenta cluster.

      changed to “extending across tentacular-axis-symmetric quadrant regions” for clarity

      (4) As with Figure 2, I do not see any circles/spheres that are yellow, red, or blue! There are some traces of what appear to be other neurons that have these colors, but nothing that would suggest the localization of mitochondria.

      Thank you for pointing this out. We have corrected the caption to match the figure, as in the previous item.

      (5) The connectivity map is very cool, but the caption does not seem to correspond to the version included in the manuscript. I don’t see any hexagons; all arrows seem to have the same thickness.

      changed to: “Complete connectivity map of the gravity-sensing neural circuit. Cells belonging to the same group are shown as diamonds, and the number of cells is added to their labels. The number of synapses is shown on the arrows.”

      RESULTS: Dynamics of balancer cilia

      (1) The orientation of the stage+larvae is a bit hard to follow. Maybe say the sagittal or tentacular plane is parallel to the sample stage and the gravity vector?

      we added “Larvae were oriented with their sagittal or tentacular plane parallel to the sample stage.”

      (2) “We could simultaneously image Q1(3) and Q2(4). The meaning of the numbers in () is not clear. Either way that I try to interpret it does not match the diagrams. Should this say viewing the tentacular plane, you can image Q1 and 4 or Q2 and 3?

      Thank you for spotting this mistake, we have changed to: “In larvae with their sagittal plane facing the objective, we could compare balancer-cilia movements between Q1 vs. Q2 or Q3 vs. Q4. In other larvae oriented in the tentacular plane, we could simultaneously image Q1 and Q4 or Q2 and Q3.”

      (3) Typo: episod[e]s were excluded

      Corrected

      DISCUSSION:

      This section is quite clean. Maybe mention some future directions:

      We have added a “Future Directions” section

      (1) Do these networks change during development? Five-days-old is still quite undeveloped - what would it look like in an adult specimen? Would you expect a larger version of the same or more diverse connections?

      As far as we know from work on aboral organs in adult ctenophores, the same structures and cells can be found. We do not know how the network will develop. We know that at 5 days the balancer is fully functional and the animals can orient and their behaviour is coordinated. So the wiring may not change extensively later in development. In the 1-day-old larva, Ferraioli et al. did not distinguish ANN neurons as a separate population, as these were merged with SNNs in their dataset. This suggests that significant cellular and circuit maturation likely occurs between 1 and 5 days.

      METHODS: Imaging the Activity of Balancer Cilia

      (1) “we selected only larvae whose aboral-oral axis was oriented nearly perpendicular to the gravitational vector”. Shouldn’t this be “nearly parallel to the gravity vector” not perpendicular?

      Thank you for spotting this, corrected.

    1. eLife Assessment

      This study presents an important study into the molecular function of AT-HOOK MOTIF NUCLEAR LOCALIZED 15 (AHL15), a member of the AHL protein family, identifying it as a potential regulator of three-dimensional gene-loop organization within transcribed gene bodies. The authors support this claim with compelling genome-wide evidence, integrating AHL15 binding profiles with transcriptional and chromatin accessibility changes, as well as demonstrating overlap with genes known to form loops across transcribed regions. The evidence supporting the claims of the authors is solid. Collectively, these findings will be of broad interest to biologists seeking to understand the core regulatory mechanisms underlying gene expression.

    2. Reviewer #1 (Public review):

      The study by Luden et al. seeks to elucidate the molecular functions of AHL15, a member of the AT-HOOK MOTIF NUCLEAR LOCALIZED (AHL) protein family, whose overexpression has been shown to extend plant longevity in Arabidopsis. To address this question, the authors conducted genome-wide ChIP-sequencing analyses to identify AHL15 binding sites. They further integrated these data with RNA-sequencing and ATAC-sequencing analyses to compare directly bound AHL15 targets with genes exhibiting altered expression and chromatin accessibility upon ectopic AHL15 overexpression.

      The analyses indicate that AHL15 preferentially associates with regions near transcription start sites (TSS) and transcription end sites (TES). Notably, no clear consensus DNA-binding motif was identified, suggesting that AHL15 binding may be mediated through interactions with other regulatory factors rather than through direct sequence recognition. The authors further show that AHL15 predominantly represses its direct target genes; however, this repression appears to be largely independent of detectable changes in chromatin accessibility.

      In addition to the AHL protein family, the globular H1 domain-containing high-mobility group A (GH1-HMGA) protein family also harbors AT-hook DNA-binding domains. Recent studies have shown that GH1-HMGA proteins repress FLC, a key regulator of flowering time, by interfering with gene-loop formation. The observed enrichment of AHL15 at both TSS and TES regions, therefore, raises the intriguing possibility that AHL15 may also participate in regulating gene-loop architecture. Consistent with this idea, the authors report that several direct AHL15 target genes are known to form gene loops.

      Overall, the conclusions of this study are well supported by the presented data and provide new mechanistic insights into how AHL family proteins may regulate gene expression.

      However, it is important to note that the genome-wide analyses in this study rely predominantly on ectopic overexpression of AHL15 at developmental stages when the gene is not usually expressed. Moreover, loss-of-function phenotypes for AHL15 have not been reported, leaving unresolved whether AHL15 plays a physiological role in regulating plant longevity under native conditions. It therefore remains possible that longevity control is mediated by other AHL family members rather than by AHL15 itself. In this regard, the manuscript's title would benefit from more accurately reflecting this broader implication.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Luden et al. investigates the molecular function and DNA-binding modes of AHL15, a transcription factor with pleiotropic effects on plant development. The results contribute to our understanding of AHL15 function in development, specifically, and transcriptional regulation in plants, more broadly.

      Strengths:

      The authors developed a set of genetic tools for high-resolution profiling of AHL15 DNA binding and provided exploratory analyses of chromatin accessibility changes upon AHL15 overexpression. The generated data (CHiP-Seq, ATAC-Seq and RNA-Seq is a valuable resource for further studies. The data suggest that AHL15 does not operate as a pioneer TF, but is likely involved in gene looping.

      Weaknesses:

      While the overall message is conveyed clearly and convincingly, I see one major issue concerning motif discovery and interpretation. The authors state that because HOMER detected highly enriched motifs at frequencies below 1%, they conclude that "a true DNA binding motif would be present in a large portion of the AHL15 peaks (targets) and would be rare in other regions of the genome (background)."

      I agree that the frequency below 1% is unexpectedly low; however, this more likely reflects problems in data preprocessing or motif discovery rather than intrinsic biological properties of the transcriptional factor that possesses a DNA-binding domain and is known to bind AT_rich motifs. As it is, Figure 2 cannot serve as a main figure in the manuscript: it rather suggests that the generated CHiP-Seq peakset is dominated by noise (or motif discovery was done improperly) than that AHL15 binds nonspecifically.

      Since key methodological details on the HOMER workflow are missing in the M&M section, it is not possible to determine what went wrong. Looking at other results, i.e. the reasonably structured peak distribution around TSS/TTS and consistent overlap of the peaks between the replicas, I assume that the motif discovery step was done improperly.

      Therefore, I recommend redoing the motif analysis, for example, by restricting the search to the top-ranked peaks (e.g. TOP1000) and by using an appropriate background set (HOMER can generate good backgrounds, but it was not documented in the manuscript how the authors did it). If HOMER remains unsuccessful, the authors should consider complementary methods such as STREME or MEME, similar to the approach used for GH1-HMGA (https://pmc.ncbi.nlm.nih.gov/articles/PMC8195489). If the peakset is of good quality, I would expect the analysis to identify an AT-rich motif with a frequency substantially higher than 1%-more likely in the range of at least 30%. If such a motif is detected, it should be reported clearly, ideally with positional enrichment information relative to TSS or TTS. It would also be informative to compare the recovered motif with known GH1-HMGA motifs.

      If de novo motif discovery remains inconclusive, the authors should, at a minimum, assess enrichment of known AHL binding motifs using available PWMs (e.g. from JASPAR). As it stands, the claim that "our ChIP-seq data show that AHL15 binds to AT-rich DNA throughout the Arabidopsis genome with limited sequence specificity (Figure 2A, Figure S2-S4)" is not convincingly supported.

      Another point concerns the authors' hypothesis regarding the role of AHL15 in gene looping. While I like this hypothesis and it is good to discuss it in the discussion section, the data presented are not sufficient to support the claim, stated in the abstract, that AHL15 "regulates 3D genome organization," as such a conclusion would require additional, dedicated experiments.

    4. Reviewer #3 (Public review):

      Summary:

      This study investigated the role of AHL15 in the regulation of gene expression using AHL15 overexpression lines. Their results do show that more genes are downregulated when AHL15 is upregulated, and its binding does not affect the chromatin accessibility. Further, they investigated AHL15 binds in regions depleted in histone modifications and other epigenetic signatures. Subsequently, they investigated the presence of AHL15 in the gene chromatin loops. They found overlaps with both upregulated and downregulated genes. The methods are appropriately described, but could be improved to include the analysis of self-looping gene boundaries.

      Strengths:

      Their study clearly showed a lack of any specific sequence enrichment in the AHL15 binding sites, other than these being AT-rich, suggesting that AHL proteins do not recognize a specific DNA sequence but are recruited to their AT-rich target sites in another way. The study does suggest significant enrichment of AHL15 binding sites at TSS and TES, and AHL15 sites are depleted of any histone marks. They also identified that AHL15 binding sites overlap with self-looping gene boundaries.

      Weaknesses:

      The claim that AHL15 acts as a repressor and genes regulated by it are downregulated needs to be investigated based on AHL15 binding sites, to show enrichment/ depletion of AHL15 binding sites in overexpressing genes and repressed genes. The authors should provide data to support plant longevity with AHL15 overexpression using the DEX-induced system to support the claims in the title. Calculation of the enrichment score of AHL15 peaks in the self-looping genes that are upregulated or downregulated, and discussion about the different effects of AHL15 binding on self-looping regions to regulate gene expression may be helpful to understand the significance of the study. Motif enrichment in upregulated and downregulated genes separately to identify binding sequence preferences may be useful. It is not clear how the overlap of AHL15 peaks with self-looping genes has been carried out.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The study by Luden et al. seeks to elucidate the molecular functions of AHL15, a member of the AT-HOOK MOTIF NUCLEAR LOCALIZED (AHL) protein family, whose overexpression has been shown to extend plant longevity in Arabidopsis. To address this question, the authors conducted genome-wide ChIP-sequencing analyses to identify AHL15 binding sites. They further integrated these data with RNA-sequencing and ATAC-sequencing analyses to compare directly bound AHL15 targets with genes exhibiting altered expression and chromatin accessibility upon ectopic AHL15 overexpression.

      The analyses indicate that AHL15 preferentially associates with regions near transcription start sites (TSS) and transcription end sites (TES). Notably, no clear consensus DNA-binding motif was identified, suggesting that AHL15 binding may be mediated through interactions with other regulatory factors rather than through direct sequence recognition. The authors further show that AHL15 predominantly represses its direct target genes; however, this repression appears to be largely independent of detectable changes in chromatin accessibility.

      In addition to the AHL protein family, the globular H1 domain-containing high-mobility group A (GH1-HMGA) protein family also harbors AT-hook DNA-binding domains. Recent studies have shown that GH1-HMGA proteins repress FLC, a key regulator of flowering time, by interfering with gene-loop formation. The observed enrichment of AHL15 at both TSS and TES regions, therefore, raises the intriguing possibility that AHL15 may also participate in regulating gene-loop architecture. Consistent with this idea, the authors report that several direct AHL15 target genes are known to form gene loops.

      Overall, the conclusions of this study are well supported by the presented data and provide new mechanistic insights into how AHL family proteins may regulate gene expression.

      However, it is important to note that the genome-wide analyses in this study rely predominantly on ectopic overexpression of AHL15 at developmental stages when the gene is not usually expressed. Moreover, loss-of-function phenotypes for AHL15 have not been reported, leaving unresolved whether AHL15 plays a physiological role in regulating plant longevity under native conditions. It therefore remains possible that longevity control is mediated by other AHL family members rather than by AHL15 itself. In this regard, the manuscript's title would benefit from more accurately reflecting this broader implication.

      The ahl15 loss-of-function phenotype has previously been described in Karami et al., 2020 (Nat. Plants), Rahimi et al., 2022a (New Phyt.), and Rahimi et al., 2022b (Curr. Biol.), showing that ahl15 loss-of-function among others results in accelerated vegetative phase change and flowering, a reduced number of leaves produced by axillary meristems in short day grown plants and reduced secondary growth in the inflorescence stem. The dominant-negative ahl15 delta-G allele, expressing a mutant protein lacking the conserved G motif in the PPC domain, shows these phenotypes more clearly in the heterozygous ahl15 +/- background, and is embryo lethal in the homozygous ahl15 background (Karami et al., 2021, Nature Comm.). In addition, we recently show that leaf senescence is significantly accelerated in the ahl15 loss-of-function mutant (Luden et al., 2025, BioRxiv). These results show that AHL15 is involved in several aspects of ageing in Arabidopsis, and we will adjust the introduction to discuss these previous findings more explicitly.

      I agree with reviewer 1 on the possibility that multiple AHLs could have an effect on longevity, which is partially supported by the delayed flowering time observed in the AHL20, AHL27, or AHL29 overexpression lines (Karami et al., 2020, Street et al., 2008). However, the induction of the AHL15-GR fusion alone by DEX shows a clear delay of developmental phase transitions and the aging process in general, indicating that AHL15 by itself is able to extend longevity as other AHLs are not affected by DEX treatment (proven by the fact that their expression is not significantly changed in our RNA-seq analysis of DEX-treated 35S:AHL15-GR seedlings).

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Luden et al. investigates the molecular function and DNA-binding modes of AHL15, a transcription factor with pleiotropic effects on plant development. The results contribute to our understanding of AHL15 function in development, specifically, and transcriptional regulation in plants, more broadly.

      Strengths:

      The authors developed a set of genetic tools for high-resolution profiling of AHL15 DNA binding and provided exploratory analyses of chromatin accessibility changes upon AHL15 overexpression. The generated data (CHiP-Seq, ATAC-Seq and RNA-Seq is a valuable resource for further studies. The data suggest that AHL15 does not operate as a pioneer TF, but is likely involved in gene looping.

      Weaknesses:

      While the overall message is conveyed clearly and convincingly, I see one major issue concerning motif discovery and interpretation. The authors state that because HOMER detected highly enriched motifs at frequencies below 1%, they conclude that "a true DNA binding motif would be present in a large portion of the AHL15 peaks (targets) and would be rare in other regions of the genome (background)."

      I agree that the frequency below 1% is unexpectedly low; however, this more likely reflects problems in data preprocessing or motif discovery rather than intrinsic biological properties of the transcriptional factor that possesses a DNA-binding domain and is known to bind AT_rich motifs. As it is, Figure 2 cannot serve as a main figure in the manuscript: it rather suggests that the generated CHiP-Seq peakset is dominated by noise (or motif discovery was done improperly) than that AHL15 binds nonspecifically.

      Since key methodological details on the HOMER workflow are missing in the M&M section, it is not possible to determine what went wrong. Looking at other results, i.e. the reasonably structured peak distribution around TSS/TTS and consistent overlap of the peaks between the replicas, I assume that the motif discovery step was done improperly.

      Therefore, I recommend redoing the motif analysis, for example, by restricting the search to the top-ranked peaks (e.g. TOP1000) and by using an appropriate background set (HOMER can generate good backgrounds, but it was not documented in the manuscript how the authors did it). If HOMER remains unsuccessful, the authors should consider complementary methods such as STREME or MEME, similar to the approach used for GH1-HMGA (https://pmc.ncbi.nlm.nih.gov/). If the peakset is of good quality, I would expect the analysis to identify an AT-rich motif with a frequency substantially higher than 1%-more likely in the range of at least 30%. If such a motif is detected, it should be reported clearly, ideally with positional enrichment information relative to TSS or TTS. It would also be informative to compare the recovered motif with known GH1-HMGA motifs.

      If de novo motif discovery remains inconclusive, the authors should, at a minimum, assess enrichment of known AHL binding motifs using available PWMs (e.g. from JASPAR). As it stands, the claim that "our ChIP-seq data show that AHL15 binds to AT-rich DNA throughout the Arabidopsis genome with limited sequence specificity (Figure 2A, Figure S2-S4)" is not convincingly supported.

      Another point concerns the authors' hypothesis regarding the role of AHL15 in gene looping. While I like this hypothesis and it is good to discuss it in the discussion section, the data presented are not sufficient to support the claim, stated in the abstract, that AHL15 "regulates 3D genome organization," as such a conclusion would require additional, dedicated experiments.

      The motifs discovered by HOMER are ranked by their enrichment over background, of which the highest-scoring motifs are very rare in the AHL15-bound targets, but even rarer in the background, which is why they score highly on the percent enrichment score. As expected by reviewer 2, we identified AT-rich motifs that were present in a larger percentage of AHL15 targets (found in 3-18% of targets, depending on the motif, see for example motif #5 in figure S4A), which can be seen at the right tail of the histograms shown in figures 2B-C and figures S2-S4B-C. However, these motifs were also common in the background and were therefore not considered as significantly enriched in the AHL15-bound regions, with a target:background ratio of <2. As most of these motifs were flagged by HOMER as possible false-positives, and to limit the size of the (supplemental) figures, we did not show each of the motifs identified by HOMER in table form. We can include the full tables of de novo motifs identified by HOMER, including possible false-positive results for clarification.

      Although the identification of AT-rich motifs shows that AHL15 (and very likely most other AHL proteins as well) binds AT-rich regions, it does not sufficiently explain the binding of AHL15 to its target genes, as these motifs are found at almost equal frequencies in non-AHL15-bound regions.  In addition, a sequence found at this frequency in the genomic background is, in our view, too unspecific to be considered as a transcription factor binding site. Based on this, we concluded that AHL15 lacks a specific binding motif that can define the genes it binds.

      We will update the methods section to include more details on the HOMER analysis, and will also run the analysis in the top1000 shared peaks as suggested by reviewer 2.

      Reviewer #3 (Public review):

      Summary:

      This study investigated the role of AHL15 in the regulation of gene expression using AHL15 overexpression lines. Their results do show that more genes are downregulated when AHL15 is upregulated, and its binding does not affect the chromatin accessibility. Further, they investigated AHL15 binds in regions depleted in histone modifications and other epigenetic signatures. Subsequently, they investigated the presence of AHL15 in the gene chromatin loops. They found overlaps with both upregulated and downregulated genes. The methods are appropriately described, but could be improved to include the analysis of self-looping gene boundaries.

      Strengths:

      Their study clearly showed a lack of any specific sequence enrichment in the AHL15 binding sites, other than these being AT-rich, suggesting that AHL proteins do not recognize a specific DNA sequence but are recruited to their AT-rich target sites in another way. The study does suggest significant enrichment of AHL15 binding sites at TSS and TES, and AHL15 sites are depleted of any histone marks. They also identified that AHL15 binding sites overlap with self-looping gene boundaries.

      Weaknesses:

      The claim that AHL15 acts as a repressor and genes regulated by it are downregulated needs to be investigated based on AHL15 binding sites, to show enrichment/ depletion of AHL15 binding sites in overexpressing genes and repressed genes. The authors should provide data to support plant longevity with AHL15 overexpression using the DEX-induced system to support the claims in the title. Calculation of the enrichment score of AHL15 peaks in the self-looping genes that are upregulated or downregulated, and discussion about the different effects of AHL15 binding on self-looping regions to regulate gene expression may be helpful to understand the significance of the study. Motif enrichment in upregulated and downregulated genes separately to identify binding sequence preferences may be useful. It is not clear how the overlap of AHL15 peaks with self-looping genes has been carried out.

      A metagenome plot of AHL15 binding around genes that are differentially expressed upon DEX treatment can be found in Figure 3F. This analysis shows that AHL15 binding near differentially expressed genes is more pronounced compared to all AHL15-bound genes, and that AHL15 binding near the TSS is especially enriched for upregulated genes.

      As also suggested by reviewer 2, we will run a motif enrichment analysis on the differentially expressed genes that are bound by AHL15 to see if any motifs are enriched compared to the background and overrepresented in the AHL15-bound genes.

      Plant longevity in 35S:AHL15-GR plants treated with DEX has been shown by Karami et al. (2020; Nature Plants). DEX treatment extended vegetative development after flowering in Arabidopsis and tobacco, enhanced overall biomass in Arabidopsis and tobacco, re-initiation of vegetative growth in senescent tobacco) and recently we showed that it delays leaf senescence in Arabidopsis (Luden et al., 2025, bioRxiv). All these observations will be discussed in more detail in the text. In addition, we show that 35S:AHL15-GR plants treated a single time with DEX at 10 days after germination show a significantly delayed flowering time in figure 4C-D of this manuscript.

      The enrichment of AHL15 ChIP-seq peaks in self-looping genes will be analyzed as suggested and compared to a random set of genes as a control, and the methods section will be updated to clarify how the analyses on self-looping genes were carried out.

    1. eLife Assessment

      This fundamental study advances our understanding of population-level immune responses to influenza in both children and adults. The strength of the evidence supporting the conclusions is compelling, with high-throughput profiling assays and mathematical modeling. The work will be of interest to immunologists, virologists, vaccine developers, and those working on mathematical modeling of infectious diseases.

    2. Reviewer #1 (Public review):

      The authors present exciting new experimental data on the antigenic recognition of 78 H3N2 strains (from the beginning of the 2023 Northern Hemisphere season) against a set of 150 serum samples. The authors compare protection profiles of individual sera and find that the antigenic effect of amino acid substitutions at specific sites depends on the immune class of the sera, differentiating between children and adults. Person-to-person heterogeneity in the measured titers is strong, specifically in the group of children's sera. The authors find that the fraction of sera with low titers correlates with the inferred growth rate using maximum likelihood regression (MLR), a correlation that does not hold for pooled sera. The authors then measure the protection profile of the sera against historical vaccine strains and find that it can be explained by birth cohort for children. Finally, the authors present data comparing pre- and post- vaccination protection profiles for 39 (USA) and 8 (Australia) adults. The data shows a cohort-specific vaccination effect as measured by the average titer increase, and also a virus-specific vaccination effect for the historical vaccine strains. The generated data is shared by the authors and they also note that these methods can be applied to inform the bi-annual vaccine composition meetings, which could be highly valuable.

      Comments on revisions:

      Thanks to the authors for the revised version of the manuscript. This version contains extended explanations clarifying the growth analysis by MLR. The other points of the initial report were addressed as well by language adjustments. As discussed during the revision process, future work might focus on the observed heterogeneity among the serum titers to different strains and its causes, which requires additional in-depth analysis.

    3. Reviewer #2 (Public review):

      This is an excellent paper. The ability to measure the immune response to multiple viruses in parallel is a major advancement for the field, that will be relevant across pathogens (assuming the assay can be appropriately adapted). I only had a few comments, focused on maximising the information provided by the sera.

      Comments on revisions:

      These concerns were all addressed in the revised paper.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The authors present exciting new experimental data on the antigenic recognition of 78 H3N2 strains (from the beginning of the 2023 Northern Hemisphere season) against a set of 150 serum samples. The authors compare protection profiles of individual sera and find that the antigenic effect of amino acid substitutions at specific sites depends on the immune class of the sera, differentiating between children and adults. Person-to-person heterogeneity in the measured titers is strong, specifically in the group of children's sera. The authors find that the fraction of sera with low titers correlates with the inferred growth rate using maximum likelihood regression (MLR), a correlation that does not hold for pooled sera. The authors then measure the protection profile of the sera against historical vaccine strains and find that it can be explained by birth cohort for children. Finally, the authors present data comparing pre- and post- vaccination protection profiles for 39 (USA) and 8 (Australia) adults. The data shows a cohort-specific vaccination effect as measured by the average titer increase, and also a virus-specific vaccination effect for the historical vaccine strains. The generated data is shared by the authors and they also note that these methods can be applied to inform the bi-annual vaccine composition meetings, which could be highly valuable.

      We appreciate the reviewer’s clear summary of our work.

      Thanks to the authors for the revised version of the manuscript. A few concerns remain after the revision:

      (1) We appreciate the additional computational analysis the authors have performed on normalizing the titers with the geometric mean titer for each individual, as shown in the new Supplemental Figure 6. We agree with the authors statement that, after averaging again within specific age groups, "there are no obvious age group-specific patterns." A discussion of this should be added to the revised manuscript, for example in the section "Pooled sera fail to capture the heterogeneity of individual sera," referring to the new Supplemental Figure 6.

      However, we also suggested that after this normalization, patterns might emerge that are not necessarily defined by birth cohort. This possibility remains unexplored and could provide an interesting addition to support potential effects of substitutions at sites 145 and 275/276 in individuals with specific titer profiles, which as stated above do not necessarily follow birth cohort patterns.

      The reviewer is correct that there remains heterogeneity among the serum titers to different strains that we cannot easily explain via age group, and suggests that additional patterns could emerge. We certainly agree that explaining this heterogeneity remains an interesting goal, but as described in the manuscript we have analyzed the possible causes of the heterogeneity as exhaustively as possible given the available metadata. At this point, the most we can say is that the strain-specific neutralization titers are highly heterogeneous in a way that cannot be completely explained by birth cohort. We agree that further analysis of the cause is an area for future work, and have made all of our data available so that others can continue to explore additional hypotheses. It may be that these questions can only be answered by experiments on sera from newer cohorts where more detailed metadata on infection and vaccination history are available.

      (2) Thank you for elaborating further on the method used to estimate growth rates in your reply to the reviewers. To clarify: the reason that we infer from Fig. 5a that A/Massachusetts has a higher fitness than A/Sydney is not because it reaches a higher maximum frequency, but because it seems to have a higher slope. The discrepancy between this plot and the MLR inferred fitness could be clarified by plotting the frequency trajectories on a log-scale.

      For the MLR, we understand that the initial frequency matters in assessing a variant's growth. However, when starting points of two clades differ in time (i.e., in different contexts of competing clades), this affects comparability, particularly between A/Massachusetts and A/Ontario, as well as for other strains. We still think that mentioning these time-dependent effects, which are not captured by the MLR analysis, would be appropriate. To support this, it could be helpful to include the MLR fits as an appendix figure, showing the different starting and/or time points used.

      Multinomial logistic regression is a widely used technique to estimate viral growth rates from sequencing counts (PLoS Computational Biology, 20:e1012443; Nature, 597:703-708; Science, 376:1327-1332). As the reviewer points out, it does assume that the relative viral growth rates are constant over the time period analyzed. However, most of the patterns mentioned by the reviewer are not deviations from this assumption, but rather just due to the fact that frequencies are plotted on a linear scale. More specifically, our multinomial logistic regression implementation defines two parameters per variant: the initial frequency and the growth rate. The absolute variant growth rate is effectively the slope of the logit-transformed variant frequencies. Each variant's relative fitness depends on that variant's growth rate relative to a predefined baseline variant. Plotting frequencies on a logit scale does help emphasize the importance of the slope by showing exponential growth as a linear trajectory. We have added a new Supplemental Figure 9 that plots the frequencies from Figure 5A on a logit scale. As can be seen the frequency trajectories are closer to linear on the logit scale.

      We have updated the results text to clarify the nature of the fixed relative growth rates per strain and to refer to this new supplemental figure as follows:

      To estimate the evolutionary success of different human H3N2 influenza strains during 2023, we used multinomial logistic regression, which uses sequence counts to estimate fixed strain growth rates relative to a baseline strain for the entire analysis time period (in this case, 2023) [50–52]. Relative growth rates estimated by multinomial logistic regression represent relative fitnesses of strains over that time period. There were sufficient sequencing counts to reliably estimate growth rates in 2023 for 12 of the HAs for which we measured titers using our sequencing-based neutralization assay libraries (Figure 5a,b and Supplemental Figure 9). We estimated strain growth rates relative to the baseline strain of A/Massachusetts/18/2022. Note that these growth rates estimate how rapidly each strain grows relative to the baseline strain, rather than the absolute highest frequency reached by each strain. Each strain’s absolute growth rate corresponds to the slope of the strain’s logit-transformed frequencies at the end of the analysis time period (Supplemental Figure 9).

      As the reviewer notes, the multinomial logistic regression implementation assumes a fixed growth rate for each strain over the time period being analyzed. This limitation causes the inferred growth rates to emphasize the latest trends in the analysis time period. For example, at the end of December 2023 in Figure 5A, the A/Ontario/RV00796/2023 strain is growing rapidly and replacing all other variants. Correspondingly, the multinomial logistic regression infers a high growth rate for that Ontario strain relative to the A/Massachusetts/18/2022 baseline strain. However, the A/Massachusetts/18/2022 strain was growing relative to other strains in the first half of 2023 since it has a higher growth rate than they do. However, there are modest deviations from linearity on the logit scale shown in the added supplementary figure likely because the assumption of a fixed set of relative growth rates over the analyzed time period is an approximation.

      We have added the following text to the discussion to highlight this limitation of the multinomial logistic regression:

      Our comparisons of the neutralization titers to the growth rates of different H3N2 strains was limited by the fact that only a modest number of strains had adequate sequence data to estimate their growth rates. Strains with more sequencing counts tend to be those with moderate-to-high fitness, which therefore limited the dynamic range of growth rates across strains we were able to analyze. Relatedly, the multinomial logistic regression infers a single fixed growth rate per strain for the entire analysis time period of 2023, and cannot represent changes in relative fitness of strains over that relatively short time period. Additionally, because the strains for which we estimated growth rates are phylogenetically related it is difficult to assess the statistical significance of the correlation [53], so it will be important for future work to reassess the correlations with new neutralization data against the dominant strains in future years.

      (3) Regarding my previous suggestion to test an older vaccine strain than A/Texas/50/2012 to assess whether the observed peak in titer measurements is virus-specific: We understand that the authors want to focus the scope of this paper on the relative fitness of contemporary strains, and that this additional experimental effort would go beyond the main objectives outlined in this manuscript. However, the authors explicitly note that "Adults across age groups also have their highest titers to the oldest vaccine strain tested, consistent with the fact that these adults were first imprinted by exposure to an older strain." This statement gives the impression that imprinting effects increase titers for older strains, whereas this does not seem to be true from their results, but only true for A/Texas. It should be modified accordingly.

      We agree with the reviewer’s suggestion that the specific language describing the potential trend of adults having the highest titers to the oldest strain tested could be further caveated. To this end, we have made the following edits to the portion of the main text that they highlighted:

      Adults across age groups also have their highest titers to the oldest vaccine strain tested (Figure 6), consistent with the fact that these adults were likely first imprinted by exposure to an older strain more antigenically similar to A/Texas/50/2012 (the oldest strain tested here) than more recent strains. Note that a similar trend towards adult sera having higher titers to older vaccine strains was also observed in a more recent study we have performed using the same methodology described here [60].

      Notably, this trend of adults across age groups having the highest titers to the oldest vaccine strains tested has held true in subsequent work we’ve performed with H1N1 viruses (Kikawa et al., 2025 Virus Evolution, DOI: https://doi.org/10.1093/ve/veaf086). In that more recent study, we again saw that adults (cohorts EPIHK, NIID, and UWMC) tended to have their highest titers to the oldest cell-passaged strain tested (A/California/07/2009), whereas children (cohort SCH) had more similar neutralization titers across strains.  These additional data therefore support the idea that adults tend to have their highest titers to older vaccine strains, a finding that is also consistent with substantial prior work (eg, Science, 346:996-1000).

      Reviewer #2 (Public review):

      This is an excellent paper. The ability to measure the immune response to multiple viruses in parallel is a major advancement for the field, that will be relevant across pathogens (assuming the assay can be appropriately adapted). I only had a few comments, focused on maximising the information provided by the sera. These concerns were all addressed in the revised paper.

      We thank this reviewer for the summary of our work and their helpful comments in the first revision.

      Reviewer #3 (Public review):

      The authors use high throughput neutralisation data to explore how different summary statistics for population immune responses relate to strain success, as measured by growth rate during the 2023 season. The question of how serological measurements relate to epidemic growth is an important one, and I thought the authors present a thoughtful analysis tackling this question, with some clear figures. In particular, they found that stratifying the population based on the magnitude of their antibody titres correlates more with strain growth than using measurements derived from pooled serum data. The updated manuscript has a stronger motivation, and there is substantial potential to build on this work in future research.

      Comments on revisions:

      I have no additional recommendations. There are several areas where the work could be further developed, which were not addressed in detail in the responses, but given this is a strong manuscript as it stands, it is fine that these aspects are for consideration only at this point.

      We appreciate this reviewer’s summary of our work, and we are glad they feel the motivation is stronger in the revised manuscript.

    1. eLife Assessment

      This important manuscript evaluates how sample size and demographic balance of reference cohorts affect the reliability of normative models. The evidence supporting the conclusions is convincing. This work will be of interest to clinicians and scientists working with normative models.

    2. Reviewer #1 (Public review):

      This is a well-designed and carefully executed study that delivers clear and actionable guidance on the sample size and representative demographic requirements for robust normative modelling in neuroimaging. The central claims are convincingly supported.

      The study has multiple strengths. First, it offers a comprehensive and methodologically rigorous analysis of sample size and age distribution, supported by multiple complementary fit indices. Second, the learning-curve results are compelling and reproducible and will be of immediate utility to researchers planning normative modelling projects. Third, the study includes both replication in an independent dataset and an adaptive transfer analysis from UK Biobank, highlighting both the robustness of the results and the practical advantages of transfer learning for smaller clinical cohorts. Finally, the clinical validation effectively ties the methodological work back to real-world clinical application.

      One dataset-dependent limitation worth noting concerns age-distribution coverage: the larger negative effects observed under left-skewed sampling reflect a mismatch between younger training samples and older test cohorts. Importantly, the authors explicitly quantify this effect using simulation-based coverage analyses and demonstrate that it accounts for the observed asymmetry in sampling performance. By identifying and empirically characterising this constraint, the study appropriately bounds the generalisability of its conclusions while strengthening their interpretability.

    3. Reviewer #2 (Public review):

      Summary:

      The authors test how sample size and demographic balance of reference cohorts affect the reliability of normative models in ageing and Alzheimer's disease. Using OASIS-3 and replicating in AIBL, they change age and sex distributions and number of samples and show that age alignment is more important than overall sample size. They also demonstrate that models adapted from a large dataset (UK Biobank) can achieve stable performance with fewer samples. The results suggest that moderately sized but demographically well-balanced cohorts can provide robust performance.

      Strengths:

      The study is thorough and systematic, varying sample size, age, and sex distributions in a controlled way. Results are replicated in two independent datasets with relatively large sample sizes, thereby strengthening confidence in the findings. The analyses are clearly presented and use widely applied evaluation metrics. Clinical validation (outlier detection, classification) adds relevance beyond technical benchmarks.The comparison between within-cohort training and adaptation from a large dataset is valuable for real-world applications.

      The work convincingly shows that age alignment is crucial and that adapted models can reach good performance with fewer samples.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Summary: 

      Overall, this is a well-designed and carefully executed study that delivers clear and actionable guidance on the sample size and representative demographic requirements for robust normative modelling in neuroimaging. The central claims are convincingly supported. 

      Strengths: 

      The study has multiple strengths. First, it offers a comprehensive and methodologically rigorous analysis of sample size and age distribution, supported by multiple complementary fit indices. Second, the learning-curve results are compelling and reproducible and will be of immediate utility to researchers planning normative modelling projects. Third, the study includes both replication in an independent dataset and an adaptive transfer analysis from UK Biobank, highlighting both the robustness of the results and the practical advantages of transfer learning for smaller clinical cohorts. Finally, the clinical validation ties the methodological work back to clinical application.  

      We are grateful for the reviewer’s positive overall evaluation and for the constructive feedback, which has helped us refine and clarify the manuscript.

      Weaknesses: 

      There are two minor points for consideration: 

      (1) Calibration of percentile estimates could be shown for the main evaluation (similar to that done in Figure 4E). Because the clinical utility of normative models often hinges on identifying individuals outside the 5th or 95th percentiles, readers would benefit from visual overlays of model-derived percentile curves on the curves from the full training data and simple reporting of the proportion of healthy controls falling outside these bounds for the main analyses (i.e., 2.1. Model fit evaluation). 

      We thank the reviewer for this helpful point. To address this, we implemented two complementary analyses that evaluate the accuracy of percentile estimates in the main evaluation (Section 2.1, Model fit evaluation).

      (a) Percentage of healthy controls (HC) outside the extreme centiles (added to the main figure)

      For each sampling strategy and sample size, we now report the proportion of healthy controls falling outside the predicted 2.5th and 97.5th percentiles, to remain consistent with the 1.96 threshold used throughout the study. Under perfect calibration, this proportion should be close to 2.5%. This metric was computed for every ROI, model run, sample size, and sampling condition. The results are now shown in the main model-fit figure alongside MSLL, EV, Rho, SMSE, and ICC, and the corresponding statistics have been added throughout. This directly quantifies how well the centile estimates capture tail behavior, which is essential for the clinical interpretation of normative deviations. See the added plots to Figure 2 and Figure 3 (see also Table 2-3 in the revised main manuscript and replication in AIBL and transfer leaning experiments in Supplementary Materials Figure S1, S10-11, S18-19, S2829, Table S1-2, S5-6, S9-10). 

      (b) Centile curve overlays (added to the Supplementary Figures)

      To visually demonstrate calibration, we now include additional overlays of model-derived percentile curves against those obtained using the full training set. These are shown for key ROIs, multiple sample sizes and different sampling strategies in Supplementary Materials (Figure S9 and S27). These overlays illustrate where centile estimation diverges, particularly at age extremes. 

      Together, these additions provide both quantitative and qualitative evidence of percentile calibration across sampling regimes and sample sizes.

      (2) The larger negative effect of left-skewed sampling likely reflects a mismatch between the younger training set and the older test set; accounting explicitly for this mismatch would make the conclusions more generalizable. 

      We agree with the reviewer that the large negative effect of left-skewed training reflects a mismatch between the training and test age distributions. 

      To characterize the expected age distributions produced by each sampling strategy, we simulated the procedures used in the main analyses by repeatedly drawing training samples under all sampling conditions (representative, left-skewed, right-skewed, and the predefined sex-ratio settings). Simulations were performed at a fixed sample size (n = 200), generating 1000 samples per condition, and the resulting age distributions were summarized separately for males and females (Supplementary Materials section 5.1). These simulated distributions show that left-skewed sampling produces a more pronounced shift toward younger ages than the corresponding shift toward older ages under rightskewed sampling, particularly in OASIS-3, with smaller differences observed in AIBL (Tables S14– S15).

      To further quantify how these sampling-induced age profiles align with the empirical age structure of the test cohorts, we computed an age-bin coverage metric based on distribution intersection. Age was discretized into 20 quantile-based bins using the full training set of each dataset (OASIS-3 and AIBL) as reference.

      For each sampling strategy (Representative, Left-skewed, Right-skewed), sample size, and dataset, we generated 1000 independent training samples using the same sampling procedures as in the main analyses. For each sampled training set, age-bin count distributions were computed and compared to the corresponding HC test-set age-bin counts.

      Coverage was defined as:

      where, 𝑖 indexes age bins, 𝑛<sub>train</sub> and 𝑛<sub>test</sub> are the numbers of individuals in bin i in the sampled training set and HC test set, respectively. This metric quantifies the fraction of the test-set age distribution that is “covered” by the sampled training set and ranges from 0 (no test-set ages covered) to 1 (complete coverage of the test-set age distribution). For each condition, the mean and standard deviation of the coverage across repetitions were computed.

      We show that under left-skewed sampling, age coverage remains markedly reduced across all sample sizes in OASIS-3 in comparison with AIBL dataset (see Figures S37). This suggests that the poorer performance observed with left-skewed training may stem from a reduced coverage of the test age range. We added the following in the Discussion (page 27):

      “The left-skewed sampling had overall a greater effect than right-skewed sampling in both model evaluation and clinical validation, likely due to (1) the dataset’s original bias toward older individuals, making younger-skewed samples less representative, and (2) the older age structure of the AD population, which exacerbates mismatch when younger HC are used to calibrate models in the clinical population. This asymmetry is also reflected in the coverage analysis, where left-skewed sampling resulted in poorer age coverage of the target population at the same sample size (Supplementary Materials section 5.4.)”

      Reviewer #2:

      Summary: 

      The authors test how sample size and demographic balance of reference cohorts affect the reliability of normative models in ageing and Alzheimer's disease. Using OASIS-3 and replicating in AIBL, they change age and sex distributions and number of samples and show that age alignment is more important than overall sample size. They also demonstrate that models adapted from a large dataset (UK Biobank) can achieve stable performance with fewer samples. The results suggest that moderately sized but demographically well-balanced cohorts can provide robust performance. 

      Strengths: 

      The study is thorough and systematic, varying sample size, age, and sex distributions in a controlled way. Results are replicated in two independent datasets with relatively large sample sizes, thereby strengthening confidence in the findings. The analyses are clearly presented and use widely applied evaluation metrics. Clinical validation (outlier detection, classification) adds relevance beyond technical benchmarks. The comparison between within-cohort training and adaptation from a large dataset is valuable for real-world applications. 

      The work convincingly shows that age alignment is crucial and that adapted models can reach good performance with fewer samples. However, some dataset-specific patterns (noted above) should be acknowledged more directly, and the practical guidance could be sharper. 

      We are grateful for the reviewer’s positive overall evaluation and for the constructive comments that guided our revisions strengthened the manuscript.

      Weaknesses: 

      The paper uses a simple regression framework, which is understandable for scalability, but limits generalization to multi-site settings where a hierarchical approach could better account for site differences. This limitation is acknowledged; a brief sensitivity analysis (or a clearer discussion) would help readers weigh trade-offs. 

      We thank the reviewer for this insightful point. We agree that hierarchical Bayesian regression provides clear advantages in multi-site settings, particularly when site-level variability is substantial or when federated learning is required. In our case, both OASIS-3 and AIBL include only a small number of sites, and the primary aim of the study was to isolate the effects of sample size and covariate composition rather than to model site-related structure. For these reasons, implementing HBR was beyond the scope of the present work, but we fully acknowledge its relevance for studies with larger or more heterogeneous site configurations. To clarify this distinction, we added a dedicated paragraph in the Discussion (page 28) that situates warped BLR and HBR within different data scenarios and outlines the circumstances under which each approach is preferable.

      “From a methodological perspective, the choice between warped BLR and HBR should primarily be guided by the structure of site effects and by computational constraints. HBR explicitly models sitelevel variation through hierarchical random effects, enabling information sharing across sites and supporting federated-learning implementations in which site-specific updates can be combined without sharing raw data (Bayer et al., 2022; Kia et al., 2021; Maccioni et al., 2025). This structure provides more stable estimates when site-specific sample sizes are small or acquisition differences are substantial. In contrast, wrapped BLR treats site as a fixed-effect covariate when site adjustment is required and does not implement hierarchical pooling, but offers simpler inference and substantially lower computational cost while accommodating non-Gaussian data distributions through the warping transformation (C. J. Fraza et al., 2021). These properties make wrapped BLR practical in settings where site heterogeneity is limited or adequately controlled, whereas HBR may be preferable in strongly multisite contexts or when federated learning is required for privacy-preserving data integration.”

      Other than that, there are some points that are not fully explained in the paper: 

      (1) The replication in AIBL does not fully match the OASIS results. In AIBL, left-skewed age sampling converges with other strategies as sample size grows, unlike in OASIS. This suggests that skew effects depend on where variability lies across the age span. 

      Recommendation: Replication differences across datasets (age skew): 

      In OASIS, left-skewed (younger-heavy) training harms performance and does not fully recover with more data; in AIBL, performance under left-skew appears to converge toward the other conditions as training size grows. Given AIBL's smaller size and older age range, please explain this discrepancy. Does this imply that the effect of skew depends on where biological variability is highest across the age span (e.g., more variability from ~45-60 in OASIS vs {greater than or equal to}60 in AIBL), rather than on "skew" per se? If so, the paper should say explicitly that skewness must be interpreted relative to the age-variability profile of the target population, not just counts. 

      We thank the reviewer for this thoughtful comment. To examine whether differences in age-related variability could explain the replication patterns, we quantified how regional variance changed with age by computing age-binned variance profiles in the HC training sets of OASIS-3 and AIBL. Age was discretized into 10 quantile-based bins for each dataset separately. For each ROI and each age bin, we calculated the sample variance of the ROI values within that bin. The bin center was defined as the mean age of individuals in the corresponding bin. We then summarized variance across ROIs by computing, for each age bin, the median variance and its interquartile range (25th–75th percentile). These summary profiles (median and IQR across ROIs as a function of bin-centered age) are shown in Author response image 1. As shown in this plot, OASIS-3 and AIBL display comparable levels of variance across their respective age ranges, and the profiles do not suggest pronounced shifts in variability that would account for the divergent behavior of the left-skewed models.

      Author response image 1.

      Median ROI variance across age bins for OASIS-3 and AIBL. Shaded areas represent variability across regions within each age bin.

      Instead, the coverage analysis recommended by the reviewer in comment #5 and introduced in our response to Reviewer 1, comment #2 indicates that the replication differences between OASIS-3 and AIBL are primarily driven by the age coverage of the sampled training sets relative to the test cohorts. In AIBL, which has a narrower and predominantly older age range, left-skewed sampling shows slightly lower coverage than right-skewed sampling, but coverage increases steadily with sample size, and the strategies converge as n grows. In contrast, OASIS-3 spans a broader lifespan and is itself skewed toward older ages; under left-skewed sampling, coverage of the test-set age range increases more slowly and remains comparatively lower even at large n. This slower recovery of age coverage explains why leftskewed performance does not recover in OASIS-3 and why the discrepancies between left- and rightskewed sampling are more pronounced in this dataset. The corresponding age-coverage curves are reported in Supplementary Figures S37. 

      Furthermore, this difference is also reflected in the expected age distributions obtained from repeated simulations of the sampling procedures (Supplementary Materials section 5.1. Tables S14–S15), where left-skewed sampling induces a larger shift toward younger ages than right-skewed sampling induces toward older ages, especially in OASIS-3, with smaller differences observed in AIBL. 

      For more details on both analyses see also our response to Reviewer 1, comment #2.

      (2) Sex imbalance effects are difficult to interpret, since sex is included only as a fixed effect, and residual age differences may drive some errors. 

      Recommendation: Sex effects may be confounded with age:

      Because sex is treated only as a fixed effect, it is unclear whether errors under sex-imbalance scenarios partly reflect residual age differences between female and male subsets. Please report (or control for) age distributions within each sex-imbalance condition, and clarify whether the observed error changes are truly attributable to sex composition rather than age composition. 

      To address the concern that sex-imbalance effects could be driven by residual age differences we now explicitly report the age distributions by sex for the original training and test datasets, as well as the expected age distributions induced by each sampling condition, obtained by repeated simulation of the sampling procedure (Supplementary Materials section 5.1, Tables S13-15). Table S13 shows very similar distributions of age for HC train and test sets across sexes within each dataset. Tables S14–S15 further show that, within each sampling strategy, the age distributions of females and males are highly similar, including under sex-imbalanced conditions. These summaries confirm that the sampling procedures do not introduce systematic age-structure differences between sexes.

      In addition, we extended the statistical models for tOC and MSE to explicitly include age, sex, and all higher-order interactions with the diagnosis, sample size, and sex-ratio sampling (Supplementary Materials section 5.2., Tables S17 for direct training, and S19 for transferred models). For completion we also included age and sex for age samplings models (Supplementary Tables S16 for direct training, S18 for transferred models). These analyses revealed no significant main effects of age under seximbalanced sampling and only very small effect sizes in isolated higher-order interactions. Together, these results indicate that age did not introduce residual confounding in our analyses.

      We now report in the Results section (page 15) the following: 

      “Supplementary analysis (Tables S17,19) also showed that main effect of age was not significant for either MSE or tOC, and no significant age × sex-ratio interactions were observed. While some higherorder interactions involving age, diagnosis, and sex-ratio reached statistical significance, all associated effect sizes were very small and inconsistent across outcomes, indicating that the observed error changes are not driven by residual age confounding.”

      And in the Methods section (page 36): 

      “Age distributions were summarized separately for males and females in the original training and test sets (Supplementary Table S13) and the expected age distributions resulting from the skewed-age sampling and the sex-imbalance sampling procedures were obtained by repeated simulations at a fixed sample size and are reported in Supplementary Tables S14–S15.”

      (3) In Figure 3, performance drops around n≈300 across conditions. This consistent pattern raises the question of sensitivity to individual samples or sub-sampling strategy. 

      Recommendation: Instability around n ≈ 300 (Figure 3):

      Several panels show a consistent dip in performance near n=300. What drives this? Is the model sensitive to particular individuals being included/excluded at that size, or does it reflect an interaction with the binning/selection scheme? A brief ablation (e.g., alternative sub-sampling seeds or bins) would help rule out artefacts. 

      We thank the reviewer for highlighting this point. To assess whether the observed dip at n=300 reflected sensitivity to the specific individuals selected or to the sub-sampling scheme, we re-ran the analysis at n = 300 using 20 independent random seeds (Supplementary Materials sections 5.3.). This ablation showed no systematic decrease in performance across repetitions, indicating that the original effect was driven by stochastic sampling variability rather than a stable model instability or binning interaction. We now report this control analysis in the Supplementary Materials (Figure S36). We have clarified this point in the Results page 10:

      “A consistent dip in performance was observed around n = 300 for the left-skewed sampling condition in the original analysis (Figure 3). To assess whether this reflected sensitivity to the specific subsampling or stochastic sampling variability, we repeated the analysis for this specific sample using 20 independent random seeds (Figure S36); the absence of a consistent effect across repetitions indicates that the original pattern was driven by sampling variability rather than a systematic model artifact.”

      (4) The total outlier count (tOC) analysis is interesting but hard to generalize. For example, in AIBL, left-skew sometimes performs slightly better despite a weaker model fit. Clearer guidance on how to weigh model fit versus outlier detection would strengthen the practical message. 

      Recommendation: Interpreting total outlier count (tOC): 

      The tOC findings are interesting but hard to operationalize. In AIBL, even for n>40, left-skewed training sometimes yields slightly better tOC discrimination and other strategies plateau. Does this mean that a better model fit on the reference cohort does not necessarily produce better outlier-based case separation? Please add a short practical rule-set: e.g., when optimizing for deviation mapping/outlier detection, prioritize coverage of the patient-relevant age band over global fit metrics; report both fit and tOC sensitivity to training-set age coverage. 

      We thank the reviewer for this important point. Apparent improvements in tOC-based separation under left-skewed training should not be interpreted as indicating a better model or superior deviation mapping. In particular, in AIBL, left-skew can sometimes yield slightly larger group differences in tOC despite weaker overall model fit. This reflects an inflation of deviation magnitude in AD rather than improved separation per se. Crucially, relative ranking between HC and AD remains preserved across sampling strategies, as shown by the classification analysis in the main manuscript (Figure 5C), indicating that enhanced tOC contrast under left-skew does not translate into improved case discrimination. Instead, it reflects a systematic shift in deviation scale due to age-mismatched training.

      We now clarify this distinction in the Discussion of the main manuscript on page 26:

      “Importantly, apparent increases in HC–AD separation in total outlier count should not be interpreted as evidence of superior model quality. Age-mismatched training can rescale deviation magnitudes and inflate tOC in specific subgroups without improving true case–control separability, as shown by classification task (Figure 5C). Model fit metrics and outlier-based measures, therefore capture complementary but distinct aspects of normative model behavior and should be interpreted jointly rather than in isolation.”

      (5) The suggested plateau at n≈200 seems context dependent. It may be better to frame sample size targets in relation to coverage across age bins rather than as an absolute number. 

      Recommendation: "n≈200" as a plateau is context-dependent: 

      The suggested threshold for stable fits (about 200 people) likely depends on how variable the brain features are across the covered ages. Rather than an absolute number, consider reporting a coverageaware target, such as a minimum per-age-bin coverage or an effective sample size relative to the age range. This would make the guidance transferable to cohorts with different age spans. 

      We agree that the observed performance plateau around n≈200 is context dependent and may shift with the covered age range, anatomical variability, and feature of interest. In the present study, this stabilization was evaluated within the specific datasets and age spans considered and extending it to broader lifespan or different biological contexts will require dedicated future work.

      To clarify this point, we added an explicit age-coverage analysis in the Supplementary Materials (section 5.4.) as introduced in response to reviewer 1 on comment #2. This analysis shows that, under representative sampling, the point at which age coverage becomes complete closely coincides with the saturation of model fit and stability metrics. At the same time, we note that normative models operate in continuous covariate space, such that reliable interpolation can still be achieved even when intermediate age ranges are less densely sampled, provided that surrounding age ranges are sufficiently represented. This makes rigid minimum per-bin requirements difficult to define in a generalizable way.

      Rather than proposing a universal sample-size threshold, we now emphasize that both learning-curve analyses and age-coverage assessments offer a more transferable way to identify when performance approaches saturation for a given dataset. This clarification is now included in the Discussion on page 25:

      “This is further supported by the coverage analysis reported in the Supplementary Materials (section 5.4), which shows that under representative sampling, the point of full age coverage closely coincides with the saturation of model fit and stability metrics. Rather than proposing a universal sample size threshold, we therefore encourage readers to perform learning-curve analyses, complemented by age coverage assessments, in their own datasets to empirically assess when performance approaches saturation for their specific age range and population.”

      And we also address it in the limitations page 29: 

      “In addition, the observed stabilization of model performance around 200–300 participants was evaluated within the specific age ranges and cohorts examined here and may shift in broader lifespan settings or in populations with different sources of biological variability.”

      (5) Minor inconsistency in training-set size: 

      The manuscript mentions 691 in Methods, but the figures/scripts label is 692. Please correct for consistency. 

      Thank you for pointing out this inconsistency, the error in the methods section has been corrected.

    1. eLife Assessment

      This valuable study provides insights into the role of Pten mutations in SHH-medulloblastoma, by using mouse models to resolve the effects of heterozygous vs homozygous mutations on proliferation and cell death throughout tumorigenesis. The experiments presented are convincing, with rigorous quantifications and orthogonal experimentation provided throughout, and the models employing sporadic oncogene induction, rather than EGL-wide genetic modifications, represent an advancement in experimental design. However, additional experimentation focused on a greater characterization of macrophage phenotypes (e.g., microglia vs circulating monocytes) would enhance this study. The work will be of interest to medical biologists studying general cancer mechanisms, as the function of Pten may be similar across tumor types.

    2. Reviewer #1 (Public review):

      This study investigates how Pten loss influences medulloblastoma development in mouse models of Shh-driven MB. Previous studies have shown that Pten heterozygosity can accelerate tumorigenesis in models where the entire GNP compartment harbours MB-promoting mutations, raising questions about how Pten levels and context interact, especially when MB-initiating mutations occur sporadically in the cerebellum. Here, the authors create an allelic series combining sporadic, cell-autonomous induction of oncogenic SmoM2 with Pten loss in granule neuron progenitors. In contrast to previous studies, Pten heterozygosity does not significantly impact tumour development from sporadic SmoM2 induction, whereas complete Pten loss accelerates tumour onset. Analysis of Pten-deficient tumours reveals accumulation of death-resistant differentiated cells and reduced macrophage infiltration. At early stages, Pten-deficient pre-tumour cells exhibit increased proliferation and EGL hyperplasia, indicating that Pten loss drives proliferation but shifts cells towards differentiation.

      Strengths

      This study raises the bar for modelling and interpreting the effects of secondary mutations on MB development. It is carefully executed, and the models-using sporadic oncogene induction rather than EGL-wide genetic manipulations-represent an advance in experimental design. The deeper phenotyping, including single-cell RNA-seq and target validation, adds rigor. This work extends previous work on ShhMB and Pten by showing that Pten heterozygosity in GNPs is likely not responsible for the accelerated tumour development reported in earlier studies. The evolution of these Pten-deficient tumours from proliferative to post-mitotic and death-resistant is an important observation with potential clinical significance.

      Minor weakness

      The absence of an effect of Pten heterozygosity on tumour development in their model suggests non-cell-autonomous effects, but this is not directly demonstrated. Changes in macrophage recruitment warrant further exploration and represent an interesting avenue for future investigation.

    3. Reviewer #2 (Public review):

      The authors sought to answer several questions about the role of the tumor suppressor PTEN in SHH-medulloblastoma formation. Namely, whether Pten loss increases metastasis, understanding why Pten loss accelerates tumor growth, and the effect of single-copy vs double-copy loss on tumorigenesis. Using an elegant mouse model, the authors found that Pten mutations do not increase metastasis in a SmoD2-driven SHH-medullolbastoma mouse model, based on extensive characterization of the presence of spinal cord metastases. Upon examining the cellular phenotype of Pten-null tumors in the cerebellum, the authors made the interesting and puzzling observation that Pten loss increased the differentiation state of the tumor, with less cycling cells, seemingly in contrast to the higher penetrance and decreased latency of tumor growth.

      The authors then examined the rate of cell death in the tumor. Interestingly, Pten-null tumors had less dying cells, as assessed by TUNEL. In addition, the tumors expressed differentiaton markers NeuN and SyP, which are rare in SHH-MB mouse models. This reduction in dying cells is also evident at earlier stages of tumor growth. By looking shortly after Pten-loss induction, the authors found that Pten loss had an immediate impact on increasing the proliferative state of GCPs, followed by enhancing survival of differentiated cells. These two pro-tumor features together account for the increased penetrance and decreased latency of the model. While heterozygous loss of Pten also promoted proliferation, it did not protect against cell death.<br /> Interestingly, loss of Pten alone in GCPs caused an increase in cerebellar size throughout development. The authors suggest that Pten normally constrants GCP proliferation, although they did not check whether reduced cell death is also contributing to cerebellum size.

      Lastly, the authors examined macrophage infiltration and found that there was less macrophage infiltration to the Pten-null tumors. Using scRNA-seq, they suggest that the observed reduction in macrophages might be due to immunosuppressive tumor microenvironment.

      This mouse model will be of high relevance to the medulloblastoma community, as current models do not reflect the heterogeneity of the disease. In addition, the elegant experimentation into Pten function may be relevant to cancer biologists outside of the medulloblastoma field.

      Strengths:

      The in-depth characterisation of the mouse model is a major strength of the study, including multiple time points and quantifications. The single-cell sequencing adds a nice molecular feature, and this dataset may be relevant to other researchers with specific questions of Pten function.

      Weaknesses:

      Adequately addressed in revisions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study provides insights into the role of Pten mutations in SHH-medulloblastoma, by using mouse models to resolve the effects of heterozygous vs homozygous mutations on proliferation and cell death throughout tumorigenesis. The experiments presented are convincing, with rigorous quantifications and orthogonal experimentation provided throughout, and the models employing sporadic oncogene induction, rather than EGL-wide genetic modifications, represent an advancement in experimental design. However, the study remains incomplete, such that the biological conclusions do not extend greatly from those in the extant literature; this could be addressed with additional experimentation focused on cell cycle kinetic changes at early stages, as well as greater characterization of macrophage phenotypes (e.g., microglia vs circulating monocytes). The work will be of interest to medical biologists studying general cancer mechanisms, as the function of Pten may be similar across tumor types.

      We appreciate the summary of the importance of our work and agree that it provides a foundation for future experiments addressing underlying mechanisms including the role of macrophages in tumor progression/regression

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper investigates how Pten loss influences the development of medulloblastoma using mouse models of Shh-driven MB. Previous studies have shown that Pten heterozygosity can accelerate tumorigenesis in models where the entire GNP compartment has MB-promoting mutations, raising questions about how Pten levels and context interact, especially when cancer-causing mutations are more sporadic. Here, the authors create an allelic series combining sporadic, cell-autonomous induction of SmoM2 with Pten loss in granule neuron progenitors. In their models, Pten heterozygosity does not significantly impact tumor development, whereas complete Pten loss accelerates tumour onset. Notably, Pten-deficient tumours accumulate differentiated cells, reduced cell death, and decreased macrophage infiltration. At early stages, before tumour establishment, they observe EGL hyperplasia and more pre-tumour cells in S phase, leading them to suggest that Pten loss initially drives proliferation but later shifts towards differentiation and accumulation of death-resistant, postmitotic cells. Overall, this is a well-executed and technically elegant study that confirms and extends earlier findings with more refined models. The phenotyping is strong, but the mechanistic insight is limited, especially with respect to dosage effects and macrophage biology.

      Strengths:

      The work is carefully executed, and the models-using sporadic oncogene induction rather than EGL-wide genetic manipulations-represent an advance in experimental design. The deeper phenotyping, including singlecell RNA-seq and target validation, adds rigor.

      Weaknesses:

      The biological conclusions largely confirm findings from previous studies (Castellino et al, 2010; Metcalf et al, 2013), showing that germline or conditional Pten heterozygosity accelerates tumorigenesis, generates tumors with a very similar phenotype, including abundant postmitotic cells, and reduced cell death.

      We respectfully would like to point out that we have added new insights not covered in the previous more abbreviated studies. First, we are the first to show that in a sporadic model, heterozygous loss of Pten does not lead to accelerated or more aggressive disease. This is an important finding, since this is the case for many patients and only germline PTEN mutant humans are likely to have more aggressive tumors. Also, the previous studies did not examine tumor progress by analyzing neonatal stages or analyze spinal cord metastasis. We found a different phenotype at some early stages then at end stage, thus they provide new insights. Our study also is the only one to apply a mosaic analysis to study cell behaviors at early stages of progression, including proliferation and differentiation/survival. We are also the first to demonstrate a reduction in macrophages in Pten mutant SHH-MB.

      The second stated goal - to understand why Pten dosage might matter - remains underdeveloped. The difference between earlier models using EGL-wide SmoA1 or Ptch loss versus sporadic cell-autonomous SmoM2 induction and Pten loss in this study could reflect model-specific effects or non-cell-autonomous contributions from Pten-deficient neighbouring cells in the EGL, for example. However, the study does not explore these possibilities. For instance, examining germline Pten loss in the sporadic SmoM2 context could have provided insight into whether dosage effects are cell-autonomous or dependent on the context.

      We thank the reviewer for suggesting this experiment and agree it would be an informative one for other groups to perform as a follow up to our work to allow a direct comparison in the same sporadic SHH-MB model of mosaic vs germline loss of Pten. Also, we would like to point out that we do show a dosage effect of lowering vs removing Pten when only sporadic GCPs also have an activating mutation in SMO. Please see above comments for additional new mechanistic insight we have provided.

      The observations on macrophages are intriguing but preliminary. The reduction in Iba1+ cells could reflect changes in microglia, barrier-associated macrophages, or infiltrating peripheral macrophages, but these populations are not distinguished. Moreover, the functional relevance of these immune changes for tumor initiation or progression remains unexplored.

      We agree, further studies of the influence of Pten mutations on macrophage phenotypes will be interesting.

      Reviewer #2 (Public review):

      The authors sought to answer several questions about the role of the tumor suppressor PTEN in SHHmedulloblastoma formation. Namely, whether Pten loss increases metastasis, understanding why Pten loss accelerates tumor growth, and the effect of single-copy vs double-copy loss on tumorigenesis. Using an elegant mouse model, the authors found that Pten mutations do not increase metastasis in a SmoD2-driven SHH-medulloblastoma mouse model, based on extensive characterization of the presence of spinal cord metastases. Upon examining the cellular phenotype of Pten-null tumors in the cerebellum, the authors made the interesting and puzzling observation that Pten loss increased the differentiation state of the tumor, with fewer cycling cells, seemingly in contrast to the higher penetrance and decreased latency of tumor growth.

      The authors then examined the rate of cell death in the tumor. Interestingly, Pten-null tumors had fewer dying cells, as assessed by TUNEL. In addition, the tumors expressed differentiation markers NeuN and SyP, which are rare in SHH-MB mouse models. This reduction in dying cells is also evident at earlier stages of tumor growth. By looking shortly after Pten-loss induction, the authors found that Pten loss had an immediate impact on increasing the proliferative state of GCPs, followed by enhancing the survival of differentiated cells. These two pro-tumor features together account for the increased penetrance and decreased latency of the model. While heterozygous loss of Pten also promoted proliferation, it did not protect against cell death.

      Interestingly, loss of Pten alone in GCPs caused an increase in cerebellar size throughout development. The authors suggest that Pten normally constrains GCP proliferation, although they did not check whether reduced cell death is also contributing to cerebellum size.

      Lastly, the authors examined macrophage infiltration and found that there was less macrophage infiltration in the Pten-null tumors. Using scRNA-seq, they suggest that the observed reduction in macrophages might be due to an immunosuppressive tumor microenvironment.

      This mouse model will be of high relevance to the medulloblastoma community, as current models do not reflect the heterogeneity of the disease. In addition, the elegant experimentation into Pten function may be relevant to cancer biologists outside of the medulloblastoma field.

      Strengths:

      The in-depth characterisation of the mouse model is a major strength of the study, including multiple time points and quantifications. The single-cell sequencing adds a nice molecular feature, and this dataset may be relevant to other researchers with specific questions of Pten function.

      Weaknesses:

      One weakness of the study was the examination of the macrophage phenotype, which did not include quantification (only single images), so it is difficult to assess whether this reduction of macrophages holds true across multiple samples. Future studies will also be needed to assess whether Pten-mutated patient medulloblastomas also have a differentiation phenotype, but this is difficult to assess given the low number of samples worldwide.

      We thank the reviewer for highlighting the importance of our sporadic mutant approach and new findings. As stated above, we agree, further studies of the influence of Pten mutations on macrophage phenotypes will be interesting as well as of human samples once large numbers can be obtained. All conclusions about macrophages are based on analyzing 3 independent tumors/genotype, which was stated in the Figure legends, and for all end stage tumors the sections were collected from one lateral edge of the tumor to the midline and for earlier stage from one side of the brain to the other, thus we believe the reported phenotypes are consistent within tumor and stages

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor points 

      (1) The authors should state explicitly that early EGL analyses sample the same cerebellar region across animals (e.g., matched lobule or distance from the midline) because position-dependent effects are possible. 

      We agree this is an important aspect of the rigor of the study and are sorry this was not clear enough. We had stated in the legends to Figures 4 and 5 that midline sections were analyzed and when it was not the entire EGL quantified the region analyzed was shown, but we now include more details in all relevant Figure legends and in the Methods section. 

      (2) It is not clear from Figure 3i-k that TUNEL density in Syp-high regions differs between Pten+/- and Pten-/- tumors. 

      We have added a new graph as Figure 3 Supplemental Figure 1D with this direct comparison. Indeed, there is no difference between the Syp-high regions of Pten+/- and Pten-/- tumors as these regions of Pten+/- tumors have no detectable PTEN protein and thus have the same behavior as Pten-/- tumors (reduced cell death).

      (3) The authors interpret the increase in the %EdU+ GFP+ cells in the EGL as evidence of a faster cell cycle. However, EdU labeling alone does not demonstrate altered cell cycle kinetics; this would require a dedicated assay. It would also be informative to combine EdU with Ki67 staining. This could clarify whether the effect reflects changes in differentiation - for example, if a higher proportion of GFP+ pre-tumor cells remain Ki67+-or whether the increase in EdU simply reflects a greater fraction of cells being in cycle. Such an analysis might even reveal no change in cycling if the proliferation index in controls is lower. 

      We are sorry we did not make our analysis sufficiently clear in Figure 5 and Figure 6. The quantification of EdU+ cells was restricted to the outer EGL (region defined by containing GFP+ and EdU+ cells) where all cells should be Ki67+.  We cannot perform co-staining of Ki67 and GFP, since antigen retrieval for Ki67 removes the epitope for our GFP antibody. We have revised the wording in the figure legends and results sections.  

      (4) Some of the stains are unconvincing - for example, Figure 2 E,F, the p27 staining is difficult to distinguish from the background, Figure 7G,E- CD31+ blood vessels are difficult to see. 

      As requested, in Fig. 2 we adjusted the level of the green color for P27 to reduce the background in A, B, E , F using Photoshop. In Fig. 7G, H we adjusted the level of the green color for CD31 to reduce the background.  

      (5) Line 158: "unlike a SmoA2 model with germline or broad deletion of Pten in the cerebellum, where heterozygous deletion is sufficient..." That paper refers to the Neuro-D2SmoA1 mouse model. So this statement should be clarified.  

      We have made this edit.

      Reviewer #2 (Recommendations for the authors): 

      (1) I find the final discussion paragraph about Kmt2d does not add much to the study, as it seems obvious that the mechanisms of tumor formation would differ between two different tumor suppressor genes, but this is only my opinion. 

      We respectfully think it is interesting, even if expected, so have left it in the Discussion.

      (2) There is also a typo on line 342 that changes the meaning of the sentence: mTORC1 signaling is significantly 'unregulated'; 

      We thank the reviewer for noticing this mistake. We have changed 'unregulated' to ‘upregulated’.

      (3) Figure 9Q,R mislabeled: not mTORC1, but instead UPR  

      Asns is included in the mTOR pathway in Hallmark MTOR1 signaling as well as in the Unfolded Protein Response gene list. We have made a note of this in the Figure legend.

    1. eLife Assessment

      This manuscript presents a valuable study of the activity and functional relevance of different circuits in the dentate gyrus of mice performing a pattern separation task. The study is likely to be of interest to those studying the subregional organization and cell type-specific functions of the dentate gyrus. However, the strength of evidence for the study's conclusions is currently incomplete.

    2. Reviewer #1 (Public review):

      This manuscript investigates how dentate gyrus (DG) granule cell subregions, specifically suprapyramidal (SB) and infrapyramidal (IB) blades, are differentially recruited during a high cognitive demand pattern separation task. The authors combine TRAP2 activity labeling, touchscreen-based TUNL behavior, and chemogenetic inhibition of adult-born dentate granule cells (abDGCs) or mature granule cells (mGCs) to dissect circuit contributions.

      This manuscript presents an interesting and well-designed investigation into DG activity patterns under varying cognitive demands and the role of abDGCs in shaping mGC activity. The integration of TRAP2-based activity labeling, chemogenetic manipulation, and behavioral assays provides valuable insight into DG subregional organization and functional recruitment. However, several methodological and quantitative issues limit the interpretability of the findings. Addressing the concerns below will greatly strengthen the rigor and clarity of the study.

      Major points:

      (1) Quantification methods for TRAP+ cells are not applied consistently across panels in Figure 1, making interpretation difficult. Specifically, Figure 1F reports TRAP+ mGCs as density, whereas Figure 1G reports TRAP+ abDGCs as a percentage, hindering direct comparison. Additionally, Figure 1H presents reactivation analysis only for mGCs; a parallel analysis for abDGCs is needed for comparison across cell types.

      (2) The anatomical distribution of TRAP+ cells is different between low- and high-cognitive demand conditions (Figure 2). Are these sections from dorsal or ventral DG? Is this specific to dorsal DG, as itis preferentially involved in cognitive function? What happens in ventral DG?

      (3) The activity manipulation using chemogenetic inhibition of abDGCs in AsclCreER; hM4 mice was performed; however, because tamoxifen chow was administered for 4 or 7 weeks, the labeled abDGC population was not properly birth-dated. Instead, it consisted of a heterogeneous cohort of cells ranging from 0 to 5-7 weeks old. Thus, caution should be taken when interpreting these results, and the limitations of this approach should be acknowledged.

      (4) There is a major issue related to the quantification of the DREADD experiments in Figure 4, Figure 5, Figure 6, and Figure 7. The hM4 mouse line used in this study should be quantified using HA, rather than mCitrine, to reliably identify cells derived from the Ascl lineage. mCitrine expression in this mouse line is not specific to adult-born neurons (off-targets), and its expression does not accurately reflect hM4 expression.

      (5) Key markers needed to assess the maturation state of abDGCs are missing from the quantification. Incorporating DCX and NeuN into the analysis would provide essential information about the developmental stage of these cells.

      Minor points:

      (1) The labeling (Distance from the hilus) in Figure 2B is misleading. Is that the same location as the subgranular zone (SGZ)? If so, it's better to use the term SGZ to avoid confusion.

      (2) Cell number information is missing from Figures 2B and 2C; please include this data.

      (3) Sample DG images should clearly delineate the borders between the dentate gyrus and the hilus. In several images, this boundary is difficult to discern.

      (4) In Figure 6, it is not clear how tamoxifen was administered to selectively inhibit the more mature 6-7-week-old abDGC population, nor how this paradigm differs from the chow-based approach. Please clarify the tamoxifen administration protocol and the rationale for its specificity.

    3. Reviewer #2 (Public review):

      Summary

      In this manuscript, the authors combine an automated touchscreen-based trial-unique nonmatching-to-location (TUNL) task with activity-dependent labeling (TRAP/c-Fos) and birth-dating of adult-born dentate granule cells (abDGCs) to examine how cognitive demand modulates dentate gyrus (DG) activity patterns. By varying spatial separation between sample and choice locations, the authors operationally increase task difficulty and show that higher demand is associated with increased mature granule cell (mGC) activity and an amplified suprapyramidal (SB) versus infrapyramidal (IB) blade bias. Using chemogenetic inhibition, they further demonstrate dissociable contributions of abDGCs and mGCs to task performance and DG activation patterns.

      The combination of behavioral manipulation, spatially resolved activity tagging, and temporally defined abDGC perturbations is a strength of the study and provides a novel circuit-level perspective on how adult neurogenesis modulates DG function. In particular, the comparison across different abDGC maturation windows is well designed and narrows the functionally relevant population to neurons within the critical period (~4-7 weeks). The finding that overall mGC activity levels, in addition to spatially biased activation patterns, are required for successful performance under high cognitive demand is intriguing.

      Major Comments

      (1) Individual variability and the relationship between performance and DG activation.

      The manuscript reports substantial inter-animal variability in the number of days required to reach the criterion, particularly during large-separation training. Given this variability, it would be informative to examine whether individual differences in performance correlate with TRAP+ or c-Fos+ density and/or spatial bias metrics. While the authors report no correlation between success and TRAP+ density in some analyses, a more systematic correlation across learning rate, final performance, and DG activation patterns (mGC vs abDGC, SB vs IB) could strengthen the interpretation that DG activity reflects task engagement rather than performance only.

      (2) Operational definition of "cognitive demand".

      The distinction between low (large separation) and high (small separation) cognitive demand is central to the manuscript, yet the definition remains somewhat broad. Reduced spatial separation likely alters multiple behavioral variables beyond cognitive load, including reward expectation, attentional demands, confidence, engagement, and potentially motivation. The authors should more explicitly acknowledge these alternative interpretations and clarify whether "cognitive demand" is intended as a composite construct rather than a strictly defined cognitive operation.

      (3) Potential effects of task engagement on neurogenesis.

      Given the extensive behavioral training and known effects of experience on adult neurogenesis, it remains unclear whether the task itself alters the size or maturation state of the abDGC population. Although the focus is on activity and function rather than cell number, it would be useful to clarify whether neurogenesis rates were assessed or controlled for, or to explicitly state this as a limitation.

      (4) Temporal resolution of activity tagging.

      TRAP and c-Fos labeling provide a snapshot of neural activity integrated over a temporal window, making it difficult to determine which task epochs or trial types drive the observed activation patterns. This limitation is partially acknowledged, but the conclusions occasionally imply trial-specific or demand-specific encoding. The authors should more clearly distinguish between sustained task engagement and moment-to-moment trial processing, and temper interpretations accordingly. While beyond the scope of the current study, this also motivates future experiments using in vivo recording approaches.

      (5) Interpretation of altered spatial patterns following abDGC inhibition.

      In the abDGC inhibition experiments, Cre+ DCZ animals show delayed learning relative to controls. As a result, when animals are sacrificed, they may be at an intermediate learning stage rather than at an equivalent behavioral endpoint. This raises the possibility that altered DG activation patterns reflect the learning stage rather than a direct circuit effect of abDGC inhibition. Additional clarification or analysis controlling for the learning stage would strengthen the causal interpretation.

      (6) Relationship between c-Fos density and behavioral performance.

      The study reports that abDGC inhibition increases c-Fos density while impairing performance, whereas mGC inhibition decreases c-Fos density and also impairs performance. This raises an important conceptual question regarding the relationship between overall activity levels and task success. The authors suggest that both sufficient activity and appropriate spatial patterning are required, but the manuscript would benefit from a more explicit discussion of how different perturbations may shift the identity, composition, or coordination of the active neuronal ensemble rather than simply altering total activity levels.

    4. Reviewer #3 (Public review):

      Summary:

      The authors used genetic models and immunohistochemistry to identify how training in a spatial discrimination working memory task influences activity in the dentate gyrus subregion of the hippocampus. Finding that more cognitively challenging variants of the task evoked more and distinct patterns of activity, they then investigated whether newborn neurons in particular were important for learning this task and regulating the spatial activity patterns.

      Strengths:

      The focus on precise anatomical locations of activity is relatively novel and potentially important, given that little is known about how DG subregions contribute to behavior. The authors also use a task that is known to depend on this memory-related part of the brain.

      Weaknesses:

      Statistical rigor is insufficient. Many statistical results are not stated, inappropriate tests are used, and sample sizes differ across experiments (which appear to potentially underlie null results). The chemogenetic approach to inhibit adult-born neurons also does not appear to be targeting these neurons, as judged by their location in the DG.

    1. eLife Assessment

      This useful study by Palo et al proposes that FRG1 functions as a negative regulator of Nonsense-Mediated mRNA decay (NMD) by associating with the exon junction complex (EJC) and destabilizing UPF1 independently of DUX4. The authors present solid evidence to dissect the relationship between FRG1 and DUX4 in NMD. However, the evidence to support the claim that FRG1 is a component of the EJC or the NMD machinery is incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Dixit and colleagues investigate the role of FRG1 in modulating nonsense-mediated mRNA decay using human cell lines and zebrafish embryos. They present data from experiments that test the effect of normal, reduced or elevated levels of FRG1 on NMD of a luciferase-based NMD reporter and on endogenous mRNA substrates of NMD. They also carry out experiments to investigate FRG1's influence on UPF1 mRNA and protein levels, with a particular focus on the possibility that FRG1 regulates UPF1 protein levels through ubiquitin-mediated proteolysis of UPF1. The experiments described also test whether DUX4's effect on UPF1 protein levels and NMD could be mediated through FRG1. Finally, the authors also present experiments that test for physical interaction between UPF1, the spliceosome and components of the exon junction complex.

      Strengths:

      A key strength of the work is its focus on an intriguing model of NMD regulation by FRG1, which is of particular interest as FRG1 is positively regulated by DUX4, which has been previously implicated in subjecting UPF1 to proteosome-mediated degradation and thereby causing NMD inhibition. The data that shows that DUX4-mediated effect on UPF1 levels is diminished upon FRG1 depletion suggests that DUX4's regulation of NMD could be mediated by FRG1.

      Weaknesses:

      A major weakness and concern is that many of the key conclusions drawn by the authors are not supported by the data, and there are also some significant concerns with experimental design. More specific comments below describe these issues:

      (1) Multiple issues lower the confidence in the experiments testing the effect of FRG1 on NMD.

      (a) All reporter assays presented in the manuscript are based on quantification of luciferase activity, and in most cases, the effect on luciferase activity is quite small. This assay is the key experimental approach throughout the manuscript. However, no evidence is provided that the effect captured by this assay is due to enhanced degradation of the mRNA encoding the luciferase reporter, which is what is implied in the interpretation of these experiments. Crucially, there is also no control for the reporter that can account for the effects of experimental manipulations on transcriptional versus post-transcriptional effects. A control reporter lacking a 3'UTR intron is described in Barid et al, where the authors got their NMD reporter from. Due to small effects observed on luciferase activity upon FRG1 depletion, it is necessary to not only measure NMD reporter mRNA steady state levels, but it will be equally important to ascertain that the effect of FRG1 on NMD is at the level of mRNA decay and not altered transcription of NMD substrates. This can be accomplished by testing decay rates of the beta-globin reporter mRNA.

      (b) It is unusual to use luciferase enzymatic activity as a measurement of RNA decay status. Such an approach can at least be justified if the authors can test how many-fold the luciferase activity changes when NMD is inhibited using a chemical inhibitor (e.g., SMG1 inhibitor) or knockdown of a core NMD factor.

      (c) The concern about the direct effect of FRG1 on NMD is further amplified by the small effects of FRG1 knockout on steady-state levels of endogenous NMD targets (Figure 1A and B: ~20% reduction in reporter mRNA in MCF7 cells; Figure 1M, only 18 endogenous NMD targets shared between FRG1_KO and FRG1_KD).

      (d) The question about transcriptional versus post-transcriptional effects is also important in light of the authors' previous work that FRG1 can act as a transcriptional regulator.

      (2) In the experiments probing the relationship between DUX4 and FRG1 in NMD regulation, there are some inconsistencies that need to be resolved.

      (a) Figure 3 shows that the inhibition of NMD reporter activity caused by DUX4 induction is reversed by FRG1 knockdown. Although levels of FRG1 and UPF1 in DUX4 uninduced and DUX4 induced + FRG1 knockdown conditions are similar (Figure 5A), why is the reporter activity in DUX4 induced + FRG1 knockdown cells much lower than DUX4 uninduced cells in Figure 3?

      (b) In Figure 3, it is important to know the effect of FRG1 knockdown in DUX4 uninduced conditions.

      (c) On line 401, the authors claim that MG132 treatment leads to "time-dependent increase in UPF1 protein levels" in Figure 5C. However, upon proteasome inhibition, UPF1 levels significantly increase only at 8h time point, while the change at 12 and 24 hours is not significantly different from the control.

      (3) There are multiple issues with experiments investigating ubiquitination of UPF1:

      (a) Ubiquitin blots in Figure 6 are very difficult to interpret. There is no information provided either in the text or figure legends as to which bands in the blots are being compared, or about what the sizes of these bands are, as compared to UPF1. Also, the signal for Ub in most IP samples looks very similar to or even lower than the input.

      (b) Western blot images in Figure 6D appear to be adjusted for brightness/contrast to reduce background, but are done in such a way that pixel intensities are not linearly altered. This image appears to be the most affected, although some others have also similar patterns (e.g., Figure 5C).

      (4) The experiments probing physical interactions of FRG1 with UPF1, spliceosome and EJC proteins need to consider the following points:

      (a) There is no information provided in the results or methods section on whether immunoprecipitations were carried out in the absence or presence of RNases. Each RNA can be bound by a plethora of proteins that may not be functionally engaged with each other. Without RNase treatment, even such interactions will lead to co-immunoprecipitation. Thus, experiments in Figure 6 and Figure 7A-D should be repeated with and without RNase treatment.

      (b) Also, the authors claim that FRG1 is a "structural component" of EJC and NMD complexes seems to be an overinterpretation. As noted in the previous comment, these interactions could be mediated by a connecting RNA molecule.

      (c) A negative control (non-precipitating protein) is missing in Figure 7 co-IP experiments.

      (d) Polysome analysis is missing important controls. FRG1 and EIF4A3 co-sedimentation with polysomes could simply be due to their association with another large complex (e.g., spliceosome), which will also co-sediment in these gradients. This possibility can at least be tested by Western blotting for some spliceosome components across the gradient fractions. More importantly, a puromycin treatment control needs to be performed to confirm that FRG1 and EIF4A3 are indeed bound to polysomes, which are separated into ribosome subunits upon puromycin treatment. This leads to a shift of the signal for ribosomal proteins and any polysome-associated proteins to the left.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Palo et al present a novel role for FRG1 as a multifaceted regulator of nonsense-mediated mRNA decay (NMD). Through a combination of reporter assays, transcriptome-wide analyses, genetic models, protein-protein interaction studies, ubiquitination assays, and ribosome-associated complex analyses, the authors propose that FRG1 acts as a negative regulator of NMD by destabilizing UPF1 and associating with spliceosomal, EJC, and translation-related complexes. Overall, the data, while consistent with the authors' central conclusions, are undermined by several claims-particularly regarding structural roles and mechanistic exclusivity. To really make the claims presented, further experimental evidence would be required.

      Strengths:

      (1) The integration of multiple experimental systems (zebrafish and cell culture).

      (2) Attempts to go into a mechanistic understanding of the relationship between FGR1 and UPF1.

      Weaknesses:

      (1) Overstatement of FRG1 as a structural NMD component.

      Although FRG1 interacts with UPF1, eIF4A3, PRP8, and CWC22, core spliceosomal and EJC interactions (PRP8-CWC22 and eIF4A3-UPF3B) remain intact in FRG1-deficient cells. This suggests that, while FRG1 associates with these complexes, this interaction is not required for their assembly or structural stability. Without further functional or reconstitution experiments, the presented data are more consistent with an interpretation of FRG1 acting as a regulatory or accessory factor rather than a core structural component.

      (2) Causality between UPF1 depletion and NMD inhibition is not fully established.

      While reduced UPF1 levels provide a plausible explanation for decreased NMD efficiency, the manuscript does not conclusively demonstrate that UPF1 depletion drives all observed effects. Given FRG1's known roles in transcription, splicing, and RNA metabolism, alterations in transcript isoform composition and apparent NMD sensitivity may arise from mechanisms independent of UPF1 abundance. To directly link UPF1 depletion to altered NMD efficiency, rescue experiments testing whether UPF1 re-expression restores NMD activity in FRG1-overexpressing cells would be important.

      (3) Mechanism of FRG1-mediated UPF1 ubiquitination requires clarification.

      The ubiquitination assays support a role for FRG1 in promoting UPF1 degradation; however, the mechanism underlying this remains unexplored. The relationship between FRG1-UPF1 what role FRG1 plays in this is unclear (does it function as an adaptor, recruits an E3 ubiquitin ligase, or influences UPF1 ubiquitination indirectly through transcriptional or signaling pathways?).

      (4) Limited transcriptome-wide interpretation of RNA-seq data.

      Although the RNA-seq data analysis relies heavily on a small subset of "top 10" genes. Additionally, the criteria used to define NMD-sensitive isoforms are unclear. A more comprehensive transcriptome-wide summary-indicating how many NMD-sensitive isoforms are detected and how many are significantly altered-would substantially strengthen the analysis.

      (5) Clarification of NMD sensor assay interpretation.

      The logic underlying the NMD sensor assay should be explained more clearly early in the manuscript, as the inverse relationship between luciferase signal and NMD efficiency may be counterintuitive to readers unfamiliar with this reporter system. Inclusion of a schematic or brief explanatory diagram would improve accessibility.

      (6) Potential confounding effects of high MG132 concentration.

      The MG132 concentration used (50 µM) is relatively high and may induce broad cellular stress responses, including inhibition of global translation (its known that proteosome inhibition shuts down translation). Controls addressing these secondary effects would strengthen the conclusion that UPF1 stabilization specifically reflects proteasome-dependent degradation would be essential.

      (7) Interpretation of polysome co-sedimentation data.

      While the co-sedimentation of FRG1 with polysomes is intriguing, this approach does not distinguish between direct ribosomal association and co-migration with ribosome-associated complexes. This limitation should be explicitly acknowledged in the interpretation.

      (8) Limitations of PLA-based interaction evidence.

      The PLA data convincingly demonstrate close spatial proximity between FRG1 and eIF4A3; however, PLA does not provide definitive evidence of direct interaction and is known to be susceptible to artefacts. Moreover, a distance threshold of ~40 nm still allows for proteins to be in proximity without being part of the same complex. These limitations should be clearly acknowledged, and conclusions should be framed accordingly.

    4. Reviewer #3 (Public review):

      The manuscript by Palo and colleagues demonstrates identification of FRG1 as a novel regulator of nonsense-mediated mRNA decay (NMD), showing that FRG1 inversely modulates NMD efficiency by controlling UPF1 abundance. Using cell-based models and a frg1 knockout zebrafish, the authors show that FRG1 promotes UPF1 ubiquitination and proteasomal degradation, independently of DUX4. The work further positions FRG1 as a structural component of the spliceosome and exon junction complex without compromising its integrity. Overall, the manuscript provides mechanistic insight into FRG1-mediated post-transcriptional regulation and expands understanding of NMD homeostasis. The authors should address the following issues to improve the quality of their manuscript.

      (1) Figure 7A-D, appropriate positive controls for the nuclear fraction (e.g., Histone H3) and the cytoplasmic fraction (e.g., GAPDH or α-tubulin) should be included to validate the efficiency and purity of the subcellular fractionation.

      (2) To strengthen the conclusion that FRG1 broadly impacts the NMD pathway, qRT-PCR analysis of additional core NMD factors (beyond UPF1) in the frg1⁻/⁻ zebrafish at 48 hpf would be informative.

      (3) Figure labels should be standardized throughout the manuscript (e.g., consistent use of "Ex" instead of mixed terms such as "Oex") to improve clarity and readability.

      (4) The methods describing the generation of the frg1 knockout zebrafish could be expanded to include additional detail, and a schematic illustrating the CRISPR design, genotyping workflow, and validation strategy would enhance transparency and reproducibility.

      (5) As FRG1 is a well-established tumor suppressor, additional cell-based functional assays under combined FRG1 and UPF1 perturbation (e.g., proliferation, migration, or survival assays) could help determine whether FRG1 influences cancer-associated phenotypes through modulation of the NMD pathway.

      (6) Given the claim that FRG1 inversely regulates NMD efficacy via UPF1, an epistasis experiment such as UPF1 overexpression in an FRG1-overexpressing background followed by an NMD reporter assay would provide stronger functional validation of pathway hierarchy.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Dixit and colleagues investigate the role of FRG1 in modulating nonsense-mediated mRNA decay using human cell lines and zebrafish embryos. They present data from experiments that test the effect of normal, reduced or elevated levels of FRG1 on NMD of a luciferase-based NMD reporter and on endogenous mRNA substrates of NMD. They also carry out experiments to investigate FRG1's influence on UPF1 mRNA and protein levels, with a particular focus on the possibility that FRG1 regulates UPF1 protein levels through ubiquitin-mediated proteolysis of UPF1. The experiments described also test whether DUX4's effect on UPF1 protein levels and NMD could be mediated through FRG1. Finally, the authors also present experiments that test for physical interaction between UPF1, the spliceosome and components of the exon junction complex.

      Strengths:

      A key strength of the work is its focus on an intriguing model of NMD regulation by FRG1, which is of particular interest as FRG1 is positively regulated by DUX4, which has been previously implicated in subjecting UPF1 to proteosome-mediated degradation and thereby causing NMD inhibition. The data that shows that DUX4-mediated effect on UPF1 levels is diminished upon FRG1 depletion suggests that DUX4's regulation of NMD could be mediated by FRG1.

      Weaknesses:

      A major weakness and concern is that many of the key conclusions drawn by the authors are not supported by the data, and there are also some significant concerns with experimental design. More specific comments below describe these issues:

      (1) Multiple issues lower the confidence in the experiments testing the effect of FRG1 on NMD.

      (a) All reporter assays presented in the manuscript are based on quantification of luciferase activity, and in most cases, the effect on luciferase activity is quite small. This assay is the key experimental approach throughout the manuscript. However, no evidence is provided that the effect captured by this assay is due to enhanced degradation of the mRNA encoding the luciferase reporter, which is what is implied in the interpretation of these experiments. Crucially, there is also no control for the reporter that can account for the effects of experimental manipulations on transcriptional versus post-transcriptional effects. A control reporter lacking a 3'UTR intron is described in Barid et al, where the authors got their NMD reporter from. Due to small effects observed on luciferase activity upon FRG1 depletion, it is necessary to not only measure NMD reporter mRNA steady state levels, but it will be equally important to ascertain that the effect of FRG1 on NMD is at the level of mRNA decay and not altered transcription of NMD substrates. This can be accomplished by testing decay rates of the beta-globin reporter mRNA.

      We thank the reviewer for raising these points and for the careful evaluation of our experimental approach. Here we provide our response to comment (a) in three parts

      Reliance on luciferase-based reporter assays

      While luciferase-based NMD reporter assays represent an important experimental component of this study, our conclusions do not rely exclusively on this approach. The reporter-based findings are independently supported by RNA sequencing analyses of FRG1-perturbed cells, which demonstrate altered abundance of established PTC-containing NMD target transcripts. This genome-wide analysis provides an unbiased and physiologically relevant validation of FRG1 involvement in NMD regulation.

      All reporter assays presented in the manuscript are based on quantification of luciferase activity, and in most cases, the effect on luciferase activity is quite small.

      We respectfully disagree with the comment that the magnitude of the luciferase effects is low. Increased expression of FRG1, which leads to reduced UPF1 levels, results in a ~3.5-fold increase in relative luciferase activity (Fig. 1C), indicating a robust effect. Furthermore, in the in vivo zebrafish model, FRG1 knockout causes a pronounced decrease in relative luciferase activity (Fig. 1H), consistent with elevated UPF1 levels and enhanced NMD activity.

      It is also important to note that FRG1 functions as a negative regulator of UPF1; therefore, its depletion is expected to increase UPF1 levels. However, excessive elevation of UPF1 is likely constrained by additional regulatory mechanisms, which may limit the observable effects of FRG1 knockdown or knockout. In line with this, our previous study (1) demonstrated that FRG1 positively regulates multiple NMD factors while exerting an inverse regulatory effect on UPF1. This dual role suggests that FRG1 may act as a compensatory modulator of the NMD machinery, which likely explains the relatively subtle net effects observed in FRG1 knockdown/knockout conditions in vitro (Fig. 1A and 1B). This interpretation is explicitly discussed in the manuscript (Discussion, paragraph para 4).

      However, no evidence is provided that the effect captured by this assay is due to enhanced degradation of the mRNA encoding the luciferase reporter, which is what is implied in the interpretation of these experiments. Crucially, there is also no control for the reporter that can account for the effects of experimental manipulations on transcriptional versus post-transcriptional effects. A control reporter lacking a 3'UTR intron is described in Barid et al, where the authors got their NMD reporter from. Due to small effects observed on luciferase activity upon FRG1 depletion, it is necessary to not only measure NMD reporter mRNA steady state levels, but it will be equally important to ascertain that the effect of FRG1 on NMD is at the level of mRNA decay and not altered transcription of NMD substrates. This can be accomplished by testing decay rates of the beta-globin reporter mRNA.

      Thank you for your suggestion. We will test decay rates of the beta-globin reporter mRNA.

      (b) It is unusual to use luciferase enzymatic activity as a measurement of RNA decay status. Such an approach can at least be justified if the authors can test how many-fold the luciferase activity changes when NMD is inhibited using a chemical inhibitor (e.g., SMG1 inhibitor) or knockdown of a core NMD factor.

      We respectfully disagree that the use of luciferase enzymatic activity as a readout for NMD is unusual. Multiple prior studies have successfully employed identical or closely related luciferase-based/fluorescence-based reporters to quantify NMD activity (2–5). Importantly, the goal of our study was not to measure RNA decay kinetics per se, but rather to assess how altered FRG1 levels influence the functional efficiency of the NMD pathway. Given that FRG1 is a structural component of the spliceosome C complex (6) and is previously indirectly linked to NMD regulation (1,7) this approach was well-suited to address our central question.

      As suggested by the reviewer, we will also assess luciferase activity following pharmacological inhibition of NMD to further validate the reporter system's responsiveness.

      (c) The concern about the direct effect of FRG1 on NMD is further amplified by the small effects of FRG1 knockout on steady-state levels of endogenous NMD targets (Figure 1A and B: ~20% reduction in reporter mRNA in MCF7 cells; Figure 1M, only 18 endogenous NMD targets shared between FRG1_KO and FRG1_KD).

      The modest changes observed upon FRG1 loss do not preclude a direct role in NMD. As detailed in our response to comment (a) and discussed in paragraph 4 of the Discussion, limited effects on steady-state levels of endogenous NMD targets are expected given the buffering capacity of the NMD pathway and the contribution of compensatory regulatory mechanisms.

      (d) The question about transcriptional versus post-transcriptional effects is also important in light of the authors' previous work that FRG1 can act as a transcriptional regulator.

      We agree that distinguishing between transcriptional and post-transcriptional effects is important, particularly in light of our previous work demonstrating that FRG1 can function as a transcriptional regulator of multiple NMD genes (1). Consistent with this, the current manuscript shows that FRG1 influences the transcript levels of UPF1. In addition, we demonstrate that FRG1 regulates UPF1 at the protein level. We therefore conclude that FRG1 regulates UPF1 dually, at both transcriptional and post-transcriptional levels, supporting a dual role for FRG1 in the regulation of NMD.

      This conclusion is further supported by prior studies indicating post-transcriptional functions of FRG1. FRG1 is a nucleocytoplasmic shuttling protein(8), interacts with the NMD factor ROD1 (7), and has been identified as a component of the spliceosomal C complex (6). FRG1 has also been reported to associate with the hnRNPK family of proteins (8), which participate in extensive protein–protein interaction networks. Collectively, these observations are consistent with a role for FRG1 in regulating NMD components at multiple levels.

      (2) In the experiments probing the relationship between DUX4 and FRG1 in NMD regulation, there are some inconsistencies that need to be resolved.

      (a) Figure 3 shows that the inhibition of NMD reporter activity caused by DUX4 induction is reversed by FRG1 knockdown. Although levels of FRG1 and UPF1 in DUX4 uninduced and DUX4 induced + FRG1 knockdown conditions are similar (Figure 5A), why is the reporter activity in DUX4 induced + FRG1 knockdown cells much lower than DUX4 uninduced cells in Figure 3?

      We appreciate the reviewer’s comment. Figures 3 and 5A represent independent experiments in which FRG1 knockdown was achieved by transient transfection. As such, variability in transfection efficiency is expected and likely accounts for the quantitative difference. We want to highlight that compared to DUX4_induced lane (Fig. 5A, lane 2), when we knock down FRG1 on the DUX4_induced background, it shows a clear increase in the UPF1 level (Fig. 5A, lane 3). We will add one more replicate to 5 A with better FRG1_KD transfection to the experiment.

      (b) In Figure 3, it is important to know the effect of FRG1 knockdown in DUX4 uninduced conditions.

      We thank the reviewer for this thoughtful suggestion. The effect of FRG1 knockdown under DUX4-uninduced conditions is presented in Figure 1A, where FRG1 levels are reduced without altering DUX4 expression. In contrast, Figure 3 is specifically designed to assess the rescue effect—namely, how reduction of FRG1 expression under DUX4-induced conditions influences NMD efficiency. Therefore, inclusion of an FRG1 knockdown–only group in Figure 3 was not relevant to the objective of this experiment.

      (c) On line 401, the authors claim that MG132 treatment leads to "time-dependent increase in UPF1 protein levels" in Figure 5C. However, upon proteasome inhibition, UPF1 levels significantly increase only at 8h time point, while the change at 12 and 24 hours is not significantly different from the control.

      We thank the reviewer for this observation and agree that the statement of a “time-dependent increase in UPF1 protein levels” was inaccurate. A significant increase is observed only at the 8 h time point following MG132 treatment, with no significant changes at 12 h or 24 h. The text will be revised accordingly to reflect Figure 5C.

      (3) There are multiple issues with experiments investigating ubiquitination of UPF1:

      (a) Ubiquitin blots in Figure 6 are very difficult to interpret. There is no information provided either in the text or figure legends as to which bands in the blots are being compared, or about what the sizes of these bands are, as compared to UPF1. Also, the signal for Ub in most IP samples looks very similar to or even lower than the input.

      We agree that the ubiquitin blots in Figure 6 require clearer presentation. In the revised figure, we will annotate the ubiquitin immunoblots to indicate the region corresponding to UPF1 (~140 kDa), which is the relevant molecular weight for interpretation. Because UPF1 is polyubiquitinated, ubiquitinated species are expected to appear as multiple bands rather than a single discrete signal; therefore, ubiquitination was assessed across the full blot. Importantly, interpretation is based on comparisons between UPF1 immunoprecipitated samples within each panel (Fig. 6C–F), rather than between input and IP lanes. For example, in Figure 6 C UPF1 IP FRG1_KD compared to UPF1 IP FRG1_Ex, in Figure 6 D UPF1 IP FRG1_WT compared to UPF1 IP FRG1_KO, in Figure 6 E UPF1 IP FRG1_KO compared to UPF1 IP FRG1_KO+FRG1_Ex, and in Figure 6 F UPF1 IP FRG1_Ex compared to UPF1 IP FRG1_Ex+MG132 TRT.

      (b) Western blot images in Figure 6D appear to be adjusted for brightness/contrast to reduce background, but are done in such a way that pixel intensities are not linearly altered. This image appears to be the most affected, although some others have also similar patterns (e.g., Figure 5C).

      We thank the reviewer for raising this point. The appearance noted in Figure 6D was not due to non-linear alteration of pixel intensities, but rather resulted from the poor quality of the ubiquitin antibody, which required prolonged exposure times. To address this, we replaced the antibody and repeated the ubiquitin immunoblots shown in Figures 6D, 6E, and 6F.

      For Figure 5C, only uniform contrast adjustment was applied for clarity. Importantly, all adjustments were performed linearly and applied to the entire image. Raw, unprocessed images for all blots are provided in the Supplementary Information. Updated versions of Figures 5 and 6 will be included in the revised manuscript.

      (4) The experiments probing physical interactions of FRG1 with UPF1, spliceosome and EJC proteins need to consider the following points:

      (a) There is no information provided in the results or methods section on whether immunoprecipitations were carried out in the absence or presence of RNases. Each RNA can be bound by a plethora of proteins that may not be functionally engaged with each other. Without RNase treatment, even such interactions will lead to co-immunoprecipitation. Thus, experiments in Figure 6 and Figure 7A-D should be repeated with and without RNase treatment.

      We thank the reviewer for this important point. The co-immunoprecipitation experiments shown in Figures 6 and 7A–D were performed in the absence of RNase treatment; this information was inadvertently omitted and will be added to the Methods section and the relevant figure legends. To directly assess whether the observed interactions are RNA-dependent, we will repeat the key co-immunoprecipitation experiments in the presence of RNase treatment and include these results in the revised manuscript.

      (b) Also, the authors claim that FRG1 is a "structural component" of EJC and NMD complexes seems to be an overinterpretation. As noted in the previous comment, these interactions could be mediated by a connecting RNA molecule.

      We thank the reviewer for this insightful comment. As noted, previous studies have suggested that FRG1 interacts with components of the EJC and NMD machinery. Specifically, Bertram et al. (6) identified FRG1 as a component of the spliceosomal C complex via Cryo-EM structural analysis, and pull-down studies have shown direct interaction between FRG1 and ROD1, a known EJC component (7). These findings support a protein-protein interaction rather than one mediated solely by RNA. To further address the reviewer’s concern, we will perform key co-immunoprecipitation experiments in the presence of RNase treatment to distinguish RNA-dependent from RNA-independent interactions.

      (c) A negative control (non-precipitating protein) is missing in Figure 7 co-IP experiments.

      We agree that including a non-precipitating protein as a negative control is important, and we will perform the co-IP experiment incorporating this control.

      (d) Polysome analysis is missing important controls. FRG1 and EIF4A3 co-sedimentation with polysomes could simply be due to their association with another large complex (e.g., spliceosome), which will also co-sediment in these gradients. This possibility can at least be tested by Western blotting for some spliceosome components across the gradient fractions. More importantly, a puromycin treatment control needs to be performed to confirm that FRG1 and EIF4A3 are indeed bound to polysomes, which are separated into ribosome subunits upon puromycin treatment. This leads to a shift of the signal for ribosomal proteins and any polysome-associated proteins to the left.

      As recommended, we will examine the distribution of a spliceosome component across the gradient fractions to assess potential co-sedimentation. Additionally, we will perform a puromycin treatment control to confirm that FRG1 and EIF4A3 are genuinely associated with polysomes.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Palo et al present a novel role for FRG1 as a multifaceted regulator of nonsense-mediated mRNA decay (NMD). Through a combination of reporter assays, transcriptome-wide analyses, genetic models, protein-protein interaction studies, ubiquitination assays, and ribosome-associated complex analyses, the authors propose that FRG1 acts as a negative regulator of NMD by destabilizing UPF1 and associating with spliceosomal, EJC, and translation-related complexes. Overall, the data, while consistent with the authors' central conclusions, are undermined by several claims-particularly regarding structural roles and mechanistic exclusivity. To really make the claims presented, further experimental evidence would be required.

      Strengths:

      (1) The integration of multiple experimental systems (zebrafish and cell culture).

      (2) Attempts to go into a mechanistic understanding of the relationship between FGR1 and UPF1.

      Weaknesses:

      (1) Overstatement of FRG1 as a structural NMD component.

      Although FRG1 interacts with UPF1, eIF4A3, PRP8, and CWC22, core spliceosomal and EJC interactions (PRP8-CWC22 and eIF4A3-UPF3B) remain intact in FRG1-deficient cells. This suggests that, while FRG1 associates with these complexes, this interaction is not required for their assembly or structural stability. Without further functional or reconstitution experiments, the presented data are more consistent with an interpretation of FRG1 acting as a regulatory or accessory factor rather than a core structural component.

      We thank the reviewer for this clarification. We would like to emphasize that we do not claim FRG1 to be a core structural component of either the spliceosome or the EJC. Consistent with the reviewer’s interpretation, our data indicate that FRG1 deficiency does not disrupt the structural integrity of these complexes. Our intended conclusion is that FRG1 functions as a regulatory or accessory factor in NMD rather than being required for complex assembly or stability. We will carefully revise the manuscript to remove any language that could be interpreted as an overstatement. In addition, we are currently performing further experiments to better define the association of FRG1 with the EJC.

      (2) Causality between UPF1 depletion and NMD inhibition is not fully established.

      While reduced UPF1 levels provide a plausible explanation for decreased NMD efficiency, the manuscript does not conclusively demonstrate that UPF1 depletion drives all observed effects. Given FRG1's known roles in transcription, splicing, and RNA metabolism, alterations in transcript isoform composition and apparent NMD sensitivity may arise from mechanisms independent of UPF1 abundance. To directly link UPF1 depletion to altered NMD efficiency, rescue experiments testing whether UPF1 re-expression restores NMD activity in FRG1-overexpressing cells would be important.

      As suggested, to directly test causality, we will perform rescue experiments to determine whether UPF1 re-expression restores NMD activity in FRG1-overexpressing MCF7 cells.

      (3) Mechanism of FRG1-mediated UPF1 ubiquitination requires clarification.

      The ubiquitination assays support a role for FRG1 in promoting UPF1 degradation; however, the mechanism underlying this remains unexplored. The relationship between FRG1-UPF1 what role FRG1 plays in this is unclear (does it function as an adaptor, recruits an E3 ubiquitin ligase, or influences UPF1 ubiquitination indirectly through transcriptional or signaling pathways?).

      We agree with the reviewer that the precise mechanism by which FRG1 promotes UPF1 ubiquitination remains to be defined. Our ubiquitination assays support a role for FRG1 in facilitating UPF1 degradation; however, whether FRG1 functions directly as an adaptor or E3 ligase, or instead influences UPF1 stability indirectly, is currently unclear. Notably, a prior study by Geng et al. reported that DUX4 expression alters the expression of numerous genes involved in protein ubiquitination, including multiple E3 ubiquitin ligases (9), and FRG1 itself has been reported to be upregulated upon DUX4 expression in muscle cells. We will expand the Discussion to address these potential mechanisms and place our findings in the context of indirect transcriptional or signaling pathways that may regulate UPF1 proteolysis. A detailed mechanistic dissection of FRG1-mediated ubiquitination is beyond the scope of the present study.

      (4) Limited transcriptome-wide interpretation of RNA-seq data.

      Although the RNA-seq data analysis relies heavily on a small subset of "top 10" genes. Additionally, the criteria used to define NMD-sensitive isoforms are unclear. A more comprehensive transcriptome-wide summary-indicating how many NMD-sensitive isoforms are detected and how many are significantly altered-would substantially strengthen the analysis.

      We thank the reviewer for this comment and agree that the current presentation may place a disproportionate emphasis on a limited subset of genes. These genes were selected as illustrative examples from an isoform-level analysis performed using IsoformSwitchAnalyzeR (ISAR) (10); however, we acknowledge that this approach does not fully convey the transcriptome-wide scope of the analysis.

      Using quantified RNA-seq data, ISAR was employed to identify significant isoform switches and transcripts predicted to be NMD-sensitive. Isoforms were annotated using GENCODE v47, and NMD sensitivity was assigned based on the established 50-nucleotide rule, as described in the Materials and Methods. To address the reviewer’s concern, we will revise the Results section to include a transcriptome-wide summary derived from the ISAR analysis.

      (5) Clarification of NMD sensor assay interpretation.

      The logic underlying the NMD sensor assay should be explained more clearly early in the manuscript, as the inverse relationship between luciferase signal and NMD efficiency may be counterintuitive to readers unfamiliar with this reporter system. Inclusion of a schematic or brief explanatory diagram would improve accessibility.

      We agree with the reviewer and would provide a schematic as well as the experimental setup diagram to improve accessibility to the readers.

      (6) Potential confounding effects of high MG132 concentration.

      The MG132 concentration used (50 µM) is relatively high and may induce broad cellular stress responses, including inhibition of global translation (its known that proteosome inhibition shuts down translation). Controls addressing these secondary effects would strengthen the conclusion that UPF1 stabilization specifically reflects proteasome-dependent degradation would be essential.

      We acknowledge the reviewer’s concern regarding the relatively high concentration of MG132 used in this study. While proteasome inhibition can indeed induce global translation inhibition, our interpretation is based on the specific stabilization of UPF1 observed under these conditions. Since inhibition of global translation would generally reduce protein levels rather than cause selective accumulation, the observed increase in UPF1 is unlikely to result from translational effects. To address this point, we plan to repeat selected experiments using a lower MG132 concentration to further confirm that UPF1 stabilization reflects proteasome-dependent degradation.

      (7) Interpretation of polysome co-sedimentation data.

      While the co-sedimentation of FRG1 with polysomes is intriguing, this approach does not distinguish between direct ribosomal association and co-migration with ribosome-associated complexes. This limitation should be explicitly acknowledged in the interpretation.

      We acknowledge that polysome co-sedimentation alone cannot definitively distinguish between direct ribosomal binding and co-migration with ribosome-associated complexes. Importantly, our interpretation does not rely solely on this assay; when combined with co-immunoprecipitation and proximity ligation assay results, the data consistently support an association of FRG1 with the exon junction complex. We are also conducting additional experiments with appropriate controls to further validate the specificity of FRG1’s association with ribosomes and to address the possibility of nonspecific co-migration.

      (8) Limitations of PLA-based interaction evidence.

      The PLA data convincingly demonstrate close spatial proximity between FRG1 and eIF4A3; however, PLA does not provide definitive evidence of direct interaction and is known to be susceptible to artefacts. Moreover, a distance threshold of ~40 nm still allows for proteins to be in proximity without being part of the same complex. These limitations should be clearly acknowledged, and conclusions should be framed accordingly.

      We thank the reviewer for highlighting this important point. We agree that PLA indicates close spatial proximity but does not constitute definitive evidence of direct interaction and can be susceptible to artefacts. We will explicitly acknowledge this limitation in the revised manuscript. Importantly, our conclusions are not solely based on PLA data; they are supported by complementary co-immunoprecipitation and polysome co-sedimentation assays, which provide biochemical evidence consistent with an association between FRG1 and eIF4A3.

      Reviewer #3 (Public review):

      The manuscript by Palo and colleagues demonstrates identification of FRG1 as a novel regulator of nonsense-mediated mRNA decay (NMD), showing that FRG1 inversely modulates NMD efficiency by controlling UPF1 abundance. Using cell-based models and a frg1 knockout zebrafish, the authors show that FRG1 promotes UPF1 ubiquitination and proteasomal degradation, independently of DUX4. The work further positions FRG1 as a structural component of the spliceosome and exon junction complex without compromising its integrity. Overall, the manuscript provides mechanistic insight into FRG1-mediated post-transcriptional regulation and expands understanding of NMD homeostasis. The authors should address the following issues to improve the quality of their manuscript.

      (1) Figure 7A-D, appropriate positive controls for the nuclear fraction (e.g., Histone H3) and the cytoplasmic fraction (e.g., GAPDH or α-tubulin) should be included to validate the efficiency and purity of the subcellular fractionation.

      We thank the reviewer for the suggestion. We will include appropriate positive controls for the nuclear fraction (Histone H3) and the cytoplasmic fraction (GAPDH or α-tubulin) in Figure 7A–D to validate the efficiency and purity of the subcellular fractionation.

      (2) To strengthen the conclusion that FRG1 broadly impacts the NMD pathway, qRT-PCR analysis of additional core NMD factors (beyond UPF1) in the frg1⁻/⁻ zebrafish at 48 hpf would be informative.

      We appreciate the reviewer’s insightful comment. We will perform qRT-PCR analysis of additional core NMD factors in the frg1⁻/⁻ zebrafish at 48 hpf to further strengthen the conclusion that FRG1 broadly impacts the NMD pathway.

      (3) Figure labels should be standardized throughout the manuscript (e.g., consistent use of "Ex" instead of mixed terms such as "Oex") to improve clarity and readability.

      We thank the reviewer for noticing the inconsistency. We will ensure that all figure labels are standardized throughout the manuscript (e.g., using “Ex” consistently) to improve clarity and readability.

      (4) The methods describing the generation of the frg1 knockout zebrafish could be expanded to include additional detail, and a schematic illustrating the CRISPR design, genotyping workflow, and validation strategy would enhance transparency and reproducibility.

      We appreciate the reviewer’s suggestion and will expand the Methods section to provide additional detail on the generation of the frg1 knockout zebrafish. A schematic illustrating the CRISPR design, genotyping workflow, and validation strategy will also be included to enhance transparency and reproducibility.

      (5) As FRG1 is a well-established tumor suppressor, additional cell-based functional assays under combined FRG1 and UPF1 perturbation (e.g., proliferation, migration, or survival assays) could help determine whether FRG1 influences cancer-associated phenotypes through modulation of the NMD pathway.

      We thank the reviewer for this thoughtful and constructive suggestion. While FRG1 is indeed a well-established tumor suppressor, incorporating additional cell-based functional assays under combined FRG1 and UPF1 perturbation would significantly broaden the scope of the current study. The present work is focused on elucidating the molecular relationship between FRG1 and the NMD pathway. Investigation of downstream cancer-associated phenotypes represents an important and interesting direction for future studies, but is beyond the scope of the current manuscript.

      (6) Given the claim that FRG1 inversely regulates NMD efficacy via UPF1, an epistasis experiment such as UPF1 overexpression in an FRG1-overexpressing background followed by an NMD reporter assay would provide stronger functional validation of pathway hierarchy.

      We agree with the reviewer’s suggestion. To strengthen the functional validation of the proposed pathway hierarchy, we will perform an epistasis experiment by overexpressing UPF1 in an FRG1-overexpressing background and assess NMD activity using an established NMD reporter assay. The results of this experiment will be included in the revised manuscript.

      References

      (1) Palo A, Patel SA, Shubhanjali S, Dixit M. Dynamic interplay of Sp1, YY1, and DUX4 in regulating FRG1 transcription with intricate balance. Biochim Biophys Acta Mol Basis Dis. 2025 Mar;1871(3):167636.

      (2) Sato H, Singer RH. Cellular variability of nonsense-mediated mRNA decay. Nat Commun. 2021 Dec 10;12(1):7203.

      (3) Baird TD, Cheng KCC, Chen YC, Buehler E, Martin SE, Inglese J, et al. ICE1 promotes the link between splicing and nonsense-mediated mRNA decay. eLife. 2018 Mar 12;7:e33178.

      (4) Chu V, Feng Q, Lim Y, Shao S. Selective destabilization of polypeptides synthesized from NMD-targeted transcripts. Mol Biol Cell. 2021 Dec 1;32(22):ar38.

      (5) Udy DB, Bradley RK. Nonsense-mediated mRNA decay uses complementary mechanisms to suppress mRNA and protein accumulation. Life Sci Alliance. 2022 Mar;5(3):e202101217.

      (6) Bertram K, El Ayoubi L, Dybkov O, Agafonov DE, Will CL, Hartmuth K, et al. Structural Insights into the Roles of Metazoan-Specific Splicing Factors in the Human Step 1 Spliceosome. Mol Cell. 2020 Oct 1;80(1):127-139.e6.

      (7) Brazão TF, Demmers J, van IJcken W, Strouboulis J, Fornerod M, Romão L, et al. A new function of ROD1 in nonsense-mediated mRNA decay. FEBS Lett. 2012 Apr 24;586(8):1101–10.

      (8) Sun CYJ, van Koningsbruggen S, Long SW, Straasheijm K, Klooster R, Jones TI, et al. Facioscapulohumeral muscular dystrophy region gene 1 is a dynamic RNA-associated and actin-bundling protein. J Mol Biol. 2011 Aug 12;411(2):397–416.

      (9) Geng LN, Yao Z, Snider L, Fong AP, Cech JN, Young JM, et al. DUX4 activates germline genes, retroelements, and immune mediators: implications for facioscapulohumeral dystrophy. Dev Cell. 2012 Jan 17;22(1):38–51.

      (10) Vitting-Seerup K, Sandelin A. The Landscape of Isoform Switches in Human Cancers. Mol Cancer Res MCR. 2017 Sep;15(9):1206–20.

    1. eLife Assessment

      This study presents a valuable finding on maternal SETDB1 as a key chromatin repressor that shuts down the 2C gene program and enables normal mouse embryonic development. The evidence supporting the claims of the authors is solid, although the inclusion of a causality test, a mechanistic understanding of SETDB1 targeting, and phenotypic quantification would have greatly strengthened the study. The work will be of broad interest to biologists working on embryonic development, stem cells and gene regulation.

    2. Reviewer #1 (Public review):

      Summary:

      During the earliest stages of mouse development, the zygote and 2-cell (2C) embryo are totipotent, capable of generating all embryonic and extra-embryonic lineages, and they transiently express a distinctive set of "2C-stage" genes, many driven by MERVL long terminal repeat (LTR) promoters. Although activation of these transcripts is a normal feature of totipotency, they must be rapidly silenced as development proceeds to the 4-cell and 8-cell stages; failure to shut down the 2C program results in developmental arrest. This study examines the role of maternal SETDB1, a histone H3K9 methyltransferase, in suppressing the 2C transcriptional network. Using an oocyte-specific conditional knockout that removes maternal Setdb1 while leaving the paternal allele intact, the authors demonstrate that embryos lacking maternal SETDB1 arrest during cleavage, with very few progressing beyond the 8-cell stage and no morphologically normal blastocysts forming. Transcriptomic analyses reveal persistent expression of MERVL-LTR-driven transcripts and other totipotency markers, indicating a failure to terminate the totipotent state. Together, the data demonstrate that maternally deposited SETDB1 is required to silence the MERVL-driven 2C program and enable the transition from totipotency to pluripotency. More broadly, the work identifies maternal SETDB1 as a key chromatin repressor that deposits repressive H3K9 methylation to shut down the transient 2C gene network and to permit normal preimplantation development.

      Strengths:

      (1) Closes a key knowledge gap.

      The study tackles a central open question - how embryos exit the totipotent 2-cell (2C) state - and provides direct in vivo evidence that epigenetic repression is required to terminate the 2C program for development to proceed. By identifying maternal SETDB1 as the responsible factor, the work substantially advances our understanding of the maternal-to-zygotic transition and early lineage specification.

      (2) Clean genetics paired with rigorous genomics.

      An oocyte-specific Setdb1 knockout cleanly isolates a maternal-effect requirement, ensuring that early phenotypes arise from loss of maternal protein. The resulting cleavage-stage arrest is unambiguous (most embryos stall before or around the 8-cell stage). State-of-the-art single-embryo RNA-seq across stages - well-matched to low-cell-number constraints - captures genome-wide mis-expression, including persistent 2C transcripts in mutants, strongly supporting the conclusions.

      (3) Compelling molecular linkage to phenotype.

      Transcriptome data show that without maternal SETDB1, embryos fail to repress a suite of 1-cell/2C-specific genes by the 8-cell stage. The tight correlation between continued activation of the MERVL-driven totipotency network and developmental arrest provides a specific molecular explanation for the observed failure to progress.

      (4) Mechanistic insight grounded in chromatin biology.

      SETDB1, a H3K9 methyltransferase classically linked to heterochromatin and transposon repression, targets MERVL LTRs and MERVL-driven chimeric transcripts in early embryos. Bioinformatic evidence indicates that these loci normally acquire H3K9me3 during the 2C→4C transition. The data articulate a coherent mechanism: maternal SETDB1 deposits repressive H3K9me3 at 2C gene loci to shut down the totipotency network, extending observations from ESC systems to bona fide embryos.

      (5) Broad implications for development and stem-cell biology.

      By pinpointing a maternal gatekeeper of the totipotent-to-pluripotent transition, the work suggests that some cases of cleavage-stage arrest (e.g., in IVF) may reflect faulty epigenetic silencing of transposon-driven genes. It also informs stem-cell efforts to control totipotent-like states in vitro (e.g., 2C-like cells), linking epigenetic reprogramming, transposable-element regulation, and developmental potency.

      Weaknesses:

      (1) Causality not directly demonstrated.

      The link among loss of SETDB1, persistence of 2C transcripts, and developmental arrest is compelling but remains correlative. No rescue experiments test whether dampening the 2C/MERVL program restores development. Targeted interventions-e.g., knocking down key 2C drivers (such as Dux) or pharmacologically curbing MERVL-linked transcription in maternal Setdb1 mutants-would strengthen the claim that unchecked 2C activity is causal rather than a by-product of other SETDB1 functions.

      (2) Limited mechanistic resolution of SETDB1 targeting.

      The study establishes a requirement for maternal SETDB1 but does not define how it is recruited to MERVL loci. Given SETDB1's canonical cooperation with TRIM28/KAP1 and KRAB-ZNFs, upstream sequence-specific factors and/or pre-existing chromatin features likely guide targeting. Direct occupancy and mark-placement evidence (e.g., SETDB1/TRIM28 CUT&RUN or ChIP, and H3K9me3 profiling at MERVL LTRs during the 2C→4C window) would convert inferred mechanisms into demonstrated ones.

      (3) Narrow scope on MERVL; broader epigenomic consequences underexplored.

      Maternal SETDB1 may restrain additional repeat classes or genes beyond the 2C network. A systematic repeatome analysis (LINEs/SINEs/ERV subfamilies) would clarify specificity versus a general loss of heterochromatin control. Moreover, potential effects on imprinting or DNA methylation balance are not examined; perturbations there could also contribute to arrest. Bisulfite-based DNA methylation maps at imprinted loci and allele-specific expression analyses would help rule in/out these mechanisms.

      (4) Phenotype quantitation and transcriptomic breadth could be clearer.

      The developmental phenotype is described qualitatively ("very few beyond 8-cell") without precise stage-wise arrest rates or representative morphology. Tabulated counts (2C/4C/8C/blastocyst), images, and statistics would increase clarity. On the RNA-seq side, the narrative emphasizes known 2C markers; reporting novel/unannotated misregulated transcripts, as well as downregulated pathways (e.g., failure to activate normal 8-cell programs, metabolism, or early lineage markers), would present a fuller portrait of the mutant state.

    3. Reviewer #2 (Public review):

      Zeng et al. report that Setdb1-/- embryos fail to extinguish the 1- and 2-cell embryo transcriptional program and have permanent expression of MERVL transposable elements. The manuscript is technically sound and well performed, but, in my opinion, the results lack conceptual novelty.

      (1) The manuscript builds on previous observations that: 1, Setbd1 is necessary for early mouse development, with knockout embryos rarely reaching the 8-cell stage; 2, SETB1 mediates H3K9me3 deposition at transposable elements in mouse ESCs; 3, SETB1silences MERVLs to prevent 2CLC-state acquisition in mouse ESCs. The strength of the current work is the demonstration that this is not due to a general transcriptional collapse; but otherwise, the findings are not surprising. The well-known (several Nature papers of years ago) crosstalk between m6A RNA modification and H3K9me3 in preventing 2CLC generation also partly compromises the novelty of this work.

      (2) The conclusions regarding H3K9me3 deposition are inferred based on previously reported datasets, but there is no direct demonstration.

      (3) The detection of chimeric transcripts is somewhat unreliable using short-read sequencing.

    4. Author response:

      eLife Assessment 

      This study presents a valuable finding on maternal SETDB1 as a key chromatin repressor that shuts down the 2C gene program and enables normal mouse embryonic development. The evidence supporting the claims of the authors is solid, although the inclusion of a causality test, a mechanistic understanding of SETDB1 targeting, and phenotypic quantification would have greatly strengthened the study. The work will be of broad interest to biologists working on embryonic development, stem cells and gene regulation.

      Thank you for this positive evaluation of our work. Please find the point-by point responses to the Reviewer’s comments below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary: 

      During the earliest stages of mouse development, the zygote and 2-cell (2C) embryo are totipotent, capable of generating all embryonic and extra-embryonic lineages, and they transiently express a distinctive set of "2C-stage" genes, many driven by MERVL long terminal repeat (LTR) promoters. Although activation of these transcripts is a normal feature of totipotency, they must be rapidly silenced as development proceeds to the 4-cell and 8-cell stages; failure to shut down the 2C program results in developmental arrest. This study examines the role of maternal SETDB1, a histone H3K9 methyltransferase, in suppressing the 2C transcriptional network. Using an oocyte-specific conditional knockout that removes maternal Setdb1 while leaving the paternal allele intact, the authors demonstrate that embryos lacking maternal SETDB1 arrest during cleavage, with very few progressing beyond the 8-cell stage and no morphologically normal blastocysts forming. Transcriptomic analyses reveal persistent expression of MERVL-LTR-driven transcripts and other totipotency markers, indicating a failure to terminate the totipotent state. Together, the data demonstrate that maternally deposited SETDB1 is required to silence the MERVL-driven 2C program and enable the transition from totipotency to pluripotency. More broadly, the work identifies maternal SETDB1 as a key chromatin repressor that deposits repressive H3K9 methylation to shut down the transient 2C gene network and to permit normal preimplantation development. 

      Strengths: 

      (1) Closes a key knowledge gap. 

      The study tackles a central open question - how embryos exit the totipotent 2-cell (2C) state - and provides direct in vivo evidence that epigenetic repression is required to terminate the 2C program for development to proceed. By identifying maternal SETDB1 as the responsible factor, the work substantially advances our understanding of the maternal-to-zygotic transition and early lineage specification. 

      (2) Clean genetics paired with rigorous genomics. 

      An oocyte-specific Setdb1 knockout cleanly isolates a maternal-effect requirement, ensuring that early phenotypes arise from loss of maternal protein. The resulting cleavage-stage arrest is unambiguous (most embryos stall before or around the 8-cell stage). State-of-the-art single-embryo RNA-seq across stages - well-matched to low-cell-number constraints - captures genome-wide mis-expression, including persistent 2C transcripts in mutants, strongly supporting the conclusions. 

      (3) Compelling molecular linkage to phenotype. 

      Transcriptome data show that without maternal SETDB1, embryos fail to repress a suite of 1-cell/2C-specific genes by the 8-cell stage. The tight correlation between continued activation of the MERVL-driven totipotency network and developmental arrest provides a specific molecular explanation for the observed failure to progress. 

      (4) Mechanistic insight grounded in chromatin biology. 

      SETDB1, a H3K9 methyltransferase classically linked to heterochromatin and transposon repression, targets MERVL LTRs and MERVL-driven chimeric transcripts in early embryos. Bioinformatic evidence indicates that these loci normally acquire H3K9me3 during the 2C→4C transition. The data articulate a coherent mechanism: maternal SETDB1 deposits repressive H3K9me3 at 2C gene loci to shut down the totipotency network, extending observations from ESC systems to bona fide embryos. 

      (5) Broad implications for development and stem-cell biology. 

      By pinpointing a maternal gatekeeper of the totipotent-to-pluripotent transition, the work suggests that some cases of cleavage-stage arrest (e.g., in IVF) may reflect faulty epigenetic silencing of transposon-driven genes. It also informs stem-cell efforts to control totipotent-like states in vitro (e.g., 2C-like cells), linking epigenetic reprogramming, transposable-element regulation, and developmental potency.

      We thank Reviewer 1 for recognizing the strengths in our work and for the suggestions below.

      Weaknesses: 

      (1) Causality not directly demonstrated. 

      The link among loss of SETDB1, persistence of 2C transcripts, and developmental arrest is compelling but remains correlative. No rescue experiments test whether dampening the 2C/MERVL program restores development. Targeted interventions-e.g., knocking down key 2C drivers (such as Dux) or pharmacologically curbing MERVL-linked transcription in maternal Setdb1 mutants-would strengthen the claim that unchecked 2C activity is causal rather than a by-product of other SETDB1 functions.

      We agree that rescue experiments might strengthen causality. Those experiments, however, would be extremely challenging technically because the knockdowns would need to be precisely timed to follow (and not prevent) the wave of 2c-specific activation. Knocking down 2c drivers in the zygote, for example, may prevent switching on the totipotency program. In addition, while sustained MERVL expression—such as that induced by forced DUX expression—disrupts totipotency exit and embryo development (1, 2), derepression of transcription is very broad in Setdb1<sup>mat-/+</sup> embryos and knocking down individual 2C drivers may not be sufficient to rescue development or restore the exit from totipotency.

      (2) Limited mechanistic resolution of SETDB1 targeting. 

      The study establishes a requirement for maternal SETDB1 but does not define how it is recruited to MERVL loci. Given SETDB1's canonical cooperation with TRIM28/KAP1 and KRAB-ZNFs, upstream sequence-specific factors and/or pre-existing chromatin features likely guide targeting. Direct occupancy and mark-placement evidence (e.g., SETDB1/TRIM28 CUT&RUN or ChIP, and H3K9me3 profiling at MERVL LTRs during the 2C→4C window) would convert inferred mechanisms into demonstrated ones.

      We do show H3K9me3 patterns at MERVL LTRs during the early2c-late2c-2c-4c-8c-morula window from a published dataset. Please see the genome browser images in Figures 4C, 4D, 4E, 6D, 6E and Figure S6. We agree that mapping of SETDB1/TRIM28 to those locations would strengthen the mechanistic insight. However, ChIPseq or CUT&RUN of those proteins in preimplantation embryos are not technically feasible. We do provide genetic evidence for the collaboration between SETDB1 and DUXBL, a DNA-binding factor, by showing that DUXBL cannot switch off its top targets without SETDB1 (Figure 6). Future studies will characterize the molecular mechanisms underlying this (likely indirect) collaboration. We do not think that DUXBL and SETDB1 directly interact, because such interaction was not detected by DUXBL IP-MS (3).

      (3) Narrow scope on MERVL; broader epigenomic consequences underexplored. 

      Maternal SETDB1 may restrain additional repeat classes or genes beyond the 2C network. A systematic repeatome analysis (LINEs/SINEs/ERV subfamilies) would clarify specificity versus a general loss of heterochromatin control. Moreover, potential effects on imprinting or DNA methylation balance are not examined; perturbations there could also contribute to arrest. Bisulfite-based DNA methylation maps at imprinted loci and allele-specific expression analyses would help rule in/out these mechanisms.

      We did examine genes and repeat elements beyond the 2c network. We evaluated gene and TE expression changes using four-way comparisons. Please find the results regarding gene expression in Figure 1C-J, Figure S2, Figure S3, Figure S4., Table S2, Table S3, and Table S4. Please find results on TE expression in Figure S5. Table S6, Table S7, and Table S8 and in the text. We agree that DNA methylation may be altered in Setdb1<sup>mat-/+</sup> embryos. In our hands, evaluating this possibility using bisulfite sequencing requires a larger number of embryos than what we can feasibly obtain (the number of obtained mutant embryos is very small). Regarding imprinted gene expression, one cannot fully assess and interpret imprinted gene expression in preimplantation stage embryos before the maternally deposited transcripts are gone. We reported earlier that clear somatic parental-specific patterns of imprinted gene expression may only start later in development, around 8.5 dpc (4).

      (4) Phenotype quantitation and transcriptomic breadth could be clearer. 

      The developmental phenotype is described qualitatively ("very few beyond 8-cell") without precise stage-wise arrest rates or representative morphology. Tabulated counts (2C/4C/8C/blastocyst), images, and statistics would increase clarity. On the RNA-seq side, the narrative emphasizes known 2C markers; reporting novel/unannotated misregulated transcripts, as well as downregulated pathways (e.g., failure to activate normal 8-cell programs, metabolism, or early lineage markers), would present a fuller portrait of the mutant state.

      Tabulated counts are displayed in Figure 1A, and morphology is shown in Figure S1A. We do say that 4% Setdb1<sup>mat-/+</sup> embryos reached the 8-cel stage by 2.5 dpc. We recovered zero Setdb1<sup>mat-/+</sup> blastocysts at 4.5 dpc (not shown). On the RNA-seq side we do report a more global assessment of transcription of genes and TEs (please see above at point 3), including novel chimeric transcripts (Table S6). Developmental pathways are shown in Figure S3 and Figure S4. Metabolic pathways are displayed in Figure S2.

      Reviewer #2 (Public review): 

      Zeng et al. report that Setdb1-/- embryos fail to extinguish the 1- and 2-cell embryo transcriptional program and have permanent expression of MERVL transposable elements. The manuscript is technically sound and well performed, but, in my opinion, the results lack conceptual novelty.

      (1) The manuscript builds on previous observations that: 1, Setbd1 is necessary for early mouse development, with knockout embryos rarely reaching the 8-cell stage; 2, SETB1 mediates H3K9me3 deposition at transposable elements in mouse ESCs; 3, SETB1silences MERVLs to prevent 2CLC-state acquisition in mouse ESCs. The strength of the current work is the demonstration that this is not due to a general transcriptional collapse; but otherwise, the findings are not surprising. The well-known (several Nature papers of years ago) crosstalk between m6A RNA modification and H3K9me3 in preventing 2CLC generation also partly compromises the novelty of this work.

      We thank the Reviewer for appreciating the technical quality of our work. Regarding novelty, please consider that prior work in ES cells included contradictory findings (please see our Introduction). Prior embryology work (please see our Introduction) did not explain the preimplantation-stage phenotype. We highly appreciate those earlier works. Our work here answers the expectations drawn from prior studies and unequivocally shows that SETDB1 carries out the developmentally essential function of suppressing MERVLs and the 2-cell program in the mouse embryo.

      (2) The conclusions regarding H3K9me3 deposition are inferred based on previously reported datasets, but there is no direct demonstration.

      Dynamic H3K9me3 deposition is displayed at MERVL LTRs during the early2c-late2c-2c-4c-8c-morula window (Figures 4C, 4D, 4E, 6D, 6E and Figure S6) from a published work that has very high-quality data. We agree that demonstrating loss off H3K9me3 in Setdb1<sup>mat-/+</sup> embryos would confirm that the H3K9me3 histone methyltransferase function of SETDB1 (as opposed to any, yet unidentified, non-HMT specific activity of SETDB1) is responsible for shutting down MERVL LTRs. However, ChIP-seq, CUT&RUN, or similar assays are not feasible due to the rarity of Setdb1<sup>mat-/+</sup> embryos.

      (3) The detection of chimeric transcripts is somewhat unreliable using short-read sequencing.

      We used single embryo total RNA-seq and we report detecting chimeric transcripts (Table S6), which is considered more reliable than mRNA-seq for detecting chimeric transcripts, because many are not polyadenylated. We acknowledge, however, that long-read sequencing, which recently is becoming available, but which is still very expensive, is currently the most powerful method for detecting chimeric transcripts. This, however, does not affect the major conclusions or the significance of our work.

    1. eLife Assessment

      This study presents a method for expressing single-stranded DNA fluorescent aptamers in E. coli using a retron-based strategy. The evidence supporting the successful expression and folding of DNA aptamers is solid, with clear demonstration of fluorescence after extraction, though the aptamers do not function in living cells. The method represents an important technical advance that will likely become standard for DNA aptamer expression in bacterial systems.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use an interesting expression system called a retron to express single-stranded DNA aptamers. Expressing DNA as a single-stranded sequence is very hard - DNA is naturally double stranded. However, the successful demonstration by the authors of expressing Lettuce, which is a fluorogenic DNA aptamer, allowed visual demonstration of both expression and folding, but only after extraction in cells, but not in vivo (possibly because of the low fluorescence of Lettuce, or perhaps more likely, some factor in cells preventing Lettuce fluorescence). This method will likely be the main method for expressing and testing DNA aptamers of all kinds, including fluorogenic aptamers like Lettuce and the future variants / alternatives.

      Strengths:

      This has an overall simplicity which will lead to ready adoption. I am very excited about this work. People will be able to express other fluorogenic aptamers or DNA aptamers tagged with Lettuce with this system.

      Weaknesses:

      Some things could be addressed/shown in more detail, e.g. half-lives of different types of DNA aptamers and ways to extend this to mammalian cells.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript explores a DNA fluorescent light up aptamer (FLAP) with the specific goal of comparing activity in vitro to that in bacterial cells. In order to achieve expression in bacteria, the authors devise an expression strategy based on retrons and test four different constructs with the aptamer inserted at different points in the retron scaffold.

      The initial version of this manuscript made several claims about the fluorescence activity of the aptamers in cells, and the observed fluorescence signal has now been found to result from cellular auto-fluorescence. Thus, all data regarding the function of the aptamers in cells have been removed.

      Negative data are important to the field, especially when it comes to research tools that may not work as many people think that they will. Thus, there would have been an opportunity here for the authors to dig into why the aptamers don't seem to work in cells.

      In the absence of insight into the negative result, the manuscript is now essentially a method for producing aptamers in cells. If this is the main thrust, then it would be beneficial for the authors to clearly outline why this is superior to other approaches for synthesizing aptamers.

    4. Author response:

      The following is the authors’ response to the original reviews

      Comment to both reviewers:

      We are very grateful for the thoughtful and constructive comments from both reviewers. During the revision, and in direct response to these comments, we performed additional control experiments for the cellular fluorescence measurements. These new data revealed that the weak increase in green fluorescence reported in our original submission does not depend on retron-expressed Lettuce RT-DNA or the DFHBI-1T fluorophore, but instead reflects stress-induced autofluorescence of E. coli (e.g. upon inducer and antibiotic treatment).

      We also benchmarked the fluorogenic properties of Lettuce against the RNA FLAP Broccoli and found that Lettuce is ~100-fold less fluorogenic under optimal in vitro conditions. Consequently, with the currently available, in vitro- but not in vivo-optimized Lettuce variants, intracellular fluorescence cannot be reliably detected by microscopy or flow cytometry. We have therefore removed the original flow cytometry / and in-culture-fluorescence data and no longer claim detectable intracellular Lettuce fluorescence.

      In the revised manuscript, we now directly demonstrate that retron-produced Lettuce RT-DNA can be purified from cells and remains functional ex vivo with a gel-based fluorophore-binding assays. Together, these data clarify the current limitations of DNA-based FLAPs for in vivo imaging, while still establishing retrons as a viable platform for intracellular production of functional DNA aptamers.

      Reviewer #1 (Public Review):

      Summary:

      The authors use an interesting expression system called a retron to express single-stranded DNA aptamers. Expressing DNA as a single-stranded sequence is very hard - DNA is naturally double-stranded. However, the successful demonstration by the authors of expressing Lettuce, which is a fluorogenic DNA aptamer, allowed visual demonstration of both expression and folding. This method will likely be the main method for expressing and testing DNA aptamers of all kinds, including fluorogenic aptamers like Lettuce and future variants/alternatives.

      Strengths:

      This has an overall simplicity which will lead to ready adoption. I am very excited about this work. People will be able to express other fluorogenic aptamers or DNA aptamers tagged with Lettuce with this system.

      We thank the reviewer for their thoughtful assessment and appreciate their encouraging remarks.

      Weaknesses:

      Several things are not addressed/shown:

      (1) How stable are these DNA in cells? Half-life?

      We thank the reviewer for this insightful question.

      Retron RT-DNA forms a phage surveillance complex with the associated RT and effector protein[1-4]. Moreover, considering the unique ‘closed’ structure of RT-DNA[5] (with the ends of msr and msd bound either by 2’-5’ linkage and base paired region) and its noncoding function, we hypothesized that the RT-DNA must be exceptionally stable. Nevertheless, we attempted to determine half-life of the RT-DNA using qPCR for Eco2 RT-DNA. To this end, we designed an assay where we would first induce RT-DNA expression, use the induced cells to start a fresh culture without the inducers. We would then take aliquots from this fresh culture at different timepoints and determine RT-DNA abundance by qPCR.

      We induced RT-DNA expression of retron Eco2 in BL21AI cells as described in the Methods. After overnight induction, cells were washed to remove IPTG and arabinose, diluted to OD<sub>600</sub> = 0.2 into fresh LB without inducers, and grown at 37°C. At the indicated time points, aliquots corresponding to OD<sub>600</sub> = 0.1 were boiled (95°C, 5 min), and 1 µL of the lysate was used as template in 20 µL qPCR reactions (see revised Methods for details).

      Assuming RT-DNA degradation would occur by active degradation mechanisms (nuclease-mediated degradation) and dilution (cell growth and division), we determined the rate of degradation by the following equation

      where  is the degradation rate constant and the ratio is the dilution factor which takes into account dilution by cell division. OD<sub>600</sub>(t) was determined by fitting the OD<sub>600</sub> measurements by the following the equation describing logistic growth:

      Which yields the plots shown in Figure 2–figure supplement 1.

      After substituting OD<sub>600</sub>(t) by the function in equation (2), we fit the experimental data for the fold-change of the RT-DNA to equation (1). Interestingly, the best fit (red) was obtained with a  converging towards zero suggesting that the half-life of the RT-DNA is beyond the detection limit of our assay. To showcase typical half-lives of RNA, which are in the range of minutes in growing E. coli cells[6], we refitted the data using constant half-life of 15 and 30 minutes. In both cases, simulated curve deviated significantly from the experimental data further confirming that the half-life of the RT-DNA is probably orders of magnitude higher than the doubling time of E. coli under these optimal conditions. While we cannot exclude that the RT-DNA is still produced as a result of promotor leakiness, but we expect this effect to be low as the expression of RT-DNA in E. coli AI cells requires both the presence of IPGT and arabinose, which were thoroughly removed before inoculating the growth media with the starter culture. Overall, our data therefore argues for an exceptional stability of the RT-DNA in growing bacterial cells.

      We have now included this new experimental data in the supplementary information.

      (2) What concentration do they achieve in cells/copy numbers? This is important since it relates to the total fluorescence output and, if the aptamer is meant to bind a protein, it will reveal if the copy number is sufficient to stoichiometrically bind target proteins. Perhaps the gels could have standards with known amounts in order to get exact amounts of aptamer expression per cell?

      The copy number of RT-DNA can be estimated based on the qPCR experiments. We use a pET28a plasmid, which is low-copy with typical copy number 15-20 per cell[7]. We determined the abundance of RT-DNA over plasmid/RT-DNA, upon induction, to be 8-fold, thereby indicating copy number of Eco2 RT-DNA to be roughly around 100-200. Assuming an average aqueous volume of E. coli of 1 femtoliter[6], the concentration of RT-DNA is ~250-500 nM. We have added this information to the revised version of the manuscript.

      (3) Microscopic images of the fluorescent E. coli - why are these not shown (unless I missed them)? It would be good to see that cells are fluorescent rather than just showing flow sorting data.

      In the original submission, we used flow cytometry as an orthogonal method to quantify the fluorescence output of intracellularly expressed Lettuce aptamer, anticipating that it would provide high-throughput, quantitative information on a large population of cells. During the revision, additional controls revealed that the weak increase in fluorescence we had previously attributed to Lettuce expression was in fact a stress-induced autofluorescence signal that occurred independently of retron RT-DNA and DFHBI-1T. We have therefore removed these data from the manuscript and no longer claim detectable intracellular Lettuce fluorescence.

      To understand this limitation, we compared the in vitro fluorescence of Lettuce with that of the RNA FLAP Broccoli, which is commonly used for RNA live-cell imaging. Under optimal in vitro conditions, Lettuce shows ~100-fold lower fluorescence output than Broccoli (new Figure 3–figure supplement 5). Given this poor fluorogenicity and the low intracellular concentration of retron RT-DNA (now derived from the qPCR experiments), we conclude that the current Lettuce variants are below the detection threshold for in vivo imaging in our system. We now explicitly discuss this limitation and the need for further (in vivo) evolution of DNA-based FLAPs in the revised manuscript.

      (4) I would appreciate a better Figure 1 to show all the intermediate steps in the RNA processing, the subsequent beginning of the RT step, and then the final production of the ssDNA. I did not understand all the processing steps that lead to the final product, and the role of the 2'OH.

      We thank the referee for this comment. We have now made changes to Figure 1, showing the intermediate steps as well as a better illustration of the 2’-5’ linkage.

      (5) I would like a better understanding or a protocol for choosing insertion sites into MSD for other aptamers - people will need simple instructions.

      We appreciate the reviewer for bringing up this important point. We simulated the ssDNA structure using Vienna RNA fold with DNA parameters. Based on the resulting structure, we inserted Lettuce sequence in the single stranded and/or loop regions to minimise interference with the native msd fold. We have now included this information in the description of Figure 3.

      (6) Can the gels be stained with DFHBI/other dyes to see the Lettuce as has been done for fluorogenic RNAs?

      Yes. We have now included experiments where we performed in-gel staining with DFHBI-1T for both chemically synthesized Eco2-Lettuce surrogates as well as the heterologously expressed Eco2-Lettuce RT-DNA. We have added this data to the revised Figure 3 (panel C and E).

      (7) Sometimes FLAPs are called fluorogenic RNA aptamers - it might be good to mention both terms initially since some people use fluorogenic aptamer as their search term.

      We thank the referee for this useful suggestion. We have now included both terms in the introduction of the revised version.

      (8) What E coli strains are compatible with this retron system?

      Experimental and bioinformatic analysis have shown that retrons abundance varies drastically across different strains of E. coli[8-10]. For example, in an experimental investigation of 113 independent clinical isolates of E. coli, only 7 strains contained RT-DNA[8]. In our experiments, we have found that BL21AI strain is compatible with plasmid-borne Eco2. The fact that this strain has a native retron system (Eco1) allowed us to use it as internal standard. However, we were also able express Eco2 RT-DNA in conventional lab strains such as E. coli Top 10 (data not shown), indicating both ncRNA and the RT alone are sufficient for intracellular RT-DNA synthesis.

      (9) What steps would be needed to use in mammalian cells?

      We appreciate the reviewer’s thoughtful inquiry. Expression of retrons has been demonstrated in mammalian cells by Mirochnitchenko et al[11] and Lopez et al[12]. For example, Lopez et al demonstrate expression of retrons in mammalian cell lines using the Lipofectamine 3000 transfection protocol (Invitrogen) and a PiggyBac transposase system[12]. We also mention this in the discussion section of the revised manuscript. Expression of retron-encoded DNA aptamers in mammalian cells should be possible with these systems.

      (10) Is the conjugated RNA stable and does it degrade to leave just the DNA aptamer?

      We are grateful to the reviewer for their perceptive question. This usually depends on the specific retron system. For example, in case of certain retron systems such as retron Sen2, Eco4 and Eco7, the RNA is cleaved off, leaving behind just the ssDNA. In our case, with retron Eco2, the RNA remains stably bound to the ssDNA, thereby maintaining a stable hybrid RNA-DNA structure[10,13]. During the extraction of RT-DNA, the conjugated RNA is degraded during the RNase digestion step, and therefore is not visible in the gel images.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript explores a DNA fluorescent light-up aptamer (FLAP) with the specific goal of comparing activity in vitro to that in bacterial cells. In order to achieve expression in bacteria, the authors devise an expression strategy based on retrons and test four different constructs with the aptamer inserted at different points in the retron scaffold. They only observe binding for one scaffold in vitro, but achieve fluorescence enhancement for all four scaffolds in bacterial cells. These results demonstrate that aptamer performance can be very different in these two contexts.

      Strengths:

      Given the importance of FLAPs for use in cellular imaging and the fact that these are typically evolved in vitro, understanding the difference in performance between a buffer and a cellular environment is an important research question.

      The return strategy utilized by the authors is thoughtful and well-described.

      The observation that some aptamers fail to show binding in vitro but do show enhancement in cells is interesting and surprising.

      We appreciate the reviewer’s thorough assessment.

      Weaknesses:

      This study hints toward an interesting observation, but would benefit from greater depth to more fully understand this phenomenon. Particularly challenging is that FLAP performance is measured in vitro by affinity and in cells by enhancement, and these may not be directly proportional. For example, it may be that some constructs have much lower affinity but a greater enhancement and this is the explanation for the seemingly different performance.

      We thank the reviewer for this insightful comment. In response, we conducted a series of additional control experiments to better understand the apparent discrepancy between the in vitro and in vivo data. These experiments revealed that the previously reported increase in intracellular green fluorescence is independent of retron-expressed Lettuce RT-DNA and DFHBI-1T, and instead reflects stress-induced autofluorescence of E. coli upon inducer and antibiotic treatment. Our original negative controls (empty wild-type Eco2, uninduced cells in the presence of DFHBI-1T) were therefore not sufficient to rule out this effect.

      As a consequence, we have removed the earlier FACS data from the manuscript and no longer claim detectable intracellular Lettuce fluorescence. The reviewer’s comment prompted us to re-examine the fluorogenicity of our constructs in vitro. We found that the 4Lev4 construct folds poorly and produces very low signal in in-gel staining assays with DFHBI-1T. In contrast, the 8LE variant (8-nt P1 stem at position v4) shows the highest fluorescence in these in-gel assays (new Figure 3C). Nevertheless, even this construct remains 100-fold less fluorogenic than the RNA-based FLAP Broccoli (new Figure 3–figure supplement 5), and we were unable to detect its intracellular fluorescence above background (new Figure 3–figure supplement 4).

      To still directly demonstrate that retron-embedded Lettuce domains that are synthesized under intracellular conditions are functional, we modified our strategy in the revision and purified the expressed RT-DNA from E. coli, followed by in-gel staining with DFHBI-1T (new Figure 3E). Despite the challenge of obtaining sufficient amounts of ssDNA, this ex vivo approach clearly shows that the retron-produced Lettuce RT-DNA retains fluorogenic activity.

      The authors only test enhancement at one concentration of fluorophore in cells (and this experimental detail is difficult to find and would be helpful to include in the figure legend). This limits the conclusions that can be drawn from the data and limits utility for other researchers aiming to use these constructs.

      We appreciate this excellent suggestion. In the original experiments, the DFHBI-1T concentration in cells was chosen based on published conditions for live-cell imaging of the Broccoli RNA aptamer[14], which is substantially more fluorogenic than Lettuce. Motivated by the reviewer’s comment, we explored different fluorophore concentrations and additional controls to optimize the in vivo readout. These experiments showed that the weak intracellular fluorescence signal is dominated by stress-induced autofluorescence[15] (possibly due to the weaker antitoxin activity of the modified msd) and does not depend on the presence of Lettuce RT-DNA or DFHBI-1T.

      Given the combination of low Lettuce fluorogenicity and low intracellular RT-DNA levels, we concluded that varying the fluorophore concentration alone does not provide a meaningful way to deconvolute these confounding factors in cells. Instead, we shifted our focus to a more direct assessment of Lettuce activity: we now demonstrate that retron-produced Lettuce RT-DNA can be purified from E. coli and retains fluorogenic activity in an in-gel staining assay with DFHBI-1T (new Figure 3E). We believe this revised strategy provides a clearer and more quantitative characterization of the system’s capabilities and limitations than the initial in vivo fluorescence measurements.

      The FLAP that is used seems to have a relatively low fluorescence enhancement of only 2-3 fold in cells. It would be interesting to know if this is also the case in vitro. This is lower than typical FLAPs and it would be helpful for the authors to comment on what level of enhancement is needed for the FLAP to be of practical use for cellular imaging.

      In the revised manuscript, we directly address this point by comparing the in vitro fluorescence of Lettuce (DNA) and Broccoli (RNA) under optimized buffer conditions. These experiments show that Broccoli is nearly two orders of magnitude more fluorogenic than Lettuce (new Figure 3-figure supplement 5). Thus, the low enhancement observed for Lettuce in cells is consistent with its intrinsically poor fluorogenicity in vitro.

      Based on this comparison and on reported properties of RNA FLAPs such as Broccoli, we conclude that robust cellular imaging typically requires substantially higher fluorogenicity and dynamic range than currently provided by DNA-based Lettuce. In other words, under our conditions, Lettuce is close to or below the practical detection limit for in vivo imaging, whereas Broccoli performs well. We now explicitly state in the Discussion that further evolution and optimization of DNA FLAPs will be required to achieve fluorescence enhancements that are suitable for routine cellular imaging, and we position our work as a first demonstration that functional DNA aptamers can be produced in cells via retrons, while also delineating the current sensitivity limits.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Addgene accession numbers are not listed - how is this plasmid obtained?

      The sequence was obtained from Millman et al[16], and ordered as gblock from IDT. The gblock was then cloned into a pET28a vector by Gibson assembly. We have now included this in the methods section.

      Reviewer #2 (Recommendations For The Authors):

      Page 2, line 40 - FLAPS should be FLAPs

      We have corrected this typo in the revised version.

      References

      (1) Rousset, F. & Sorek, R. The evolutionary success of regulated cell death in bacterial immunity. Curr. Opin. Microbiol. 74, 102312; 10.1016/j.mib.2023.102312 (2023).

      (2) Gao, L. et al. Diverse enzymatic activities mediate antiviral immunity in prokaryotes. Science 369, 1077–1084; 10.1126/science.aba0372 (2020).

      (3) Carabias, A. et al. Retron-Eco1 assembles NAD+-hydrolyzing filaments that provide immunity against bacteriophages. Mol. Cell 84, 2185-2202.e12; 10.1016/j.molcel.2024.05.001 (2024).

      (4) Wang, Y. et al. DNA methylation activates retron Ec86 filaments for antiphage defense. Cell Rep. 43, 114857; 10.1016/j.celrep.2024.114857 (2024).

      (5) Wang, Y. et al. Cryo-EM structures of Escherichia coli Ec86 retron complexes reveal architecture and defence mechanism. Nat. Microbiol. 7, 1480–1489; 10.1038/s41564-022-01197-7 (2022).

      (6) Milo, R. & Phillips, R. Cell biology by the numbers (Garland Science Taylor & Francis Group, New York NY, 2016).

      (7) Sathiamoorthy, S. & Shin, J. A. Boundaries of the origin of replication: creation of a pET-28a-derived vector with p15A copy control allowing compatible coexistence with pET vectors. PLOS ONE 7, e47259; 10.1371/journal.pone.0047259 (2012).

      (8) Sun, J. et al. Extensive diversity of branched-RNA-linked multicopy single-stranded DNAs in clinical strains of Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 86, 7208–7212; 10.1073/pnas.86.18.7208 (1989).

      (9) Rice, S. A. & Lampson, B. C. Bacterial reverse transcriptase and msDNA. Virus Genes 11, 95–104; 10.1007/BF01728651 (1995).

      (10) Simon, A. J., Ellington, A. D. & Finkelstein, I. J. Retrons and their applications in genome engineering. Nucleic Acids Res. 47, 11007–11019; 10.1093/nar/gkz865 (2019).

      (11) Mirochnitchenko, O., Inouye, S. & Inouye, M. Production of single-stranded DNA in mammalian cells by means of a bacterial retron. J. Biol. Chem. 269, 2380–2383; 10.1016/S0021-9258(17)41956-9 (1994).

      (12) Lopez, S. C., Crawford, K. D., Lear, S. K., Bhattarai-Kline, S. & Shipman, S. L. Precise genome editing across kingdoms of life using retron-derived DNA. Nat. Chem. Biol. 18, 199–206; 10.1038/s41589-021-00927-y (2022).

      (13) Lampson, B. C. et al. Reverse transcriptase in a clinical strain of Escherichia coli: production of branched RNA-linked msDNA. Science 243, 1033–1038; 10.1126/science.2466332 (1989).

      (14) Filonov, G. S., Moon, J. D., Svensen, N. & Jaffrey, S. R. Broccoli: rapid selection of an RNA mimic of green fluorescent protein by fluorescence-based selection and directed evolution. J. Am. Chem. Soc. 136, 16299–16308; 10.1021/ja508478x (2014).

      (15) Renggli Sabine, Keck Wolfgang, Jenal Urs & Ritz Daniel. Role of Autofluorescence in Flow Cytometric Analysis of Escherichia coli Treated with Bactericidal Antibiotics. J. Bacteriol. 195, 4067–4073; 10.1128/jb.00393-13. (2013).

      (16) Millman, A. et al. Bacterial Retrons Function In Anti-Phage Defense. Cell 183, 1551-1561.e12; 10.1016/j.cell.2020.09.065 (2020).

    1. eLife Assessment

      This study offers important insights into how entorhinal and hippocampal activity support human thinking in feature spaces. It replicates hexagonal symmetry in entorhinal cortex, reports a novel three-fold symmetry in both behavior and hippocampal signals, and links these findings with a computational model. The task and analyses are sophisticated, and the results appear convincing and of broad interest to neuroscientists.

    2. Reviewer #1 (Public review):

      Summary:

      Zhang and colleagues examine neural representations underlying abstract navigation in entorhinal cortex (EC) and hippocampus (HC) using fMRI. This paper replicates a previously identified hexagonal modulation of abstract navigation vectors in abstract space in EC in a novel task involving navigating in a conceptual Greeble space. In HC, the authors identify a three-fold signal of the navigation angle. They also use a novel analysis technique (spectral analysis) to look at spatial patterns in these two areas and identify phase coupling between HC and EC. Interestingly, the three-fold pattern identified in the hippocampus explains quirks in participants' behavior where navigation performance follows a three-fold periodicity. Finally, the authors propose a EC-HPC PhaseSync Model to understand how the EC and HC construct cognitive maps. The wide array and creativity of the techniques used is impressive but because of their unique nature, the paper would benefit from more details on how some of these techniques were implemented.

    3. Reviewer #2 (Public review):

      The authors report results from behavioral data, fMRI recordings, and computer simulations during a conceptual navigation task. They report 3-fold symmetry in behavioral and simulated model performance, 3-fold symmetry in hippocampal activity, and 6-fold symmetry in entorhinal activity (all as a function of movement directions in conceptual space). The analyses seem thoroughly done, and the results and simulations are very interesting.

      [Editors' note: this version was assessed by the editors without consulting the reviewers further.]

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Zhang and colleagues examine neural representations underlying abstract navigation in entorhinal cortex (EC) and hippocampus (HC) using fMRI. This paper replicates a previously identified hexagonal modulation of abstract navigation vectors in abstract space in EC in a novel task involving navigating in a conceptual Greeble space. In HC, the authors identify a three-fold signal of the navigation angle. They also use a novel analysis technique (spectral analysis) to look at spatial patterns in these two areas and identify phase coupling between HC and EC. Interestingly, the three-fold pattern identified in the hippocampus explains quirks in participants' behavior where navigation performance follows a three-fold periodicity. Finally, the authors propose a EC-HPC PhaseSync Model to understand how the EC and HC construct cognitive maps. The wide array and creativity of the techniques used is impressive but because of their unique nature, the paper would benefit from more details on how some of these techniques were implemented.

      Comments on revisions:

      Most of my concerns were adequately addressed, and I believe the paper is greatly improved. I have two more points. I noticed that the legend for Figure 4 still refers to some components of the previous figure version, this should be updated to reflect the current version of the figure. I also think the paper would benefit from more details regarding some of the analyses.

      Specifically, the phase-amplitude coupling analysis should have a section in the methods which should be sure to clarify how the BOLD signals were reconstructed.

      (1)“…I noticed that the legend for Figure 4 still refers to some components of the previous figure version, this should be updated to reflect the current version of the figure…”.

      Thank you for pointing this out. We have revised the legend of Figure 4 by removing the significance notation “***: p < 0.001”, which referred to elements from a previous version of the figure.

      (2)“…I also think the paper would benefit from more details regarding some of the analyses. Specifically, the phase-amplitude coupling analysis should have a section in the methods which should be sure to clarify how the BOLD signals were reconstructed”.

      We agree and appreciate the reviewer’s helpful suggestion. We have added a dedicated subsection entitled “Phase–amplitude coupling” to the Materials and Methods, in which we provide a detailed description of how the EC and HPC BOLD signals were reconstructed and how the coupling analysis was implemented. Correspondingly, we refined the description of this analysis in the Results section under “Phase synchronization between the HPC and EC activity”. The revised sections have been included below for your convenience. 

      Materials and Methods: Phase–amplitude coupling

      To quantify the spatial peak relationship between EC and HPC BOLD activity, we implemented a cross-frequency amplitude–phase coupling analysis in the directional space (Canolty et al., 2006). Rather than analyzing raw BOLD signals, we reconstructed 6-fold EC activity and 3-fold HPC activity in each voxel using sinusoidal modulation weights (β<sub>sine</sub> and β<sub>cosine</sub>) estimated from the raw BOLD signals. Specifically, activity was modeled as β<sub>cosine</sub>cos(kθ) + β<sub>sine</sub>sin(kθ), where k denotes the rotational symmetry. This approach selectively captures the hypothesized spatial symmetries of neural activity (e.g., 6-fold or 3-fold periodicity) as a function of movement direction. For this coupling analysis, we used participants’ original movement directions (i.e., without applying orientation calibration). The reconstructed 6-fold EC and 3-fold HPC activity were then converted into analytic representations using the Hilbert transform, yielding the instantaneous phase of the HPC (ϕ<sub>HPC</sub>) and the amplitude envelope of the EC (A<sub>ERC</sub>). HPC phases were classified into nine bins. The composite analytic signal, defined as z = A<sub>ERC</sub>e<sup>iϕHPC</sup>, was used to compute the modulation index M (Canolty et al., 2006), defined as the absolute value of the mean of z values, quantifying the scalar coupling strength between EC amplitude and HPC phase within each bin. A surrogate dataset, a null distribution of the modulation indices (M<sup>-</sup>), was generated by spatially offsetting the EC amplitude relative to the HPC phase across all possible spatial lags. The mean of this surrogate distribution was used as the baseline reference against which the observed coupling strength was compared.

      Results: Phase synchronization between the HPC and EC activity

      To examine whether the spatial phase structure in one region could predict that in another, we tested whether the orientations of the 6-fold EC and 3-fold HPC periodic activities, estimated from odd-numbered sessions using sinusoidal modulation with rotationally symmetric parameters, were correlated across participants. A cross-participant circular correlation was conducted between the spatial phases of the two areas to quantify the spatial correspondence of their activity patterns (EC: purple dots; HPC: green dots) (Jammalamadaka & Sengupta, 2001). The analysis revealed a significant circular correlation (Fig. 4a; r = 0.42, p < 0.001), as reflected by the continuous color progression across the participants (i.e., the colored lines connecting each pair of the EC and HPC dots in Fig. 4a), suggesting that participants with smaller hippocampal phases (green, outer ring) tended to have smaller entorhinal phases (purple, inner ring), and vice versa.

      In addition to the across-participant phase correlation, we further examined the spatial alignment between the 6-fold EC and 3-fold HPC activity patterns. Given that the spatial phase of the HPC is hypothesized to depend on EC projections, particularly along the three primary axes of the hexagonal code, we examined whether the periodic activities of the EC and HPC were spatially peak-aligned. Notably, unlike previous studies that focused on temporal coherence of neural oscillations (Buzsaki, 2006; Maris et al., 2011; Friese et al., 2013), our analysis focused on periodic coupling between brain areas in the directional space. To test spatial peak alignment between EC and HPC, a cross-frequency spatial coupling analysis (adapted from the amplitude–phase coupling framework; Canolty et al., 2006) was employed to identify at which HPC phase the EC exhibited maximal amplitude modulation. If the activities of both areas were peak-aligned (i.e., no peak offset), a strong coupling at phase 0 of the HPC would be expected as shown by the one-cyclebased schema in Fig. 4b. In doing so, the instantaneous phase of the HPC and the amplitude envelope of the EC were extracted from the reconstructed activity using the Hilbert transform (see methods for details). HPC phases were classified into nine bins, and the modulation index (M), quantifying the scalar coupling strength between EC amplitude and HPC phase, was computed within each bin. As a result, significant coupling was observed in the bin centered at phase 0 of the HPC (Fig. 4c; t(32) = 2.57, p = 0.02, Bonferroni-corrected across tests; Cohen’s d = 0.45). In contrast, no significant coupling was found in other bins (p > 0.05). To rule out the possibility that the observed coupling was driven by a potential harmonic (integer multiple) relationship between the 3-fold and 6-fold periodicities, we additionally conducted control analyses using 9-fold and 12-fold EC components. However, no significant coupling was observed in these controls (Fig. 4c; p > 0.05). Together, these results confirmed selective alignments of spatial peaks between the 6fold EC and 3-fold HPC periodicity in the conceptual direction domain.

      Reviewer #2 (Public review):

      The authors report results from behavioral data, fMRI recordings, and computer simulations during a conceptual navigation task. They report 3-fold symmetry in behavioral and simulated model performance, 3-fold symmetry in hippocampal activity, and 6-fold symmetry in entorhinal activity (all as a function of movement directions in conceptual space). The analyses seem thoroughly done, and the results and simulations are very interesting.

      We thank the reviewer for the positive assessment of our work.

      We thank both reviewers again for their constructive and insightful feedback, which has substantially strengthened the manuscript.

    1. eLife Assessment

      This valuable study introduces a model to help researchers understand how multivariate processes affect observed relationships in genetic data. The authors provide a tool to estimate model parameters. Overall, the authors provide solid evidence that their tool can obtain median-unbiased estimates of the true parameters when using simulated data under the model.

    2. Reviewer #1 (Public review):

      Summary:

      The authors develop a multivariate extension of SEM models incorporating transmitted and non-transmitted polygenic scores to disentangle genetic and environmental intergenerational effects across multiple traits. Their goal is to enable unbiased estimation of cross-trait vertical transmission, genetic nurture, gene-environment covariance, and assortative mating within a single coherent framework. By formally deriving multivariate path-tracing rules and validating the model through simulation, they show that ignoring cross-trait structure can severely bias both cross- and within-trait estimates. The proposed method provides a principled tool for studying complex gene-environment interplay in family genomic data.

      Strengths:

      It has become apparent in recent years that multivariate processes play an important role in genetic effects that are studied (e.g., Border et al., 2022), and these processes can affect the interpretation of these studies. This paper develops a comprehensive framework for polygenic score studies using trio data. Their model allows for assortative mating, vertical transmission, gene-environment correlation, and genetic nurture. Their study makes it clear that within-trait and cross-trait influences are important considerations. While their exposition and simulation focus on a bivariate model, the authors point out that their approach can be easily extended to higher-dimensional applications.

      Weaknesses:

      (1) My primary concern is that the paper is very difficult to follow. Perhaps this is inevitable for a model as complicated as this one. Admittedly, I have limited experience working with SEMs, so that might be partly why I really struggled with this paper, but I ultimately still have many questions about how to interpret many aspects of the path diagram, even after spending a considerable amount of time with it. Below, I will try to point out the areas where I got confused (and some where I still am confused). If the authors choose to revise the paper, clarifying some of these points would substantially broaden the paper's accessibility and impact.

      (1a) Figure 1 contains a large number of paths and variable names, and it is not always apparent which variables correspond to which paths. For example, at a first glance, the "k + g_c" term next to the "T_m" box could arguably correspond to any of the four paths near it. Disentangling this requires finding other, more reasonable variables for the other lines and sifting through the 3 pages of tables describing the elements of the figure.

      (1b) More hand-holding, describing the different parameters in the model, would help readers who don't have experience with SEMs. For example, many parameters show up several times (e.g., delta, a, g_c, i_c, w) and describing what these parameters are and why they show up several times would help. Some of this information is found in the tables (e.g., "Note: [N]T denotes either NT or T, as both share the same matrix content"), though I don't believe it is explained what it means to "share the same matrix content."

      (1c) Relatedly, descriptions of the path tracing were very confusing to me. I was relieved to see the example on the bottom of page 10 and top of page 11, but then as I tried to follow the example, I was again confused. Because multiple paths have the same labels, I was not able to follow along which exact path from Figure 1 corresponded to the elements of the sum that made up Theta_{Tm}. Also, based on my understanding of the path-tracing rules described, some paths seemed to be missing. After a while, I think I decided that these paths were captured by the (1/2)*w term since that term didn't seem to be represented by any particular path in the figure, but I'm still not confident I'm right. In this example, rather than referring to things like "four paths through the increased genetic covariance from AM", it might be useful to identify the exact paths represented by indicating the nodes those paths go through. If there aren't space constraints, the authors might even consider adding a figure which just contains the relevant paths for the example

      (1d) The paper has many acronyms and variable names that are defined early in the paper and used throughout. Generally, I would limit acronyms wherever possible in a setting like this, where readers are not necessarily specialists. For the variables, while the definitions are technically found in the paper, it would be useful to readers if they were reminded what the variables stood for when they are referred to later, especially if that particular variable hasn't been mentioned for a while. As I read, I found myself constantly having to scroll back up to the several pages of figures and tables to remind myself of what certain variables meant. Then I would have to find where I was again. It really made a dense paper even harder to follow.

      (1e) Relatedly, on page 13, the authors make reference to a parameter eta, and I don't see it in Figure 1 or any of the tables. What is that parameter?

      (2) This point may be related to me misunderstanding the model, but if LT_p represent the actual genetic factors for the two traits for variants that are transmitted to the child, and T_p represents the PGS of for transmitted variants, shouldn't their be a unidirectional arrow from LT_p to T_p (since the genetic factor affects the PGS and not the other way around) and shouldn't there be no arrow from T_p to Y_0 (since the entire effect of the transmitted SNPs is represented by the arrow from LT_p to Y_0)? If I'm mistaken here, it would be useful to explain why these arrows are necessary.

      (3) Some explanation of how the interpretation of the coefficients differs in a univariate model versus a bivariate model would be useful. For example, in a univariate model, the delta parameter represents the "direct effect" of the PGI on the offspring's outcome (roughly corresponding to a regression of the offspring's outcome onto the offspring's PGI and each parent's PGI). Does it have the same interpretation in the bivariate case, or is it more closely related to a regression of one of the outcomes onto the PGIs for both traits?

      (4) It appears from the model that the authors are assuming away population stratification since the path coefficient between T_m and T_m is delta (the same as the path coefficient between T_m and Y_0). Similarly, I believe the effect of NT_m on Y_0 only has a genetic nurture interpretation if there is no population stratification. Some discussion of this would be valuable.

      References:

      Border, R., Athanasiadis, G., Buil, A., Schork, AJ, Cai, N., Young, AI, ... & Zaitlen, N.A. (2022). Cross-trait assortative mating is widespread and inflates genetic correlation estimates. Science , 378 (6621), 754-761.

    3. Reviewer #2 (Public review):

      (1) Summary and overall comments:

      This is an impressive and carefully executed methodological paper developing an SEM framework with substantial potential. The manuscript is generally very well written, and I particularly appreciated the pedagogical approach: the authors guide the reader step by step through a highly complex model, with detailed explanations of the structure and the use of path tracing rules. While this comes at the cost of length, I think the effort is largely justified given the technical audience and the novelty of the contribution.

      The proposed SEM aims to estimate cross-trait indirect genetic effects and assortative mating, using genotype and phenotype data from both parents and one offspring, and builds on the framework introduced by Balbona et al. While I see the potential interest of the model, it is still a bit unclear in which conditions I could use it in practice. However, this paper made a clear argument for the need for cross-traits models, which changed my mind on the topic (I would have accommodated myself with univariate models and only interpreted in the light of likely pleiotropy, but I am now excited by the potential to actually disentangle cross-traits effects).

      The paper is written in a way that makes me trust the authors' thoroughness and care, even when I do not fully understand every step of the model. I want to stress that I am probably not well-positioned to identify technical errors in the implementation. My comments should therefore be interpreted primarily from the perspective of a potential user of the method: I focus on what I understand, what I do not, and where I see (or fail to see) the practical benefits.

      For transparency, here is some context on my background. I have strong familiarity with the theoretical concepts involved (e.g., genetic nurture, gene-environment covariance, dynastic effects), and I have worked on those with PGS regressions and family-based comparison designs. My experience with SEM is limited to relatively simple models, and I have never used OpenMx. Reading this paper was therefore quite demanding for me, although still a better experience than many similarly technical papers, precisely because of the authors' clear effort to explain the model in detail. That said, keeping track of all moving parts in such a complex framework was difficult, and some components remain obscure to me.

      (2) Length, structure, and clarity:

      I do not object in principle to the length of the paper. This is specialized work, aimed at a relatively narrow audience, and the pedagogical effort is valuable. However, I think the manuscript would benefit from a clearer and earlier high-level overview of the model and its requirements. I doubt that most readers can realistically "just skim" the paper, and without an early hook clearly stating what is estimated and what data are required, some readers may disengage.

      In particular, I would suggest clarifying early on:

      • What exactly is estimated?

      For example, in the Discussion, the first two paragraphs seem to suggest slightly different sets of estimands: "estimate the effects of both within- and cross-trait AM, genetic nurture, VT, G-E covariance, and direct genetic effects." versus "model provides unbiased estimates of direct genetic effects (a and δ), VT effects (f), genetic nurture effects (ϕ and ρ), G-E covariance w and v, AM effects (μ), and other parameters when its assumptions are met." A concise and consistent summary of parameters would be helpful.

      • What data are strictly required?

      At several points, I thought that phenotypes for both parents were required, but later in the Discussion, the authors consider scenarios where parental phenotypes are unavailable. I found this confusing and would appreciate a clearer statement of what is required, what is optional, and what changes when data are missing.

      • Which parameters must be fixed by assumption, rather than estimated from the data?

      Relatedly, in the Discussion, the authors mention the possibility of adding an additional latent shared environmental factor across generations. It would help to clearly distinguish: - the baseline model, - the model actually tested in the paper, and - possible extensions.

      Making these distinctions explicit would improve accessibility.

      This connects to a broader concern I had when reading Balbona et al. (2021): at first glance, the model seemed readily applicable to commonly available data, but in practice, this was not the case. I wondered whether something similar applies here. A clear statement of what data structures realistically allow the model to be fitted would be very useful.

      I found the "Suggested approach for fitting the multivariate SEM-PGS model" in the Supplementary Information particularly helpful and interesting. I strongly encourage highlighting this more explicitly in the main manuscript. If the authors want the method to be widely used, a tutorial or at least a detailed README in the GitHub repository would greatly improve accessibility.

      Finally, while the pedagogical repetition can be helpful, there were moments where it felt counterproductive. Some concepts are reintroduced several times with slightly different terminology, which occasionally made me question whether I had misunderstood something earlier. Streamlining some explanations and moving more material to the SI could improve clarity without sacrificing rigor.

      (3) Latent genetic score (LGS) and the a parameter

      I struggled to understand the role of the latent genetic score (LGS), and I think this aspect could be explained more clearly. In particular, why is this latent genetic factor necessary? Is it possible to run the model without it?

      My initial intuition was that the LGS represents the "true" underlying genetic liability, with the PGS being a noisy proxy. Under that interpretation, I expected the i matrix to function as an attenuation factor. However, i is interpreted as assortative-mating-induced correlation, which suggests that my intuition is incorrect. Or should the parameter be interpreted as an attenuation factor?

      Relatedly, in the simulation section, the authors mention simulating both PGS and LGS, which confused me because the LGS is not a measured variable. I did not fully understand the logic behind this simulation setup.

      Finally, I was unsure whether the values simulated for parameter a in Figures 8-9 are higher than what would typically be expected given the current literature, though this uncertainty may reflect my incomplete understanding of a itself. I appreciated the Model assumptions section of the discussion, and I wonder if this should not be discussed earlier.

      (4) Vertical transmission versus genetic nurture

      I am not sure I fully understand the distinction between vertical transmission (VT) and genetic nurture as defined in this paper. From the Introduction, I initially had the impression that these concepts were used almost interchangeably, but Table 3 suggests they are distinct.

      Relatedly:

      • Why are ϕ and ρ not represented in the path diagram?

      • Are these parameters estimated in the model?

      The authors also mention that these parameters target different estimands compared to other approaches. It would be helpful to elaborate on this point. Relatedly, where would the authors expect dynastic effects to appear in this framework?

      (5) Univariate model and misspecification

      In the simulations where a univariate model is fitted to data generated under a true bivariate scenario, I have a few clarification questions.

      What is the univariate model used (e.g., Table 5)? Is it the same as the model described in Balbona et al. (2025)? Does it include an LGS?

      If the genetic correlation in the founder generation is set to zero, does this imply that all pleiotropy arises through assortative mating? If so, is this a realistic mechanism, and does it meaningfully affect the interpretation of the results?

      (6) Simulations

      Overall, I found the simulations satisfying to read; they largely test exactly the kinds of issues I would want them to test, and the rationale for these tests is clear.

      That said, I was confused by the notation Σ and did not fully understand what it represents.

      In the Discussion, the authors mention testing the misspecification of social versus genetic homogamy, but I do not recall this being explicitly described in the simulation section. They also mention this issue in the SI ("Suggested approach for fitting..."). I think it would be very helpful to include an example illustrating this form of misspecification.

      (7) Cross-trait specific limitations

      I am wondering - and I don't think this is addressed - what is the impact of the difference in the noisiness and the heritability of the traits used for this multivariate analysis?

      Using the example, the authors mention of BMI and EA, one could think that these two traits have different levels of noise (maybe BMI is self-reported and EA comes from a registry), and similarly for the GWAS of these traits, let's say one GWAS is less powered than the other ones. Does it matter? Should I select the traits I look at carefully in function of these criteria? Should I interpret the estimates differently if one GWAS is more powered than the other one?

    1. eLife Assessment

      This manuscript provides valuable insight into how genome organization changes as cells progress through the cell cycle after mitotic exit. The conclusions are supported by solid, rigorous data, and the use of sorted unsynchronized cells rather than cells treated with drugs is a particular strength. Two sharp genome remodeling events are identified at G1-S and to a lesser extent, at S-G2 transitions. A discussion on the limitations of Hi-C and a broader interpretation of results in the context of other mechanistic models would strengthen the overall rigor.

    2. Reviewer #1 (Public review):

      This work convincingly shows that, rather than gradually "evolving" throughout interphase, global chromatin architecture undergoes unexpectedly sharp remodeling at G1-S (and to a lesser extent, S-G2) transitions. By applying "standard" Hi-C analyses on carefully sorted cells, the authors provide an excellent temporal view of how global chromatin architecture is changed throughout the cell cycle. They show a surprisingly abrupt increase in compartmentation strength (particularly interactions between the "active" A compartments) at G1-S transition, which is slightly weakened at S-G2 transition. Follow-up experiments show convincingly that the compartment "maturation" does not require the DNA synthesis accompanying S phase per se, but the authors have not identified the responsible factors (work for future publications). The possible biological ramifications of these architectural changes (setting up potential replication "factories", and/or facilitating transcription-replication conflict resolution, both more pertinent for the active A compartments, which are most affected) have been well discussed in the article, but still remain speculative at this stage.

      My major criticism of this article is aimed more at the state of the field in general, rather than this specific article, but it should be discussed to give a more balanced view: what actually is a chromatin compartment? Chromosomal tracing and live tracking experiments have shown that the majority of "structures" identified from Hi-C experiments are statistical phenomena, with even "strong" interactions only being infrequent and transient. A-B compartments are "built up" from multiple very low-frequency "interactions", so ascribing causal effects for genome functions is even tougher. As a result, I have very little confidence in the results of the authors' polymer simulations and their inferred "peninsula" A compartment structures without any other supporting experimental data.

      Specific minor points:

      (1) A better explanation for how Figure 1E was generated is required, because this figure could be very misleading. Figure 1F and all other cis-decay plots (and the Hi-C maps themselves) show that the strongest interactions are always at smaller genomic separations, so why should there be more "heat" at the megabase ranges in Figure 1E?

      (2) An ultra-high-resolution Hi-C study (Harris et al., Nat Commun, 2023) identified very small A and B compartments, including distinctions between gene promoters and gene bodies, raising further questions as to what the nature of a compartment really is beyond a statistical phenomenon. It is unreasonable to expect the authors to generate maps as deep as this prior study, but how much do their conclusions change according to the resolution of their compartment calling? The authors should include a balanced discussion on the "meaning" of A/B compartments.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Choubani et al presents a technically strong analysis of A/B compartment dynamics across interphase using cell-cycle-resolved Hi-C. By combining the elegant Fucci-based staging system with in situ Hi-C, the authors achieve unusually fine temporal resolution across G1, S, and G2, particularly within the short G1 phase of mESCs. The central finding that A/B compartment strength increases abruptly at the G1/S transition, stabilizes during S phase, and subsequently weakens toward G2 challenges the prevailing view that compartmentalization strengthens monotonically throughout interphase. The authors further propose that this "compartment maturation" is triggered by S-phase entry but occurs independently of active DNA synthesis, and that it involves a consolidation and large-scale reorganization of A-compartment domains.

      Strengths:

      Overall, this is a thoughtfully executed study that will be of broad interest to the 3D genome community. The data are of high quality, and the analyses are extensive, albeit not completely novel. In particular, previous work (Nagano et al 2017 and Zhang et al 2019) has shown that compartments are re-established after mitosis and strengthened during early interphase, and single-cell Hi-C studies have reported changes in compartment association across S phase. In particular, Nagano et al show that DNA replication correlates with a build-up of compartments, similar to what is presented here, with the authors' conclusion that compartment strength peaks in early S. The idea that it weakens toward G2, rather than continuing to strengthen, appears to be novel and differs from the prevailing framing in the literature.

      Weaknesses:

      That said, several aspects of the conceptual framing and interpretation would also benefit from further clarification, and the mechanistic interpretation of the reported compartment dynamics requires more careful positioning relative to established models of genome organization. Specific concerns are outlined below:

      (1) One of the major conclusions of the study is that compartment maturation does not require ongoing DNA replication. However, the interpretation would benefit from more precise wording. Thymidine arrest still permits licensing, replisome assembly, and other S-phase-associated chromatin changes upstream of bulk DNA synthesis. Therefore, their data, as presented, demonstrate independence from DNA synthesis per se, but not necessarily from the broader replication program. Please clarify this distinction in the text and interpretations throughout the manuscript.

      (2) A major conceptual issue that is not addressed at all is the well-established anti-correlation between cohesin-mediated loop extrusion and A/B compartmentalization. Numerous studies have shown that loss of cohesin or reduced loop extrusion leads to stronger compartment signals, whereas increased cohesin residence or enhanced extrusion weakens compartmentalization. Given this framework, an obvious alternative explanation for the authors' observations is that the abrupt increase in compartment strength at G1/S, and its decline toward G2, could reflect cell-cycle-dependent modulation of cohesin activity rather than a compartment-intrinsic "maturation" program.

      The manuscript does not explicitly consider this possibility, nor does it examine loop extrusion-related features (such as loop strength, insulation, or stripe patterns) across the same cell-cycle stages. Without discussing or analyzing this widely accepted model, it is difficult to distinguish whether the reported compartment dynamics represent a novel architectural mechanism or an indirect consequence of known changes in extrusion behavior during the cell cycle. I strongly encourage the authors to analyze their data to determine if they observe anti-correlated loop changes at the same time they observe compartment changes. Ideally, the authors would remove loop extrusion during interphase using well-established cohesin degrons available in mESCs and determine if the relative differences in compartment dynamics persist.

      (3) The proposed "peninsula-like" A-domain structures are inferred from ensemble Hi-C data and polymer modeling, rather than directly observed physical conformations. That is, single-cell imaging data clearly have shown that Hi-C (especially ensemble Hi-C) cannot uniquely specify physical conformations and that different underlying structures can produce similar contact patterns. The "peninsula" language, as written, risks being interpreted as a literal structural model rather than a conceptual visualization. Instead of risking this as just another nuanced Hi-C feature in the field, the authors could strengthen the manuscript by either (i) explicitly framing the peninsula model as a heuristic description of contact redistribution rather than a definitive physical architecture, or (ii) discussing alternative structural scenarios that could give rise to similar Hi-C patterns. Clarifying this distinction would improve the rigor and help readers better understand what aspects of A-compartment consolidation are directly supported by the data versus model-based extrapolations. For example, it would be useful to clarify whether the observed increase in long-range A-A contacts reflects spatial extension of internal A regions, changes in loop extrusion dynamics, increased compartment mixing within the A state, or population-averaged heterogeneity across alleles.

      (4) The extension of the analysis to additional cell types using HiRES single-cell data is a valuable addition and supports the idea that compartment maturation is not unique to mESCs. However, the limitations of these data, in particular, the limited phase resolution, in addition to the pseudo-bulk aggregation and variable coverage, should be emphasized more clearly in the main text. Framing these results as evidence for conservation in principle, rather than definitive proof of identical dynamics across tissues, would be a more appropriate framing.

    1. eLife Assessment

      This important work demonstrates the role of physically linking the core and CTD kinase modules of TFIIH via separate domains of subunit Tfb3 in confining RNA Polymerase II Serine 5 CTD phosphorylation to promoter regions of transcribed genes in budding yeast. The main findings, resulting from analyses of viable Tfb3 mutants in which the linkage between TFIIH core and kinase modules has been severed, are supported by solid evidence from in vitro and in vivo experiments. There is an intriguing possibility that the Tfb3-mediated connection between core and kinase modules of TFIIH is an evolutionary addition to an ancestral state of physically unconnected enzymes, which could be examined more rigorously with additional evolutionary analyses.

    2. Reviewer #1 (Public review):

      Giordano et al. demonstrate that yeast cells expressing separated N- and C-terminal regions of Tfb3 are viable and grow well. Using this creative and powerful tool, the authors effectively uncouple CTD Ser5 phosphorylation at promoters and assess its impact on transcription. This strategy is complementary to previous approaches, such as Kin28 depletion or the use of CDK7 inhibitors. The results are largely consistent with earlier studies, reinforcing the importance of the Tfb3 linkage in mediating CTD Ser5 phosphorylation at promoters and subsequent transcription.

      Notably, the authors also observe effects attributable to the Tfb3 linker itself, beyond its role as a simple physical connection between the N- and C-terminal domains. These findings provide functional insight into the Tfb3 linker, which had previously been observed in structural studies but lacked clear functional relevance. Overall, I am very positive about this manuscript and offer a few minor comments below that may help to further strengthen the study.

      (1) Page 4

      PIC structures show the linker emerging from the N-terminal domain as a long alpha-helix running along the interface between the two ATPase subunits, followed by a turn and a short stretch of helix just N-terminal to a disordered region that connects to the C-terminal region (see schematic in Figure 1A).

      The linker helix was only observed in the poised PIC (Abril-Garrido et al., 2023), not in other fully-engaged PIC structures.

      (2) Page 8

      Recent structures (reviewed in (Yu et al., 2023)) show that the Kinase Module would block interactions between the Core Module and other NER factors. Therefore, TFIIH either enters into the NER complex as the free Core Module, or the Kinase Module must dissociate soon after.

      To my knowledge, this is still controversial in the NER field. I note the potential function of the kinase module is likely attributed to the N-terminal region of Tfb3 through its binding to Rad3. Because the yeast strains used in Figure 6 retain the N-terminal region of Tfb3, the UV sensitivity assay presented here is unlikely to directly address the contribution of the kinase module to NER.

      (3) Page 11

      Notably, release of the Tfb3 Linker contact also results in the long alpha-helix becoming disordered (Abril-Garrido et al., 2023), which could allow the kinase access to a far larger radius of area. This flexibility could help the kinase reach both proximal and distal repeats within the CTD, which can theoretically extend quite far from the RNApII body.

      Although the kinase module was resolved at low resolution in all PIC-Mediator structures, these structural studies consistently reveal the same overall positioning of the kinase module on Mediator, indicating that its localization is constrained rather than variable. This observation suggests that the linker region may help position the kinase module at this specific site, likely through direct interactions with the PIC or Mediator. This idea is further supported by numerous cross-links between the linker region and Mediator (Robinson et al., 2016).

    3. Reviewer #2 (Public review):

      Summary:

      This work advances our understanding of how TFIIH coordinates DNA melting and CTD phosphorylation during transcription initiation. The finding that untethered kinase activity becomes "unfocused," phosphorylating the CTD at ser5 throughout the coding sequence rather than being promoter-restricted, suggests that the TFIIH Core-Kinase linkage not only targets the kinase to promoters but also constrains its activity in a spatial and temporal manner.

      Strengths:

      The experiments presented are straightforward, and the models for coupling initiation and CTD phosphorylation and for the evolution of these linked processes are interesting and novel. The results have important implications for the regulation of initiation and CTD phosphorylation.

      Weaknesses:

      Additional data that should be easily obtainable and analysis of existing data would enable an additional test of the models presented and extract additional mechanistic insights.

    4. Reviewer #3 (Public review):

      Summary:

      Eukaryotic gene transcription requires a large assemblage of protein complexes that govern the molecular events required for RNA Polymerase II to produce mRNAs. One of these complexes, TFIIH, comprises two modules, one of which promotes DNA unwinding at promoters, while the other contains a kinase (Kin28 in yeast) that phosphorylates the repeated motif at the C-terminal domain (CTD) of the largest subunit of Pol II. Kin28 phosphorylation of Ser5 in the YSPTSPS motif of the CTD is normally highly localized at promoter regions, and marks the beginning of a cycle of phosphorylation events and accompanying protein association with the CTD during the transition from initiation to elongation.

      The two modules of TFIIH are linked by Tfb3. Tfb3 consists of two globular regions, an N-terminal domain that contacts the Core module of TFIIH and a C-terminal domain that contacts the kinase module, connected by a linker. In this paper, Giordano et al. test the role of Tfb3 as a connector between the two modules of TFIIH in yeast. They show that while no or very slow growth occurs if only the C-terminal or N-terminal region of Tfb3 is present, near normal growth is observed when the two unlinked regions are expressed. Consistent with this result, the separate domains are shown to interact with the two distinct TFIIH modules. ChIP experiments show that the Core module of TFIIH maintains its localization at gene promoters when the Tfb3 domains are separated, while localization of the kinase module and of Ser5 phosphorylation on the CTD of Pol II is disrupted. Finally, the authors examine the effect of separating the Tfb3 domains on another function of TFIIH, namely nucleotide excision repair, and find little or no effect when only the N-terminal region of Tfb3 or the two unlinked domains are present.

      Strengths:

      Experiments involving expression of Tfb3 domains in yeast are well-controlled, and the data regarding viability, interaction of the separate Tfb3 domains with TFIIH modules, genome-wide localization of the TFIIH modules and of phosphorylated Ser5 CTDs, and of effects on NER, are convincing. The experiments are consistent with current models of TFIIH structure and function and support a model in which Tfb3 tethers the kinase module of TFIIH close to initiation sites to prevent its promiscuous action on elongating Pol II.

      Weaknesses:

      (1) The work is limited in scope and does not provide any major insights into the mechanism of transcription. One indication of this limitation is that in the Discussion, published structural and functional results on transcription are used to support the interpretations of the results here more than current results inform previous models or findings.

      (2) The first described experiment, which purports to show that three kinases cannot function in place of Kin28 when tethered (by fusion) to Tfb3, is missing the crucial control of showing that Kin28 can support viability in the same context. This result also does not connect with the rest of the manuscript.

      (3) Finally, the authors present the interesting and reasonable speculation that the TFIIH complex and connecting Tfb3 found in mammals and yeast may have evolved from an earlier state in which the two TFIIH subdomains were present as unconnected, distinct enzymes. This idea is supported by a single example from the literature (T. brucei). A more thorough evolutionary analysis could have tested this idea more rigorously.

    1. eLife Assessment

      This manuscript uses adaptive-bandit simulations to describe the dynamics of the Pseudomonas-derived chephalosporinase PDC-3 β-lactamase and its mutants to better understand antibiotic resistance. The finding, that clinically observed mutations alter the flexibility of the Ω- and R2-loops, reshaping the cavity of the active site, is valuable to the field. The evidence is considered incomplete, however, with the need for analysis to demonstrate equilibrium weighting of adaptive trajectories and related measures of statistical significance.

    2. Reviewer #2 (Public review):

      Summary:

      In the manuscript entitled "Ω-Loop mutations control dynamics 2 of the active site by modulating the 3 hydrogen-bonding network in PDC-3 4 β-lactamase", Chen and coworkers provide a computational investigation of the dynamics of the enzyme Pseudomonas-derived chephalosporinase 3 (PDC3) and some mutants associated with increased antibiotic resistance. After an initial analysis of the enzyme dynamics provided by RMSD/RMSF, the author conclude that the mutations alter the local dynamics within the omega loop and the R2 loop. The authors show that the network of hydrogen bonds in disrupted in the mutants. Constant pH calculations showed that the mutations also change the pKa of the catalytic lysine 67 and pocket volume calculations showed that the mutations expand the catalytic pocket. Finally, time-independent componente analysis (tiCA) showed different profiles for the mutant enzyme as compared to the wild type.

      Strengths:

      The scope of the manuscript is definitely relevant. Antibiotic resistance is an important problem and, in particular, Pseudomonas aeruginosa resistance is associated with an increasing number of deaths. The choice of the computational methods is also something to highlight here. Although I am not familiar with Adaptive Bandit Molecular Dynamics (ABMD), the description provided in the manuscript that this simulation strategy is well suited for the problem under evaluation.

      Weaknesses:

      In the revised version, the authors addressed my concerns regarding their use of the MSM, and in my view, their conclusions are now much more robust and well-supported by the data. While it would be very interesting to see a quantitative correlation between the effects of the mutations observed in the MD data and relevant experimental findings, I understand that this may be beyond the scope of the manuscript.

    3. Reviewer #3 (Public review):

      Summary:

      This manuscript aims to explore how mutations in the PDC-3 3 β-lactamase alter its ability to bind and catalyse reactions of antibiotic compounds. The topic is interesting and the study uses MD simulations and to provide hypotheses about how the size of the binding site is altered by mutations that change the conformation and flexibility of two loops that line the binding pocket. Some greater consideration of the uncertainties and how the method choice affect the ability to compare equilibrium properties would strengthen the quantitative conclusions. While many results appear significant by eye, quantifying this and ensuring convergence would strengthen the conclusions.

      Strengths:

      The significance of the problem is clearly described the relationship to prior literature is discussed extensively.

      Comments on revised version:

      I am concerned that the authors state in the response to reviews that it is not possible to get error bars on values due to the use of the AB-MD protocol that guides the simulations to unexplored basins. Yet the authors want to compare these values between the WT and mutants. This relates to RMSD, RMSF, % H-bond and volume calculations. I don't accept that you cannot calculate an uncertainty on a time averaged property calculated across the entire simulation. In these cases you can either run repeat simulations to get multiple values on which to do statistical analysis, or you can break the simulation into blocks and check both convergence and calculate uncertainties.

      I note that the authors do provide error bars on the volumes, but the statistics given for these need closer scrutiny (I cant test this without the raw data). For example the authors have p<0.0001 for the following pair of volumes 1072 {plus minus} 158 and 1115 {plus minus} 242, or for SASA p<0.0001 is given for 2 identical numbers 155+/- 3.

      I also remain concerned about comparisons between simulations run with the AB-MD scheme. While each simulation is an equilibrium simulation run without biasing forces, new simulations are seeded to expand the conformational sampling of the system. This means that by definition the ensemble of simulations does not represent and equilibrium ensemble. For example, the frequency at which conformations are sampled would not be the same as in a single much longer equilibrium simulation. While you may be able to see trends in the differences between conditions run in this way, I still don't understand how you can compare quantitative information without some method of reweighing the ensemble. It is not clear that such a rewieghting exists for this methods, in which case I advise some more caution in the wording of the comparisons made from this data.

      At this stage I don't feel the revision has directly addressed the main comments I raised in the earlier review, although there is a stronger response to the comments of Reviewer #2.

    4. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #2 (Public review):

      Summary:

      In the manuscript entitled "Ω-Loop mutations control dynamics 2 of the active site by modulating the 3 hydrogen-bonding network in PDC-3 4 β-lactamase", Chen and coworkers provide a computational investigation of the dynamics of the enzyme Pseudomonas-derived chephalosporinase 3 (PDC3) and some mutants associated with increased antibiotic resistance. After an initial analysis of the enzyme dynamics provided by RMSD/RMSF, the author conclude that the mutations alter the local dynamics within the omega loop and the R2 loop. The authors show that the network of hydrogen bonds in disrupted in the mutants. Constant pH calculations showed that the mutations also change the pKa of the catalytic lysine 67 and pocket volume calculations showed that the mutations expand the catalytic pocket. Finally, time-independent componente analysis (tiCA) showed different profiles for the mutant enzyme as compared to the wild type.

      Strengths:

      The scope of the manuscript is definitely relevant. Antibiotic resistance is an important problem and, in particular, Pseudomonas aeruginosa resistance is associated with an increasing number of deaths. The choice of the computational methods is also something to highlight here. Although I am not familiar with Adaptive Bandit Molecular Dynamics (ABMD), the description provided in the manuscript that this simulation strategy is well suited for the problem under evaluation.

      Weaknesses:

      In the revised version, the authors addressed my concerns regarding their use of the MSM, and in my view, their conclusions are now much more robust and well-supported by the data. While it would be very interesting to see a quantitative correlation between the effects of the mutations observed in the MD data and relevant experimental findings, I understand that this may be beyond the scope of the manuscript.

      Thank you for the careful evaluation and constructive comments. Regarding the suggestion of a more quantitative correlation with experimental observables, we agree that this would be valuable, and we have noted it as an important direction for future work.

      Reviewer #3 (Public review):

      Summary:

      This manuscript aims to explore how mutations in the PDC-3 3 β-lactamase alter its ability to bind and catalyse reactions of antibiotic compounds. The topic is interesting and the study uses MD simulations and to provide hypotheses about how the size of the binding site is altered by mutations that change the conformation and flexibility of two loops that line the binding pocket. Some greater consideration of the uncertainties and how the method choice affect the ability to compare equilibrium properties would strengthen the quantitative conclusions. While many results appear significant by eye, quantifying this and ensuring convergence would strengthen the conclusions.

      Strengths:

      The significance of the problem is clearly described the relationship to prior literature is discussed extensively.

      Comments on revised version:

      I am concerned that the authors state in the response to reviews that it is not possible to get error bars on values due to the use of the AB-MD protocol that guides the simulations to unexplored basins. Yet the authors want to compare these values between the WT and mutants. This relates to RMSD, RMSF, % H-bond and volume calculations. I don't accept that you cannot calculate an uncertainty on a time averaged property calculated across the entire simulation. In these cases you can either run repeat simulations to get multiple values on which to do statistical analysis, or you can break the simulation into blocks and check both convergence and calculate uncertainties.

      We thank the reviewer for raising this point. We would like to clarify that we did not intend to state that error bars are impossible to obtain under AB-MD. In fact, we reported error bars for several quantities derived from the AB-MD trajectories (we also broke the trajectories into blocks and calculated uncertainties for RMSF in our first-round response as you suggested). However, these data are closely related to your concern about comparing quantitative information without an appropriate reweighting of the ensemble. Therefore, in the revised manuscript, we removed quantitative analyses that were calculated directly from the raw AB-MD trajectories. Instead, the quantitative comparisons are now obtained from MSM analysis. We report pocket volumes and key interaction metrics for MSM metastable states, with corresponding error bars for these MSM-based quantities (Figure 6 and its supplementary figure).

      I note that the authors do provide error bars on the volumes, but the statistics given for these need closer scrutiny (I cant test this without the raw data). For example the authors have p<0.0001 for the following pair of volumes 1072 {plus minus} 158 and 1115 {plus minus} 242, or for SASA p<0.0001 is given for 2 identical numbers 155+/- 3.

      Thank you for this comment. As noted above, we have removed the table from the manuscript, and the pocket-volume results together with their error bars are now shown in Figure 6. To address the concern raised here and to avoid making the same mistake in future analyses, we re-examined how the statistics were computed. We believe the very small p-values were caused by treating per-frame MD values as independent observations in two-sample t-tests. Because consecutive MD frames are strongly time-correlated, they do not satisfy the independence assumption, which can greatly overestimate the effective sample size and lead to artificially small p-values. For the SASA, a p < 0.0001 is reported even though both values are shown as 155 ± 3. This is due to rounding, which can hide subtle underlying differences.

      I also remain concerned about comparisons between simulations run with the AB-MD scheme. While each simulation is an equilibrium simulation run without biasing forces, new simulations are seeded to expand the conformational sampling of the system. This means that by definition the ensemble of simulations does not represent and equilibrium ensemble. For example, the frequency at which conformations are sampled would not be the same as in a single much longer equilibrium simulation. While you may be able to see trends in the differences between conditions run in this way, I still don't understand how you can compare quantitative information without some method of reweighing the ensemble. It is not clear that such a rewieghting exists for this methods, in which case I advise some more caution in the wording of the comparisons made from this data.

      At this stage I don't feel the revision has directly addressed the main comments I raised in the earlier review, although there is a stronger response to the comments of Reviewer #2.

      We thank the reviewer for reiterating this important point, and we agree with the underlying concern. Although AB-MD generates unbiased trajectories, the ensemble of simulations does not represent an equilibrium ensemble. As a result, statistics computed by simply concatenating all AB-MD trajectories should not be used for quantitative comparisons. In the original version, we acknowledge that we reported several quantitative descriptors directly from concatenated AB-MD frames, including (i) distributions of χ1 torsions, (ii) mean pocket volumes and SASA, and (iii) percentages of some key interactions. We agree that this was not appropriate given the adaptive sampling protocol. In the revised manuscript, we have removed these quantitative analyses.

      We retained RMSD and RMSF analyses, but we have revised their wording and clarified their purpose. RMSD and RMSF are used only to summarize the structural variability and residue-level mobility observed across the collected trajectory segments and to motivate the selection of structural features for MSM construction. The manuscript now states: “Because AB-MD adaptively seeds new unbiased trajectories to expand conformational sampling, RMSD and RMSF are used here to summarize the structural variability and per-residue mobility observed across the collected trajectories.”

      Regarding the reviewer’s question about reweighting, the Markov state model (MSM) provides a principled framework to obtain the stationary distribution π from the transition probability matrix T<sub>τ</sub>. The resulting π<sub>i</sup> gives the equilibrium weight of each microstate i, and the corresponding discrete free energy can be written as F<sup>i</sup>=−k<sub>B</sub>Tln(π<sub>i</sup>). PCCA then coarse-grains the microstate space into a small number of metastable states. In the revised manuscript, quantitative comparisons are therefore derived from the MSM at the level of these metastable states, rather than from unweighted counts of concatenated AB-MD frames.

      Accordingly, we have revised the sections “E219K and Y221A mutations facilitate proton transfer” and “Substitutions enlarge the active-site pocket to accommodate bulkier R1 and R2 groups of β-lactams”, and we have added new figures in Figure 6 and its figure supplement. The adjustments to the quantitative analyses do not affect our original conclusions.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This manuscript uses adaptive sampling simulations to understand the impact of mutations on the specificity of the enzyme PDC-3 β-lactamase. The authors argue that mutations in the Ω-loop can expand the active site to accommodate larger substrates.

      Strengths:

      The authors simulate an array of variants and perform numerous analyses to support their conclusions. The use of constant pH simulations to connect structural differences with likely functional outcomes is a strength.

      Weaknesses:

      I would like to have seen more error bars on quantities reported (e.g., % populations reported in the text and Table 1).

      We appreciate this point. Here, the population we analyze is intended to showcase conformational differences across variants rather than to estimate equilibrium occupancies. Although each system includes 100 trajectories, they were generated using an adaptive-bandit protocol. The protocol deliberately guides towards underexplored basins, therefore conformational heterogeneity betweentrajectories is expected by design. For example, in E219K the MSM decomposition shows that in states 1, 6, and 7 the K67(NZ)–S64(OG) distance is almost entirely > 6 Å, whereas in states 2 and 3 it is almost entirely < 3.5 Å (Figure 5—figure supplement 12). These distances suggest that the hydrogen bond fraction is approximately zero in states 1, 6, and 7, and close to one in states 2 and 3. In addition, the mean first passage time of the Markov state models suggests that the formation and disruption of this hydrogen bond occur on the microsecond timescale, which is far longer than the length of each individual trajectory (300 ns). Consequently, across the 100 replicas, some trajectories exhibit very low fractions, while others display the opposite trend. Under such bimodal, protocol-induced heterogeneity, computing an error bar across trajectories mainly visualizes the protocol’s dispersion and risks being misread as thermodynamic uncertainty, which is not central to our aim of comparing conformational differences between wild-type PDC-3 and variants. We therefore do not include the error bars. 

      Reviewer #2 (Public review):

      Summary:

      In the manuscript entitled "Ω-Loop mutations control dynamics of the active site by modulating the 3 hydrogen-bonding network in PDC-3 4 β-lactamase", Chen and coworkers provide a computational investigation of the dynamics of the enzyme Pseudomonas-derived cephalosporinase 3 (PDC3) and some mutants associated with increased antibiotic resistance. After an initial analysis of the enzyme dynamics provided by RMSD/RMSF, the author concludes that the mutations alter the local dynamics within the omega loop and the R2 loop. The authors show that the network of hydrogen bonds is disrupted in the mutants. Constant pH calculations showed that the mutations also change the pKa of the catalytic lysine 67, and pocket volume calculations showed that the mutations expand the catalytic pocket. Finally, time-independent component analysis (tiCA) showed different profiles for the mutant enzyme as compared to the wild type.

      Strengths:

      The scope of the manuscript is definitely relevant. Antibiotic resistance is an important problem, and, in particular, Pseudomonas aeruginosa resistance is associated with an increasing number of deaths. The choice of the computational methods is also something to highlight here. Although I am not familiar with Adaptive Bandit Molecular Dynamics (ABMD), the description provided in the manuscript suggests that this simulation strategy is well-suited for the problem under evaluation.

      Weaknesses:

      In the description of many of their results, the authors do not provide enough information for a deep understanding of the biochemistry/biophysics involved. Without these issues addressed, the strength of the evidence is of concern.

      We thank the reviewer for pointing out the need for deeper discussion of the biochemical and biophysical implications of our results. In our manuscript, we begin by examining basic structural metrics (e.g., RMSD and RMSF) which clearly indicate that the major conformational changes occur in the Ω-loop and the R2 loop. We have now added a paragraph to describe the importance of the Ωloop and highlighted it in the revised manuscript on lines 142-166 of page 6. This observation guided our subsequent focus on these regions, as well as on the catalytic site. Our analysis revealed notable alterations in the hydrogen bonding network—especially in interactions involving the K67-S64, K67N152, K67-G220, Y150-A292, and N287-N314 pairs. These observations led us to conclude that:

      (1) Mutations E219K and Y221A facilitate the proton transfer of catalytic residues. This is consistent with prior experimental data showing that these substitutions produce the most pronounced increase in sensitivity to cephalosporin antibiotics (lines 210-212 in page 8 of the revised manuscript). 

      (2) Substitutions enlarge the active-site pocket to accommodate bulkier R1 and R2 groups of β-lactams.This is in line with MIC measurements reported by Barnes et al. (2018), which showed that mutants with larger active-site pockets exhibit markedly greater sensitivity to cephalosporins with bulky side chains than others (lines 249-259 in pages 10).

      Furthermore, we applied Markov state models (MSMs) to explore the timescales of the transitions between these different conformational states. We believe that these methodological steps support our conclusions.

      Reviewer #3 (Public review):

      Summary:

      This manuscript aims to explore how mutations in the PDC-3 3 β-lactamase alter its ability to bind and catalyse reactions of antibiotic compounds. The topic is interesting, and the study uses MD simulations to provide hypotheses about how the size of the binding site is altered by mutations that change the conformation and flexibility of two loops that line the binding pocket. However, the study doesn't clearly describe the way the data is generated. While many results appear significant by eye, quantifying this and ensuring convergence would strengthen the conclusions.

      Strengths:

      The significance of the problem is clearly described, and the relationship to prior literature is discussed extensively.

      Weaknesses:

      The methods used to gain the results are not explained clearly, meaning it was hard to determine exactly how some data was obtained. The convergence and uncertainties in the data were not adequately quantified. The text is also a little long, which obscures the main findings.

      We thank the reviewer for the suggestion. We respectfully ask the reviewer to specify which aspects of the data-generation methods are unclear so that we can include the necessary details in the next revision. Moreover, all statistics that are reported in the manuscript are obtained from extensive analyses of 300,000 simulation frames. The Markov state models have been validated by the ITS plots and Chapman-Kolmogorov (CK) test. The two-sample t-tests were also carried out for the volume and SASA.

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 1D focus on the PDC3 catalytic site. However, the authors mentioned before that the enzyme has two domains, an alpha domain and an alpha/beta domain. The reader would benefit from a more detailed description of the enzyme, its active site, AND the location of the mutants under investigation in the figure.

      We have updated Figure 1D and marked the positions of all mutations (V211A/G, G214A/R, E219A/G/K and Y221A/H), which have now been highlighted as spheres.

      (2) Since in the journal format, the results come before the methods. It would be interesting to add a brief description of where the results came from. For example, in the first section of the results, the authors describe the flexibility of the omega loop and the R2 loop. However, the reader won't know what kind of simulation was used and for how long, for example. A sentence would add the required context for a deeper understanding here.

      At the beginning of the Results and Discussion section we now state: “To investigate how the mutations in the Ω-loop affect PDC-3 dynamics, adaptive-bandit molecular dynamics (AB-MD) simulations were carried out for each system. 100 trajectories of 300 ns each (totaling 30 μs per system) were run.”

      (3) Still in the same section, the authors don't define what change in RMSF is considered significant. For example, I can't see a relevant change in the RMSF for the omega loop between the et enzyme and the E219 mutants in Figure 2D. A more objective definition would be of benefit here.

      Our analysis reveals that while the wild-type PDC-3 and the G214A, G214R, E214G, and Y221A variants exhibit an average per-residue RMSF of around 4 Å in the Ω-loop, the V211A and V211G variants show markedly lower values (around 1.5 Å), and the E219K and Y221H variants exhibit intermediate values between 2 and 2.5 Å. In addition, the fluctuations around the binding site should be seen collectively along with the fluctuations in the R2-loop. Importantly, we urge the reviewer to focus on the MDLovofit analysis in Figure 2C, where the dynamic differences between the core and the fluctuating loops is clearly evident.  

      (4) In line 138, the authors state that "Therefore, the flexibility of these proteins is mainly caused by the fluctuations in the Ω-loops and R2-loop". This is quite a bold statement to be drawn at this point. First of all, there is no mention of it in the manuscript, but is there any domain movement? Figure 2C clearly shows that there is some mobility in omega and R2 loops. But there is no evidence shown in the manuscript that shows that "the flexibility of these proteins is mainly caused by the fluctuations in the" loops. Please consider rephrasing this sentence or adding more data, if available.

      We have revised the wording to take the reviewer’s concern into account. The sentence now states: “Therefore, flexibility of PDC-3 is predominantly localized to the Ω- and R2-loops, whereas the remainder of the structure is comparatively rigid.” To further explain to the reviewer, the β lactamase enzymes are fairly rigid structures, where no large-scale domain motions occur. Instead, the enzyme communicates structurally via cross correlation of loop dynamics ( https://doi.org/10.7554/eLife.66567 ).  

      (5) I guess, the most relevant question for the scope of the paper is not answered in this section. The authors show that the mobility of the omega- and R2-loops is altered by some mutations. Why is that? I wish I could see a figure showing where the mutations are and where the loops are. This question will come back in other sections.

      We have updated Figure 1D to mark the positions of all mutations (V211A/G, G214A/R, E219A/G/K and Y221A/H) as spheres. The Ω- and R2-loops are also highlighted. All mutations map to the Ω-loop, indicating that these substitutions directly perturb this region. Notably, K67 forms a hydrogen bond with the backbone of G220 within the Ω-loop and another with the phenolic hydroxyl of Y150. Y150, in turn, hydrogen-bonds with A292 in the R2 loop. Together, the residue interaction network (G220– K67–Y150–A292) suggest a pathway by which Ω-loop mutations propagate their effects to the R2 loop.

      (6) The authors then analyze the network of polar residues in the active site and the hydrogen bonds observed there. For the K67-N152 hydrogen bond, for example, there is a reduction in the occupancy from ~70% in the wild-type enzyme to ~30% and 40% in the mutants E219K and Y221, respectively. This finding is interesting. The question that remains is "why is that"? From the structural point of view, how does the replacement of E219 with a Lysine alter the hydrogen bond formation between K67 and N152? Is it due to direct competition? Solvent rearrangement? The reader is left without a clue in this section. Also, Figure 3B won't help the reader, since the mutated residues are not shown there. Please consider adding some information about why the authors believe that the mutations are disrupting the active site hydrogen bond network and showing it in Figure 3B.

      We appreciate the comment and have updated Figures 1D and 3B to highlight the mutation sites. The change from ~70% in the wild type to ~30–40% in the E219K and Y221T variants reported in Table 1 refers to the S64–K67 hydrogen bond. In the wild type, K67 forms an additional hydrogen bond with G220 on the Ω-loop, which helps anchor the K67 side chain in a geometry that favors the S64–K67 interaction. In the variants, the mutations reshape the Ω-loop and frequently disrupt the K67–G220 contact. The loss of this local anchor increases the conformational dispersion of K67, which is consistent with the observed reduction of the S64–K67 occupancy. Furthermore, our observation that the mutations are disrupting the active-site hydrogen-bond network is a data-driven conclusion rather than a subjective inference. Across ten systems, our AB-MD simulations provided 30 µs of sampling per system. Saving one frame every nanosecond yielded 30,000 conformations per system and 300,000 in total. All hydrogen-bond and salt-bridge statistics were computed over this full ensemble. Thus, the conclusion that the mutations disrupt the active-site hydrogen-bond network follows directly from these ensemble statistics. 

      (7) The pKa calculations and the pocket volume calculations show that the mutations expand the volume of the catalytic site and alter the microenvironment. Is there any change in the solvation associated with these changes? If the volume expands and the environment becomes more acidic, are there more water molecules in the mutants as compared to the wt enzyme? If so, can changes in solvation be associated with the changes in the hydrogen bond network? Would a simulation in the presence of a substrate be meaningful here? ( I guess it would!).

      Regarding solvation, we observe a modest increase in transient water occupancy associated with the increase in volume of the pocket. The conserved deacylation water molecule is the most important and is always present throughout the simulation. Additional waters enter and leave the pocket but do not form persistent interactions that measurably perturb the hydrogen-bond network of the Ω- and R2-loops. We agree that simulations with a bound substrate would be informative. However, our study focuses on how Ω-loop mutations modulate the active site of apo PDC-3 and its variants. Within this scope, we find: (i) Amino acid substitutions change the flexibility of Ω-loops and R2-loops; (ii) E219K and Y221A mutations facilitate the proton transfer; (iii) Substitutions enlarge the active-site pocket to accommodate bulkier R1 and R2 groups of β-lactams.

      (8) I have some concerns regarding the Markov State Modeling as shown here. After a time-independent component analysis, the authors show the projections on the components, which is different between wild wild-type enzyme and the mutants, and draw some conclusions from these changes. For example, the authors state that "From the metastable state results, we observe that E219K adopts a highly stable conformation in which all the tridentate hydrogen-bonding interactions (K67(NZ)-S64(OG), K67(NZ)N152(OD1) and K67(NZ)-G220(O) mentioned above are broken". This is conclusion is very difficult to draw from Figure 5 alone. Unless the macrostates observed in the MSM can be shown (their structures) and could confirm the broken interactions, I really don't believe that the reader can come to the same conclusion as drawn by the authors here. I would recommend the authors to map the macrostates back to the coordinates and show them (what structure corresponds to what macrostate). After showing that, it makes sense to discuss what macrostate is being favored by what mutation. Taking conclusions from tiCA projections only is not recommended. I very strongly suggest that the authors revisit this entire section, adding more context so that the reader can draw conclusions from the data that is shown.

      We appreciate the reviewer’s concern. In the Markov state modeling section, our objective is to quantify the timescales (via mean first passage times) associated with the formation and disruption of the critical hydrogen bonds (K67(NZ)-S64(OG), K67(NZ)-N152(OD1), K67(NZ)-G220(O), Y150(N)A292(O), N287(ND2)-N314(OD1)) mentioned above. Representative structures illustrating these interactions are shown in Figures 3B and 4A. We agree that the main Figure 5 alone does not convey structural information. Accordingly, we provide Figure 5—figure supplements 12–16. Together, Figure 5B and Figure 5—figure supplements 12–16 map structures to metastable states, whereas Figures 3B and 4A supply atomistic detail of the interactions. Author response image 1 presents selected subplots from Figure 5— figure supplements 12–14. Together with the free-energy landscape in Figure 5A, these data indicate that E219K adopts a highly stable conformation in which all three K67-centered hydrogen bonds (K67(NZ)–S64(OG), K67(NZ)–N152(OD1), and K67(NZ)–G220(O)) are broken.

      Author response image 1.

      TICA plot illustrates the distribution of E219K with the colour indicating the K67(NZ)-S64(OG), K67(NZ)-N152(OD1) and K67(NZ)-G220(O) distance.

      (9) As a very minor issue, there are a few typos in the manuscript text. The authors might want to take some time to revisit their entire text. Examples in lines 70, 197, etc.

      Thank you for your comment. We have corrected these typos.

      Reviewer #3 (Recommendations for the authors):

      This manuscript aims to explore how mutations in the PDC-3 3 β-lactamase alter its ability to bind and catalyse reactions of antibiotic compounds. The topic is interesting, and the study uses MD simulations to provide hypotheses about how the size of the binding site is altered by mutations that change the conformation and flexibility of two loops that line the binding pocket.

      However, the study doesn't clearly describe the way the data is generated and potentially lacks statistical rigour, which makes it uncertain if the key results are significant. As such, it is difficult to judge if the conclusions made are supported by data.

      All necessary data-acquisition methods are described in the Methods section. The Markov state models have been validated by the ITS plot and the Chapman-Kolmogorov (CK) test (Figure 5—figure supplement 2–11) . The two-sample t-tests were also carried out for the volume and SASA (Table 2).

      The results section jumps straight to reporting RMSD and RMSF values; however, it is not clear what simulations are used to generate this information. Indeed, the main text does not mention the simulations themselves at all. The methods section mentions that 10 independent MD simulations were set up for each system, but no information is given as to how long these were run or the equilibration protocol used. Then it says that AB-MD simulations were run, but it is not clear what starting coordinates were used for this or how the 10 replicates were fed into these simulations. Most importantly, are the RMSD and RMSF calculations and later distance distribution information derived from the equilibrium MD runs or from the AB-MD simulations?

      Thank you for pointing this out. We have added “To investigate how the mutations in the Ω-loop affect PDC-3 dynamics, adaptive-bandit molecular dynamics (AB-MD) simulations were carried out for each system. 100 trajectories of 300 ns each (totaling 30 μs per system) were run.” to the Results and Discussion section. We didn’t run 10 independent MD simulations per system. We regret the typo in the Methods section that confused the reviewer. The sentence should have read – ‘All-atom MD simulations of wild-type PDC-3 and its variants were performed.’ Each system was equilibrated for 5 ns at 1 atmospheric pressure using Berendsen barostat. AB-MD simulations were initiated from these equilibrated structures. All analyses, apart from CpHMD, are based on the AB-MD trajectories.

      If these are taken from the equilibrium simulations, then it is critical that the reproducibility and statistical significance of the simulations is established. This can be done by calculating the RMSD and RMSF values independently for each replicate and determining the error bars. From this, the significance of differences between WT and mutant simulations can be determined. Without this, I have no data to judge if the main conclusions are supported or not. If these are derived from the AB-MD simulations, then I want to know how the independent simulations were combined and reweighted to generate overall RMSD, RMSF, and distance distributions. Unless I misunderstand the approach, the individual simulations no longer sample all regions of conformational space the same relative amount you would see in a standard MD simulation - specific conformational regions are intentionally run more to enhance sampling, then the overall conformational distributions cannot be obtained from these simulations without some form of reweighting scheme. But no such scheme is described. In addition, convergence of the data is required to ensure that the RMSD, RMSF, and distances have reached stable values. It is possible that I am misunderstanding the approach here. But in that case, I hope the authors can clarify the method and provide a means of ensuring that the data presented is converged. Many of the differences are clear by eye, but it is important to know they are not random differences between simulations and rather reflect differences between them.

      Thank you for raising this important point. In our AB-MD workflow, the adaptive bandit is used only for starting-structure selection (adaptive seeding). After each epoch, it chooses new starting snapshots from previously sampled conformations and launches the next runs. Each trajectory itself is standard, unbiased MD with no biasing potentials and no modification of the Hamiltonian. In other words, AB decides where we start, but does not alter the physics or sampling dynamics within an individual trajectory. In addition, our goal in this work is to compare variants under the same adaptive-bandit (AB) protocol, rather than to estimate equilibrium (Boltzmann) populations. Hence, we did not apply equilibrium reweighting to RMSD, RMSF, or distance distributions. However, MSM section provides reweighted reference results based on the MSM stationary distribution.

      In the response to reviews, the authors state that the "RMSF is a statistical quantity derived from averaging the time series of atomic displacements, resulting in a fixed value without an inherent error bar." But normally we would run multiple replicates and get an error bar from the different values in each. To dismiss the request for uncertainties and error bars seems to miss the point. I strongly agree with the prior reviewer that comparisons between RMSF or other values should be accompanied by uncertainties and estimates of statistical significance.

      Regarding the reviewers’ suggestion to present the data as a bar graph with error bars, we would like to note that RMSF is calculated as the time average of the fluctuations of each residue’s Cα atom over the entire simulation. As such, RMSF is a statistical quantity derived from averaging the time series of atomic displacements, resulting in a fixed value without an inherent error bar. We believe that our current presentation clearly and accurately reflects the local flexibility differences among the variants. Nearly all published studies report RMSF in this way, as indicated by the following examples:

      Figure 3a in DOI: https://doi.org/10.1021/jacsau.2c00077

      Figure 2 in DOI: https://doi.org/10.1021/acs.jcim.4c00089

      Supplementary Fig. 1, 2, 5, 9, 12, 20, 22, 24, and 26 in DOI: https://doi.org/10.1038/s41467-022-293313

      However, in response to the reviewers’ strong request, we present RMSF plots with error bars in our response letter. 

      Author response image 2.

      The root-mean-square fluctuation (RMSF) profiles of wild-type PDC-3 and its variants. Blue lines show the mean RMSF across 100 independent MD trajectories for each system; red translucent bands denote the standard deviation across trajectories. The Ω-loop (residues G183 to S226) is highlighted in yellow, and the R2-loop (residues L280 to Q310) is highlighted in blue.

      It was good to see that convergence of the constant-pH simulations was shown. While it can be challenging to get absolute pH values from the implicit solvent-based simulations, the differences between the systems are large and the trends appear significant. I was not clear how the starting coordinates were chosen for these simulations. Is the end point of the classical simulations, or is a representative snapshot chosen somehow?

      To ensure comparison, all systems used the X-ray crystal structure (PDB ID: 4HEF) with T79A substitution as the initial structure. The E219K and Y221A mutants were generated in silico using the ICM mutagenesis module. We have added the clarification in Methods section: “The starting structures were identical to those used for AB-MD.”

      Significant figures: Throughout the text and tables, the authors present data with more figures than are significant. 1071.81+-157.55 should be reported as 1100 +/ 160 or 1070 =- 160 . See the eLife guidelines for advice on this.

      Thank you for your suggestion. We have amended these now. 

      The manuscript is very long for the results presented, and I feel that a clearer story would come across if the authors shortened the text so that the main conclusions and results were not lost.

      We appreciate the suggestion. We examined the twenty most recent research articles published in eLife and found that they are either longer than or comparable in length to our manuscript.

    1. eLife Assessment

      This study presented valuable findings regarding the basic molecular pathways leading to the cystogenesis of Autosomal Dominant Polycystic Kidney Disease, suggesting BICC1 functions as both a minor causative gene for PKD and a modifier of PKD severity. Solid data were supplied to demonstrate the functional and structural interactions between BICC-1, PC1 and PC2, respectively. The characterization of such interactions remains to be developed further, which renders the specific relevance of these findings for the etiology of relevant diseases unclear.

    2. Reviewer #1 (Public review):

      In this manuscript, Tran et al. investigate the interaction between BICC1 and ADPKD genes in renal cystogenesis. Using biochemical approaches, they reveal a physical association between Bicc1 and PC1 or PC2 and identify the motifs in each protein required for binding. Through genetic analyses, they demonstrate that Bicc1 inactivation synergizes with Pkd1 or Pkd2 inactivation to exacerbate PKD-associated phenotypes in Xenopus embryos and potentially in mouse models. Furthermore, by analyzing a large cohort of PKD patients, the authors identify compound BICC1 variants alongside PKD1 or PKD2 variants in trans, as well as homozygous BICC1 variants in patients with early-onset and severe disease presentation. They also show that these BICC1 variants repress PC2 expression in cultured cells.

      Overall, the concept that BICC1 variants modify PKD severity is plausible, the data are robust, and the conclusions are largely supported.

      Comments on revision:

      My comments have been mostly addressed.

    3. Reviewer #2 (Public review):

      Tran and colleagues report evidence supporting the expected yet undemonstrated interaction between the Pkd1 and Pkd2 gene products Pc1 and Pc2 and the Bicc1 protein in vitro, in mice, and collaterally, in Xenopus and HEK293T cells. The authors go on to convincingly identify two large and non-overlapping regions of the Bicc1 protein important for each interaction and to perform gene dosage experiments in mice that suggest that Bicc1 loss of function may compound with Pkd1 and Pkd2 decreased function, resulting in PKD-like renal phenotypes of different severity. These results led to examining a cohort of very early onset PKD patients to find three instances of co-existing mutations in PKD1 (or PKD2) and BICC1. Finally, preliminary transcriptomics of edited lines gave variable and subtle differences that align with the theme that Bicc1 may contribute to the PKD defects, yet are mechanistically inconclusive.

      These results are potentially interesting, despite the limitation, also recognized by the authors, that BICC1 mutations seem exceedingly rare in PKD patients and may not "significantly contribute to the mutational load in ADPKD or ARPKD". The manuscript has several intrinsic limitations that must be addressed.

      The manuscript contains factual errors, imprecisions, and language ambiguities. This has the effect of making this reviewer wonder how thorough the research reported and analyses have been.

      Comments on revision:

      My comments have been addressed.

    4. Reviewer #3 (Public review):

      Summary:

      This study investigates the role of BICC1 in the regulation of PKD1 and PKD2 and its impact on cytogenesis in ADPKD. By utilizing co-IP and functional assays, the authors demonstrate physical, functional, and regulatory interactions between these three proteins.

      Strengths:

      (1) The scientific principles and methodology adopted in this study are excellent, logical, and reveal important insights into the molecular basis of cystogenesis.

      (2) The functional studies in animal models provide tantalizing data that may lead to a further understanding and may consequently lead to the ultimate goal of finding a molecular therapy for this incurable condition.

      (3) In describing the patients from the Arab cohort, the authors have provided excellent human data for further investigation in large ADPKD cohorts. Even though there was no patient material available, such as HUREC, the authors have studied the effects of BICC1 mutations and demonstrated its functional importance in a Xenopus model.

      Weaknesses:

      This is a well-conducted study and could have been even more impactful if primary patient material was available to the authors. A further study in HUREC cells investigating the critical regulatory role of BICC1 and potential interaction with mir-17 may yet lead to a modifiable therapeutic target.

      Conclusion:<br /> The authors achieve their aims. The results reliably demonstrate the physical and functional interaction between BICC1 and PKD1/PKD2 genes and their products.

      The impact is hopefully going to be manifold:

      (1) Progressing the understanding of the regulation of the expression of PKD1/PKD2 genes.

      Comments on revision:

      My comments have been addressed and sorted.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The authors devote significant effort to characterizing the physical interaction between Bicc1 and Pkd2. However, the study does not examine or discuss how this interaction relates to Bicc1's well-established role in posttranscriptional regulation of Pkd2 mRNA stability and translation efficiency.

      The reviewer is correct that the present study has not addressed the downstream consequences of uthis interaction considering that Bicc1 is a posttranscriptional regulator of Pkd2 (and potentially Pkd1). We think that the complex of Bicc1/Pkd1/Pkd2 retains Bicc1 in the cytoplasm and thus restrict its activity in participating in posttranscriptional regulation (see Author response image 1). We, however, do not yet have data to support this and thus have not included this model in the manuscript. Yet, we have updated the discussion of the manuscript to further elaborate on the potential mechanism of the Bicc1/Pkd1/Pkd2 complex.

      We have updated the discussion to include a discussion on the potential consequences on posttranscriptional regulation by Bicc1.

      Author response image 1.

      Model of BICC1, PC1 and PC2 self-regulation. In this model Bicc1 acts as a positive regulator of PKD gene expression. In the presence of ‘sufficient’ amounts of PC1/PC2 complex, it is tethered to the complex and remains biologically inactive (Fig. 1A). However, once the levels of the PC1/PC2 complex are reduced, Bicc1 is now present in the cytoplasm to promote expression of the PKD proteins, thereby raising their levels (Fig. 4B), which then in turn will ‘shutdown’ Bicc1 activity by again tethering it to the plasma membrane.

      (2) Bicc1 inactivation appears to downregulate Pkd1 expression, yet it remains unclear whether Bicc1 regulates Pkd1 through direct interaction or by antagonizing miR-17, as observed in Pkd2 regulation. This should be further examined or discussed.

      This is a very interesting comment. Vishal Patel published that PKD1 is regulated by a mir-17 binding site in its 3’UTR (PMID: 35965273). We, however, have not evaluated whether BICC1 participates in this regulation. A definitive answer would require utilization of the mice described in above reference, which is beyond the scope of this manuscript. We, however, have revised the discussion to elaborate on this potential mechanism. 

      We have updated the discussion to include a statement on the potential direct regulation of Pkd1 mRNA by Bicc1.

      (3) The evidence supporting Bicc1 and ADPKD gene cooperativity, particularly with Pkd1, in mouse models is not entirely convincing, likely due to substantial variability and the aggressive nature of Bpk/Bpk mice. Increasing the number of animals or using a milder Bicc1 strain, such as jcpk heterozygotes, could help substantiate the genetic interaction.

      We have initially performed the analysis using our Bicc1 complete knockout, we previously reported on (PMID 20215348) focusing on compound heterozygotes. Yet, similar to the Pkd1/Pkd2 compound heterozygotes (PMID 12140187) no cyst development was observed when we sacrificed the mice as late as P21. Our strain is similar to the above mentioned jcpk, which is characterized by a short, abnormal transcript thought to result in a null allele (PMID: 12682776). We thank the reviewer for pointing us to the reference showing the heterozygous mice exhibit glomerular cysts in the adults (PMID: 7723240). This suggestion is an interesting idea we will investigate. In general, we agree with the reviewer that a better understanding of the contribution of Bicc1 to the adult PKD phenotype will be critical. To this end, we are currently generating a floxed allele of Bicc1 that will allow us to address the cooperativity in the adult kidney, when e.g. crossed to the Pkd1<sup>RC/RC</sup> mice. Yet, these experiments are beyond the timeframe for this revision. 

      No changes were made in the revised manuscript. 

      Reviewer #2 (Public review):

      (1) These results are potentially interesting, despite the limitation, also recognized by the authors, that BICC1 mutations seem exceedingly rare in PKD patients and may not "significantly contribute to the mutational load in ADPKD or ARPKD". The manuscript has several intrinsic limitations that must be addressed. 

      As mentioned above, the study was designed to explore whether there is an interaction between BICC1 and the PKD1/PKD2 and whether this interaction is functionally important. How this translates into the clinical relevance will require additional studies (and we have addressed this in the discussion of the manuscript).

      (2) The manuscript contains factual errors, imprecisions, and language ambiguities. This has the effect of making this reviewer wonder how thorough the research reported and analyses have been. 

      We respectfully disagree with the reviewer on the latter interpretation. The study was performed with rigor. We have carefully assessed the critiques raised by the reviewer. As presented below, most of the criticisms raised by the reviewer have been easily addressed in the revised version of the manuscript. Yet, none of the critiques seems to directly impact the overall interpretation of the data. 

      Reviewer #1 (Recommendations for the authors):

      (1) The manuscript requires further editing. For example, figure panels and legends are mismatched in Figure 1

      We have corrected the labeling of Figure 1. 

      (2) Y-axis units and values are inconsistent in Figures 4b-4g, Supplementary Figures S2e and S2f are not referenced in the text, genotypes are missing in Supplementary Figure S3f, and numerous typographical errors are present.

      In respect to the y-axis in Figure 4b-g, the scale is different for each of them, but that is intentional as one would lose the differences if they were all scaled identically. But we have now mentioned this in the figure legend to make the reader aware of it. In respect to the Supplemental Figure S2e,f, we included the panels in the description of the mutant BICC1 lines, but unfortunately forgot to reference them. This has now been done.

      We have updated the labeling of the Y-axis for the cystic indices adding “[%]” as the unit and updated the figure legend of Figure 4. We have included the genotypes in Supplementary Figure S3f. The Supplementary Figure S2e,f is now mentioned in the supplemental material (page 9, 2<sup>nd</sup> paragraph). 

      Reviewer #2 (Recommendations for the authors):

      (1) Previous data from mouse, Xenopus, and zebrafish suggest a crucial role for the RNAbinding protein Bicc1 in the pathogenesis of PKD, although BICC1 mutations in human PKD have not been previously reported." The cited sources (and others that were not cited) link Bicc1 mutations to renal cysts, similar to a report by Kraus (PMID: 21922595) that the authors cite later. However, a more direct link to PKD was reported by Lian and colleagues using whole Pkd1 mice (PMID: 20219263) and by Gamberi and colleagues using Pkd1 kidneys and human microarrays (PMID: 28406902). Although relevant, neither is cited here, and only the former is cited later in the manuscript.

      Thanks for pointing this out. We have added these three citations.

      We have added these three citations (PMID: 21922595, PMID: 20219263 and PMID: 28406902) in the indicated sentence.

      (2) In Figure 1B, the lanes do not seem to correspond among panels, particularly evident in the panel with myc-mBicc1. Hence, it is difficult to agree with the presented conclusions.

      We have corrected the labeling of the lanes in Figure 1b.

      (3) In the Figure 1 legend: "(g) Western blot analysis following co-IP experiments, using an anti-mouse Bicc1 or anti-goat PC2 antibody as bait, identified protein interactions between endogenous PC2 and BICC1 in UCL93 cells. Non-immune goat and mouse IgG were included as a negative control." There is no mention of panel H, although this reviewer can imagine what the authors meant. The capitalization differs in the figure and legend. More troublingly, in panel G, a non-defined star indicates a strong band present in both immune and non-immune control.

      We have corrected the figure legend of Figure 1 and clarified the non-specific band in the figure legend.

      (4) In Figure 4, the authors do not show the matched control for the Bicc1 Pkd1 interaction in panel d, nor do they show a scale bar in either a) or d). Thus, the phenotypic severity cannot be properly assessed.

      Thanks for pointing out the missing scale bars, which have now been added. In respect to the two kidneys shown in Figure 4d, the two kidneys shown are from littermates to illustrate the kidney size in agreement with the cumulative data shown in Figure 4e. Unfortunately, this litter did not have a wildtype control. As the data analysis in Figure 4e is based on littermates, mixing and matching kidneys of different litters does not seem appropriate. Thus, we have omitted showing a wildtype control in this panel. However, the size of the wildtype kidney can be seen in Figure 4a.

      We have added the scale bar to both panels and have updated the figure legend to emphasize that the kidneys shown are from littermates and that no wildtype littermate was present in this litter.

      (5) "Surprisingly, an 8-fold stronger interaction was observed between full-length PC1 and myc-mBicc1-ΔKH compared to mycmBicc1 or myc-mBicc1-ΔSAM." Assuming all the controls for protein folding and expression levels have been carried out and not shown/mentioned, this sentence seems to contradict the previous statement that Bicc1deltaSAM reduced the interaction with PC1 by 55%. Because the full length and SAM deletion have different interaction strengths, the latter sentence makes no sense.

      The reduction in the levels of myc-mBicc1-ΔSAM compared to wildtype mycmBicc1 in respect to PC1 binding was not significant. We have clarified this in the text.

      We have corrected the sentence and modified the Figure accordingly. 

      (6) Imprecise statements make a reader wonder how to interpret the data: "More than three independent experiments were analyzed." Stating the sample size or including it in the figure would save space and improve confidence in the data presented.

      We have stated the exact number of animals per conditions above each of the bars.

      (7) "Next, we performed a similar mouse study for Pkd1 by reducing the gene dose of Pkd1 postnatally in the collecting ducts using a Pkhd1-Cre as previously described40" What did the authors mean?

      The reference was included to cite the mouse strain, but realized that it can be mis-interpreted that the exact experiments has been performed previously. We have clarified this in the text.

      We have reworded the sentence to avoid misinterpretation. 

      (8) The authors examined the additive effects of knocking down Bicc1, Pkd1, and Pkd2 with morpholinos in Xenopus and, genetically, in mice. While the Bicc1[+/-] Pkd1 or 2[+/-] double heterozygote mice did not show phenotypes, the authors report that the Bicc1[-/-] Pkd1 or 2 [+/-] did instead show enlarged kidneys. What is the phenotype of a Bicc1[+/-] Pkd1 or 2 [-/-]? What we learn from the author's findings among the PKD population suggests that the latter situation would be potentially translationally relevant.

      The mouse experiments were designed to address a cooperativity between Bicc1 and either Pkd1 or Pkd2 and whether removal of one copy of Pkd1 or Pkd2 would further worsen the Bicc1 cystic kidney phenotype. Thus, the parental crosses were chosen to maximize the number of animals obtained for these genotypes. Unfortunately, these crosses did not yield the genotypes requested by the reviewer. To address the contribution of Bicc1 towards the PKD population, we will need to perform a different cross, where we eliminate Pkd1 or Pkd2 in a floxed background of Bicc1 postnatally in adult mice. While we are gearing up to perform such an experiment, this is timewise beyond the scope of the manuscript. In addition, please note that we have addressed the question about the translation towards the PKD population already in the discussion of the original submission (page 13/14, last/first paragraph).

      No changes have been made to the revised version of the manuscript.

      (9) How do the authors interpret the milder effects of the Bicc1[-/-] Pkd1[+/-] compared to Bicc1[-/-] Pkd2[+/-] relative to the respective protein-protein interactions?

      The milder effects are due to the nature of the crosses. While the Pkd2 mutant is a germline mutation, the Pkd1 mutant is a conditional allele eliminating Pkd1 only in the collecting ducts of the kidney. As such, we spare other nephron segments such as the proximal tubules, which also significantly contribute to the cyst load. As such these mouse data support the interaction between Pkd1 and Pkd2 with Bicc1, but do not allow us to directly compare the outcomes. While this was mentioned in the previous version of the manuscript, we have expanded on this in the revised version of the manuscript.

      We have expanded the results section in the revised version of the manuscript highlighting that the two different approaches cannot be directly compared.

      (10) How do the authors interpret that the strong Bicc1[Bpk] Pkd1 or Pkd2 double heterozygote mice did not have defects and "kidneys from Bicc1+/-:Pkd2+/- did not exhibit cysts (data not shown)", when the VEO PKD patients and - although not a genetic reduction - also the morpholino-treated Xenopus did?

      VEO PKD patients are characterized by a loss of function of PKD1 or PKD2 and – as we propose in this manuscript - that BICC1 further aggravates the phenotype. Yet, we do not address either in the mouse or Xenopus experiments whether BICC1 is a genetic modifier. We are simply addressing whether the two genes show a genetic interaction. In the mouse studies, we eliminate one copy of Pkd1 or Pkd2 in the background of a hypomorphic allele of Bicc1. Similarly, in the Xenopus experiments, we employ suboptimal doses of the morpholino oligomers, i.e., concentrations that did not yield a phenotypic change and then asked whether removing both together show cooperativity. It is important to state that this is based on a biological readout and not defined based on the amount of protein. While we have described this already in the original manuscript (page 7, first paragraph), we have amended our description of the Xenopus experiment to make this even clearer. 

      Finally, we agree with the reviewer that if we were to address whether Bicc1 is a modifier of the PKD phenotype in mouse, we would need to reduce Bicc1 function in a Pkd1 or Pkd2 mutants. Yet, we have recognized this already in the initial version of the manuscript in the discussion (page 14, first paragraph).

      We have expanded the results section when discussing the suboptimal amounts of the morpholino oligos (Page 6, 1<sup>st</sup> paragraph).

      (11) Unclear: "While variants in BICC1 are very rare, we could identify two patients with BICC1 variants harboring an additional PKD2 or PKD1 variant in trans, respectively." Shortly after, the authors state in apparent contradiction that "the patients had no other variants in any of other PKD genes or genes which phenocopy PKD including PKD1, PKD2, PKHD1, HNF1s, GANAB, IFT140, DZIP1L, CYS1, DNAJB11, ALG5, ALG8, ALG9, LRP5, NEK8, OFD1, or PMM2."

      The reviewer is correct. This should have been phrased differently. We have now added “Besides the variants reported below” to clarify this more adequately.

      The sentence was changed to start with “Besides the variants reported below, […].”

      (12) "The demonstrated interaction of BICC1, PC1, and PC2 now provides a molecular mechanism that can explain some of the phenotypic variability in these families." How do the authors reconcile this statement with their reported ultra-rare occurrence of the BICC1 mutations?

      As mentioned in the manuscript and also in response to the other two reviewers, Bicc1 has been shown to regulate Pkd2 gene expression in mice and frogs via an interaction with the miR-17 family of microRNAs. Moreover, the miR-17 family has been demonstrated to be critical in PKD (PMID: 30760828, PMID: 35965273, PMID: 31515477, PMID: 30760828). In fact, both other reviewers have pointed out that we should stress this more since Bicc1 is part of this regulatory pathway. Future experiments are needed to address whether Bicc1 contributes to the variability in ADPKD onset/severity. Yet, this is beyond the scope of this study. 

      Based on the comments of the two other reviewers we have further addressed the Bicc1/miR-17 interaction.

      (13) The manuscript should use correct genetic conventions of italicization and capitalization. This is an issue affecting the entire manuscript. Some exemplary instances are listed below.

      (a) "We also demonstrate that Pkd1 and Pkd2 modifies the cystic phenotype in Bicc1 mice in a dose-dependent manner and that Bicc1 functionally interacts with Pkd1, Pkd2 and Pkhd1 in the pronephros of Xenopus embryos." Genes? Proteins?

      The data presented in this section show that a hypomorphic allele of Bicc1 in mouse and a knockdown in Xenopus yields this. As both affect the proteins, the spelling should reflect the proteins.

      No changes have been made in the revised manuscript.

      (b) The sentence seems to use both the human and mouse genetic capitalization, although it refers to experiments in the mouse system “to define the Bicc1 interacting domains for PC2 (Fig. 2d,e). Full-length PC2 (PC2-HA) interacted with full-length myc-mBICC1.”

      We agree with the review that stating the species of the molecules used is critical, we have adapted a spelling of Bicc1, where BICC1 is the human homologue, mBicc1 is the mouse homologue and xBicc1 the Xenopus one.

      We have highlighted the species spelling in the methods section and labeled the species accordingly throughout the manuscript and figures. 

      (14) “Together these data supported our biochemical interaction data and demonstrated that BICC1 cooperated with PKD1 and PKD2.” Are the authors implying that these results in mice will translate to the human protein?

      We agree that we have not formally shown that the same applies to the human proteins. Thus, we have changed the spelling accordingly.

      We have revised the capitalization of the proteins. 

      (15) The text is often unclear, terse, or inconsistent.

      (a) “These results suggested that the interaction between PC1 and Bicc1 involves the SAM but not the KH/KHL domains (or the first 132 amino acids of Bicc1). It also suggests that the N-terminus could have an inhibitory effect on PC1-BICC1 association.” How do the authors define the N-terminus? The first 132 aa? KH/KHL domains?

      This was illustrated in the original Figure 2A. The DKH constructs lack the first 351 amino acids. 

      To make this more evident, we have specified this in the text as well.

      (b) Similarly, the authors state below, "Unlike PC1, PC2 interacted with mycmBICC1ΔSAM, but not myc-mBICC1-ΔKH suggesting that PC2 binding is dependent on the N-terminal domains but not the SAM domain." It is unclear if the authors refer to the KH/KHL domains or others. Whatever the reference to the N-terminal region, it should also be consistent with the section above.

      This is now specified in the text.

      (c) Unclear: "We have previously demonstrated that Pkd2 levels are reduced in a complete Bicc1 null mice,22 performing qRT-PCR of P4 kidneys (i.e. before the onset of a strong cystic phenotype), revealed that Bicc1, Pkd1 and Pkd2 were statistically significantly down9 regulated (Fig. 4h-j)".

      We have changed the text to clarify this. 

      (d) “Utilizing recombinant GST domains of PC1 and PC2, we demonstrated that BICC1 binds to both proteins in GST-pulldown assays (Fig. 1a, b)." GST-tagged domains? Fusions?

      We have changed the text to clarify this. 

      (e) "To study the interaction between BICC1, PKD1 and PKD2 we combined biochemical approaches, knockout studies in mice and Xenopus, genetic engineered human kidney cells" > genetically engineered.

      We have changed the text to clarify this.

      (f) Capitalization (e.g., see Figure S3, ref. the Bpk allele) and annotation (e.g., Gly821Glu and G821E) are inconsistent.

      We have homogenized the labeling of the capitalization and annotations throughout the manuscript. 

      (g) What do the authors mean by "homozygous evolutionarily well-conserved missense variant"?

      We have changed this is the revised version of the manuscript. 

      Reviewer #3 (Public review/Recommendations to the authors):

      (1) A further study in HUREC cells investigating the critical regulatory role of BICC1 and potential interaction with mir-17 may yet lead to a modifiable therapeutic target.

      (2) This study should ideally include experiments in HUREC material obtained from patients/families with BICC1 mutations and studying its effects on the PKD1/2 complex in primary cell lines.

      This is an excellent suggestion. We agree with the reviewer that it would have been interesting to analyze HUREC material from the affected patients. Unfortunately, besides DNA and the phenotypic analysis described in the manuscript neither human tissue nor primary patient-derived cells collected once the two patients with the BICC1 p.Ser240Pro variant passed away.

      No changes to the revised manuscript have been made to address this point.

      (3) Please remove repeated words in the following sentence in paragraph 2 of the introduction: "BICC1 encodes an evolutionarily conserved protein that is characterized by 3 K-homology (KH) and 2 KH-like (KHL) RNA-binding domains at the N-terminus and a SAM domain at the C-terminus, which are separated by a by a disordered intervening sequence (IVS).23-28".

      This has been changed.

    1. Author response:

      Reviewer #1 (Public review):

      The authors analysed large-scale brain-state dynamics while humans watched a short video. They sought to identify the role of thalamocortical interactions.

      Major concerns

      (1) Rationale for using the naturalistic stimulus

      In terms of brain state dynamics, previous studies have already reported large-scale neural dynamics by applying some data-driven analyses, like energy landscape analysis and Hidden Markov Model, to human fMRI/EEG data recorded during resting/task states. Considering such prior work, it'd be critical to provide sufficient biological rationales to perform a conceptually similar study in a naturalistic condition, i.e., not just "because no previous work has been done". The authors would have to clarify what type of neural mechanisms could be missed in conventional resting-state studies using, say, energy landscape analysis, but could be revealed in the naturalistic condition.

      We appreciate your insightful comments regarding the need for a biological rationale in our study. As you mentioned, there are similar studies, just like Meer et al. utilized Hidden Markov Models to identify various activation modes of brain networks that included subcortical regions[1], Song et al. linked brain states to narrative understandings and attentional dynamics[2, 3]. These studies could answer why we use naturalistic stimuli datasets. Moreover, there is evidence suggesting that the thalamus plays a crucial role in processing information in a more naturalistic context while pointing out the vital role in thalamocortical communications[4, 5]. So, we tended to bridge thalamic activity and cortical state transition using the energy landscape description.

      To address these gaps in conventional resting-state studies, we explored an alternative method—maximum entropy modeling based on the energy landscape. This allowed us to validate how the thalamus responds to cortical state transitions. To enhance clarity, we will update our introduction to emphasize the motivations behind our research and the significance of examining these neural mechanisms in a naturalistic setting.

      (2) Effects of the uniqueness of the visual stimulus and reproducibility

      One of the main drawbacks of the naturalistic condition is the unexpected effects of the stimuli. That is, this study looked into the data recorded from participants who were watching Sherlock, but what would happen to the results if we analyzed the brain activity data obtained from individuals who were watching different movies? To ensure the generalizability of the current findings, it would be necessary to demonstrate qualitative reproducibility of the current observations by analysing different datasets that employed different movie stimuli. In fact, it'd be possible to find such open datasets, like www.nature.com/articles/s41597-023-02458-8.

      We appreciate your concern regarding the reproducibility of our findings. The dataset from the "Sherlock" study is of high quality and has shown good generalizability in various research contexts. We acknowledge the importance of validating our results with different datasets to enhance the robustness of our conclusions. While we are open to exploring additional datasets, we intend to pursue this validation once we identify a suitable alternative. Currently, we are considering a comparison with the dataset from "Forrest Gump" as part of our initial plan.

      (3) Spatial accuracy of the "Thalamic circuit" definition

      One of the main claims of this study heavily relies on the accuracy of the localization of two different thalamic architectures: matrix and core. Given the conventional or relatively low spatial resolution of the fMRI data acquisition (3x3x3 mm^3), it appears to be critically essential to demonstrate that the current analysis accurately distinguished fMRI signals between the matrix and core parts of the thalamus for each individual.

      We acknowledge the importance of accurately localizing the different thalamic architectures, specifically the matrix and core regions. To address this, we downsampled the atlas of matrix and core cell populations from the previous study from a resolution of 2x2x2 mm<sup>3</sup> to 3x3x3 mm<sup>3</sup>, which aligns with our fMRI data acquisition. We would report the atlas as Supplementary Figures in our revision.

      (4) More detailed analysis of the thalamic circuits

      In addition, if such thalamic localisation is accurate enough, it would be greatly appreciated if the authors perform similar comparisons not only between the matrix and core architectures but also between different nuclei. For example, anterior, medial, and lateral groups (e.g., pulvinar group). Such an investigation would meet the expectations of readers who presume some microscopic circuit-level findings.

      We appreciate your suggestion regarding a more detailed analysis of thalamic circuits. We have touched upon this in the discussion section as a forward-looking consideration. However, we believe that performing nuclei segmentation with 3T fMRI may not be ideal due to well-documented concerns regarding signal-to-noise ratio and spatial resolution. That said, we are interested in exploring these nuclei-pathway connections to cortical areas in future studies with a proper 7T fMRI naturalistic dataset.

      (5) Rationale for different time window lengths

      The authors adopted two different time window lengths to examine the neural dynamics. First, they used a 21-TR window for signal normalisation. Then, they narrowed down the window length to 13-TR periods for the following statistical evaluation. Such a seemingly arbitrary choice of the shorter time window might be misunderstood as a measure to relax the threshold for the correction of multiple comparisons. Therefore, it'd be appreciated if the authors stuck to the original 21-TR time window and performed statistical evaluations based on the setting.

      Thank you for your valuable feedback regarding the choice of time window lengths. We aimed to maintain consistency in window lengths across our analyses. In light of your comments and suggestions from other reviewers, we plan to test our results using different time window lengths and report findings that generalize across these variations. Should the results differ significantly, we will discuss the implications of this variability in our revised manuscript.

      (6) Temporal resolution

      After identifying brain states with energy landscape analysis, this study investigated the brain state transitions by directly looking into the fMRI signal changes. This manner seems to implicitly assume that no significant state changes happen in one TR (=1.5sec), which needs sufficient validation. Otherwise, like previous studies, it'd be highly recommended to conduct different analyses (e.g., random-walk simulation) to address and circumvent this problem.

      Thank you for raising this important point regarding temporal resolution. Many fMRI studies, such as those examining event boundaries during movie watching, operate under similar assumptions concerning state changes within one TR. For example, Barnett et al. processed the dynamic functional connectivity (dFC) with a window of 20 TRs (24.4s). So, we do not think it is a limitation but is a common question related to fMRI scanning parameters. To strengthen our analysis of state transitions and ensure they are not merely coincidental, we plan to conduct random-walk simulations, as suggested, to validate our findings in accordance with methodologies used in previous research.

      Reviewer #2 (Public review):

      Summary:

      In this study, Liu et al. investigated cortical network dynamics during movie watching using an energy landscape analysis based on a maximum entropy model. They identified perception- and attention-oriented states as the dominant cortical states during movie watching and found that transitions between these states were associated with inter-subject synchronization of regional brain activity. They also showed that distinct thalamic compartments modulated distinct state transitions. They concluded that cortico-thalamo-cortical circuits are key regulators of cortical network dynamics.

      Strengths:

      A mechanistic understanding of cortical network dynamics is an important topic in both experimental and computational neuroscience, and this study represents a step forward in this direction by identifying key cortico-thalamo-cortical circuits. The analytical strategy employed in this study, particularly the LASSO-based analysis, is interesting and would be applicable to other data types, such as task- and resting-state fMRI.

      We thanks for this comment and encouragement.

      Weaknesses:

      Due to issues related to data preprocessing, support for the conclusions remains incomplete. I also believe that a more careful interpretation of the "energy" derived from the maximum entropy model would greatly clarify what the analysis actually revealed.

      Thank you for your valuable suggestions, and we apologize for any misunderstandings regarding the interpretation of the energy landscape in our study. To address this issue, we will include a dedicated paragraph in both the methods and results sections to clarify our use of the term "energy" derived from the maximum entropy model. This addition aims to eliminate any ambiguity and provide a clearer understanding of what our analysis reveals.

      (1) I think the method used for binarization of BOLD activity is problematic in multiple ways.

      a) Although the authors appear to avoid using global signal regression (page 4, lines 114-118), the proposed method effectively removes the global signal. According to the description on page 4, lines 117-122, the authors binarized network-wise ROI signals by comparing them with the cross-network BOLD signal (i.e., the global signal): at each time point, network-wise ROI signals above the cross-network signal were set to 1, and the rest were set to −1. If I understand the binarization procedure correctly, this approach forces the cross-network signal to be zero (up to some noise introduced by the binarization of network-wise signals), which is essentially equivalent to removing the global signal. Please clarify what the authors meant by stating that "this approach maintained a diverse range of binarized cortical states in data where the global signal was preserved" (page 4, lines 121-122).

      Thank you for highlighting the potential issue with our binarization method. We appreciate your insights regarding the comparison of network-wise ROI signals with the cross-network BOLD signal, as this may inadvertently remove the global signal. To address this, we will conduct a comparative analysis of results obtained from both our current approach and the original pipeline. If we decide to retain our current method, we will carefully reconsider the rationale and rephrase our descriptions to ensure clarity regarding the preservation of the global signal and the diversity of binarized cortical states.

      b) The authors might argue that they maintained a diverse range of cortical states by performing the binarization at each time point (rather than within each network). However, I believe this introduces another problem, because binarizing network-wise signals at each time point distorts the distribution of cortical states. For example, because the cross-network signal is effectively set to zero, the network cannot take certain states, such as all +1 or all −1. Similarly, this binarization biases the system toward states with similar numbers of +1s and −1s, rather than toward unbalanced states such as (+1, −1, −1, −1, −1, −1). These constraints and biases are not biological in origin but are simply artifacts of the binarization procedure. Importantly, the energy landscape and its derivatives (e.g., hard/easy transitions) are likely to be affected by these artifacts. I suggest that the authors try a more conventional binarization procedure (i.e., binarization within each network), which is more robust to such artifacts.

      Related to this point, I have a question regarding Figure S1, in which the authors plotted predicted versus empirical state probabilities. As argued above, some empirical state probabilities should be zero because of the binarization procedure. However, in Figure S1, I do not see data points corresponding to these states (i.e., there should be points on the y-axis). Did the authors plot only a subset of states in Figure S1? I believe that all states should be included. The correlation coefficient between empirical and predicted probabilities (and the accuracy) should also be calculated using all states.

      Thank you for your thoughtful examination of our data processing pipeline. We agree that a comparison between the conventional binarization method and our current approach is warranted, and we appreciate your suggestion. Upon reviewing Figure S1, we discovered that there was indeed an error related to the plotting style set to "log10." As you correctly pointed out, the data should reflect that the probabilities for states where all networks are either activated or deactivated are zero. We are very interested in exploring the state distributions obtained from both the original and current approaches, as your comments highlight important considerations. We sincerely appreciate your insightful feedback and will make sure to address these points thoroughly in our first revision.

      c) The current binarization procedure likely inflates non-neuronal noise and obscures the relationship between the true BOLD signal and its binarized representation. For example, consider two ROIs (A and B): both (+2%, +1%) and (+0.01%, −0.01%) in BOLD signal changes would be mapped to (+1, −1) after binarization. This suggests that qualitatively different signal magnitudes are treated identically. I believe that this issue could be alleviated if the authors were to binarize the signal within each network, rather than at each time point.

      Thank you for your important observation regarding the potential inflation of non-neuronal noise in our current binarization procedure. We recognize that this process could lead to qualitatively different signal magnitudes being treated similarly after binarization, as you illustrated with your example. While we acknowledge your point, we believe that conventional binarization pipelines may also encounter this issue, albeit by comparing signals to a network's temporal mean activity. To address this concern and maintain consistency with previous studies, we will discuss this limitation in our revised manuscript. Additionally, if deemed necessary, we will explore implementing a percentile-based threshold above the baseline to further refine our binarization approach. Your suggestion provides a valuable perspective, and we appreciate your insights.

      (2) As the authors state (page 5, lines 145-148), the "energy" described in the energy landscape is not biological energy but rather a statistical transformation of probability distributions derived from the Boltzmann distribution. If this is the case, I believe that Figure 2A is potentially misleading and should be removed. This type of schematic may give the false impression that cortical state dynamics are governed by the energy landscape derived from the maximum entropy model (which is not validated).

      Thank you for your valuable feedback regarding Figure 2A. We apologize for any confusion it may have created. While we recognize that similar figures are commonly used in literature involving energy landscapes (maximum entropy model), we agree that Figure 2A may mislead readers into thinking that cortical state dynamics are directly governed by the energy landscape derived from the maximum entropy model, which has not been validated. In light of your comments, we will remove Figure 2A and instead emphasize the analytical strategy presented in Figure 2B. Additionally, we will provide a simplified line graph as an illustrative example to clarify the concepts without the potential for misinterpretation.

      Reviewer #3 (Public review):

      Summary:

      In this study, Liu et al. analyze fMRI data collected during movie watching, applied an energy landscape method with pairwise maximum entropy models. They identify a set of brain states defined at the level of canonical functional networks and quantify how the brain transitions between these states. Transitions are classified as "easy" or "hard" based on changes in the inferred energy landscape, and the authors relate transition probabilities to inter-subject correlation. A major emphasis of the work is the role of the thalamus, which shows transition-linked activity changes and dynamic connectivity patterns, including differential involvement of parvalbumin- and calbindin-associated thalamic subdivisions.

      Strengths:

      The study is methodologically complex and technically sophisticated. It integrates advanced analytical methods into high-dimensional fMRI data. The application of energy landscape analysis to movie-watching data appears to be novel as well. The finding on the thalamus involved energy state transition and provides a strong linkage to several theories on thalamic control functions, which is a notable strength.

      Thanks for your comments on the novelty of our study.

      Weaknesses:

      The main weakness is the conceptual clarity and advances that this otherwise sophisticated set of analyses affords. A central conceptual ambiguity concerns the energy landscape framework itself. The authors note that the "energy" in this model is not biological energy but a statistical quantity derived from the Boltzmann distribution. After multiple reads, I still have major trouble mapping this measure onto any biological and cognitive operations. BOLD signal is a measure of oxygenation as a proxy of neural activity, and correlated BOLD (functional connectivity) is thought to measure the architecture of information communication of brain systems. The energy framework described in the current format is very difficult for most readers to map onto any neural or cognitive knowledge base on the structure and function of brain systems. Readers unfamiliar with maximum entropy models may easily misinterpret energy changes as reflecting metabolic cost, neural effort, or physiological variables, and it is just very unclear what that measure is supposed to reflect. The manuscript does not clearly articulate what conceptual and mechanistic advances the energy formalism provides beyond a mathematical and statistical report. In other words, beyond mathematical description, it is very hard for most readers to understand the process and function of what this framework is supposed to tell us in regards to functional connectivity, brain systems, and cognition. The brain is not a mathematical object; it is a biological organ with cognitive functions. The impact of this paper is severely limited until connections can be made.

      Thank you for your insightful and constructive comments regarding the conceptual clarity of our energy landscape framework. We appreciate your perspective on the challenges of mapping the statistical measure of "energy" derived from the Boltzmann distribution onto biological and cognitive operations. To address these concerns, we will revise our manuscript to clarify our expressions surrounding "energy" and emphasize its probabilistic nature. Additionally, we will incorporate a series of analyses that explicitly relate the features of the energy landscape to cognitive processes and key parameters, such as brain integration and functional connectivity. We believe these changes will help bridge the gap between our mathematical framework and its relevance to understanding brain systems and cognitive functions.

      Relatedly, the use of metaphors such as "valleys," "hills," and "routes" in multidimensional measures lacks grounding. Valleys and hills of what is not intuitive to understand. Based on my reading, these features correspond to local minima and barriers in a probability distribution over binarized network activation patterns, but similar to the first point, the manuscript does not clearly explain what it means conceptually, neurobiologically, or computationally for the brain to "move" through such a landscape. The brain is not computing these probabilities; they are measurement tools of "something". What is it? To advance beyond mathematical description, these measurements must be mapped onto neurobiological and cognitive information.

      Thank you for your valuable feedback. In our revisions, we would aim to link the concept of rapid transition routes in the energy landscape to cognitive processes, such as narrative understanding and related features. By exploring these connections, we hope to provide a clearer context for how our framework can enhance understanding of cognitive functions and their neural correlates.

      This conceptual ambiguity goes back to the Introduction. At the level of motivation, the purpose and deliverables of the study are not defined in the Introduction. The stated goal is "Transitions between distinct cortical brain states modulate the degree of shared neural processing under naturalistic conditions". I do not know if readers will have a clear answer to this question at the end. Is the claim that state transitions cause changes in inter-subject correlation, that they index moments of narrative alignment, or that they reflect changes in attentional or cognitive mode? This level of explanation is largely dissociated from the methods in their current form.

      Thank you for highlighting this important point regarding the conceptual clarity in our Introduction. We appreciate your feedback about the motivation and objectives of the study. To clarify the stated goal of investigating how transitions between distinct cortical brain states modulate shared neural processing under naturalistic conditions, we will revise the manuscript to explicitly define the specific claims we aim to address. We will ensure that these explanations are closely tied to the methods employed in our study, providing a clearer framework for our readers.

      Several methodological choices can use clarification. The use of a 21-TR window centered on transition offsets is unusually long relative to the temporal scale of fMRI dynamics and to the hypothesized rapidity of state transitions. On a related note, what is the temporal scale of state transition? Is it faster than 21 TRs?

      Thank you for your insightful questions regarding our methodological choices. Our focus on specific state transitions necessitated the use of a 21-TR window. While it’s true that other transitions may occur within this window, averaging across the same transitions at different times allows us to identify distinctive thalamic BOLD patterns that precede cortical state transitions. This methodology enables us to capture relevant dynamics while ensuring that we focus on the transitions of interest. We appreciate your feedback, and this clarification will be included in our revised manuscript. We would also add a figure that describe the dwell time of cortical states.

      The choice of movie-watching data is a strength. But, many of the analyses performed here, energy landscape estimation, clustering of states, could in principle be applied to resting-state data. The manuscript does not clearly articulate what is gained, mechanistically or cognitively, by using movie stimuli beyond the availability of inter-subject correlation.

      Thank you for your question, which closely aligns with a concern raised by Reviewer #1. Our core hypothesis posits that naturalistic stimuli yield a broader set of brain states compared to those observed during resting-state conditions. To support this assertion, we will clearly articulate the findings from previous studies that relate to this hypothesis. Additionally, if appropriate, we will provide a comparative analysis between our data and resting-state data to highlight the differences and emphasize the uniqueness of the brain states elicited by naturalistic stimuli.

      Because of the above issues, a broader concern throughout the results is the largely descriptive nature of the findings. For example, the LASSO analysis shows that certain state transitions predict ISC in a subset of regions, with respectable R² values. While statistically robust, the manuscript provides little beyond why these particular transitions should matter, what computations they might reflect, or how they relate to known cognitive operations during movie watching. Similar issues arise in the clustering analyses. Clustering high-dimensional fMRI-derived features will almost inevitably produce structure, whether during rest, task, or naturalistic viewing. What is missing is an explanation of why these specific clusters are meaningful in functional or mechanistic terms.

      Thank you for your questions. In our revisions, we will perform additional analyses aimed at linking state transitions to cognitive processes more explicitly. Regarding clustering, we will provide a thorough discussion in the revised manuscript.

      Finally, the treatment of the thalamus, while very exciting, could use a bit more anatomical and circuit-level specificity. The manuscript largely treats the thalamus as a unitary structure, despite decades of work demonstrating big functional and connectivity differences across thalamic nuclei. A whole-thalamus analysis without more detailed resolution is increasingly difficult to justify. The subsequent subdivision into PVALB- and CALB-associated regions partially addresses this, but these markers span multiple nuclei with overlapping projection patterns.

      This suggestion aligns with the feedback from Reviewer #1. We believe that performing nuclei segmentation with 3T fMRI may not be ideal due to well-documented concerns regarding signal-to-noise ratio and spatial resolution. Therefore, investigating core and matrix cell projections across different thalamic nuclei using 7T fMRI presents a promising avenue for further study.

      (1) Van Der Meer J N, Breakspear M, Chang L J, et al. Movie viewing elicits rich and reliable brain state dynamics [J]. Nature Communications, 2020, 11(1): 5004.

      (2) Song H, Park B Y, Park H, et al. Cognitive and Neural State Dynamics of Narrative Comprehension [J]. Journal of Neuroscience, 2021, 41(43): 8972-8990.

      (3) Song H, Shim W M, Rosenberg M D. Large-scale neural dynamics in a shared low-dimensional state space reflect cognitive and attentional dynamics [J]. Elife, 2023, 12.

      (4) Shine J M, Lewis L D, Garrett D D, et al. The impact of the human thalamus on brain-wide information processing [J]. Nature Reviews Neuroscience, 2023, 24(7): 416-430.

      (5) Yang M Y, Keller D, Dobolyi A, et al. The lateral thalamus: a bridge between multisensory processing and naturalistic behaviors [J]. Trends in Neurosciences, 2025, 48(1): 33-46.

    2. Reviewer #3 (Public review):

      Summary:

      In this study, Liu et al. analyze fMRI data collected during movie watching, applied an energy landscape method with pairwise maximum entropy models. They identify a set of brain states defined at the level of canonical functional networks and quantify how the brain transitions between these states. Transitions are classified as "easy" or "hard" based on changes in the inferred energy landscape, and the authors relate transition probabilities to inter-subject correlation. A major emphasis of the work is the role of the thalamus, which shows transition-linked activity changes and dynamic connectivity patterns, including differential involvement of parvalbumin- and calbindin-associated thalamic subdivisions.

      Strengths:

      The study is methodologically complex and technically sophisticated. It integrates advanced analytical methods into high-dimensional fMRI data. The application of energy landscape analysis to movie-watching data appears to be novel as well. The finding on the thalamus involved energy state transition and provides a strong linkage to several theories on thalamic control functions, which is a notable strength.

      Weaknesses:

      The main weakness is the conceptual clarity and advances that this otherwise sophisticated set of analyses affords. A central conceptual ambiguity concerns the energy landscape framework itself. The authors note that the "energy" in this model is not biological energy but a statistical quantity derived from the Boltzmann distribution. After multiple reads, I still have major trouble mapping this measure onto any biological and cognitive operations. BOLD signal is a measure of oxygenation as a proxy of neural activity, and correlated BOLD (functional connectivity) is thought to measure the architecture of information communication of brain systems. The energy framework described in the current format is very difficult for most readers to map onto any neural or cognitive knowledge base on the structure and function of brain systems. Readers unfamiliar with maximum entropy models may easily misinterpret energy changes as reflecting metabolic cost, neural effort, or physiological variables, and it is just very unclear what that measure is supposed to reflect. The manuscript does not clearly articulate what conceptual and mechanistic advances the energy formalism provides beyond a mathematical and statistical report. In other words, beyond mathematical description, it is very hard for most readers to understand the process and function of what this framework is supposed to tell us in regards to functional connectivity, brain systems, and cognition. The brain is not a mathematical object; it is a biological organ with cognitive functions. The impact of this paper is severely limited until connections can be made.

      Relatedly, the use of metaphors such as "valleys," "hills," and "routes" in multidimensional measures lacks grounding. Valleys and hills of what is not intuitive to understand. Based on my reading, these features correspond to local minima and barriers in a probability distribution over binarized network activation patterns, but similar to the first point, the manuscript does not clearly explain what it means conceptually, neurobiologically, or computationally for the brain to "move" through such a landscape. The brain is not computing these probabilities; they are measurement tools of "something". What is it? To advance beyond mathematical description, these measurements must be mapped onto neurobiological and cognitive information.

      This conceptual ambiguity goes back to the Introduction. At the level of motivation, the purpose and deliverables of the study are not defined in the Introduction. The stated goal is "Transitions between distinct cortical brain states modulate the degree of shared neural processing under naturalistic conditions". I do not know if readers will have a clear answer to this question at the end. Is the claim that state transitions cause changes in inter-subject correlation, that they index moments of narrative alignment, or that they reflect changes in attentional or cognitive mode? This level of explanation is largely dissociated from the methods in their current form.

      Several methodological choices can use clarification. The use of a 21-TR window centered on transition offsets is unusually long relative to the temporal scale of fMRI dynamics and to the hypothesized rapidity of state transitions. On a related note, what is the temporal scale of state transition? Is it faster than 21 TRs?

      The choice of movie-watching data is a strength. But, many of the analyses performed here, energy landscape estimation, clustering of states, could in principle be applied to resting-state data. The manuscript does not clearly articulate what is gained, mechanistically or cognitively, by using movie stimuli beyond the availability of inter-subject correlation.

      Because of the above issues, a broader concern throughout the results is the largely descriptive nature of the findings. For example, the LASSO analysis shows that certain state transitions predict ISC in a subset of regions, with respectable R² values. While statistically robust, the manuscript provides little beyond why these particular transitions should matter, what computations they might reflect, or how they relate to known cognitive operations during movie watching. Similar issues arise in the clustering analyses. Clustering high-dimensional fMRI-derived features will almost inevitably produce structure, whether during rest, task, or naturalistic viewing. What is missing is an explanation of why these specific clusters are meaningful in functional or mechanistic terms.

      Finally, the treatment of the thalamus, while very exciting, could use a bit more anatomical and circuit-level specificity. The manuscript largely treats the thalamus as a unitary structure, despite decades of work demonstrating big functional and connectivity differences across thalamic nuclei. A whole-thalamus analysis without more detailed resolution is increasingly difficult to justify. The subsequent subdivision into PVALB- and CALB-associated regions partially addresses this, but these markers span multiple nuclei with overlapping projection patterns.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Liu et al. investigated cortical network dynamics during movie watching using an energy landscape analysis based on a maximum entropy model. They identified perception- and attention-oriented states as the dominant cortical states during movie watching and found that transitions between these states were associated with inter-subject synchronization of regional brain activity. They also showed that distinct thalamic compartments modulated distinct state transitions. They concluded that cortico-thalamo-cortical circuits are key regulators of cortical network dynamics.

      Strengths:

      A mechanistic understanding of cortical network dynamics is an important topic in both experimental and computational neuroscience, and this study represents a step forward in this direction by identifying key cortico-thalamo-cortical circuits. The analytical strategy employed in this study, particularly the LASSO-based analysis, is interesting and would be applicable to other data types, such as task- and resting-state fMRI.

      Weaknesses:

      Due to issues related to data preprocessing, support for the conclusions remains incomplete. I also believe that a more careful interpretation of the "energy" derived from the maximum entropy model would greatly clarify what the analysis actually revealed.

      (1) Major Comment 1:

      I think the method used for binarization of BOLD activity is problematic in multiple ways.

      a) Although the authors appear to avoid using global signal regression (page 4, lines 114-118), the proposed method effectively removes the global signal. According to the description on page 4, lines 117-122, the authors binarized network-wise ROI signals by comparing them with the cross-network BOLD signal (i.e., the global signal): at each time point, network-wise ROI signals above the cross-network signal were set to 1, and the rest were set to −1. If I understand the binarization procedure correctly, this approach forces the cross-network signal to be zero (up to some noise introduced by the binarization of network-wise signals), which is essentially equivalent to removing the global signal. Please clarify what the authors meant by stating that "this approach maintained a diverse range of binarized cortical states in data where the global signal was preserved" (page 4, lines 121-122).

      b) The authors might argue that they maintained a diverse range of cortical states by performing the binarization at each time point (rather than within each network). However, I believe this introduces another problem, because binarizing network-wise signals at each time point distorts the distribution of cortical states. For example, because the cross-network signal is effectively set to zero, the network cannot take certain states, such as all +1 or all −1. Similarly, this binarization biases the system toward states with similar numbers of +1s and −1s, rather than toward unbalanced states such as (+1, −1, −1, −1, −1, −1). These constraints and biases are not biological in origin but are simply artifacts of the binarization procedure. Importantly, the energy landscape and its derivatives (e.g., hard/easy transitions) are likely to be affected by these artifacts. I suggest that the authors try a more conventional binarization procedure (i.e., binarization within each network), which is more robust to such artifacts.

      Related to this point, I have a question regarding Figure S1, in which the authors plotted predicted versus empirical state probabilities. As argued above, some empirical state probabilities should be zero because of the binarization procedure. However, in Figure S1, I do not see data points corresponding to these states (i.e., there should be points on the y-axis). Did the authors plot only a subset of states in Figure S1? I believe that all states should be included. The correlation coefficient between empirical and predicted probabilities (and the accuracy) should also be calculated using all states.

      c) The current binarization procedure likely inflates non-neuronal noise and obscures the relationship between the true BOLD signal and its binarized representation. For example, consider two ROIs (A and B): both (+2%, +1%) and (+0.01%, −0.01%) in BOLD signal changes would be mapped to (+1, −1) after binarization. This suggests that qualitatively different signal magnitudes are treated identically. I believe that this issue could be alleviated if the authors were to binarize the signal within each network, rather than at each time point.

      (2) Major Comment 2:

      As the authors state (page 5, lines 145-148), the "energy" described in the energy landscape is not biological energy but rather a statistical transformation of probability distributions derived from the Boltzmann distribution. If this is the case, I believe that Figure 2A is potentially misleading and should be removed. This type of schematic may give the false impression that cortical state dynamics are governed by the energy landscape derived from the maximum entropy model (which is not validated).

    4. Reviewer #1 (Public review):

      The authors analysed large-scale brain-state dynamics while humans watched a short video. They sought to identify the role of thalamocortical interactions.

      Major concerns

      (1) Rationale for using the naturalistic stimulus

      In terms of brain state dynamics, previous studies have already reported large-scale neural dynamics by applying some data-driven analyses, like energy landscape analysis and Hidden Markov Model, to human fMRI/EEG data recorded during resting/task states. Considering such prior work, it'd be critical to provide sufficient biological rationales to perform a conceptually similar study in a naturalistic condition, i.e., not just "because no previous work has been done". The authors would have to clarify what type of neural mechanisms could be missed in conventional resting-state studies using, say, energy landscape analysis, but could be revealed in the naturalistic condition.

      (2) Effects of the uniqueness of the visual stimulus and reproducibility

      One of the main drawbacks of the naturalistic condition is the unexpected effects of the stimuli. That is, this study looked into the data recorded from participants who were watching Sherlock, but what would happen to the results if we analyzed the brain activity data obtained from individuals who were watching different movies? To ensure the generalizability of the current findings, it would be necessary to demonstrate qualitative reproducibility of the current observations by analysing different datasets that employed different movie stimuli. In fact, it'd be possible to find such open datasets, like www.nature.com/articles/s41597-023-02458-8.

      (3) Spatial accuracy of the "Thalamic circuit" definition

      One of the main claims of this study heavily relies on the accuracy of the localization of two different thalamic architectures: matrix and core. Given the conventional or relatively low spatial resolution of the fMRI data acquisition (3x3x3 mm^3), it appears to be critically essential to demonstrate that the current analysis accurately distinguished fMRI signals between the matrix and core parts of the thalamus for each individual.

      (4) More detailed analysis of the thalamic circuits

      In addition, if such thalamic localisation is accurate enough, it would be greatly appreciated if the authors perform similar comparisons not only between the matrix and core architectures but also between different nuclei. For example, anterior, medial, and lateral groups (e.g., pulvinar group). Such an investigation would meet the expectations of readers who presume some microscopic circuit-level findings.

      (5) Rationale for different time window lengths

      The authors adopted two different time window lengths to examine the neural dynamics. First, they used a 21-TR window for signal normalisation. Then, they narrowed down the window length to 13-TR periods for the following statistical evaluation. Such a seemingly arbitrary choice of the shorter time window might be misunderstood as a measure to relax the threshold for the correction of multiple comparisons. Therefore, it'd be appreciated if the authors stuck to the original 21-TR time window and performed statistical evaluations based on the setting.

      (6) Temporal resolution

      After identifying brain states with energy landscape analysis, this study investigated the brain state transitions by directly looking into the fMRI signal changes. This manner seems to implicitly assume that no significant state changes happen in one TR (=1.5sec), which needs sufficient validation. Otherwise, like previous studies, it'd be highly recommended to conduct different analyses (e.g., random-walk simulation) to address and circumvent this problem.

    5. eLife Assessment

      This study investigated the dynamics of human cortical network activity with functional magnetic resonance imaging during movie watching and studied the modulation of these dynamics by subcortical areas using an energy landscape mapping method. The authors identified a set of brain states defined at the level of canonical functional networks, quantified how the brain transitions between these states, and related transition probabilities to inter-subject correlations in evoked brain activity. A major emphasis of the work concerns the role of the thalamus, which shows transition-linked activity changes and dynamic connectivity patterns, including differential involvement of parvalbumin- and calbindin-associated thalamic subdivisions. The analytical strategy developed in this study is applicable to other task- and resting-state fMRI data and would be useful for many researchers in the field; however, the evidence supporting the overall conclusions remains incomplete due to limitations associated with fMRI data preprocessing, analysis, and cross-validation.

    1. eLife Assessment

      This study investigates whether heavy metal stress can induce maize-like phenotypic and molecular responses in teosinte and whether these responses overlap with genomic regions implicated in domestication. By combining copper and cadmium treatments with quantitative phenotyping, gene-expression analyses, and expanded assessments of nucleotide diversity across a key chromosome 5 interval, the authors provide an integrated view of how abiotic stress responses intersect with domestication-related traits. The significance of the findings is valuable, as the work offers meaningful insights for the subfield of maize evolution and stress biology by extending heavy-metal response analyses to teosinte and linking them to domestication-associated loci, although the evolutionary implications remain indirect. The strength of evidence is solid, with appropriately designed and quantitatively supported experiments that broadly support the claims, but do not yet establish a causal or historical role for heavy metal stress in domestication.

    2. Reviewer #1 (Public review):

      In this study, Acosta-Bayona et al. investigate whether heavy metal (HM) stress can induce phenotypic and molecular responses in teosinte parviglumis that resemble traits associated with domestication, and whether genes within a domestication-linked region show patterns consistent with reduced genetic diversity and signatures of selection. The authors exposed both maize and teosinte parviglumis to a fixed dose of copper and cadmium, representing an essential and a non-essential element, respectively. They assessed shoot and root phenotypic traits at a defined developmental stage in plants exposed to HM stress versus control. They then integrated these phenotypic results with expanded analyses of genetic diversity across a broader chromosome 5 interval, which was previously associated with domestication-related traits. Overall, the revisions improve the clarity and the robustness of the analyses, as well as make the conclusions better aligned with the evidence.

      The revised manuscript is strengthened by several additions.

      (1) The authors broaden the genetic analysis beyond a small set of loci and evaluate nucleotide variability across several genes within the linked chromosome 5 interval, which improves the interpretation of diversity patterns and reduces concerns about a too narrow locus selection or regional linkage effects driving the conclusions.

      (2) The expression analyses are now presented with clearer methodological separation and stronger quantitative support. Now, tissue/developmental RT-PCR profiles are distinguished from real-time qPCR assays used to test HM-induced expression changes, with appropriate replication and statistical reporting.

      (3) The authors include a transcriptome-scale element by analyzing multiple published and publicly available HM-stress transcriptome datasets and reporting shared differentially expressed genes across studies, which supports the interpretation that the observed expression changes align with broader HM-responsive transcriptional programs.

      However, it remains challenging to distinguish which aspects of the HM responses observed here represent novel insight versus patterns already reported in maize HM-stress studies. In addition, the link between HM exposure and domestication history remains indirect: reduced diversity patterns and stress-responsive expression do not, on their own, demonstrate human-driven selection or a specific paleoenvironmental scenario, and alternative explanations related to general stress responses or regional evolutionary processes cannot be fully excluded.

    3. Reviewer #2 (Public review):

      Summary:

      This work explores the phenotypic developmental traits associated with Cu and Cd responses in teosinte parviglumis, a species evolutionary related to extant maize crops. Cu and Cd could serve as a proxy for heavy metals present in the soils. The manuscript explores potential genetic loci associated with heavy metal responses and domestication. This includes heavy metal transporters which are unregulated during stress. To study that, authors compare the plant architecture of maize defective in ZmHMA1 and speculate on the association of heavy metals with domestication.

      Strengths:

      Very few studies covered the responses of teosintes to heavy metal stress. The physiological function of ZmHMA1 in maize is also valuable. The idea and speculation section is interesting and well-implemented.

      Weaknesses:

      Some conclusions are still speculative and future experiment could provide more clues about potential molecular mechanisms for the ideas proposed here.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1(Public review):

      In this study, Acosta-Bayona et al. aim to better understand how environmental conditions could have influenced specific gene functions that may have been selected for during the domestication of teosinte parviglumis into domesticated maize. The authors are particularly interested in identifying the initial phenotypic changes that led to the original divergence of these two subspecies. They selected heavy metal (HM) stress as the condition to investigate. While the justification for this choice remains speculative, paleoenvironmental data would add value; the authors hypothesize that volcanic activity near the region of origin could have played a role.

      The justification of choice to investigate the effects of heavy metal stress is not speculative. As mentioned now in the Abstract, the elucidation of the genome from the Palomero toluqueño maize landrace revealed heavy metal effects during domestication (Vielle-Calzada et al., Science 2009). Our aim was to test the hypothesis that heavy metal (HM) stress influenced the evolutionary transition of teosinte parviglumis to maize.

      (1) Although the paper presents some interesting findings, it is difficult to distinguish which observations are novel versus already known in the literature regarding maize HM stress responses. The rationale behind focusing on specific loci is often lacking. For example, a statistically significant region identified via LOD score on chromosome 5 contains over 50 genes, yet the authors focus on three known HM-related genes without discussing others in the region. It is unclear why ZmHMA1 was selected for mutagenesis over ZmHMA7 or ZmSKUs5.

      We appreciated the depth and value of this comment.

      Maize phenotypic responses to sublethal concentrations to heavy metals – copper (Cu) and cadmium (Cd) in particular - are well characterized and published, and in agreement with our results. In the first section of the Results (pgs 7 and 8), we added pertinent references to clearly show which observations are already known. By contrast, teosinte parviglumis responses are in all cases novel. To our knowledge this is the first study that analyzed in detail the phenotypic response of teosinte to sublethal concentrations of heavy metals, specifically Cu and Cd. We have now emphasized the novelty of these observations (pg 8).

      To address the fact that we only focused on three known HM-related genes without discussing others in the statistically significant region identified via LOD score on chr.5, we have added a full section that reads as follows (pgs. 11 to 13 of the new version):

      “Large-scale genomic and transcriptomic comparisons indicate that many HM response genes were positively selected across the maize genome.

      To expand the results well beyond the analysis of the three genes previously described, we performed a detailed analysis of genetic diversity across the 11.47 Mb genomic region comprised between Z_mSKUs5_ and ZmHMA1. This additional analysis reveals general tendencies in the quantity and nature of loci that were affected by positive selection during the teosinte parviglumis to maize transition in a region identified via LOD score on chr.5. We compared nucleotide variability by using 100 bp bins covering loci composed of two 30 Kb segments up and downstream of coding sequences, respectively, and the coding sequence itself, for 173 genes present within the genomic region comprised between ZmSKUs5 and ZmHMA (Figure S1 and Supplementary File 6). Two types of statistical tests (ANOVA and Wilcoxon) were applied to nucleotide variability comparisons using the entirety of each locus. The Benjamini-Hochber procedure allowed an estimation of the false discovery rate (FDR<0.05) to avoid type I errors (false positives). Although some individual loci appear as differently classified depending on the statistical test applied (22 out of 173 loci), the general differences in nucleotide variability are consistently maintained within the subregions described below. We found that 166 out of 173 loci show signatures of positive selection and are roughly organized in five independent subregions of variable length. The first six loci are consecutively ordered in a 402 Kb subregion that includes ZmSKUs5. A second group of 13 consecutive loci expands over a 1.44 Mb subregion that contains NRAMP ALUMINUM TRANSPORTER1, also involved in HM response through uptake of divalent ions. A third group of 17 consecutive loci expands over 1.28 Mb; eleven contain genes encoding for uncharacterized proteins. The fourth group is composed of 57 consecutive loci expanding over 3.22 Mb and contains genes encoding for DEFECTIVE KERNEL55, AUXIN RESPONSE FACTOR16, and peroxydases involved in responses to oxydative stress. The fifth group contains 12 consecutive loci expanding over 713 Kb and contains ZmHMA1. An additional segment of approximately 1.17 Mb and containing 25 consecutive loci that were positively selected expands away from the ZmSKUs5-ZmHMA1 segment; it also contains several genes encoding for peroxydases. Although multiple loci include genes that could be involved in abiotic stress and oxidative responses, these results suggest that multiple factors other than HM stress could have played a role in the evolutionary mechanisms that affected the genetic diversity of chr.5 during the teosinte parviglumis to maize transition.

      To further analyze the possibility that HM response could have played a role in maize emergence and subsequent domestication, we analyzed large scale transcriptomic data corresponding to independent experiments aiming at understanding the response of maize roots to HM stress. Six available transcriptomes were selected for in-depth analysis because they presented a fold change strictly higher than 1, and their results were supported by false discovery rates (FDR<0.05). These six transcriptomes (Table S5) included HM response datasets corresponding to growth conditions that not only incorporated Cu, but also lead (Pb) and chromium (Cr) that were not included in the substrate of our experiments. Transcriptional profiles were obtained from roots of plants at different stages: maize seedlings (Shen et al., 2012; Gao et al., 2015; Zhang et al., 2024a), three week old plantlets (Yang et al., 2023), and plants at V2 stage (Zhang et al., 2024b; Fengxia et al., 2025). A total of 120 genes shared by all six transcriptomes were found to be differentially expressed under HM stress conditions (66 upegulated and 54 downregulated; Figure S3), including ZmSKUs5, ZmHMA1 and ZmHMA7; 52 of them (43.3%) are located in maize loci showing less than 70% of the nucleotide variability found in teosinte parviglumis, suggesting that they were affected by positive selection (Yamasaki et al., 2005; Supplementary File 7). Of 18 mapping in chr.5, twelve are within the 82 cM that fractionates into multiple QTLs under selection during the parviglumis to maize transition. Interestingly, five additional loci containing HM response genes completely lack SNPs within their total length in both parviglumis and maize, and 19 additional loci lack SNPs in at least one 30 Kb segment or their coding region (Supplementary File 7), suggesting the frequent presence of ultraconserved genomic regions in many loci containing HM response genes. When this same analysis was conducted in a set of loci comprising 63 genes previously identified as differentially expressed in response to abiotic stress not directly related to HM responses (hypoxia; nutritional deficiency; soil alkalinity; drought; soil salinity), 18 loci (28.6%) showed less than 70% of the nucleotide variability found in teosinte parviglumis. Only one of them maps in chr.5 and none contained segments or coding regions lacking SNPs in parviglumis or maize. These results suggest that in contrast to other types of abiotic stress response genes, loci comprising a large set of genes that unambiguously respond to HM stress caused by chemical elements of diverse nature were affected by positive selection during the parviglumis to maize transition, irrespectively of their position in the genome.”

      The detailed analysis of genetic diversity across 11.47 Mb of chr.5 in the genomic region comprised between ZmSKUs5 and ZmHMA1 in presented as Supplementary File 6.

      The analysis of genetic diversity in loci encompassing heavy metal response genes shared by six transcriptomes and abiotic stress controls are described in Supplementary File 7.

      In the Discussion (pgs. 21 and 22), we added a paragraph section that reads as follows:

      “Although loss of genetic diversity is usually the result of human selection during domestication, it can also represent a consequence of natural selective pressures favoring fitness of specific teosinte parviglumis allelic variants better adapted to environmental changes and subsequently affected by human selection during the domestication process. This possibility is reflected by widely spread selective sweeps affecting a large portion of chr.5 that contains hundreds of genes showing signatures of positive selection. The analysis of 11.47 Mb covering the ZmHMA1ZmSKUs5 segment confirms the presence of large but discrete genomic subregions that were positively selected during the teosinte parviglumis to maize transition. Although several contain genes involved in HM response and oxidative stress, the diversity of gene functions does not necessarily favor abiotic stress over other factors that could be at the origin of selective forces affecting these regions. By contrast, a large scale transcriptomic survey indicates that genes consistently responding to HMs (Cu, Cd, Pb and Cr ) show signatures of positive selection at unusual high frequencies (43.3%) as compared to loci containing genes responding to other types of abiotic stress (28.6%). Our identification of HM response genes affected by positive selection is far from being exhaustive. Nevertheless, it agrees with the expected effects of a widespread selective sweep caused by environmental changes that influenced the parviglumis to maize transition at the genetic level. Of intriguing interest are 24 loci that partially or completely lack SNPs in both teosinte parviglumis and maize, suggesting possible genetic bottlenecks occurred before the teosinte to maize transition. Examples of other edaphological factors driving genetic divergence either in the teosintes or maize include local adaptation to phosphorus concentration in mexicana and parviglumis (Aguirre-Liguori et al. 2019), and fast maize adaptation to changing iron availability through the action of genes involved in its mobilization, uptake, and transport (Benke and Stich 2011). Our results reveal a teosinte parviglumis environmental plasticity that could be related to the function of HM response genes positively selected during the teosinte parviglumis to maize transition. Previous studies have demonstrated that transposable elements (TEs) contribute to activation of maize genes in response to abiotic stress, affecting up to 20% of the genes upregulated in response to abiotic stress, and as many as 33% of genes that are only expressed in response to stress (Makarevitch et al., 2015). It is therefore possible that the HM response of some specific genes that influenced maize emergence or domestication could be mediated by TEs influencing or driving their transcriptional regulation.”

      The mutagenic analysis of ZmHMA7 and ZmSKUs5 will be included in a different publication.

      (2) The idea that HM stress impacted gene function and influenced human selection during domestication is of interest. However, the data presented do not convincingly link environmental factors with human-driven selection or the paleoenvironmental context of the transition. While lower nucleotide diversity values in maize could suggest selective pressure, it is not sufficient to infer human selection and could be due to other evolutionary processes. It is also unclear whether the statistical analysis was robust enough to rule out bias from a narrow locus selection. Furthermore, the addition of paleoclimate records (Paleoenvironmental Data Sources as a starting point) or conducting ecological niche modeling or crop growth models incorporating climate and soil scenarios would strengthen the arguments.

      We think that the detailed analysis of genetic diversity across 11.46 Mb covering the ZmSKUs5 to ZmHMA1 genomic segment – and its statistical validation - provides a precise understanding of the selective sweep dimensions in chr.5.

      We do agree that lower nucleotide diversity values in maize are not sufficient to infer human selection. Because many HM response loci show unusually low nucleotide variability in teosinte parviglumis (see the results of the transcriptomic analysis presented above), we cannot discard the possibility that natural selection forces related to environmental changes could have affected native populations of teosinte parviglumis.

      To further explore the link between environmental factors, natural or human-driven selection, and the paleoenvironmental context of the parviglumis to maize transition, we revised paleoenvironmental and geological records and added results in two sections that read as follows (pgs. 17 to 20):

      “Paleoenvironmental studies reveal periods of climatic instability in the presumed region of maize emergence during the early Holocene.

      It is well accepted that temperature fluctuations, volcanism and anthropogenic impact shaped the distribution and abundance of plant species in the Transmexican Volcanic Belt (TMVB) during the last 14,000 years (Torrescano-Valle et al. 2019). The TMVB has produced close to 8000 volcanic structures (Ferrari et al., 2011), transforming the relief multiple times, and causing hydrographic and soil changes that actively modified the distribution and composition of plant communities in Central Mexico. Detailed paleoenvironmental data for the Pleistocene and Holocene is available for several lacustrine zones located within the 50 to 100 km range of the region currently considered the cradle of maize domestication (Matzuoka et al. 2002; Figure 5a). In Lake Zirahuén (102°44′ W; 19°26′ N and approximately 2075 meters above sea level; index [i] in Figure 5a), pollen, microcharcoal and magnetic susceptibility analyses of two sedimentary sequences reveals three periods of major ecological change during the early and middle Holocene.

      Between 9500 and 9000 calibrated years before present (cal yr BP), pine forests seem to have been associated with summer insolation increases. A second peak of forest change occurred at around 8200 cal yr BP, coinciding with cold oscillations documented in the North Atlantic. Finally, events occurred between 7500 and 7100 cal yr BP shows an abrupt change in the plant community related to humid Holocene climates and a presumed volcanic event (Lozano-García et al., 2013). The environmental history of the central Balsas watershed has also been documented by pollen, charcoal, and sedimentary analysis conducted in three lakes and a swamp of the Iguala valley (Piperno et al. 2007). Paleoecological records of lake Ixtacyola (8°20N, 99°35W and approximately 720 meters above sea level; index [ii] in Figure 5a) and lake Ixtapa (8°21N, 99°26W) indicate that an important increase in temperature and precipitation occurred between 13000 and 10000 cal yr BP. The pollen record of Ixtacyola showed that members of the genus Zea were already part of the vegetation coverage by 12900 to 13000 cal yr BP, suggesting that some teosintes – likely including parviglumis - were commonly found at elevation areas where they do not presently occur. Lake Almoloya (also named Chignahuapan; 19°05N, 99°20E and approximately 2575 meters above sea level; index [iii] in Figure 5a) in the upper Lerma basin is only 20 Km from the crater of the Nevado de Toluca that is responsible for creating the late Pleistocene Upper Toluca Pumice layer over which the Lerma basin is deposited. Pollen records indicate the presence of Zea species by 11080 to 10780 cal yr BP. As for other locations, an important period of climatic instability prevailed between 11500 and 8500 cal yr BP (Ludlow-Wiechers et al., 2005). Humidity fluctuations occurred until 8000 cal yr BP, with a stable temperate climate between 8500 and 5000 cal yr BP. Although pollen and diatom studies are often difficult to interpret at a regional scale, the overall results presented above suggest consistent periods of Zea plants present in periods of environmental and climatic instability that correlate with the history of volcanic activity during the early Holocene, as described in the next section.

      Temporal and geographical convergence between volcanic eruptions and maize emergence during the Holocene.

      Current evidence indicates that the emergence and domestication of maize initiated in Mesoamerica some time around 9,000 yr BP (Matsuoka et al. 2002). The current location of teosinte parviglumis populations that are phylogenetically most closely allied with maize are currently distributed in a region located between the Michoacan-Guanajuato Volcanic Field (MGVF) at their northwest, and the Nevado de Toluca and Popocatéptl volcanoes at their east and northeast (Figure 5a; Matsuoka et al. 2002). Precise records of field data indicate that ten accessions were collected in the Balsas river drainage near Teloloapan and Sierra de Huautla (Guerrero), at approximately 100 km south of the Nevado de Toluca crater. Three other accessions were collected near Tejupilco de Hidalgo and Zacazonapan (Estado de México), at approximately 50 to 60 km from the Nevado de Toluca crater (8762, JSG y LOS-161, and JSG-391). And four other accessions were located in Michoacan, at a location within the MGVF (accession 8763), or at mid-distance between the MGVF and the Nevado de Toluca crater (accessions JSG y LOS-130, 8761, and 8766).

      The most important source of HMs in ancient soils of Mesoamerica is TMBV-dependent volcanic activity through short- and long-term effects related to lava deposits, ores, hydrothermal flow, and ash (Torrescano-Valle et al. 2019). The Nevado de Toluca volcano produced one of the most powerful eruptions from central Mesoamerica in the Holocene, giving rise to the Upper Toluca Pumice deposit at 12621 to 12025 cal yr BP (Arce et al., 2003; Figure 5b). The pumice fallout blanketed the Lerma and Mexico basins with 40 cm of coarse ash (Bloomfield and Valastro 1977; Arce et al. 2003). A second eruption dated by 36Cl exposure occurred at 9700 cal yr BP (Arce et al. 2003; Figure 5b), and the most recent eruption occurred at 3580 to 3831 cal yr BP (Macías et al. 1997). During the early and middle Holocene, the Popocatéptl volcano produced at least four eruptions dated 13037-12060, 10775–9564, 8328-7591, and 6262-5318 cal yr BP (Siebe et al. 1997); three other important eruptions occurred during the late Holocene, between 2713 and 733 cal yr BP (Siebe and Macías, 2006). In addition, the MGFV is a monogenetic volcanic field for which 23 independent eruptions have been documented during the Holocene, 21 of them located towards the southern part of the field, in close proximity to the region harboring some of the teosinte parviglumis populations most closely related to maize. Three of these eruptions occurred in the early Holocene (El Huanillo 1130 to 9688 cal yr BP; La Taza 10649 to 10300 cal yr BP; Cerro Grande 10173 to 9502 cal yr BP; Figure 5b), and three others during the initial period of the middle Holocene, between 8400 and 7696 cal yr BP (La Mina, Los Caballos, and Cerro Amarillo; Figure 5b). On average, a new volcano forms every ~435 years in the MGFV (Macías and Arce, 2019). No less than 16 other eruptions occurred between 7159 cal yr BP and the present time (Figure 5b). Soils of volcanic origin (andosols) are currently distributed in regions north-west from the Nevado de Toluca and Popocatéptl craters, in close proximity with teosinte parviglumis populations most closely related to maize (Figure S5). Although modern distribution of teosinte populations may differ from their distribution around 9000 yr BP, and unknown populations more closely related to maize may yet to be discovered, this data indicates that the date and region where maize emerged is convergent with the dates and locations of several volcanic eruptions occurred during the Holocene in that same region.”

      (3) Despite the interest in examining HM stress in maize and the presence of a pleiotropic phenotype, the assessment of the impact of gene expression is limited. The authors rely on qPCR for two ZmHMA genes and the locus tb1, known to be associated with maize architecture. A transcriptomic analysis would be necessary to 1- strengthen the proposed connection and 2- identify other genes with linked QTLs, such as those in the short arm of chromosome 5.

      Real-time qPCR is an accurate and reliable approach to assess the expression of specific genes such as ZMHMA1 and Tb1, but we agree that our results do not allow to establish a direct regulatory link between the function of Tb1, the pleiotropic parviglumis phenotype under HM stress, and the function of ZmHMA1. We also concede that the large transcriptional analysis of HM response in maize (presented above) does not allow to elucidate a possible connection between these two genes. We have substantially downplayed our conclusion in this section by modifying the end of the section in pg. 17, that now reads:

      “These results do not allow to directly link the regulation of ZmHMA1 expression to the function of Tb1; however, they open an opportunity to further investigate the possibility that under HM stress, the formation of secondary ramifications in teosinte parviglumis could be repressed by transcription factors of the TCP family, including Tb1.”

      This is also emphasized in the Discussion (pg 21) as follows:

      “Under HM stress, we also show that Tb1 is overexpressed in the apical meristem of teosinte parviglumis, suggesting that formation of secondary ramifications is repressed by Tb1 function under HM stress, as in extant maize. At this stage we cannot discard the possibility that Tb1 upregulation in parviglumis reflects a more generalized response to abiotic stress; however, the expression ZmHMA1 is downregulated in W22 wild-type maize meristems in the presence of HMs but upregulated in teosinte parviglumis meristems, suggesting that a specific regulatory shift relating HM responses and ZmHMA1 function occurred during the teosinte parviglumis to maize transition.”

      On the other hand, the transcriptional analysis the identification of 52 additional HM response genes showing signatures of positive selection occurred during the parviglumis to maize transition; 12 of them map to chr.5 within the region having linked QTLs within the short arm of chr.5. So far, genes involved in HM response and oxidative stress represent the most prevalent class of genes identified within the genomic region showing pleiotropic effects on domestication and multiple linked QTLs in chr.5.

      Reviewer #2 (Public review):

      Summary:

      This work explores the phenotypic developmental traits associated with Cu and Cd responses in teosinte parviglumis, a species evolutionary related to extant maize crops. Cu and Cd could serve as a proxy for heavy metals present in the soils. The manuscript explores potential genetic loci associated with heavy metal responses and domestication identified in previous studies. This includes heavy metal transporters, which are unregulated during stress. To study that, the authors compare the plant architecture of maize defective in ZmHMA1 and speculate on its association with domestication.

      Strengths:

      Very few studies covered the responses of teosintes to heavy metal stress. The physiological function of ZmHMA1 in maize also gives some novelty in this study. The idea and speculation section is interesting and well-implemented.

      Weaknesses:

      The authors explored Cu/Cd stress but not a more comprehensive panel of heavy metals, making the implications of this study quite narrow. Some techniques used, such as end-point RT-PCR and qPCR, are substandard for the field. The phenotypic changes explored are not clearly connected with the potential genetic mechanisms associated with them, with the exception of nodal roots. If teosintes in response to heavy metal have phenotypic similarity with modern landraces of maize, then heavy metal stress might have been a confounding factor in the selection of maize and not a potential driving factor. Similar to the positive selection of ZmHMA1 and its phenotypic traits. In that sense, there is no clear hypothesis of what the authors are looking for in this study, and it is hard to make conclusions based on the provided results to understand its importance. The authors do not provide any clear data on the potential influence of heavy metals in the field during the domestication of maize. The potential role of Tb-1 is not very clear either.

      Thank you for these comments. We have now emphasized our hypothesis in the abstract and the last paragraph of the Introduction (pg. 6):

      “To test the hypothesis that heavy metal (HM) stress influenced the evolutionary transition of teosinte to maize, we exposed both subspecies to sublethal concentrations of copper and cadmium etc…”

      A comprehensive panel of heavy metals would not be more accurate in terms of simulating the composition of soils evolving across 9,000 years in the region where maize presumably emerged. Copper (Cu) and cadmium (Cu) correspond each to a different affinity group for proteins of the ZmHMA family. ZmHMA1 has preferential affinity for Cu and Ag (silver), whereas ZmHMA7 has preferential affinity to Cd, Zn (zinc), Co (cobalt), and Pb (lead). Since these P1b-ATPase transporters mediate the movement of divalent cations, their function remains consistent regardless of the specific metal tested, provided it belongs to the respective affinity group. By applying sublethal concentrations of Cd (16 mg/kg) and Cu (400 mg/kg), we caused a measurable physiological response while allowing plants to complete their life cycle, including the reproductive phase, facilitating a comprehensive analysis of metal stress adaptation. Whereas higher doses impair flowering or are lethal, lower Cu/Cd concentrations do not consistently show conventional phenotypic responses such as reduced plant growth (AbdElgawad et al. 2020; Atta et al., 2023)

      Based on comments by both reviewers, we present now a large transcriptional analysis that incorporates HM responses to lead (Pb) and chromium (Cr), in addition to Cu. Results show that many genes responding to Pb and Cr were also positively selected across the maize genome, suggesting that HM stress led to a ubiquitous rather than a specific evolutionary response to heavy metals (please see our response to Reviewer#1 and sections in pgs. 11 to 13) .

      Real-time qPCR is an accurate and reliable approach to assess the expression of specific genes such as ZMHMA1 and Tb1, but we agree that our results do not allow to establish a direct regulatory link between the function of Tb1, the pleiotropic parviglumis phenotype under HM stress, and the function of ZmHMA1. We also concede that the large transcriptional analysis of HM response in maize (presented above) does not allow to elucidate a possible connection between these two genes. Therefore, we have substantially downplayed our conclusion in this section by modifying the end of the section in pg. 17, that now reads:

      “These results do not allow to directly link the regulation of ZmHMA1 expression to the function of Tb1; however, they open an opportunity to further investigate the possibility that under HM stress, the formation of secondary ramifications in teosinte parviglumis could be repressed by transcription factors of the TCP family, including Tb1.”

      There are two phenotypic changes clearly connected with the genetic mechanisms involved in the parviglumis to maize transition: plant height and the number of seminal roots (not nodal roots). These changes have been now emphasized in the Abstract and the description of the results.

      Regarding the possibility for HM stress to represent a confounding factor in the selection of maize and not a driving factor, we expanded the genomic analysis of genetic diversity well beyond the analysis of the three genes under initial study, to cover a segment of 11.47 Mb comprised between ZmSKUs5 and ZmHMA1. We compared nucleotide variability by using 100 bp bins covering loci composed of two 30 Kb segments up and downstream of coding sequences, respectively, and the coding sequence itself, for 173 genes present within the genomic region comprised between ZmSKUs5 and ZmHMA (Figure S1 and Supplementary File 6). The full analysis is presented in a new section pgs. 11 and 12. We found that 166 out of 173 loci show signatures of positive selection and are roughly organized in five independent subregions of variable length. Four out of five subregions contain more than one HM or oxidative stress response gene within loci showing signatures of positive selection. Although multiple factors other than HM stress could have played a role in the evolutionary mechanisms that affected the genetic diversity of chr.5, large scale transcriptomic data corresponding to independent experiments aiming at understanding the response of maize roots to HM stress allowed the identification of 49 additional HM response genes within loci showing positive selection across the genome, a proportion (43.3%) far greater than the proportion of loci containing response genes to other types of abiotic stress not related to HMs (28.6%). These results are described in detail in pgs. 12 and 13 (Figure S3 and Supplementary File 7). These results provide strong evidence in favor of HM stress and not another factor driving positive selection.

      We now provide precise and pertinent paleoenvironmental data on the potential influence of heavy metals in the field. In sections pgs. 17 to 20 we review paleoenvironmental studies revealing periods of climatic instability in the presumed region of maize emergence during the early Holocene, and data indicating that the date and region where maize emerged is convergent with the dates and locations of several volcanic eruptions occurred during the early and middle Holocene in that same region. Please see responses to Reviewer#1 for details.

      We agree that our results do not allow to establish a direct regulatory link between the function of Tb1, the pleiotropic parviglumis phenotype under HM stress, and the function of ZmHMA1. We also concede that the large transcriptional analysis of HM response in maize (presented above) does not allow to elucidate a possible connection between these two genes. Therefore, we have substantially downplayed our conclusion in this section by modifying the end of the section in pg. 17, that now reads:

      “These results do not allow to directly link the regulation of ZmHMA1 expression to the function of Tb1; however, they open an opportunity to further investigate the possibility that under HM stress, the formation of secondary ramifications in teosinte parviglumis could be repressed by transcription factors of the TCP family, including Tb1.”

      This is also emphasized in the Discussion (pg 21) as follows:

      “Under HM stress, we also show that Tb1 is overexpressed in the apical meristem of teosinte parviglumis, suggesting that formation of secondary ramifications is repressed by Tb1 function under HM stress, as in extant maize. At this stage we cannot discard the possibility that Tb1 upregulation in parviglumis reflects a more generalized response to abiotic stress; however, the expression ZmHMA1 is downregulated in W22 wild-type maize meristems in the presence of HMs but upregulated in teosinte parviglumis meristems, suggesting that a specific regulatory shift relating HM responses and ZmHMA1 function occurred during the teosinte parviglumis to maize transition.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      While the dataset generated provides an interesting foundation for hypothesis testing on HM stress and domestication, the current data do not sufficiently support the conclusions of the manuscript.

      (1) The description of maize and teosinte architecture under HM stress is well presented.

      However, traits like shoot height, leaf size reduction, and biomass loss also occur under other environmental stresses such as drought and salinity. Additional evidence beyond shoot and root architecture would help validate the link between tb1 expression and specific ZmHMA genes under HM stress, or whether it reflects a more generalized stress response.

      We have already addressed in detail this point in the public response to Reviewer#1.

      (2) The nucleotide variability analysis is interesting, but I would have liked to see additional information to clarify the choice of the data selection and the strength of the conclusions with human selection.

      We have already addressed in detail this point in the public response to Reviewer#1.

      a) The choice of Tripsacum dactyloides as the outgroup to determine nucleotide variability seems to be distant, and I wonder whether other combinations with a closer outgroup or multiple outgroups were tried to provide a more accurate context.

      Nucleotide variability in Tripsacum dactyloides is used to graphically illustrate an external reference and not as an outgroup in the extended analysis of genetic diversity at the locus and genomic level. We did not used Tripsacum dactyloides as an outgroup in our statisticalm analysis. We could have indeed a closer teosinte subspecies as an outgroup, but at this stage no data warrants that environmentally-related selective pressures could have affected genetic diversite in other teosintes. This possibility in currently being investigated.

      b) Evolutionary differences not related to human influence could affect the results. The phrase "order of magnitude difference in π values" needs statistical validation (e.g., confidence intervals, p-values).

      We agree and have eliminated the sentence, as it is no longer relevant at the light of the detailed genomic analysis of genetic diversity prsented in Supplementary File 6.

      c) The comparison with ZmGLB1, a neutral control locus, suggests that domestication-related changes in nucleotide variability are specific to the three candidate genes. However, the concept of neutrality is complex, and while ZmGLB1 may be considered neutral in this case, the argument does not address the possibility of other factors, such as linked selection, that could influence variability in these genes. Referencing Hufford et al. is insufficient and would require a deeper argument.

      We also agree with this comment. We think that the influence and consequences of linked selection are now well documented for 11.46 Mb analyzed in chr.5 (pgs 11 and 12) in the main text and Supplementary File 6).

      (3) The statement: "Our evidence indicates that HM stress revealed a teosinte parviglumis environmental plasticity that is directly related to the function of specific HM response genes that were affected by domestication through human selection" is not supported by the presented data. The rationale for the specific Cd/Cu dosage used is unclear. A dose-response gradient would better demonstrate the nature and strength of the plastic response.

      Previous reports support the rationale for the specific HM dosage in this study; Cu/Cd dosage response gradients have been conducted in maize (AbdElgawad et al. 2020; Atta et al., 202), but since no studies have been conducted in teosinte, we reasoned that it was important to apply the same treatment to both subspecies. We have now emphasized this rationale by adding the following in pg XX: “Whereas higher doses impair flowering or are lethal, lower Cu/Cd concentrations do not consistently show conventional phenotypic responses such as reduced plant growth (AbdElgawad et al. 2020; Atta et al., 2023)”.

      We agree that the statement raised by the reviewer needed revision at the light of our results. We did revise the statement to accurately reflect our current evidence as follows: “Our results reveal a teosinte parviglumis environmental plasticity that is likely related to the function of HM response genes positively selected during the teosinte parviglumis to maize transition.”

      (4) In maize, TEs are known to influence gene expression under abiotic stress, including for tb1 (PMID: 25569788). Since the author appears to make a causative conclusion between ZmHMA1, TB1, and HM stress, I would have liked to see a whole-transcriptome analysis and not a curation of two genes to determine whether other factors, such as TEs, can have that would lead to similar outcomes.

      We agree that is definetely a possibility that we have not investigated at this stage. However, we added a pargraph to reflect this pertinent suggestion:

      “Previous studies have demonstrated that transposable elements (TEs) contribute to activation of maize genes in response to abiotic stress, affecting up to 20% of the genes upregulated in response to abiotic stress, and as many as 33% of genes that are only expressed in response to stress (Makarevitch et al., 2015). It is therefore possible that the HM response of some specific genes that influenced maize emergence or domestication could be mediated by TEs influencing or driving their transcriptional regulation.”

      (5) I would suggest that the authors carefully review the tables, figures, and the corresponding legends. For example :

      a) Table 2 is called before Table 1, I would therefore suggest changing the numbering to reflect the paragraph order.

      Thank you for your help, we did change the order of the Tables in the new version.

      b) In Table 2, it is not clear whether the P value applies to the mean difference between WT and the mutant zmhma1, either in the presence or the absence of heavy metals. In addition, the authors need to use the P-value to estimate the differences between WT in the absence vs presence of HM, and WT in the absence of HM versus the mutant in the absence of HM (idem for presence).

      We did address this issue in detail and added P-values and specific pairwise comparisons to that Table (now Table 1). Data are presented as mean ± standard deviation and were tested by a paired Student’s T-Test. When the effects were significant according to T-Test, the treatments were compared with the Welch two sample T-Test at P < 0.05.

      c) Table 1 and Table 2: Indicate what type of statistical test was used and the number of plants used for each experiment (n). Also, I recommend the use of scientific notation for the P-values.

      The statistical tests have now been indicated, scientific notation has been added to the P-values; the number of plants and biological replicates are indicated in the Methods section.

      d) Lines 202 and 204: I assume Table 1 should be called instead of Table 2.

      This error has been corrected.

      e) General: In the text, when significance is highlighted along with measurements, the p-value needs to be added.

      We have added the P-value along the measurement for all significant differences.

      f) In the text, it is also mentioned that "the expression of ZMHMA1 was significantly increased in the presence of HMs (Figure 3c)". We are looking here at an RT-PCR, which is qualitative and without a robust quantitative comparison and statistics, I cannot conclude this assessment based on the presented evidence. No statistical measure is indicated here.

      Panel 3c is not RT-PCR but a real-time qPCR, showing relative fold-change, normalized to actin, with a 3-technical triplicate per 3 biological replicates). We have added error bars (SD) and P-values represented by asterisks (calculated with Student's t statistic) to support significant differences (P<0.05 and P<0.01). ZmHMA1 expression was significantly increased in the presence of HMs only in teosinte; there was no significant difference in maize.

      g) Figure 3 should at least have the gene name in the figure to quickly understand the figure panel. The key conserved domains should also be identified.

      We agree and apologize for the omission. The gene names have been added adjacent to the structures.

      h) Sentence at lines 459-460 lacks words and punctuation.

      This unfortunate rror has also been corrected.

      i) Figure S1, the reference Lemmon and Doebley, 2024 should be Lemmon and Doebley, 2014 to harmonize with the text.

      The correct year is 2014. We have corrected this error.

      Reviewer #2 (Recommendations for the authors):

      (1) The narrative should be clearer, starting with a clearer hypothesis that is later sustained or not in the results, and then discussed in the idea and speculation section.

      Thank you for the comment. We have clarified the hypothesis, it is included in the abstract and the last paragraph of the Introduction. We hope it is now clear that the evidence presented supports our hypothesis

      (2) Focus more on traits that are relevant, for example, nodal and seminal roots.

      We modified the text to emphasize three relevant traits. In the case of teosinte under HM stress, absence of tillering and increase in the number of female inflorescences. In the case of the zmha1 mutant under HM stress, differences in the number of nodal roots, and differences in height.

      (3) RNA-seq in Cu/Cd stress could make the work much more useful and complete.

      As previously mentioned, we have incorporated a large scale transcriptional analysis on the basis of six transcriptomes statistically validated (Table S5). Please see sections pgs. 11 to 13 for details.

    1. eLife Assessment

      This is an important work implementing data mining methods on IMC data to discover spatial protein patterns related to the triple-negative breast cancer patients' chemotherapy response. The evidence supporting the claims of the authors is solid, although more detailed methodology clarification and validation are needed. While the accuracy of the methods is not very high, the work shows potential for translational application.

    2. Reviewer #1 (Public review):

      Summary:

      The study presents a computational pipeline for Imaging Mass Cytometry (IMC) analysis in triple-negative breast cancer (TNBC). Analyzing over 4 million cells from 63 patients, it uncovers a distinct spatial organization of cell types between chemotherapy responders and non-responders. Using graph neural networks, the framework predicts treatment response from pre-treatment samples and identifies key predictive protein markers and cell types associated with therapeutic outcomes.

      Strengths:

      (1) The study presents a novel framework leveraging Imaging Mass Cytometry (IMC) to investigate spatial patterns and differences among patient groups, which has been rarely explored.

      (2) It uncovers several compelling biological insights, providing a deeper understanding of the complex interactions within the tumor microenvironment.

      (3) The analysis pipeline is comprehensive, incorporating batch correction, cell type clustering, and a graph neural network based on cell-cell interactions to predict chemotherapy response, demonstrating methodological innovation and thoughtful design.

      Weaknesses:

      (1) Some figure references are inconsistent. For example, Figure 4C is cited on Page 11, but it does not appear in the manuscript.

      (2) Several explanations and methodological details related to the figures remain unclear. For instance, it is not explained how the overall abundance of cell types in Figures 3D and 3E was calculated, how relative abundance was derived, or how these calculations were adjusted when split by proliferation status. In Table 2, it seems that model performance is reported using different node features (protein abundance or cell type), but the text in the second paragraph suggests that both were used simultaneously. This inconsistency is confusing. Additionally, the process for constructing the cell-cell contact graph, including how edges are defined, should be described more clearly.

      (3) The GNN performance appears modest. An AUROC of 0.71 can indicate meaningful predictive power for chemotherapy response, but it remains moderate. Including a baseline comparison would help contextualize the model's effectiveness. Furthermore, the reported value of 0.58 in Table 2 is relatively low, and its meaning or implication is not clearly explained.

      (4) Some methodological choices are not well justified. For example, the rationale for selecting the Self-Organizing Map (SOM) for clustering over other clustering methods is not discussed.

      (5) The manuscript would benefit from a more explicit discussion of how studies using IMC-based spatial analysis relate to or differ from those employing spatial transcriptomics, particularly in terms of their interpretability.

    3. Reviewer #2 (Public review):

      Summary:

      The current research presents an end-to-end computational workflow for large-scale Imaging Mass Cytometry (IMC) data and applies it to 813 regions of interest (ROIs) comprising over 4 million cells from 63 TNBC patients. The study integrates image preprocessing (IMC-Denoise and CLAHE), cell segmentation (Mesmer), phenotyping (Pixie), spatial neighborhood analysis (SquidPy), collagen feature extraction, and graph neural network (GNN) modeling to identify spatial-molecular determinants of chemotherapy response. The major observations include T-cell exclusion in non-responders, persistent fibroblast-macrophage co-localization post-therapy, and the identification of B7H4, CD11b, CD366, and FOXP3 as predictive markers via GNN explainability analysis. The work has been implemented on a rich dataset and integrated with spatial and molecular information. The manuscript is well written and addresses an important clinical question.

      Strengths:

      (1) The study analyzes 813 ROIs and over 4 million cells, which is an exceptionally large IMC dataset, and allows the authors to investigate spatial determinants of chemotherapy response in TNBC with considerably more statistical power than prior studies. It clearly shows an integrated spatial-proteomic analysis on a large IMC dataset.

      (2) The work reveals robust, conceptually meaningful tissue patterns with CD8+ T-cell exclusion from tumor regions in non-responders and increased fibroblast-macrophage spatial proximity that align with existing biological understanding of immunosuppressive microenvironments in TNBC. These findings highlight spatial organization, rather than simple cell abundance, as a key differentiator of treatment response.

      (3) Novel use of GNNs for chemoresponse prediction in IMC data helps in demonstrating that spatial and molecular features captured simultaneously can provide predictive information about treatment response. The use of GNNExplainer adds interpretability of the selected features, identifying immune-regulatory markers such as B7H4, CD366, FOXP3, and CD11b as contributors to chemoresponse heterogeneity.

      (4) The work complements emerging spatial transcriptomic analyses from the same SMART cohort and provides a scalable computational framework likely to be useful to other IMC and spatial-omics researchers.

      Weaknesses:

      (1) Some analytical components lack quantitative validation, limiting confidence in specific claims, such as CLAHE-based batch correction applied before segmentation are evaluated primarily through qualitative visualization rather than quantitative metrics. Similarly, the cell-type annotations produced via Pixie and manual thresholds lack independent validation, making it harder to assess the accuracy of downstream spatial and predictive analyses.

      (2) Predictive modeling performance is moderate and may be influenced by dataset structure; the GNN achieves AUROC ~0.71, which is meaningful but still limited, and the absence of external validation or multiple cross-validation strategies raises questions about generalizability. The predictive insights are promising but not yet sufficiently strong to support clinical decision-making.

      (3) Pre- and post-treatment comparisons are constrained to non-responders and pathologist-selected ROIs.

    4. Reviewer #3 (Public review):

      Summary:

      Luque et al. proposed stratifying chemotherapy response in triple-negative breast cancer based on spatial protein patterns from IMC data. This proposed method combines GNN with GNNexplainer to identify several important protein markers and cell types related to chemotherapy. As one of the most significant challenges in cancer research, this work holds great potential for translational medicine.

      Strengths:

      (1) Targeting the invention decision-making of TNBC, one of the prominent challenges in the field.

      (2) Cutting-edge spatial proteomics data with enough cohort and clinical outcome.

      (3) Appropriate usage of cutting-edge machine learning models and comprehensive analysis.

      Weaknesses:

      (1) More scientific rigor is needed for machine learning benchmarking.

      (2) More depth is needed, comparing related works with using similar approaches.

    1. eLife Assessment

      This important study focuses on the molecular mechanisms underlying the generation of neuronal diversity. Taking advantage of a well-defined neuroblast lineage in Drosophila, the authors provide convincing evidence that two transcription factors of the conserved forkhead box (FOX) family offer a mechanistic link between transient spatial cues that specify neuroblast identity and terminal selector genes that define post-mitotic neuron identity. The findings will be of interest to developmental neurobiologists.

    2. Reviewer #1 (Public review):

      Summary:

      Lai and Doe address the integration of spatial information with temporal patterning and genes that specify cell fate. They identify the Forkhead transcription factor Fd4 as a lineage-restricted cell fate regulator that bridges transient spatial transcription factors to terminal selector genes in the developing Drosophila ventral nerve cord. The experimental evidence convincingly demonstrates that Fd4 is both necessary for late-born NB7-1 neurons, but also sufficient to transform other neural stem cell lineages toward the NB7-1 identity. This work addresses an important question that will be of interest to developmental neurobiologists: How cell identities defined by initial transient developmental cues can be maintained in the progeny cells, even if the molecular mechanism remains to be investigated. In addition, the study proposes a broader concept of lineage identity genes that could be utilized in other lineages and regions in the Drosophila nervous system and in other species.

      Strengths:

      While the spatial factors patterning the neuroepithelium to define the neuroblast lineages in the Drosophila ventral nerve cord are known, these factors are sometimes absent or not required during neurogenesis. In the current work, Lai and Doe identified Fd4 in the NB7-1 lineage that bridges this gap and explains how NB7-1 neurons are specified after Engrailed (En) and Vnd cease their expression. They show that Fd4 is transiently co-expressed with En and Vnd and are present in all nascent NB7-1 progenies. They further demonstrate that Fd4 is required for later-born NB7-1 progenies and sufficient for the induction of NB7-1 markers (Eve and Dbx) while repressing markers of other lineages when force-expressed in neural progenitors, e.g. in the NB5-6 lineage and in the NB7-3 lineage. They also demonstrate that, when Fd4 is ectopically expressed in NB7-3 and NB5-6 lineages, this leads to the ectopic generation of dorsal muscle-innervating neurons. The inclusion of functional validation using axon projections demonstrates that the transformed neurons acquire appropriate NB7-1 characteristics beyond just molecular markers. Quantitative analyses are thorough and well-presented for most experiments.

      Original weaknesses and potential extensions:

      (1) While Fd4 is required and sufficient for several later-born NB7-1 progeny features, a comparison between early-born (Hb/Eve) and later-born (Run/Eve) appears missing for pan-progenitor gain of Fd4 (with sca-Gal4; Figure 4) and for the NB7-3 lineage (Figure 6). Having a quantification for both could make it clearer whether Fd4 preferentially induces later-born neurons or is sufficient for NB7-1 features without temporal restriction.

      (2) Fd4 and Fd5 are shown to be partially redundant, as Fd4 loss of function alone does not alter the number of Eve+ and Dbx+ neurons. This information is critical and should be included in Figure 3.

      (3) Several observations suggest that lineage identity maintenance involves both Fd4-dependent and Fd4-independent mechanisms. In particular, the fact that fd4-Gal4 reporter remains active in fd4/fd5 mutants even after Vnd and En disappear indicates that Fd4's own expression, a key feature of NB7-1 identity, is maintained independently of Fd4 protein. This raises questions about what proportion of lineage identity features require Fd4 versus other maintenance mechanisms, which deserves discussion.

      (4) Similarly, while gain of Fd4 induces NB7-1 lineage markers and dorsal muscle innervation in NB5-6 and NB7-3 lineages, drivers for the two lineages remain active despite the loss of molecular markers, indicating some regulatory elements retain activity consistent with their original lineage identity. It is therefore important to understand the degree of functional conversion in the gain-of-function experiments. Sparse labeling of Fd4 overexpressing NB5-6 and NB7-3 progenies, as what was done in Seroka and Doe (2019) would be an option.

      (5) The less-penetrant induction of Dbx+ neurons in NB5-6 with Fd4-overexpression is interesting. It might be worth discussing whether it is a Fd4 feature or a NB5-6 feature by examining Dbx+ neuron number in NB7-3 with Fd4-overexpression.

      (6) It is logical to hypothesize that spatial factors specify early-born neurons directly so only late-born neurons require Fd4, but it was not tested. The model would be strengthened by examining whether Fd4-Gal4-driven Vnd rescues the generation of later-born neurons in fd4/fd5 mutants.

      (7) It is mentioned that Fd5 is not sufficient for the NB7-1 lineage identity. The observation is intriguing in how similar regulators serve distinct roles, but the data are not shown. The analysis in Figure 4 should be performed for Fd5 as supplemental information.

      Comments on latest version:

      We appreciate the thorough revision and the detailed point-by-point responses. Overall, the updated manuscript has addressed the main issues we raised previously, especially around the potential potency differences of Fd4 along the birth order axis and possible redundancy with Vnd in early-born neurons. The additional data are convincing and presented clearly, with figures and supplements that are informative and appropriately labeled.

      We noticed one remaining point that could be considered, the necessary-and-sufficient phrasing for Fd4 regulating NB7-1 fates. Given the possible redundancy among Fd4/5 and Vnd and the fact that early-born outputs (U1-3, Figure 3F) are not dependent on Fd4/5, we suggest revising this claim and either (a) limit the claim to necessary and sufficient for late-born NB7-1 progeny identity, or (b) frame Fd4 as sufficient for NB7-1 program induction while being required but redundant (e.g., with Vnd) for early-born features, rather than universally necessary/sufficient across the entire lineage output.

      Regarding the lack of phenotype of single Fd4/5 mutants and Fd5 gain of function, we still encourage the authors to include the fd4 and fd5 single-mutant negative results as a brief supplemental item (e.g., a representative panel plus a simple quantification on Eve and Dbx would be sufficient). This would strengthen transparency, remove "data not shown" statements that are not necessary when these data can be presented as supplementary data with no space limitation, and make it easier for readers to evaluate redundancy claims.

      Overall, we view the work as substantially complete and appreciate its contribution and conceptual framing. We have updated our public review to reflect the current version and the authors' efforts to address the major points raised in the prior round.

    3. Reviewer #3 (Public review):

      The goal of the work is to establish the linkage between the spatial transcription factors (STF's) that function transiently to establish the identities of the individual NBs and the terminal selector genes (typically homeodomain genes) that appear in the new-born post-mitotic neurons. How is the identity of the NB maintained and carried forward after the spatial genes have faded away? Focusing on a single neuroblast (NB 7-1), the authors present evidence that the fork-head transcription factor, fd4, provides a bridge linking the transient spatial cues that initially specified neuroblast identity with the terminal selector genes that establish and maintain the identity of the stem cell's progeny.

      The study is systematic, concise and takes full advantage of 40+ years of work on the molecular players that establish neuronal identities in the Drosophila CNS. In the embryonic VNC, fd4 is expressed only in the NB 7-1 and its lineage. They show that Fd4 appears in the NB while the latter is still expressing the Spatial Transcription Factors and continues after the expression of the latter fades out. Fd4 is maintained through the early life of the neuronal progeny but then declines as the neurons turn on their terminal selector genes. Hence, fd4 expression is compatible with it being a bridging factor between the two sets of genes.

      Experimental support for the "bridging" role of Fd4 comes from set of loss-of-function and gain-of-function manipulations. The loss of function of fd4, and the partially redundant gene fd5, from lineage 7-1 does not affect the size of the lineage, but terminal markers of late-born neuronal phenotypes, like Eve and Dbx, are reduced or missing. By contrast, ectopic expression of fd4, but not fd5, results in ectopic expression of the terminal markers eve and dbx throughout diverse VNC lineages.

      A detailed test of fd4's expression was then carried out using lineages 7-3 and 5-6, two well characterized lineages in Drosophila. Lineage 7-3 is much smaller that 7-1 and continues to be so when subjected to fd4 misexpression. However, under the influence of ectopic fd4 expression, the lineage 7-3 neurons lost their expected serotonin and corazonin expression and showed Eve expression as well as motoneuron phenotypes that partially mimic the U motoneurons of lineage 7-1.

      Ectopic expression of Fd4 also produced changes in the 5-6 lineage. Expression of apterous, a feature of lineage 5-6 was suppressed, and expression of the 7-1 marker, Eve, was evident. Dbx expression was also evident in the transformed 5-6 lineages but extremely restricted as compared to a normal 7-1 lineage. Considering the partial redundancy of fd4 and fd5, it would have been interesting to express both genes in the 5-6 lineage. The anatomical changes that are exhibited by motoneurons in response to fd4 expression confirms that these cells do, indeed, show a shift in their cellular identity.

      Comments on revisions:

      The authors adequately addressed all of the issues that I had with the original submission.

      Their responses to the other reviewers are also well-reasoned

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Lai and Doe address the integration of spatial information with temporal patterning and genes that specify cell fate. They identify the Forkhead transcription factor Fd4 as a lineage-restricted cell fate regulator that bridges transient spatial transcription factors to terminal selector genes in the developing Drosophila ventral nerve cord. The experimental evidence convincingly demonstrates that Fd4 is both necessary for lateborn NB7-1 neurons, but also sufficient to transform other neural stem cell lineages toward the NB7-1 identity. This work addresses an important question that will be of interest to developmental neurobiologists: How can cell identities defined by initial transient developmental cues be maintained in the progeny cells, even if the molecular mechanism remains to be investigated? In addition, the study proposes a broader concept of lineage identity genes that could be utilized in other lineages and regions in the Drosophila nervous system and in other species.

      Thanks for the accurate summary and positive comments!

      While the spatial factors patterning the neuroepithelium to define the neuroblast lineages in the Drosophila ventral nerve cord are known, these factors are sometimes absent or not required during neurogenesis. In the current work, Lai and Doe identified Fd4 in the NB7-1 lineage that bridges this gap and explains how NB7-1 neurons are specified after Engrailed (En) and Vnd cease their expression. They show that Fd4 is transiently co-expressed with En and Vnd and is present in all nascent NB7-1 progenies. They further demonstrate that Fd4 is required for later-born NB7-1 progenies and sufficient for the induction of NB7-1 markers (Eve and Dbx) while repressing markers of other lineages when force-expressed in neural progenitors, e.g., in the NB56 lineage and in the NB7-3 lineage. They also demonstrate that, when Fd4 is ectopically expressed in NB7-3 and NB5-6 lineages, this leads to the ectopic generation of dorsal muscle-innervating neurons. The inclusion of functional validation using axon projections demonstrates that the transformed neurons acquire appropriate NB7-1 characteristics beyond just molecular markers. Quantitative analyses are thorough and well-presented for all experiments.

      Thanks for the positive comments!

      (1) While Fd4 is required and sufficient for several later-born NB7-1 progeny features, a comparison between early-born (Hb/Eve) and later-born (Run/Eve) appears missing for pan-progenitor gain of Fd4 (with sca-Gal4; Figure 4) and for the NB7-3 lineage (Figure 6). Having a quantification for both could make it clearer whether Fd4 preferentially induces later-born neurons or is sufficient for NB7-1 features without temporal restriction.

      We quantified the percentage of Hb+ and Runt+ cells among Eve+ cells with sca-gal4, and the results are shown in Figure 4-figure supplement 1. We found that the proportion of early-born cells is slightly reduced but the proportion of later-born cells remain similar. Interestingly, we also found a subset of Eve+ cells with a mixed fate (Hb+Runt+) but the reason remains unclear.

      (2) Fd4 and Fd5 are shown to be partially redundant, as Fd4 loss of function alone does not alter the number of Eve+ and Dbx+ neurons. This information is critical and should be included in Figure 3.

      Because every hemisegment in an fd4 single mutant is normal, we just added it as the following text: “In fd4 mutants, we observe no change in the number of Eve+ neurons or Dbx+ neurons (n=40 hemisegments).”

      (3) Several observations suggest that lineage identity maintenance involves both Fd4dependent and Fd4-independent mechanisms. In particular, the fact that fd4-Gal4 reporter remains active in fd4/fd5 mutants even after Vnd and En disappear indicates that Fd4's own expression, a key feature of NB7-1 identity, is maintained independently of Fd4 protein. This raises questions about what proportion of lineage identity features require Fd4 versus other maintenance mechanisms, which deserves discussion.

      We agree, thanks for raising this point. We add the following text to the Discussion. “Interestingly, the fd4 fd5 mutant maintains expression of fd4:gal4, suggesting that the fd4/fd5 locus may have established a chromatin state that allows “permanent” expression in the absence of Vnd, En, and Fd4/Fd5 proteins.”

      (4) Similarly, while gain of Fd4 induces NB7-1 lineage markers and dorsal muscle innervation in NB5-6 and NB7-3 lineages, drivers for the two lineages remain active despite the loss of molecular markers, indicating some regulatory elements retain activity consistent with their original lineage identity. It is therefore important to understand the degree of functional conversion in the gain-of-function experiments. Sparse labeling of Fd4 overexpressing NB5-6 and NB7-3 progenies, as was done in Seroka and Doe (2019), would be an option.

      We agree it is interesting that the NB7-3 and NB5-6 drivers remain on following Fd4 misexpression. To explore this, we used sca-gal4 to overexpress Fd4 and observed that Lbe expression persisted while Eg was largely repressed (Author response image 1). The results show that Lbe and Eg respond differently to Fd4. A non-mutually exclusive possibility is that the continued expression of lbe-Gal4 UAS-GFP or eg-Gal4 UAS-GFP may be due to the lengthy perdurance of both Gal4 and GFP.

      Author response image 1.

      (5) The less-penetrant induction of Dbx+ neurons in NB5-6 with Fd4-overexpression is interesting. It might be worth the authors discussing whether it is an Fd4 feature or an NB56 feature by examining Dbx+ neuron number in NB7-3 with Fd4-overexpression.

      In the NB7-3 lineages misexpressing Fd4, only 5 lineages generated Dbx+ cells (0.1±0.4, n=64 hemisegments), suggesting that the low penetrance of Dbx+ induction is an intrinsic feature of Fd4 rather than lineage context. We have added this information in the results section.

      (6) It is logical to hypothesize that spatial factors specify early-born neurons directly, so only late-born neurons require Fd4, but it was not tested. The model would be strengthened by examining whether Fd4-Gal4-driven Vnd rescues the generation of laterborn neurons in fd4/fd5 mutants.

      When we used en-gal4 driver to express UAS-vnd in the fd4/fd5 mutant background, we found an average 7.4±2.2 Eve+ cells per hemisegment (n=36), significantly higher than fd4/fd5 mutant alone (3.9±0.8 cells, n=52, p=2.6x10<sup>-11</sup>) (Figure 3J). In addition, 0.2±0.5 Eve+ cells were ectopic Hb+ (excluding U1/U2), indicating that Vnd-En integration is sufficient to generate both early-born and late-born Eve+ cells in the fd4/fd5 mutants. We have added the results to the text.

      (7) It is mentioned that Fd5 is not sufficient for the NB7-1 lineage identity. The observation is intriguing in how similar regulators serve distinct roles, but the data are not shown. The analysis in Figure 4 should be performed for Fd5 as supplemental information.

      Thanks for the suggestion. Because the results are exactly the same as the wild type, we don’t think it is necessary to provide an additional images or analysis as supplemental information.

      Reviewer #2 (Public review):

      Via a detailed expression analysis, they find that Fd4 is selectively expressed in embryonic NB7-1 and newly born neurons within this lineage. They also undertake a comprehensive genetic analysis to provide evidence that fd4 is necessary and sufficient for the identity of NB7-1 progeny.

      Thanks for the accurate summary!

      The analysis is both careful and rigorous, and the findings are of interest to developmental neurobiologists interested in molecular mechanisms underlying the generation of neuronal diversity. Great care was taken to make the figures clear and accessible. This work takes great advantage of years of painstaking descriptive work that has mapped embryonic neuroblast lineages in Drosophila.

      Thanks for the positive comments!

      The argument that Fd4 is necessary for NB7-1 lineage identity is based on a Fd4/Fd5 double mutant. Loss of fd4 alone did not alter the number of NB7-1-derived Eve+ or Dbx+ neurons. The authors clearly demonstrate redundancy between fd4 and fd5, and the fact that the LOF analysis is based on a double mutant should be better woven through the text.The authors generated an Fd5 mutant. I assume that Fd5 single mutants do not display NB7-1 lineage defects, but this is not stated. The focus on Fd4 over Fd5 is based on its highly specific expression profile and the dramatic misexpression phenotypes. But the LOF analysis demonstrates redundancy, and the conclusions in the abstract and through the results should reflect the existence of Fd5 in the conclusions of this manuscript.

      We agree, and have added new text to clarify the single mutant phenotypes (there are none) and the double mutant phenotype (loss of NB7-1 molecular and morphological features. The following text is added to the manuscript: “Not surprisingly, we found that fd4 single mutants or fd5 single mutants had no phenotype (Eve+ neurons were all normal). Thus, to assess their roles, we generated a fd4 and fd5 double mutant. Because many Eve+ and Dbx+ cells are generated outside of NB7-1 lineage, it was also essential to identify the Eve+ or Dbx+ cells within NB7-1 lineage in wild type and fd4 mutant embryos. To achieve this, we replaced the open reading frame of fd4 with gal4 (called fd4-gal4) (see Methods); this stock simultaneously knocked out both fd4 and fd5 (called fd4/fd5 mutant hereafter) while specifically labeling the NB7-1 lineage. For the remainder of this paper we use the fd4/fd5 double mutant to assay for loss of function phenotypes.”

      It is notable that Fd4 overexpression can rewire motor circuits. This analysis adds another dimension to the changes in transcription factor expression and, importantly, demonstrates functional consequences. Could the authors test whether U4 and U5 motor axon targeting changes in the fd4/fd5 double mutant? To strengthen claims regarding the importance of fd4/fd5 for lineage identity, it would help to address terminal features of U motorneuron identity in the LOF condition.

      Thanks for raising this important point. We examined the axon targeting on body wall muscles in both wild type and in fd4/fd5 mutant background and added the results in Figure 3-figure supplement 2. We found that the axon targeting in the late-born neuron region (LL1) is significantly reduced, suggesting that the loss of late-born neurons in fd4/fd5 mutant leads to the absence of innervation of corresponding muscle targets.

      Reviewer #3 (Public review):

      The goal of the work is to establish the linkage between the spatial transcription factors (STFs) that function transiently to establish the identities of the individual NBs and the terminal selector genes (typically homeodomain genes) that appear in the newborn postmitotic neurons. How is the identity of the NB maintained and carried forward after the spatial genes have faded away? Focusing on a single neuroblast (NB 7-1), the authors present evidence that the fork-head transcription factor, fd4, provides a bridge linking the transient spatial cues that initially specified neuroblast identity with the terminal selector genes that establish and maintain the identity of the stem cell's progeny.

      Thanks for the positive comments!

      The study is systematic, concise, and takes full advantage of 40+ years of work on the molecular players that establish neuronal identities in the Drosophila CNS. In the embryonic VNC, fd4 is expressed only in the NB 7-1 and its lineage. They show that Fd4 appears in the NB while the latter is still expressing the Spatial Transcription Factors and continues after the expression of the latter fades out. Fd4 is maintained through the early life of the neuronal progeny but then declines as the neurons turn on their terminal selector genes. Hence, fd4 expression is compatible with it being a bridging factor between the two sets of genes.

      Thanks for the accurate summary!

      Experimental support for the "bridging" role of Fd4 comes from a set of loss-of-function and gain-of-function manipulations. The loss of function of Fd4, and the partially redundant gene Fd5, from lineage 7-1 does not aoect the size of the lineage, but terminal markers of late-born neuronal phenotypes, like Eve and Dbx, are reduced or missing. By contrast, ectopic expression of fd4, but not fd5, results in ectopic expression of the terminal markers eve and Dbx throughout diverse VNC lineages.

      Thanks for the accurate summary!

      A detailed test of fd4's expression was then carried out using lineages 7-3 and 5-6, two well-characterized lineages in Drosophila. Lineage 7-3 is much smaller than 7-1 and continues to be so when subjected to fd4 misexpression. However, under the influence of ectopic Fd4 expression, the lineage 7-3 neurons lost their expected serotonin and corazonin expression and showed Eve expression as well as motoneuron phenotypes that partially mimic the U motoneurons of lineage 7-1.

      Thanks for the positive comments!

      Ectopic expression of Fd4 also produced changes in the 5-6 lineage. Expression of apterous, a feature of lineage 5-6, was suppressed, and expression of the 7-1 marker, Eve, was evident. Dbx expression was also evident in the transformed 5-6 lineages, but extremely restricted as compared to a normal 7-1 lineage. Considering the partial redundancy of fd4 and fd5, it would have been interesting to express both genes in the 5-6 lineage. The anatomical changes that are exhibited by motoneurons in response to Fd4 expression confirm that these cells do, indeed, show a shift in their cellular identity.

      We appreciate the positive comments. We agree double misexpression of Fd4 and Fd5 might give a stronger phenotype (as the reviewer says) but the lack of this experiment does not change the conclusions that Fd4 can promote NB7-1 molecular and morphological aspects at the expense of NB5-6 molecular markers.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The title of Figure 4 may be intended to include the term "Widespread", not "Wild spread". (Though the expansion of the Eve and Dbx with Fd4 is quite remarkable…).

      Done!

      Reviewer #3 (Recommendations for the authors):

      (1) Line 138. Is part of the sentence missing? Did the authors mean to say "that fd5 is coexpressed with fd4 in NB7-1 and its .....".

      Done!

      (2) ln 237: In trying to explain the "U-like" phenotype of the transformed motoneurons in lineage 7-3, the authors speculate that "perhaps their late birth did not give them time to extend to the most distant dorsal muscles ". It is very difficult to convince a motoneuron to stop growing in the absence of a target! An alternate possibility is that since there is only one or two U neurons made instead of the normal five, the growing motoneuron has enough information to direct them to the dorsal domain, but they lack the specification that allows them to recognize a specific muscle target.

      We agree there are additional possibilities, and now update the text to say: “We observed that these transformed neurons did not innervate the dorsal muscles, perhaps their late birth did not give them time to extend to the most distant dorsal muscles, or they were incompletely specified.”

      (3) In the References, I think that the Anderson et al. reference should also include "BioRxiv" before the DOI.

      Done!

      (4) Figure 6A for wild-type 7-3 lineage. The corazonin expression appears to be expressed in EW2 as well as EW3. This should be explained.

      We agree it looks that way, due to the 3D rotation used; we now replace it with a more representative image. Note that our quantification always shows a single Cor+ neuron per hemisegment.

      (5) Figure 7: Issues of terminology. The designation of "longitudinal" for muscles is traditionally in reference to the body axis, such as the Dorsal Longitudinal Muscles (DLM) of the adult thorax. The "longitudinal" muscles in the figure are really "transverse" muscles. I also suggest using "axon" or "neurites" rather than "filament". For the middle and bottom parts of E and F, are these lateral and ventral views? They should be designated as such.

      Thanks, we agree and have made the changes, using Axon instead of Filament, and labeling the views (lateral and ventro-lateral).

    1. eLife Assessment

      This study presents experiments suggesting intriguing mesoscale reorganization of functional connectivity across distributed cortical and subcortical circuits during learning. The approach is technically impressive and the results are potentially of valuable significance. The authors have also made clear effort to address concerns in revision. However, the strength of evidence remains incomplete. Acquisition of data from additional animals in the primary experiment could bolster these findings.

    2. Reviewer #1 (Public review):

      Summary:

      This study aims to address an important and timely question: how does the mesoscale architecture of cortical and subcortical circuits reorganize during sensorimotor learning? By using high-density, chronically implanted ultra-flexible electrode arrays, the authors track spiking activity across ten brain regions as mice learn a visual Go/No-Go task. The results indicate that learning leads to more sequential and temporally compressed patterns of activity during correct rejection trials, alongside changes in functional connectivity ranks that reflect shifts in the relative influence of visual, frontal, and motor areas throughout learning. The emergence of a more task-focused subnetwork is accompanied by broader and faster propagation of stimulus information across recorded regions.

      Strengths:

      A clear strength of this work is its recording approach. The combination of stable, high-throughput multi-region recordings over extended periods represents a significant advance for capturing learning-related network dynamics at the mesoscale. The conceptual framework is well motivated, building on prior evidence that decision-relevant signals are widely distributed across the brain. The analysis approach, combining functional connectivity rankings with information encoding metrics is well motivated but needs refinement. These results provide some valuable evidence of how learning can refine both the temporal precision and the structure of interregional communication, offering new insights into circuit reconfiguration during learning.

      Weaknesses:

      Several important aspects of the evidence remain incomplete. In particular, it is unclear whether the reported changes in connectivity truly capture causal influences, as the rank metrics remain correlational and show discrepancies with the manipulation results. The absolute response onset latencies also appear slow for sensory-guided behavior in mice, and it is not clear whether this reflects the method used to define onset timing or factors such as task structure or internal state. Furthermore, the small number of animals, combined with extensive repeated measures, raises questions about statistical independence and how multiple comparisons were controlled. The optogenetic experiments, while intended to test the functional relevance of rank-increasing regions, leave it unclear how effectively the targeted circuits were silenced. Without direct evidence of reliable local inhibition, the behavioral effects or lack thereof are difficult to interpret.

    3. Reviewer #2 (Public review):

      Summary:

      Wang et al. measure from 10 cortical and subcortical brain as mice learn a go/no-go visual discrimination task. They found that during learning, there is a reshaping of inter-areal connections, in which a visual-frontal subnetwork emerges as mice gain expertise. Also visual stimuli decoding became more widespread post-learning. They also perform silencing experiments and find that OFC and V2M are important for the learning process. The conclusion is that learning evoked a brain-wide dynamic interplay between different brain areas that together may promote learning.

      Strengths:

      The manuscript is written well and the logic is rather clear. I found the study interesting and of interest to the field. The recording method is innovative and requires exceptional skills to perform. The outcomes of the study are significant, highlighting that learning evokes a widespread and dynamics modulation between different brain areas, in which specific task-related subnetworks emerge.

      Weaknesses:

      I had some major concerns that make the claims of the study less convincing: Low number of mice, insufficient movement analysis, figure visualization and analytic methods.

      Nevertheless, I had several major concerns:

      (1) The number of mice was small for the ephys recordings. Although the authors start with 7 mice in Figure 1, they then reduce to 5 in panel F. And in their main analysis they minimize their analysis 6/7 sessions from 3 mice only. I couldn't find a rationale for this reduction, but in the methods they do mention that 2 mice were used for fruitless training, which I found no mention in the results. Moreover, in the early case all of the analysis is from 118 CR trials taken from 3 mice. In general, this is a rather low number of mice and trial numbers. I think it is quite essential to add more mice.

      (2) Movement analysis was not sufficient. Mice learning a go/no-go task establish a movement strategy that is developed throughout learning and is also biased towards Hit trials. There is an analysis of movement in Fig. S4 but this is rather superficial. I was not even sure that the 3 mice in Figure S4 are the same 3 mice in the main figure. There should be also an analysis of movement as a function of time to see differences. Also for Hits and FAs. I give some more details below. In general, most of the results can be explained by the fact that as mice gain expertise, they move more (also in CR during specific times) which leads to more activation in frontal cortex and more coordination with visual areas. More needs to be done in terms of analysis, or at least a mention of this in the text.

      (3) Most of the figures are over-detailed and it is hard to understand the take home message. Although the text is written succinctly and rather short, the figures are mostly overwhelming, especially figures 4-7. For example, Figure 4 presents 24 brain plots! For rank input and output rank during early and late stim and response periods, for early and expert and their difference. All in the same colormap. No significance shown at all. The Δrank maps for all cases look essentially identical across conditions. The division into early and late time periods is not properly justified. But the main take home message is positive Δrank in OFC, V2M, V1 and negative Δrank in ThalMD and Str. In my opinion, one trio maps is enough, and the rest could be bumped to the Supp, if at all. In general, the figures in several cases do not convey the main take home messages.

      (4) Analysis is sometimes not intuitive enough. For example, the rank analysis of input and output rank seemed a bit over complex. Figure 3 was hard to follow (although a lot of effort was made by the authors to make it clearer). Was there any difference between output and input analysis? Also time period seem sometimes redundant. Also, there are other network analysis that can be done which are a bit more intuitive. The use of rank within the 10 areas was not the most intuitive. Even a dimensionality reduction along with clustering can be used as an alternative. In my opinion, I don't think the authors should completely redo their analysis, but maybe mention the fact that other analyses exist.

      Reviewer comments to the authors' revision:

      Thank you for the extensive revision. Most of my concerns were answered and the manuscript is much improved. Still, there are some major issues that remain unconvincing:

      (1) The number of learning mice is only 3 which is substantially low as compared to other studies in the field. Thus, statistics are across trials and session pooled from all mice. This is a big limitation in supporting the authors' claims

      (2) There is no measurement of movement during the task. Since there are already several studies showing that movement has a strong effect on brain-wide dynamics, and since it is well known that mice change their body movement during learning (at least some mice) the authors cannot disentangle between learning-related and movement-related dynamics. This issue is properly discussed in the paper and also partially addressed with a control group where movement was measured without neural recordings.

      (3) The authors do not know exactly where they recorded from, with emphasis on subcortical areas. The authors partially address this in a separate cohort where they regenerate the reproducibility rate of penetration locations, but still this is not a complete address to this concern.

      Given the issues above, I strongly recommend including additional mice with body movement measurement in the future. Great job and congratulations on this study!

    4. Reviewer #3 (Public review):

      Summary:

      In the manuscript " Dynamics of mesoscale brain network during decision-making learning revealed by chronic, large-scale single-unit recording", Wang et al investigated mesoscale network reorganization during visual stimulus discrimination learning in mice using chronic, large-scale single-unit recordings across 10 cortical/subcortical regions. During learning, mice improved task performance mainly by suppressing licking on no-go trials. The authors found that learning induced restructuring of functional connectivity, with visual (V1, V2M) and frontal (OFC, M2) regions forming a task-relevant subnetwork during the acquisition of correct No-Go (CR) trials. Learning also compressed sequential neural activation and broadened stimulus encoding across regions. In addition, a region's network connectivity rank correlated with its timing of peak visual stimulus encoding. Optogenetic inhibition of orbitofrontal cortex (OFC) and high order visual cortex (V2M) impaired learning, validating its role in learning. The work highlights how mesoscale networks underwent dynamic structuring during learning.

      Strengths:

      The use of ultra-flexible microelectrode arrays (uFINE-M) for chronic, large-scale recordings across 10 cortical/subcortical regions in behaving mice represents a significant methodological advancement. The ability to track individual units over weeks across multiple brain areas will provide a rare opportunity to study mesoscale network plasticity.<br /> While limited in scope, optogenetic inhibition of OFC and V2M directly ties connectivity rank changes to behavioral performance, adding causal depth to correlational observations.

      Weaknesses:

      The weakness is also related to the strength provided by the method. While the method in principle enables chronic tracking of individual units, the authors have not showed chronically tracked neurons across learning. Without demonstrating that and taking advantage of analyzing chronically tracked neurons, this approach is not different from acute recording in individual days across learning, weaking the attractiveness of the methodology and this study.

      Another weakness is that major results are based on analyses of functional connectivity. Functional connection strengthen across areas is ranked 1-10 based on relative strength. And the regional input/out is compared across learning. This approach reveals differential changes in some cortical and subcortical areas. In my view, learning-related changes should be validated using complementary methods.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      The technical approach is strong and the conceptual framing is compelling, but several aspects of the evidence remain incomplete. In particular, it is unclear whether the reported changes in connectivity truly capture causal influences, as the rank metrics remain correlational and show discrepancies with the manipulation results.

      We agree that our functional connectivity ranking analyses cannot establish causal influences. As discussed in the manuscript, besides learning-related activity changes, the functional connectivity may also be influenced by neuromodulatory systems and internal state fluctuations. In addition, the spatial scope of our recordings is still limited compared to the full network implicated in visual discrimination learning, which may bias the ranking estimates. In future, we aim to achieve broader region coverage and integrate multiple complementary analyses to address the causal contribution of each region.

      The absolute response onset latencies also appear slow for sensory-guided behavior in mice, and it is not clear whether this reflects the method used to define onset timing or factors such as task structure or internal state.

      We believe this may be primarily due to our conservative definition of onset timing. Specifically, we required the firing rate to exceed baseline (t-test, p < 0.05) for at least 3 consecutive 25-ms time windows. This might lead to later estimates than other studies, such as using the latency to the first spike after visual stimulus onset (Siegle et al., 2021) or the time to half-max response (Goldbach, Akitake, Leedy, & Histed, 2021).

      The estimation of response onset latency in our study may also be affected by potential internal state fluctuations of the mice. We used the time before visual stimulus onset as baseline firing, since firing rates in this period could be affected by trial history, we acknowledge this may increase the variability of the baseline, thus increase the difficulty to statistically detect the onset of response.

      Still, we believe these concerns do not affect the observation of the formation of compressed activity sequence in CR trials during learning.

      Furthermore, the small number of animals, combined with extensive repeated measures, raises questions about statistical independence and how multiple comparisons were controlled.

      We agree that a larger sample size would strengthen the robustness of the findings. However, as noted above, the current dataset has inherent limitations in both the number of recorded regions and the behavioral paradigm. Given the considerable effort required to achieve sufficient unit yields across all targeted regions, we wish to adjust the set of recorded regions, improve behavioral task design, and implement better analyses in future studies. This will allow us to both increase the number of animals and extract more precise insights into mesoscale dynamics during learning.

      The optogenetic experiments, while intended to test the functional relevance of rank increasing regions, leave it unclear how effectively the targeted circuits were silenced. Without direct evidence of reliable local inhibition, the behavioral effects or lack thereof are difficult to interpret.

      We appreciate this important point. Due to the design of the flexible electrodes and the implantation procedure, bilateral co-implantation of both electrodes and optical fibers was challenging, which prevented us from directly validating the inhibition effect in the same animals used for behavior. In hindsight, we could have conducted parallel validations using conventional electrodes, and we will incorporate such controls in future work to provide direct evidence of manipulation efficacy.

      Details on spike sorting are limited.

      We have provided more details on spike sorting in method section, including the exact parameters used in the automated sorting algorithm and the subsequent manual curation criteria.

      Reviewer #2 (Public review):

      Weaknesses:

      I had several major concerns:

      (1) The number of mice was small for the ephys recordings. Although the authors start with 7 mice in Figure 1, they then reduce to 5 in panel F. And in their main analysis, they minimize their analysis to 6/7 sessions from 3 mice only. I couldn't find a rationale for this reduction, but in the methods they do mention that 2 mice were used for fruitless training, which I found no mention in the results. Moreover, in the early case, all of the analysis is from 118 CR trials taken from 3 mice. In general, this is a rather low number of mice and trial numbers. I think it is quite essential to add more mice.

      We apologize for the confusion. As described in the Methods section, 7 mice (Figure 1B) were used for behavioral training without electrode array or optical fiber implants to establish learning curves, and an additional 5 mice underwent electrophysiological recordings (3 for visual-based decision-making learning and 2 for fruitless learning).

      As we noted in our response to Reviewer #1, the current dataset has inherent limitations in both the number of recorded regions and the behavioral paradigm. Given the considerable effort required to achieve high-quality unit yields across all targeted regions, we wish to adjust the set of recorded regions, improve behavioral task design, and implement better analyses in future studies. These improvements will enable us to collect data from a larger sample size and extract more precise insights into mesoscale dynamics during learning.

      (2) Movement analysis was not sufficient. Mice learning a go/no-go task establish a movement strategy that is developed throughout learning and is also biased towards Hit trials. There is an analysis of movement in Figure S4, but this is rather superficial. I was not even sure that the 3 mice in Figure S4 are the same 3 mice in the main figure. There should be also an analysis of movement as a function of time to see differences. Also for Hits and FAs. I give some more details below. In general, most of the results can be explained by the fact that as mice gain expertise, they move more (also in CR during specific times) which leads to more activation in frontal cortex and more coordination with visual areas. More needs to be done in terms of analysis, or at least a mention of this in the text.

      Due to the limitation in the experimental design and implementation, movement tracking was not performed during the electrophysiological recordings, and the 3 mice shown in Figure S4 (now S5) were from a separate group. We have carefully examined the temporal profiles of mouse movements and found it did not fully match the rank dynamics for all regions, and we have added these results and related discussion in the revised manuscript. However, we acknowledge the observed motion energy pattern could explain some of the functional connection dynamics, such as the decrease in face and pupil motion energy could explain the reduction in ranks for striatum.

      Without synchronized movement recordings in the main dataset, we cannot fully disentangle movement-related neural activity from task-related signals. We have made this limitation explicit in the revised manuscript and discuss it as a potential confound, along with possible approaches to address it in future work.

      (3) Most of the figures are over-detailed, and it is hard to understand the take-home message. Although the text is written succinctly and rather short, the figures are mostly overwhelming, especially Figures 4-7. For example, Figure 4 presents 24 brain plots! For rank input and output rank during early and late stim and response periods, for early and expert and their difference. All in the same colormap. No significance shown at all. The Δrank maps for all cases look essentially identical across conditions. The division into early and late time periods is not properly justified. But the main take home message is positive Δrank in OFC, V2M, V1 and negative Δrank in ThalMD and Str. In my opinion, one trio map is enough, and the rest could be bumped to the Supplementary section, if at all. In general, the figure in several cases do not convey the main take home messages. See more details below.

      We thank the reviewer for this valuable critique. The statistical significance corresponding to the brain plots (Figure 4 and Figure 5) was presented in Figure S3 and S5 (now Figure S5 and S7 in the revised manuscript), but we agree that the figure can be simplified to focus on the key results.

      In the revised manuscript, we have condensed these figures to focus on the most important comparisons to make the visual presentation more concise and the take-home message clearer.

      (4) The analysis is sometimes not intuitive enough. For example, the rank analysis of input and output rank seemed a bit over complex. Figure 3 was hard to follow (although a lot of effort was made by the authors to make it clearer). Was there any difference between the output and input analysis? Also, the time period seems redundant sometimes. Also, there are other network analysis that can be done which are a bit more intuitive. The use of rank within the 10 areas was not the most intuitive. Even a dimensionality reduction along with clustering can be used as an alternative. In my opinion, I don't think the authors should completely redo their analysis, but maybe mention the fact that other analyses exist

      We appreciate the reviewer’s comment. In brief, the input- and output-rank analyses yielded largely similar patterns across regions in CR trials, although some differences were observed in certain areas (e.g., striatum) in Hit trials, where the magnitude of rank change was not identical between input and output measures. We have condensed the figures to only show averaged rank results, and the colormap was updated to better covey the message.

      We did explore dimensionality reduction applied to the ranking data. However, the results were not intuitive as well and required additional interpretation, which did not bring more insights. Still, we acknowledge that other analysis approaches might provide complementary insights.

      Reviewer #3 (Public review):

      Weaknesses:

      The weakness is also related to the strength provided by the method. It is demonstrated in the original method that this approach in principle can track individual units for four months (Luan et al, 2017). The authors have not showed chronically tracked neurons across learning. Without demonstrating that and taking advantage of analyzing chronically tracked neurons, this approach is not different from acute recording across multiple days during learning. Many studies have achieved acute recording across learning using similar tasks. These studies have recorded units from a few brain areas or even across brain-wide areas.

      We appreciate the reviewer’s important point. We did attempt to track the same neurons across learning in this project. However, due to the limited number of electrodes implanted in each brain region, the number of chronically tracked neurons in each region was insufficient to support statistically robust analyses. Concentrating probes in fewer regions would allow us to obtain enough units tracked across learning in future studies to fully exploit the advantages of this method.

      Another weakness is that major results are based on analyses of functional connectivity that is calculated using the cross-correlation score of spiking activity (TSPE algorithm). Functional connection strengthen across areas is then ranked 1-10 based on relative strength. Without ground truth data, it is hard to judge the underlying caveats. I'd strongly advise the authors to use complementary methods to verify the functional connectivity and to evaluate the mesoscale change in subnetworks. Perhaps the authors can use one key information of anatomy, i.e. the cortex projects to the striatum, while the striatum does not directly affect other brain structures recorded in this manuscript

      We agree that the functional connectivity measured in this study relies on statistical correlations rather than direct anatomical connections. We plan to test the functional connection data with shorter cross-correlation delay criteria to see whether the results are consistent with anatomical connections and whether the original findings still hold.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The small number of mice, each contributing many sessions, complicates the  interpretation of the data. It is unclear how statistical analyses accounted for the small  sample size, repeated measures, and non-independence across sessions, or whether  multiple comparisons were adequately controlled.

      We realized the limitation from the small number of animal subjects, yet the difficulty to achieve sufficient unit yields across all regions in the same animal restricted our sample size. Though we agree that a larger sample size would strengthen the robustness of the findings, however, as noted below the current dataset has inherent limitations in both the scope of recorded regions and the behavioral paradigm.

      Given the considerable effort required to achieve sufficient unit yields across all targeted regions, we wish to adjust the set of recorded regions, improve behavioral task design, and implement better analyses in future studies. This will allow us to both increase the number of animals and extract more precise insights into mesoscale dynamics during learning.

      (2) The ranking approach, although intuitive for visualizing relative changes in  connectivity, is fundamentally descriptive and does not reflect the magnitude or  reliability of the connections. Converting raw measures into ordinal ranks may obscure  meaningful differences in strength and can inflate apparent effects when the underlying  signal is weak.

      We agree with this important point. As stated in the manuscript, our motivation in taking the ranking approach was that the differences in firing rates might bias cross-correlation between spike trains, making raw accounts of significant neuron pairs difficult to compare across conditions, but we acknowledge the ranking measures might obscure meaningful differences or inflate weak effects in the data.

      We added the limitations of ranking approach in the discussion section and emphasized the necessity in future studies for better analysis approaches that could provide more accurate assessment of functional connection dynamics without bias from firing rates.

      (3) The absolute response onset latencies also appear quite slow for sensory-guided  behavior in mice, and it remains unclear whether this reflects the method used to  determine onset timing or factors such as task design, sensorimotor demands, or  internal state. The approach for estimating onset latency by comparing firing rates in  short windows to baseline using a t-test raises concerns about robustness, as it may  be sensitive to trial-to-trial variability and yield spurious detections.

      We agree this may be primarily due to our conservative definition of onset timing. Specifically, we required the firing rate to exceed baseline (t-test, p < 0.05) for at least 3 consecutive 25-ms time windows. This might lead to later estimates than other studies, such as using the latency to the first spike after visual stimulus onset (Siegle et al., 2021) or the time to half-max response (Goldbach, Akitake, Leedy, & Histed, 2021).

      The estimation of response onset latency in our study may also be affected by potential internal state fluctuations of the mice. We used the time before visual stimulus onset as baseline firing, since firing rates in this period could be affected by trial history, we acknowledge this may increase the variability of the baseline, thus increase the difficulty to statistically detect the onset of response.

      Still, we believe these concerns do not affect the observation of the formation of compressed activity sequence in CR trials during learning.

      (4) Details on spike sorting are very limited. For example, defining single units only by  an interspike interval threshold above one millisecond may not sufficiently rule out  contamination or overlapping clusters. How exactly were neurons tracked across days  (Figure 7B)?

      We have added more details on spike sorting, including the processing steps and important parameters used in the automated sorting algorithm. Only the clusters well isolated in feature space were accepted in manual curation.

      We attempted to track the same neurons across learning in this project. However, due to the limited number of electrodes implanted in each brain region, the number of chronically tracked neurons in each region was insufficient to support statistically robust analyses.

      This is now stated more clearly in the discussion section.

      (5) The optogenetic experiments, while designed to test the functional relevance of  rank-increasing regions, also raise questions. The physiological impact of the inhibition  is not characterized, making it unclear how effectively the targeted circuits were  actually silenced. Without clearer evidence that the manipulations reliably altered local  activity, the interpretation of the observed or absent behavioral effects remains  uncertain.

      We appreciate this important point. Due to the design of the flexible electrodes and the implantation procedure, bilateral co-implantation of both electrodes and optical fibers was challenging, which prevented us from directly validating the inhibition effect in the same animals used for behavior. In hindsight, we could have conducted parallel validations using conventional electrodes, and we will incorporate such controls in future work to provide direct evidence of manipulation efficacy. 

      (6) The task itself is relatively simple, and the anatomical coverage does not include  midbrain or cerebellar regions, limiting how broadly the findings can be generalized to more flexible or ethologically relevant forms of decision-making.

      We appreciate this advice and have expanded the existing discussion to more explicitly state that the relatively simple task design and anatomical coverage might limit the generalizability of our findings.

      (7) The abstract would benefit from more consistent use of tense, as the current mix of  past and present can make the main findings harder to follow. In addition, terms like  "mesoscale network," "subnetwork," and "functional motif" are used interchangeably in  places; adopting clearer, consistent terminology would improve readability.

      We have changed several verbs in abstract to past form, and we now adopted a more consistent terminology by substituting “functional motif” as “subnetwork”. We still feel the use of

      “mesoscale network” and “subnetwork” could emphasize different aspects of the results according to the context, so these words are kept the same.

      (8) The discussion could better acknowledge that the observed network changes may  not reflect task-specific learning alone but could also arise from broader shifts in  arousal, attention, or motivation over repeated sessions.

      We have expanded the existing discussion to better acknowledge the possible effects from broader shifts in arousal, attention, or motivation over repeated sessions.

      (9) The figures would also benefit from clearer presentation, as several are dense and  not straightforward to interpret. For example, Figure S8 could be organized more  clearly to highlight the key comparisons and main message

      We have simplified the over-detailed brain plots in Figure 4-5, and the plots in Figure 6 and S8 (now S10 in the revised manuscript).

      (10) Finally, while the manuscript notes that data and code are available upon request,  it would strengthen the study's transparency and reproducibility to provide open access  through a public repository, in line with best practices in the field.

      The spiking data, behavior data and codes for the core analyses in the manuscript are now shared in pubic repository (Dryad). And we have changed the description in the Data Availability secition accordingly.

      Reviewer #2 (Recommendations for the authors):

      (A) Introduction:

      (1) "Previous studies have implicated multiple cortical and subcortical regions in visual  task learning and decision-making". No references here, and also in the next sentence.

      The references were in the following introduction and we have added those references here as well.

      We also added one review on cortical-subcortical neural correlates in goal-directed behavior (Cruz et al., 2023).

      (2) Intro: In general, the citation of previous literature is rather minimal, too minimal.  There is a lot of studies using large scale recordings during learning, not necessarily  visual tasks. An example for brain-wide learning study in subcortical areas is Sych et  al. 2022 (cell reports). And for wide-field imaging there are several papers from the  Helmchen lab and Komiyama labs, also for multi-area cortical imaging.

      We appreciate this advice. We included mainly visual task learning literature to keep a more focused scope around the regions and task we actually explored in this study. We fear if we expand the intro to include all the large-scale imaging/recording studies in learning field, the background part might become too broad.

      We have included (Sych, Fomins, Novelli, & Helmchen, 2022) for its relevance and importance in the field.

      (3) In the intro, there is only a mention of a recording of 10 brain regions, with no  mention of which areas, along with their relevance to learning. This is mentioned in the  results, but it will be good in the intro.

      The area names are now added in intro.

      (B) Results:

      (1) Were you able to track the same neurons across the learning profile? This is not  stated clearly.

      We did attempt to track the same neurons across learning in this project. However, due to the limited number of electrodes implanted in each brain region, the number of chronically tracked neurons in each region was insufficient to support statistically robust analyses.

      We now stated this more clearly in the discussion section.

      (2) Figure 1 starts with 7 mice, but only 5 mice are in the last panel. Later it goes down  to 3 mice. This should be explained in the results and justified.

      We apologize for the confusion. As described in the Methods section, 7 mice (Figure 1B) were used for behavioral training without electrode array or optical fiber implants to establish learning curves, and an additional 5 mice underwent electrophysiological recordings (3 for visual-based decision-making learning and 2 for fruitless learning).

      (3) I can't see the electrode tracks in Figure 1d. If they are flexible, how can you make  sure they did not bend during insertion? I couldn't find a description of this in the  methods also.

      The electrode shanks were ultra-thin (1-1.5 µm) and it was usually difficult to recover observable tracks or electrodes in section.

      The ultra-flexible probes could not penetrate brain on their own (since they are flexible), and had to be shuttled to position by tungsten wires through holes designed at the tip of array shanks. The tungsten wires were assembled to the electrode array before implantation; this was described in the section of electrode array fabrication and assembly. We also included the description about the retraction of the guiding tungsten wires in the surgery section to avoid confusion.

      As an further attempt to verify the accuracy of implantation depth, we also measured the repeatability of implantation in a group of mice and found a tendency for the arrays to end in slightly deeper location in cortex (142.1 ± 55.2 μm, n = 7 shanks), and slightly shallower location in subcortical structure (-122.6 ± 71.7 μm, n = 7 shanks). We added these results as new Figure S1 to accompany Figure 1.

      (4) In the spike rater in 1E, there seems to be ~20 cells in V2L, for example, but in 1F,  the number of neurons doesn't go below 40. What is the difference here? 

      We checked Figure 1F, the plotted dots do go below 40 to ~20. Perhaps the file that reviewer received wasn’t showing correctly?

      (5) The authors focus mainly on CR, but during learning, the number of CR trials is  rather low (because they are not experts). This can also be seen in the noisier traces  in Figure 2a. Do the authors account for that (for example by taking equal trials from  each group)? 

      We accounted this by reconstructing bootstrap-resampled datasets with only 5 trials for each session in both the early stage and the expert stage. The mean trace of the 500 datasets again showed overall decrease in CR trial firing rate during task learning, with highly similar temporal dynamics to the original data.

      The figure is now added to supplementary materials (as Figure S3 in the revised manuscript).

      (6) From Figure 2a, it is evident that Hit trials increase response when mice become  experts in all brain areas. The authors have decided to focus on the response onset  differences in CRs, but the Hit responses display a strong difference between naïve  and expert cases.

      Judged from the learning curve in this task the mice learned to inhibit its licking action when the No-Go stimuli appeared, which is the main reason we focused on these types of trials.

      The movement effects and potential licking artefacts in Hit trials also restricted our interpretation of these trials.

      (7) Figure 3 is still a bit cumbersome. I wasn't 100% convinced of why there is a need  to rank the connection matrix. I mean when you convert to rank, essentially there could  be a meaningful general reduction in correlation, for example during licking, and this  will be invisible in the ranking system. Maybe show in the supp non-ranked data, or  clarify this somehow

      We agree with this important point. As stated in the manuscript and response to Reviewer #1, our motivation in taking the ranking approach was that the differences in firing rates could bias cross-correlation between spike trains, making raw accounts of significant neuron pairs difficult to compare across conditions, but we acknowledge the ranking measures might obscure meaningful differences or inflate weak effects in the data.

      We added the limitations of ranking approach in the discussion section and emphasized the necessity in future studies for better analysis approaches that could provide more accurate assessment of functional connection dynamics without bias from firing rates.

      (8) Figure 4a x label is in manuscript, which is different than previous time labels,  which were seconds.

      We now changed all time labels from Figure 2 to milliseconds.

      (9) Figure 4 input and output rank look essentially the same.

      We have compressed the brain plots in Figures 4-5 to better convey the take-home message.

      (10) Also, what is the late and early stim period? Can you mark each period in panel A? Early stim period is confusing with early CR period. Same for early respons and late response.

      The definition of time periods was in figure legends. We now mark each period out to avoid confusion.

      (11) Looking at panel B, I don't see any differences between delta-rank in early stim,  late stim, early response, and late response. Same for panel c and output plots.

      The rankings were indeed relatively stable across time periods. The plots are now compressed and showed a mean rank value.

      (12) Panels B and C are just overwhelming and hard to grasp. Colors are similar both  to regular rank values and delta-rank. I don't see any differences between all  conditions (in general). In the text, the authors report only M2 to have an increase in  rank during the response period. Late or early response? The figure does not go well  with the text. Consider minimizing this plot and moving stuff to supplementary.

      The colormap are now changed to avoid confusion, and brain plots are now compressed.

      (13) In terms of a statistical test for Figure 4, a two-way ANOVA was done, but over  what? What are the statistics and p-values for the test? Is there a main effect of time  also? Is their a significant interaction? Was this done on all mice together? How many  mice? If I understand correctly, the post-hoc statistics are presented in the  supplementary, but from the main figure, you cannot know what is significant and what  is not.

      For these figures we were mainly concerned with the post-hoc statistics which described the changes in the rankings of each region across learning.

      We have changed the description to “t-test with Sidak correction” to avoid the confusion.

      (14) In the legend of Figure 4, it is reported that 610 expert CR trials from 6 sessions,  instead of 7 sessions. Why was that? Also, like the previous point, why only 3 mice?

      Behavior data of all the sessions used were shown in Figure S1. There were only 3 mice used for the learning group, the difficulty to achieve sufficient unit yields across all regions in the same animal restricted our sample size

      (15) Body movement analysis: was this done in a different cohort of mice? Only now  do I understand why there was a division into early and late stim periods. In supp 4,  there should be a trace of each body part in CR expert versus naïve. This should also  be done for Hit trials as a sanity check. I am not sure that the brightness difference  between consecutive frames is the best measure. Rather try to calculate frame-to frame correlation. In general, body movement analysis is super important and should  be carefully analyzed.

      Due to the limitation in the experimental design and implementation, movement tracking was not performed during the electrophysiological recordings, and the 3 mice shown in Figure S4 (now S5) were from a separate group. We have carefully examined the temporal profiles of mouse movements and found it did not fully match the rank dynamics for all regions, and we have added these results and related discussion in the revised manuscript. However, we acknowledge the observed motion energy pattern could explain some of the functional connection dynamics, such as the decrease in face and pupil motion energy could explain the reduction in ranks for striatum.

      Without synchronized movement recordings in the main dataset, we cannot fully disentangle movement-related neural activity from task-related signals. We have made this limitation explicit in the revised manuscript and discuss it as a potential confound, along with possible approaches to address it in future work.

      (16) For Hit trials, in the striatum, there is an increase in input rank around the  response period, and from Figure S6 it is clear that this is lick-related. Other than that,  the authors report other significant changes across learning and point out to Figure 5b,c. I couldn't see which areas and when it occurred.

      We did naturally expect the activity in striatum to be strongly related to movement.

      With Figure S6 (now S7) we wished to show that the observed rank increase for striatum could not simply be attributed to changes in time of lick initiation.

      As some readers may argue that during learning the mice might have learned to only intensely lick after response signal onset, causing the observed rise of input rank after response signal, we realigned the spikes in each trial to the time of the first lick, and a strong difference could still be observed between early training stage and expert training stage.

      We still cannot fully rule out the effects from more subtle movement changes, as the face motion energy did increase in early response period. This result and related discussion has been added to the results section of revised manuscript.

      (17) Figure 6, again, is rather hard to grasp. There are 16 panels, spread over 4 areas,  input and output, stim and response. What is the take home message of all this?  Visually, it's hard to differentiate between each panel. For me, it seems like all the  panels indicate that for all 4 areas, both in output and input, frontal areas increase in  rank. This take-home message can be visually conveyed in much less tedious ways.  This simpler approach is actually conveyed better in the text than in the figures  themselves. Also, the whole explanation on how this analysis was done, was not clear  from the text. If I understand it, you just divided and ranked the general input (or  output) into individual connections? If so, then this should be better explained.

      We appreciate this advice and we have compressed the figures to better convey the main message.The rankings for Figure 6 and Figure S8 (now Figure S9) was explained in the left panel of Figure 3C. Each non-zero element in the connection matrix was ranked to value from 1-10, with a value of 10 represented the 10% strongest non-zero elements in the matrix.

      We have updated the figure legends of Figure 3, and we have also updated the description in methods (Connection rank analyses) to give a clearer description of how the analyses were applied in subsequent figures.

      (18) Figure 7: Here, the authors perform a ROC analysis between go and no-go  stimuli. They balance between choice, but there is still an essential difference between  a hit and a FA in terms of movement and licks. That is maybe why there is a big  difference in selective units during the response period. For example, during a Hit trial  the mouse licks and gets a reward, resulting in more licking and excitement. In FAs,the mouse licks, but gets punished, which causes a reduction in additional licking and  movements. This could be a simple explanation why the ROC was good in the late  response period. Body movement analysis of Hit and FA should be done as in Figure  S4.

      We appreciate this insightful advice.

      Though we balanced the numbers of basic trial types, we couldn’t rule out the difference in the intrinsic movement amount difference in FA trials and Hit trials, which is likely the reason of large proportion of encoding neurons in response period.

      We have added this discussion both in result section and discussion section along with the necessity of more carefully designed behavior paradigm to disentangle task information.

      (19) The authors also find selective neurons before stimulus onset, and refer to trial  history effects. This can be directly checked, that is if neurons decode trial history.

      We attempted encoding analyses on trial history, but regrettably for our dataset we could not find enough trials to construct a dataset with fully balanced trial history, visual stimulus and behavior choice.

      (20) Figure 7e. What is the interpretation for these results? That areas which peaked  earlier had more input and output with other areas? So, these areas are initiating  hubs? Would be nice to see ACC vs Str traces from B superimposed on each other.  Having said this, the Str is the only area to show significant differences in the early  stim period. But is also has the latest peak time. This is a bit of a discrepancy.

      We appreciate this important point.

      The limitation in the anatomical coverage of brain regions restricted our interpretation about these findings. They could be initiating hubs or earlier receiver of the true initiating hubs that were not monitored in our study.

      The Str trace was in fact above the ACC trace, especially in the response period. This could be explained by the above advice 18: since we couldn’t rule out the difference in the intrinsic movement amount difference in FA trials and Hit trials, and considering striatum activity is strongly related to movement, the Str trace may reflect more in the motion related spike count difference between FA trials and Hit trials, instead of visual stimulus related difference.

      This further shows the necessity of more carefully designed behavior paradigm to disentangle task information.

      The striatum trace also in fact didn’t show a true double peak form as traces in other regions, it ramped up in the stimulus region and only peaked in response period. This description is now added to the results section.

      In the early stim period, the Striatum did show significant differences in average percent of encoding neurons, as the encoding neurons were stably high in expert stage. The striatum activity is more directly affected Still the percentage of neurons only reached peak in late stimulus period.

      (21) For the optogenetic silencing experiments, how many mice were trained for each  group? This is not mentioned in the results section but only in the legend of Figure 8. This part is rather convincing in terms of the necessity for OFC and V2M

      We have included the mice numbers in results section as well.

      (C) Discussion

      (1) There are several studies linking sensory areas to frontal networks that should be  mentioned, for example, Esmaeili et a,l 2022, Matteucci et al., 2022, Guo et a,l 2014,Gallero Salas et al, 2021, Jerry Chen et al, 2015. Sonja Hofer papers, maybe. Probably more.

      We appreciate this advice. We have now included one of the mentioned papers (Esmaeili et al., 2022) in the results section and discussion section for its direct characterization of the enhanced coupling between somatosensory region and frontal (motor) region during sensory learning.The other studies mentioned here seem to focus more on the differences in encoding properties between regions along specific cortical pathways, rather than functional connection or interregional activity correlation, and we feel they are not directly related to the observations discussed.

      (2) The reposted reorganization of brain-wide networks with shifts in time is best  described also in Sych et al. 2021.

      We regret we didn’t include this important research and we have now cited this in discussion section.

      (3) Regarding the discussion about more widespread stimulus encoding after learning,  the results indicate that the striatum emerges first in decoding abilities (Figure 7c left  panel), but this is not discussed at all.

      We briefly discussed this in the result section. We tend to attribute this to trial history signal in striatum, but since the structure of our data could not support a direct encoding analysis on trial history, we felt it might be inappropriate to over-interpret the results.

      (4) An important issue which is not discussed is the contribution of movement which  was shown to have a strong effect on brain-wide dynamics (Steinmetz et al 2019;  Musall et al 2019; Stringer et al 2019; Gilad et al 2018) The authors do have some movement analysis, but this is not enough. At least a discussion of the possible effects of movement on learning-related dynamics should be added.

      We have included these studies in discussion section accordingly. Since the movement analyses were done in a separate cohort of mice, we have made our limitation explicit in the revised manuscript and discuss it as a potential confound, along with possible approaches to address it in future work.

      (D) Methods

      (1) How was the light delivery of the optogenetic experiments done? Via fiber  implantation in the OFC? And for V2M? If the red laser was on the skull, how did it get  to the OFC?

      The fibers were placed on cortex surface for V2M group, and were implanted above OFC for OFC manipulation group. These were described in the viral injection part of the methods section.

      (2) No data given on how electrode tracking was done post hoc

      As noted in our response to the advice 3 in results section, the electrode shanks were ultra-thin (1-1.5 µm) and it was usually difficult to recover observable tracks or electrodes in section.

      As an attempt to verify the accuracy of implantation depth, we measured the repeatability of implantation in a group of mice and found a tendency for the arrays to end in slightly deeper location in cortex (142.1 ± 55.2 μm, n = 7 shanks), and slightly shallower location in subcortical structure (-122.6 ± 71.7 μm, n = 7 shanks). We added these results as new Figure S1 to accompany Figure 1.

      Reviewer #3 (Recommendations for the authors):

      (1) The manuscript uses decision-making in the title, abstract and introduction.  However, nothing is related to decision learning in the results section. Mice simply  learned to suppress licking in no-go trials. This type of task is typically used to study behavioral inhibition. And consistent with this, the authors mainly identified changes  related to network on no-go trials. I really think the title and main message is  misleading. It is better to rephrase it as visual discrimination learning. In the  introduction, the authors also reviewed multiple related studies that are based on  learning of visual discrimination tasks.

      We do view the Go/No-Go task as a specific genre of decision-making task, as there were literature that discussed this task as decision-making task under the framework of signal detection theory or updating of item values (Carandini & Churchland, 2013; Veling, Becker, Liu, Quandt, & Holland, 2022).

      We do acknowledge the essential differences between the Go/No-Go task and the tasks that require the animal to choose between alternatives, and since we have now realized some readers may not accept this task as a decision task, we have changed the title to visual discrimination task as advised.

      (2) Learning induced a faster onset on CR trials. As the no-go stimulus was not  presented to mice during early stages of training, this change might reflect the  perceptual learning of relevant visual stimulus after repeated presentation. This further  confirms my speculation, and the decision-making used in the title is misleading. 

      We have changed the title to visual discrimination task accordingly.

      (3) Figure 1E, show one hit trial. If the second 'no-go stimulus' is correct, that trial  might be a false alarm trial as mice licked briefly. I'd like to see whether continuous  licking can cause motion artifacts in recording. 

      We appreciate this important point. There were indeed licking artifacts with continuous licking in Hit trials, which was part of the reason we focused our analyses on CR trials. Opto-based lick detectors may help to reduce the artefacts in future studies.

      (4) What is the rationale for using a threshold of d' < 2 as the early-stage data and d'>3  as expert stage data?

      The thresholds were chosen as a result from trade-off based on practical needs to gather enough CR trials in early training stage, while maintaining a relatively low performance.

      Assume the mice showed lick response in 95% of Go stimulus trials, then d' < 2 corresponded to the performance level at which the mouse correctly rejected less than 63.9% of No-Go stimulus trials, and d' > 3 corresponded to the performance level at which the mouse correctly rejected more than 91.2% of No-Go stimulus trials.

      (5) Figure 2A, there is a change in baseline firing rates in V2M, MDTh, and Str. There  is no discussion. But what can cause this change? Recording instability, problem in  spiking sorting, or learning?

      It’s highly possible that the firing rates before visual stimulus onset is affected by previous reward history and task engagement states of the mice. Notably, though recorded simultaneously in same sessions, the changes in CR trials baseline firing rates in the V2M region were not observed in Hit trials.

      Thus, though we cannot completely rule out the possibility in recording instability, we see this as evidence of the effects on firing rates from changes in trial history or task engagement during learning.

      References:

      Carandini, M., & Churchland, A. K. (2013). Probing perceptual decisions in rodents. Nat Neurosci, 16(7), 824-831. doi:10.1038/nn.3410.

      Cruz, K. G., Leow, Y. N., Le, N. M., Adam, E., Huda, R., & Sur, M. (2023).Cortical-subcortical interactions in goal-directed behavior. Physiol Rev, 103(1), 347-389. doi:10.1152/physrev.00048.2021

      Esmaeili, V., Oryshchuk, A., Asri, R., Tamura, K., Foustoukos, G., Liu, Y., Guiet, R., Crochet, S., & Petersen, C. C. H. (2022). Learning-related congruent and incongruent changes of excitation and inhibition in distinct cortical areas. PLOS Biology, 20(5), e3001667. doi:10.1371/journal.pbio.3001667

      Goldbach, H. C., Akitake, B., Leedy, C. E., & Histed, M. H. (2021). Performance in even a simple perceptual task depends on mouse secondary visual areas. Elife, 10, e62156. doi:10.7554/eLife.62156.

      Siegle, J. H., Jia, X., Durand, S., Gale, S., Bennett, C., Graddis, N., Heller, G.,Ramirez, T. K., Choi, H., Luviano, J. A., Groblewski, P. A., Ahmed, R., Arkhipov, A., Bernard, A., Billeh, Y. N., Brown, D., Buice, M. A., Cain, N.,Caldejon, S., Casal, L., Cho, A., Chvilicek, M., Cox, T. C., Dai, K., Denman, D.J., de Vries, S. E. J., Dietzman, R., Esposito, L., Farrell, C., Feng, D., Galbraith, J., Garrett, M., Gelfand, E. C., Hancock, N., Harris, J. A., Howard, R., Hu, B.,Hytnen, R., Iyer, R., Jessett, E., Johnson, K., Kato, I., Kiggins, J., Lambert, S., Lecoq, J., Ledochowitsch, P., Lee, J. H., Leon, A., Li, Y., Liang, E., Long, F., Mace, K., Melchior, J., Millman, D., Mollenkopf, T., Nayan, C., Ng, L., Ngo, K., Nguyen, T., Nicovich, P. R., North, K., Ocker, G. K., Ollerenshaw, D., Oliver, M., Pachitariu, M., Perkins, J., Reding, M., Reid, D., Robertson, M., Ronellenfitch, K., Seid, S., Slaughterbeck, C., Stoecklin, M., Sullivan, D., Sutton, B., Swapp, J., Thompson, C., Turner, K., Wakeman, W., Whitesell, J. D., Williams, D., Williford, A., Young, R., Zeng, H., Naylor, S., Phillips, J. W., Reid, R. C., Mihalas, S., Olsen, S. R., & Koch, C. (2021). Survey of spiking in the mouse visual system reveals functional hierarchy. Nature, 592(7852), 86-92. doi:10.1038/s41586-020-03171-x

      Sych, Y., Fomins, A., Novelli, L., & Helmchen, F. (2022). Dynamic reorganization of the cortico-basal ganglia-thalamo-cortical network during task learning. Cell Rep, 40(12), 111394. doi:10.1016/j.celrep.2022.111394

      Veling, H., Becker, D., Liu, H., Quandt, J., & Holland, R. W. (2022). How go/no-go training changes behavior: A value-based decision-making perspective. Current Opinion in Behavioral Sciences, 47,101206.

      doi:https://doi.org/10.1016/j.cobeha.2022.101206.

    1. eLife Assessment

      This study provides valuable insight into the role of actin protrusions in mediating early pre-endoyctic steps of human papillomavirus entry at the cell surface. Using state-of-the-art microscopy in an immortalized keratinocyte model, the authors present mostly solid evidence that filopodia actively promote the transfer of heparin sulfate-coated virions from the extracullar matrix to the viral entry factor CD151. Remaining gaps in the mechanistic model could be further supported by including a more expansive analysis of the fixed microscopy samples and live cell imaging to distinguish virion transfer from direct binding.

    2. Reviewer #1 (Public review):

      Summary:

      The author's goal was to arrest PsV capsids on the extracellular matrix using cytochalasin D. The cohort was then released and interaction with the cell surface, specifically with CD151 was assessed.

      The model that fragmented HS associated with released virions mediates the dominant mechanism of infectious entry has only been suggested by research from a single laboratory and has not been verified in the 10+ years since publication. The authors are basing this study on the assumption that this model is correct, and these data are referred to repeatedly as the accepted model despite much evidence to the contrary. The discussion in lines 65-71 concerning virion and HSPG affinity changes is greatly simplified. The structural changes in the capsid induced by HS interaction and the role of this priming for KLK8 and furin cleavage has been well researched. Multiple laboratories have independently documented this. If this study aims to verify the shedding model, additional data needs to be provided.

      Note on revisions:

      The authors did an excellent job in their revision to include data from the effect of proteolytic priming on their observed virion transfer to the cell body. All other minor issues were addressed adequately.

      The work could be especially critical to understanding the process of in vivo infection.

    3. Reviewer #2 (Public review):

      The study design involves infecting HaCaT cells (immortalised keratinocytes mimicking basal cells of a target tissue) and observing virus localization with and without actin polymerization inhibition by cytochalasin D (cytoD) to analyze virion transfer from the ECM to the cell via filopodial structures, using cellular proteins as markers.

      In the context of the model system, the authors stress in the revised version the importance of using HaCaT cells as a relevant 'polarized' cell model for infection. The term 'polarized' is used in the cell biological literature for epithelial cells to describe a strict apical vs. basolateral demarcation of the plasma membrane with an established diffusion barrier of the tight junction. However, HaCat cells do not form tight junctions. In squamous epithelia, such barriers are only found in granular layers of the epithelium. The published work cited in support of their claims either does not refer to polarity or only in the context of other cells such as CaCo-2 cells.

      Overall, the matter of polarity would be important, if indeed the virus could only access cell-associated HSPGs as primary binding receptor, or the elusive secondary receptor via the ECM in the used model system (HaCaT cells), if they would locate exclusively basolaterally. This is at least not the case for binding, as observed in several previous publications (just two examples: Becker et al, 2018, Smith et al., 2008). With only a rather weak attempt at experimental verification of their model system with regards to polarity of binding, the authors then go on to base their conclusions on this unverified assumption.

      This is one example of several in the manuscript, where claims for foundational premises, observations, and/or conclusions remain undocumented or not supported by experimental data.

      Another such example is the assumption of transfer of the virus from ECM to the tetraspanin CD151. Here, the conclusions are based on the poorly documented inability of the virus to bind to the cell body, which is in stark contrast to several previous publications, and raises questions. Thus, association with CD151 likely occurs both from ECM derived virus AND virus that binds to cells, so that any conclusions on the mode of association is possible only in live cell data (which is not provided). Overall, their proposed model thus remains largely unsubstantiated with regards to receptor switching.

      There are a number of important additional issues with the manuscript:

      First, none of the inhibitors have been tested in their system for efficacy and specificity, but rely on published work in other cell types. This considerably weakens the confidence on the conclusion drawn by the authors.

      Second, the authors aim to study transfer from ECM to the cell body and effects thereof. However, there are still substantial amounts of viruses that bind to the cell body compared to ECM-bound viruses in close vicinity to the cells. This is in part obscured by the small subcellular regions of interest that are imaged by STED microscopy, or by the use of plasma membrane sheets. This remains an issue despite the added Supple. Fig. 1, where also only sub cellular regions are being displayed. As a consequence the obtained data from time point experiments is skewed, and remains for the most part unconvincing, largely because the origin of virions in time and space cannot be taken into account. This is particularly important when interpreting the association with HS, the tetraspanin CD151, and integral alpha 6, as the low degree of association could be originating from cell bound and ECM-transferred virions alike.

      Third, the use of fixed images in a time course series also does not allow to understand the issue of a potential contribution of cell membrane retraction upon cytoD treatment due to destabilisation of cortical actin. Or, of cell spreading upon cytoD washout. The microscopic analysis uses an extension of a plasma membrane stain as marker for ECM bound virions, this may introduce a bias and skew the analysis.

      Fourth, while the use of randomisation during image analysis is highly recommended to establish significance (flipping), it should be done using only ROIs that have a similar density of objects for which correlations are being established. For instance, if one flips an image with half of the image showing the cell body, and half of the image ECM, it is clear that association with cell membrane structures will only be significant in the original. But given the high density of objects on the plasma membrane, I am not convinced that doing the same by flipping only the plasma membrane will not also obtain similar numbers than the original.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors' goal was to arrest PsV capsids on the extracellular matrix using cytochalasin D. The cohort was then released, and interaction with the cell surface, specifically with CD151, was assessed.

      The model that fragmented HS associated with released virions mediates the dominant mechanism of infectious entry has only been suggested by research from a single laboratory and has not been verified in the 10+ years since publication. The authors are basing this study on the assumption that this model is correct, and these data are referred to repeatedly as the accepted model despite much evidence to the contrary.

      We stated in the introduction on line 65/66 ´Two release mechanisms are discussed, that mutually are not exclusive´. This implies that we do not consider the shedding model as ‘the accepted model’. Furthermore, we do not state in the discussion neither that the shedding model is the preferred one. However, we referred to the shedding model in the discussion, because we find HS associated with transferred PsVs, which is in line with this model.

      The discussion in lines 65-71 concerning virion and HSPG affinity changes is greatly simplified. The structural changes in the capsid induced by HS interaction and the role of this priming for KLK8 and furin cleavage have been well researched. Multiple laboratories have independently documented this. If this study aims to verify the shedding model, additional data need to be provided.

      Our findings are compatible with both models, and we do not aim to verify the shedding model neither want to disprove the priming model. However, as we understand, the referee wishes more visibility of the priming model. Therefore, using inhibitors previously used in the field, we tested whether inhibition of KLK8 or furin reduces PsV translocation to the cell body (after CytD wash off). Leupeptin blocks transport, while Furin inhibitor I still allows some initial translocation. We incorporated this new data as Figure 2 (line 265): “…we would expect that inhibition of L1 processing during the CytD incubation prevents the recovery of PsV translocation from the ECM to the cell body (Figure 2A and D). To test for this possibility, as employed in earlier studies, the protease inhibitor leupeptin was used to inhibit proteases including KLK8 which is required for L1 cleavage (Cerqueira et al. 2015). Employing this inhibitor, the PCC between PsV-L1 and F-actin staining remains negative after CytD removal, showing that for translocation indeed the action of proteases is required (Figure 2B and D). In contrast, inhibition of L2 cleavage by a furin specific inhibitor has no effect on the PCC (Figure 2C and D). However, it should be noted that we occasionally observe PsVs not completely translocating but accumulating at the border of the F-actin stained area (for example see Figure 2C (60 min)). This results in an increase of the PCC almost equal to complete translocation, explaining why the PCC remains unaffected despite a furin inhibitory effect. Hence, furin inhibition may have some effect on translocation that, however, is undetected in this type of analysis.’

      Moreover, we have added a paragraph discussing how our data integrates into the established model of the HPV infection cascade (line 604): ‘HPV infection is the result of several steps, starting with the initial binding of virions via electrostatic and polar interactions (Dasgupta et al. 2011) to the primary attachment site HS (Richards et al. 2013), which induces capsid modification (Feng et al. 2024; Cerqueira et al. 2015) and HS cleavage (Surviladze et al. 2015), enabling the virion to be released from the ECM or the glycocalyx. Next, virions bind to the cell surface to a secondary receptor complex that forms over time, and become internalized via endocytosis, before they are trafficked to the nucleus (Ozbun and Campos 2021; Mikuličić et al. 2021). Regarding the transition from the primary attachment site to cell surface binding, as already outlined in the introduction, two models are discussed. In one model, proteases cleave the capsid proteins. After priming, the capsids are structurally modified and the virion can dissociate from its HS attachment site. It has been suggested that capsid priming is mediated by KLK8 (Cerqueira et al. 2015) and furin (Richards et al. 2006). In our system, KLK8 inhibition blocks PsV transport, while furin inhibition has some effect that, however, cannot be detected in this analysis (Figure 2) suggesting furin engagement at later steps in the infection cascade. This is in line with earlier in vitro studies on the role of cell surface furin (Surviladze et al. 2015; Day et al. 2008; Day and Schiller 2009). In any case, our results align with both models of ECM detachment: one involving HS cleavage (HS co-transfer) and another involving capsid modification (by e.g., KLK8).’

      The model should be fitted into established entry events,…

      Please see our reply above.

      or at minimum, these conflicting data, a subset of which is noted below, need to be acknowledged.

      (1) The Sapp lab (Richards et al., 2013) found that HSPG-mediated conformational changes in L1 and L2 allowed the release of the virus from primary binding and allowing secondary receptor engagements in the absence of HS shedding.

      (2) Becker et al. found that furin-precleaved capsids could infect cells independently of HSPG interaction, but this infection was still inhibited with cytochalasin D.

      (3) Other work from the Schelhaas lab showed that cytochalasin D inhibition of infection resulted in the accumulation of capsids in deep invaginations from the cell surface, not on the ECM

      (4) Selinka et al., 2007, showed that preventing HSPG-induced conformational changes in the capsid surface resulted in noninfectious uptake that was not prevented with cytochalasin D.

      (5) The well-described capsid processing events by KLK8 and furin need to be mechanistically linked to the proposed model. Does inhibition of either of these cleavages prevent engagement with CD151?

      The authors need to consider an explanation for these discrepancies.

      We do not see any discrepancies; our observations are compatible with aspects of both the shedding and the priming model. That PsVs carry HS-cleavage products doesn´t imply that HS cleavage is sufficient or required for infection, or that the priming model would be wrong. We do not view our data as being in conflict with the priming model. Most of the above-mentioned papers are now cited.

      Altogether, we acknowledge that the study gains importance by directly testing the priming model within our experimental system. We are thankful for the above comments and addressed this issue.

      Other issues:

      (1) Line 110-111. The statement about PsVs in the ECM being too far away from the cell surface to make physical contact with the cell surface entry receptors is confusing. ECM binding has not been shown to be an obligatory step for in vitro infection.

      Not obligatory, but strongly supportive (Bienkowska-Haba et al., Plos Path., 2018; Surviladze et al., J. Gen. Viro., 2015). As recently published by the Sapp lab (Bienkowska-Haba et al., Plos Path., 2018), ´Direct binding of HPV16 to primary keratinocytes yields very inefficient infection rates for unknown reasons.´ Moreover, the paper shows that HaCaT cell ECM binding of PsVs increases the infection of NHEK by 10-fold and of HFK by almost 50-fold.

      This idea is referred to again on lines 158-159 and 199. The claim (line 158) that PsV does not interact with the cell within an hour needs to be demonstrated experimentally and seems at odds with multiple laboratories' data. PsV has been shown to directly interact with HSPG on the cell surface in addition to the ECM. Why are these PsVs not detected?

      The reviewing editor speculated that HaCaT cells may be a model system in which the in vivo relevant binding to the ECM can be better studied as in non-polarized cell types. This is because binding to the ECM cannot be bypassed by direct cell surface binding. The observation that only few PsVs bind to the basal cell membrane indeed suggests restricted diffusional access of PsVs to binding receptors of the basal membrane. The reviewing editor asked for an experiment showing that more PsVs bind after cell detachment. We performed this experiment and indeed find more PsVs binding to the cell surface of detached cells. This point is very important for the understanding of the study and now we mention it in several sections of the manuscript, as outlined in the following.

      Line 125: ‘Many PsVs that bind to the ECM may locate distal from the cell surface and are thus unable to establish direct contact with entry receptors. However, they are capable of migrating by an actindependent transport along cell protrusions towards the cell body (Smith et al. 2008; Schelhaas et al. 2008). We aimed for blocking this transport in HaCaT cells, a cell line that is widely used as a cell culture model for HPV infection. HaCaT cells closely resemble primary keratinocytes in key aspects: they are not virally transformed and produce large amounts of ECM that facilitates infection (Bienkowska-Haba et al. 2018; Gilson et al. 2020). In addition, HaCaT cells exhibit cellular polarity that enforces binding of virus particles to the ECM, as the virions cannot bind to receptors/entry components, such as CD151, Itgα6 and HSPGs that co-distribute on the basolateral membrane of polarized keratinocytes (Sterk et al. 2000; Cowin et al. 2006; Mertens et al. 1996), making them inaccessible by diffusion.’

      Line 205: ‘During the CytD incubation, PsVs bind to HSPGs of the basolateral membrane for 5 h. Still, in the cell body area hardly any PsVs are present (0.14 PsV/µm<sup>2</sup>, Supplementary Figure 1B). In the control, the PsV density is several-fold larger (Supplementary Figure 1B). This is expected, as the PsVs bind to the ECM and translocate to the cell body. We wondered whether there are more binding sites at the basal membrane that remain inaccessible to PsVs by diffusion because of the insufficient space between glass-coverslip and basolateral membrane. For clarification, we incubated EDTA detached HaCaT cells in suspension with PsVs for 1 h at 4 °C, followed by re-attachment for 1 h. Under these conditions, we find a PsV density 12.4-fold larger than after 5 h of CytD incubation of adhered cells (Supplementary Figure 1B and D). However, it should be noted that these values cannot be directly compared. Aside from the different treatments, another difference lies in the size of the basal membrane, as re-attachment of cells is not complete after only 1 h (compare size of adhered membranes in Supplementary Figure 1A and C). Therefore, the imaged membranes are likely strongly ruffled, which results in the underestimation of the size of the adhered membrane. As a result, we overestimate the PsVs per µm<sup>2</sup> (please note that we cannot re-attach cells for longer times as we would then lose PsVs due to endocytosis). On the other hand, we would underestimate the PsV density at the basal membrane if after re-attachment we image in part also some apical membrane. In any case, the experiment suggests that PsVs bind more efficiently if membrane surface receptors are accessible by diffusion. This is in support of the above notion that the basal membrane may provide more entry receptors than one would expect from the low density of PsVs bound after 5 h CytD (Supplementary Figure 1B). This suggests that under our assay conditions, PsVs cannot easily bypass the translocation from the ECM to the cell body by diffusing directly to the basal membrane. Hence, the large majority of PsVs that enter the cell were previously bound to the ECM. Therefore, HaCaT cells serve as an ideal model for studying the transfer of ECM bound HPV particles to the cell surface, which is similar to in vivo infection of basal keratinocytes after binding to the basement membrane (Day and Schelhaas 2014; Kines et al. 2009; Schiller et al. 2010; Bienkowska-Haba et al. 2018).’

      Line 529: ‘Filopodia usage not only facilitates infection but also increases the likelihood of virions to reach their target cells during wound healing, namely the filopodia-rich basal dividing cells. In fact, several types of viruses exploit filopodia during virus entry (Chang et al. 2016), hinting at the possibility that for HPV and other types of viruses actin-driven virion transport may play a more important role than it is currently assumed. If this is the case, sub-confluent HaCaT cells, or even better single HaCaT cells, would be an ideal model system for the study of these very early infection steps that involve ECM attachment and subsequent filopodia-dependent transport. As shown in Supplementary Figure 1, HaCaT cells have many binding sites for the HPV16 PsVs. However, as they are polarized and the binding receptors are only at the basal membrane, they remain relatively inaccessible by diffusion. Therefore, the ECM binding that is also observed in vivo (Day and Schelhaas 2014) and subsequent transport via filopodia are used upon infection of HaCaT cells that locate at the periphery of cell patches. Here, PsVs bind to the ECM which strongly enhances infection of primary keratinocytes (Bienkowska-Haba et al. 2018). In contrast, HPV can readily bind to HSPGs on the cell surface of nonpolarized cells, and by this bypasses ECM mediated virus priming and the filopodia dependency. We propose that HaCaT cells are a valuable system for studying the very early events in HPV infection that allows for dissecting capsid interaction with ECM resident priming factors and cell surface receptors.’

      Finally, please note that in the previous version of the manuscript, we did not question that in many cellular systems PsVs interact with heparan sulfate proteoglycans (HSPGs) present on the cell surface, or both on the cell surface and the ECM. We stated on line 59 ´While in cell culture virions bind to HS of the cell surface and the ECM, it has been suggested that in vivo they bind predominantly to HS of the extracellular basement membrane (Day and Schelhaas, 2014; Kines et al., 2009; Schiller et al., 2010).´

      We hope that after adding the above explanations and the experiment requested by the reviewing editor it is now clear why only few PsVs bind directly (not via the ECM) to the cell surface. We appreciate the reviewer’s and the reviewing editor’s input that has significantly improved the manuscript.

      (2) The experiments shown in Figure 5 need to be better controlled. Why is there no HS staining of the cell surface at the early timepoints? This antibody has been shown to recognize N-sulfated glucosamine residues on HS and, therefore, detects HSPG on the ECM and cell surface.

      There is staining. However, as the staining at the periphery is stronger and images are shown at the same settings of brightness and contrast, the impression is given that the cell surface is not stained. We have added more images showing HS cell surface staining.

      (i) Supplementary Figure 4C shows an enlarged view of the CytD/0 min cell shown in Figure 6A. In the area stained by Itgα6, that marks the cell body, HS staining is present, although less abundant in comparison to the ECM.

      (ii) In Figure 8, CytD/30 min, a cell is shown with abundant HS in the cell body region (compare cyan and green LUT).

      (iii) In newly added Figure 3A, lower panel, another cell with HS in the cell body region is shown.

      Please note that the staining is highly variable. We indicate this by stating on Line 373: ‘The pattern of the HS staining (cyan LUT) and the overlap of HS with PsVs and Itgα6 are highly variable (Figure 6A).’

      Therefore, the conclusion that this confirms HS coating of PsV during release from the ECM (line 430431) is unfounded. How do the authors distinguish between "HS-coated virions" and HSPG-associated virions?

      The transient increase in the PCC at CytD/30 min can be interpreted as PsV/HS co-transport or as direct binding of PsVs to cell surface HSPGs. However, two arguments support co-transport.

      First, we find that CytD/PsVs increases the HS intensity (see newly added Figure 3, confirming old Figure 5 that is now Figure 6). We state on line 290 ‘… that without actin-dependent PsV translocation HS cleavage products are retained in the ECM, consistent with the hypothesis that cleaved HS remains associated with PsVs (Ozbun and Campos 2021).

      Second, the distance between HS and Itgα6 (the cell body marker) decreases over time after CytD removal, which suggests movement of HS to the cell body (Supplementary Figure 8D). We state on line 422: ‘The movement of HS towards the cell body after removal of CytD, which indirectly demonstrates that PsVs are coated with HS, is suggested by a shortening of the HS-Itgα6 distance over time (Supplementary Figure 8D).’

      It is difficult to comprehend how the addition of 50 vge/cell of PsV could cause such a global change in HS levels.

      Some areas are covered with confluent cells, to which hardly any PsVs are bound, because accessing their basolateral membrane is nearly impossible, and PsVs do not bind to the exposed apical membrane as well. We assume this is a major difference to cultures of unpolarized cells, where PsVs should distribute more or less equally over cells. This means that in our experiments the vge/cell is not a suitable parameter for relating the magnitude of an effect to a defined number of PsVs. In the ECM, the PsV density is very high, enabling one cell to collect, in theory, several hundred PsVs, much more than expected from the 50 vge/cell.

      We state on line 135: ‘Frequently, we observe patches of confluent cells which are common to HaCaT cells. Cells at the center of these patches are dismissed during imaging, because there are no anterogradely migrating PsVs at these cells. A second reason for our dismissal of these cells is that hardly any PsVs are bound to them, possibly because their basal membranes are inaccessible by diffusion. Instead, we focus on isolated HaCaT cells or cells at the periphery of cell patches. In these cells, we find more PsVs per cell than one would expect from the employed 50 viral genome equivalents (vge) per cell, indicating that PsVs are unequally distributed between the cells.’

      The claim that the HS levels are decreased in the non-cytochalasin-treated cells due to PsV-induced shedding needs to be demonstrated.

      We did not claim that PsVs induce shedding, we rather believe they retain shedded HS. Without PsVs, the shedded HS is washed off from the ECM. We have reproduced the observation made in old Figure 5 (now Figure 6) in the newly added Figure 3 that also shows that PsVs alone have no effect on the HS intensity, only when present together with CytD. We state on line 277: ‘As outlined above, during the 5 h incubation with CytD, proteases in the ECM are expected to cleave HS chains. These cleavage products should be able to diffuse out of the ECM, unless they remain associated with nontranslocating PsVs. In the control, PsV associated HS cleavage products would leave the ECM through PsV translocation…. Using an antibody that reacts with an epitope in native heparan sulfate chains, only after CytD and if PsVs are present, the level of HS staining is significantly increased (Figure 3B). As shown in Figure 3A, stronger HS staining at PsVs (open arrows) and as well in PsV free areas (closed arrows) was observed… Collectively, our findings indicate that without actin-dependent PsV translocation HS cleavage products are retained in the ECM, consistent with the hypothesis that cleaved HS remains associated with PsVs (Ozbun and Campos 2021).’

      If HS is actually shed, staining of the cell periphery could increase with the antibody 3G10, which detects the HS neoepitope created following heparinase cleavage.

      We have tested the antibody by which we obtain only a very weak staining (Supplementary Figure 2), not allowing to differentiate between an increase in the cell periphery and the cell body area. We still include the experiment as it suggests that CytD has no effect on HS processing. We state on line 286: ‘As additional control and shown in Supplementary Figure 2, we use an antibody that reacts with a HS neo-epitope generated by heparitinase-treated heparan sulfate chains (Yokoyama et al. 1999; for details see methods). This neo-epitope staining is independent of the presence of CytD and the incubation time, suggesting that CytD does not directly affect HS processing.’

      Reviewer #2 (Public review):

      Summary:

      Massenberg and colleagues aimed to understand how Human papillomavirus particles that bind to the extracellular matrix (ECM) transfer to the cell body for later uptake, entry, and infection. The binding to ECM is key for getting close to the virus's host cell (basal keratinocytes) after a wounding scenario for later infection in a mouse vaginal challenge model, indicating that this is an important question in the field.

      Strengths:

      The authors take on a conceptually interesting and potentially very important question to understand how initial infection occurs in vivo. The authors confirm previous work that actin-based processes contribute to virus transport to the cell body. The superresolution microscopy methods and data collection are state-of-the art and provide an interesting new way of analysing the interaction with host cell proteins on the cell surface in certain infection scenarios. The proposed hypothesis is interesting and, if substantiated, could significantly advance the field.

      Weaknesses:

      As a study design, the authors use infection of HaCaT keratinocytes, and follow virus localisation with and without inhibition of actin polymerisation by cytochalasin D (cytoD) to analyse transfer of virions from the ECM to the cell by filopodial structures using important cellular proteins for cell entry as markers.

      First, the data is mostly descriptive besides the use of cytoD, and does not test the main claim of their model, in which virions that are still bound to heparan sulfate proteoglycans are transferred by binding to tetraspanins along filopodia to the cell body.

      The study identifies a rapid translocation step from the ECM to CD151 assemblies. We have no data that demonstrates a physical interaction between PsVs and CD151. In the model figure, we draw CD151 as part of the secondary receptor complex. We are sorry for having raised the impression that PsVs would bind directly to CD151 and have modified the model Figure accordingly. In the new model figure (Figure 9), the first contact established is to a CD151 free receptor.

      Second, using cytoD is a rather broad treatment that not only affects actin retrograde flow, but also virus endocytosis and further vesicular transport in cells, including exocytosis. Inhibition of myosin II, e.g., by blebbistatin, would have been a better choice as it, for instance, does not interfere with endocytosis of the virus.

      As we focus on early events, we are not concerned about CytD blocking as well late steps in the infection cascade, like endocytosis. However, we agree that a comparison between CytD and blebbistatin would be very interesting. We added Figure 8, showing that blebbistatin only partially stops migration.

      Line 429: ‘Actin retrograde transport, which underlies the here observed virion transport, is the integrative result of three components (Smith et al. 2008; Schelhaas et al. 2008)…. As CytD broadly interferes with F-actin dependent processes, we investigated the effects upon inhibition of only one of the three components, namely the myosin II mediated retrograde movement towards the cell body. Instead of CytD, we employed in the 5 h preincubation the myosin II inhibitor blebbistatin. For the control (0 min), we show in Figure 8A one example of a cell with comparatively many PsVs at the periphery (as mentioned above, the PsV pattern is highly variable) to better illustrate the difference to the PsV pattern occasionally seen with blebbistatin. After blebbistatin treatment (0 min), PsVs are still distal to the cell body but less dispersed than after CytD treatment, seemingly as if translocation started but stopped in the midst of the pathway (Figure 8A, blebbistatin). The PCC between PsVs and HS, like after CytD (Figure 6C), is elevated after blebbistatin, albeit the effect is not significant (Figure 8C). The cell body PCC, is not at 30 min (CytD) but already at 0 min elevated (compare Figure 6D to Figure 8D), which can be explained by partial translocation. This is further supported by the fact that only 8% of PsVs are closely associated with HS (Figure 8E; blebbistatin, 0 min) compared to 15% after CytD treatment (Figure 6E; 0 min). Furthermore, after 0 min PsV incubation with blebbistatin we observe no effect on the HS intensity (compare Figure 8B to Figure 3B and Figure 6B). Hence, in contrast to CytD, blebbistatin does not trap the PsVs in the ECM where they associate with HS, but ongoing actin polymerization pushes actin filaments along with PsVs towards the cell body.’

      Third, the authors aim to study transfer from ECM to the cell body and the effects thereof. However, there are substantial, if not the majority of, viruses that bind to the cell body compared to ECM-bound viruses in close vicinity to the cells.

      Please see our detailed reply to referee #1 that has raised the same issue. In brief, we agree that in multiple cell culture systems viruses bind preferentially to the cell surface directly. However, in HaCaT cells, the majority of PsVs does not bind directly to the basal membrane but gets there after initial binding to the ECM. Thus, we believe our system appropriately models the physiologically relevant scenario of ECM-to-cell transfer, as also speculated by the reviewing editor that has suggested an experiment showing that more PsVs bind to detached cells (please see above).

      This is in part obscured by the small subcellular regions of interest that are imaged by STED microscopy, or by the use of plasma membrane sheets. As a consequence, the obtained data from time point experiments is skewed, and remains for the most part unconvincing due to the fact that the origin of virions in time and space cannot be taken into account. This is particularly important when interpreting association with HS, the tetraspanin CD151, and integral alpha 6, as the low degree of association could originate from cell-bound and ECM-transferred virions alike.

      As already stated above, we observe massive binding of PsVs to the ECM, in contrast to very few PsVs that diffuse beneath the basolateral membrane of the polarized HaCaT cells and do bind directly to the cell surface. In other cellular systems, cells may hardly secrete ECM, are not polarized, and therefore virions can easily bypass ECM binding. Therefore, it is reasonable to assume that in HaCaT cells the large majority of PsVs found on the cell body originates from the ECM.

      Fourth, the use of fixed images in a time course series also does not allow for understanding the issue of a potential contribution of cell membrane retraction upon cytoD treatment due to destabilisation of cortical actin. Or, of cell spreading upon cytoD washout.

      The newly added blebbistatin experiment suggests that the initial translocation is exclusively dependent on retrograde actin flow. However, we agree that we are not able to unravel more details regarding the different possible contributions to the movement. Importantly, the lack of PCC increase after CytD/leupeptin removal (Figure 2D) suggest there is not much cell spreading into the area of accumulated PsVs. Please see our more detailed reply to the same issue raised by the same referee in the recommendations for the authors.

      The microscopic analysis uses an extension of a plasma membrane stain as a marker for ECM-bound virions, which may introduce a bias and skew the analysis.

      The dye TMA-DPH stains exclusively cellular membranes and not the ECM. The stain is actually used to delineate the cell body from the ECM area (please see Figure 1).

      Fifth, while the use of randomisation during image analysis is highly recommended to establish significance (flipping), it should be done using only ROIs that have a similar density of objects for which correlations are being established.

      We agree that the way of how randomization is done is very important. Regarding the association of PsVs with CD151 and HS, we corrected for random background association, which is now explained in more detail in in the Figure legend of Supplementary Figure 7: “On flipped images, we often find values more than half of the values of the original images, demonstrating that many PsVs have a distance ≤ 80 nm to CD151 merely by chance (background association)… (C) Each time point in (A) and (B) obtained from flipped images is the average of three biological replicates. We use these altogether 24 data points, plotting the fraction of closely associated PsVs against the CD151 maxima density. The fraction increases with the maxima density, as the chance of random association increases with the maxima density. The fitted linear regression line describes the dependence of the background association from the maxima density. As a result, the background association (y) can be calculated for any maxima density (x) in original images with the equation y = 2.04x. Please note that the CytD/0 min may be overcorrected as we subtract background association with reference to the CD151 maxima density of the entire ROI (for an example ROI see Supplementary Figure 6A), although the local maxima density at distal PsVs is lower. On the other hand, PsVs at the cell border may have a larger local CD151 maxima density and consequently are undercorrected.’

      For instance, if one flips an image with half of the image showing the cell body, and half of the image ECM, it is clear that association with cell membrane structures will only be significant in the original.

      We are aware of this problem. For instance, it would produce ‘artificially’ low PCCs after flipping images of PsV/HS stainings (please see negative PCC value after flipping in Supplementary Figure 8). In this case, we do not use as argument that in flipped images the PCC is lower. Instead, we would argue that over time the PCC changes in the original images. We still provide the PCC values of flipped images, as additional information, showing that in most cases we obtain after flipping a PCC of zero, as expected

      Hence, we fully agree that careful controls in image analysis is required, and used the above-described method for the correction of background association when the fraction of closely associated PsVs is analyzed. We do not use a lower PCC value in flipped images as argument if not appropriate.

      I am rather convinced that using randomisation only on the plasma membrane ROIs will not establish any clear significance of the correlating signals.

      Figure 6D and 8D show the PCC specifically of the cell body (only of plasma membrane ROIs). In flipped images (not shown in the previous version for clarity), we obtain significantly lower PCCs (Supplementary Figure 8F/G and Supplementary Figure 10C/D. We propose that in this case it would be appropriate to use a lower PCC of flipped images as argument for specific association. Still, also in this experiment we argue with a change in the PCC over time, and not with a PCC of zero after flipping. As above, we still provide the PCC values of flipped images as additional information.

      Also, there should be a higher n for the measurements.

      One replicate is based on the average of 14-15 cells for each condition (more for figure 4). Hence, in a typical experiment (Control and CytD with 4 time points) about 120 cells are analyzed, which is a broad basis for the averages of one replicate.

      We realize that with three biological replicates we find significant effects only if we have strong effects or moderate effects with very low variance.

      Recommendations for the authors:

      Reviewing Editor:

      The focus on the events of HPV infection between ECM binding and keratinocyte-specific receptor binding is unique and interesting. However, I agree with the reviewers that some of the conclusions could use more experimental support, as detailed in their comments. The failure to detect direct binding of the PsV to HSPGs on the cell surface in in vitro assays contradicts much of the published literature. For example, others have found that HPV capsids bind cultured cell lines in suspension, i.e, in the absence of ECM. Do EDTA-suspended HaCaT cells bind PsV? Is the binding HSPG dependent? If the authors think that failure to detect direct cell binding of HaCaTs is an unusual feature of these cell lines or culture condition,s then it would be helpful to provide an explanation. However, it is worth noting that an in vitro system where the cells do not directly bind capsids through HSPG interactions would be a much better model for studying the stages of HPV infection that are the focus of this study, since there is no direct binding of keratinoctyes in vivo.

      We are thankful for this comment that had a strong influence on the revision. The suggested experiment has been incorporated as new Supplementary Figure 1. It shows that many more PsVs bind to the cell surface of cells in suspension than to adhered cells. As suggested by the reviewing editor, we explain now that HaCaT cells are a suitable model system for studying the in vivo transport from the ECM to the cell body that in these cells, due to their polarization, cannot be bypassed (for more details please see our replies above addressing these issues).

      Because conclusions drawn regarding HS interactions are largely based on experiments using a single HS mAb, it is important that the specificity of this mAb is described in more detail, either based on the literature or further experimentation.

      We provide now detailed information about the HS antibodies used in the study. We state on line 282 ‘Using an antibody that reacts with an epitope in native heparan sulfate chains…’ and on line 286 ‘we use an antibody that reacts with a HS neo-epitope generated by heparitinase-treated heparan sulfate chains…’ and in the methods section ‘For Heparan sulfate (HS) a mouse IgM monoclonal antibody (1:200) (amsbio, cat# 370255-S) was used that reacts with an epitope in native heparan sulfate chains and not with hyaluronate, chondroitin or DNA, and poorly with heparin (mAb 10E4 (David et al., 1992)). For HS neo-epitope (Yokoyama et al., 1999) detection, a mouse monoclonal antibody (1:200) (amsbio, cat#370260-S) was used that reacts only with heparitinase-treated heparan sulfate chains, proteoglycans, or tissue sections, and not with heparinase treated HSPGs. The antibody recognizes desaturated uronic acid residues (mAb 3G10 (David et al., 1992)).’

      Reviewer #1 (Recommendations for the authors):

      (1) The phrase "tight association" or similar is repeatedly used and is not acceptable for microscopic studies; use "close association", which has no affinity connotations.

      Has been changed as suggested by the referee.

      (2) Why are lysine-coated coverslips used for microscopy? HaCaT cells adhere tightly to untreated glass, and this coating could affect the distribution of ECM and extracellular PsV.

      We believe a tight association of the basal cell membrane to its substrate, as in vivo, where the basal membrane is tightly adhered to other cells, is important in these experiments. In weakly adherent cells more PsVs may bind to the cell surface, bypassing the transport step. Hence, although HaCaT cells may not require the coat and would be able to adhere to glass, the association may not be tight enough to mimic in vivo conditions.

      (3) What is the reason to use detection of the pseudogenome for some of the experiments instead of L1 detection throughout? The process of EdU detection is sufficiently denaturing to affect some protein epitopes. The introduction of this potential artifact doesn't seem warranted for capsid detection experiments.

      The L1 and the Itgα6 antibody are from the same species, wherefore we have used in Figures 4 and 6 click-labeling of the reporter plasmid. We do not disagree with the notion of the referee, that EdU detection may denature the epitope of some proteins. For instance, we have observed a different staining pattern for CD151; for Itgα6 and HS we saw no obvious difference in the staining patterns. In double staining experiments using L1 antibody and click-labeling, both staining patterns overlapped very well, indicating that click-labeling is suitable to visualize PsVs.

      (4) What concentration of TMA-DPH was used?

      TMA-DPH is a poorly water-soluble dye that becomes strongly fluorescent upon insertion into a membrane. Because of its poor water solubility, a precise concentration cannot be given. We added 50 µl of a saturated TMA-DPH solution in PBS to 1 ml of PBS in the imaging chamber. We state this now in the methods section.

      (5) Line 419: This statement is misleading. Although PsV interaction with HSPG on the ECM is crucial for infectious transfer to cells, the majority of the PsV binding on the ECM has been attributed to interaction with laminin 332. Treatment of PsV with heparin causes sequestration to the ECM.

      We are sorry for the confusion and have removed the misleading statement.

      (6) Some reference choices are poor:

      Line 54: Ozbun and Campos, this is not the correct reference

      In the review we cited, in the introduction it is stated that PsVs establish infection via a break in the epithelial barrier? However, we have replaced this reference by a review that focuses more on epithelial wounding: ‘Ozbun, Michelle A. (2019): Extracellular events impacting human papillomavirus infections: Epithelial wounding to cell signaling involved in virus entry. In Papillomavirus research (Amsterdam, Netherlands) 7, pp. 188–192. DOI: 10.1016/j.pvr.2019.04.009.’

      Line 2012: Doorbar et al., this is not the correct reference.

      Thank you for pointing this out (..we assume the referee refers to line 104 and not line 2012). We have noticed this error during revision. As it is difficult to get a specialized review on this topic, we now cite Ozbun and Campus, 2021 that states PsVs are ‘structurally and immunologically indistinguishable from lesion- and tissue-derived HPVs.’

      Minor issues:

      (1) It is difficult to appreciate the ECM and cell surface binding pattern from the provided images, which do not even contain an entire cell. We need to see a few representative field views with the ECM delineated with laminin 332 staining, as HS antibodies stain both the ECM and cell surface.

      We now provide overview images in Supplementary Figure 4. The only experiment requiring a clear delineation between ECM and cell surface is the experiment of Figure 4. Here, we do not use the HS as a reference staining because it stains both the ECM and the cell surface.

      (2) For Figure 1E, the cells were only infected for 24 hours. The half-time for infectious internalization of HaCaT cells was shown to be 8 hours for cell-associated PsV and closer to 20 hours for PsV that was associated with the ECM prior to cell association (Becker et al., 2018). Why was such a short infection time chosen?

      During assay establishment it has been observed that after 24 h the luciferase activity is optimal.

      (3) Figure 5, the staining of uninfected cells +/- cyto treatment needs to be included.

      Now visible in new Figure 3.

      I am confused by lines 54-57. It seems as if the authors are claiming that HSPGs are not present on the ECM. This sentence, as written, is misleading.

      We agree, and state now on line 58 ‘Here, virions bind to the linear polysaccharide heparan sulfate (HS) that is present in the extracellular matrix (ECM) but as well on the plasma membrane surface. HS is attached to proteins forming so called heparan sulfate proteoglycans (HSPGs).’

      Reviewer #2 (Recommendations for the authors):

      There are further issues that are not pertaining to the study design that I find important.

      (1) It remains speculative whether the virions that are transferred from the ECM are actually structurally modified.

      The newly added Figure 2, showing that leupeptin blocks infection in our assay, suggests that virions indeed are primed.

      (2) The origin of HS correlated with virions on the cell body after transfer is also not clear: does the virus associate with cell surface HS, or does it bring HS from the ECM? Simply staining HS against Nsulfated moieties does not allow such conclusions.

      This issue has been already raised in the public review to which we replied above. In brief, we agree that the transient increase of the PCC between PsVs and HS in the cell body region can be also explained by PsVs coming from the ECM without HS and binding to cell surface HS, or from PsVs binding directly (not via the ECM) to cell surface HSPGs. However, there are two more arguments indicating that PsVs are coated with HS. Please see our detailed reply above.

      (3) Figure 1: There are few, if any, filopodia in untreated cells. It would be good to quantify their abundance to substantiate that resting HaCat cells are indeed a good model for filopodial transport bs. membrane retraction / spreading. In HaCat ECM, the virus also binds to laminin-332 for a good part. Would this not also confound the analysis?

      At first glance, the number of filopodia appears to be too low to account for such an efficient transport. However, please note that the formation of filopodia is very dynamic, and that they can form and disappear within minutes (see below). We also often observe many PsVs aligned at one filopodium. Moreover, not every cell periphery exhibits large accumulations of PsVs. Therefore, we believe it is in principle possible that filopodia are largely responsible for the transport. We cannot exclude that we overestimate the transport rate due to partial cell spreading after CytD removal, which, however, we consider as rather unlikely as in Figure 2 we observe no increase in the PCC when leupeptin was present during the CytD incubation. Under these conditions, PsVs do not translocate but cells could spread, and this would increase he PCC between PsVs and F-actin if cells would spread into the area of accumulated PsVs.

      We now state on line 304: ‘This suggests that the half-time of PsV translocation from the periphery to the cell body is about 15 min. In fact, the half-time maybe longer, as we cannot exclude that cell spreading after CytD removal contributes to less PsVs measured in the cell periphery.’ and on line 477 ‘As mentioned above, the half-time could be longer if cell spreading is in part responsible for the translocation of PsVs onto the cell body. However, we assume that this is rather unlikely, as cell spreading would increase the PCC between PsVs and F-actin under a condition where filopodia mediated transport is blocked but not cell spreading, which is not the case (Figure 2B and D, CytD/leupeptin).’

      (4) Figure 2: This would benefit from live cell analysis. There are considerable amounts of virions on the cell body, which partially contradicts statements from Figure 1.

      Does the referee refer to the images shown in Figure 4 (old Figure 2)? Please note that at CytD/0 min there are hardly any PsVs in the cell body region, the fluorescence (magenta LUT) is autofluorescence (this is explained in the results section). Only at later time points PsVs are in the cell body region.

      The fast transfer to the cell body after cyto D washout is based on the assumption that filopodia formation and transport along them (and not membrane extension) occur quickly. Is this reasonable?

      We are no experts on filopodia, but one finds references suggesting that they grow at rates of several µm per minutes and have lifetimes between a few seconds and several minutes. Hence, within the 15 min we determine for the transport, cells may need a few minutes to recover from CytD, a few minutes to form filopodia that reach out into the ECM, and a few minutes for the transport itself. However, we agree that we cannot exclude membrane extension contributing to our observed transport, although we consider this as rather unlikely (see above).

      (5) Figure 3: The rationale of claiming the existence of 'endocytic structures' needs to be better explained and quantified in the according supplementary figure.

      We now state in the legend ‘We propose that the agglomerated CD151 maxima close to PsVs feature the characteristics of endocytic structures, as CD151 has been shown to co-internalize with PsVs (Scheffer et al. 2013), and as these structures invaginate into the cell, like PsV filled tubular organelles previously described by electron microscopy (Schelhaas et al. 2012).’ For a proper quantification of these highly variable structures a much larger sample would be required.

      The formation of virus-filled tubules upon cytoD treatment has been previously reported. Are these viruses that come from the cell body or from the ECM?

      With the new data and explanations that have been added to the manuscript, it should be clear that it is reasonable to assume that they come largely from the ECM.

      (6) Figure 4: How are the subcellular ROIs chosen? Is there not a bias by not studying a full cell?

      We now explain better how we chose cells for analysis. We state on line 138 ‘Instead, we focus on isolated HaCaT cells or cells at the periphery of cell patches. In these cells, we find more PsVs per cell than one would expect from the employed 50 viral genome equivalents (vge) per cell, as PsVs are unequally distributed between the cells. Moreover, these PsVs usually are not homogenously distributed around the cell but concentrate at one region. We investigate the translocation of PsVs from these regions, defining ROIs for analysis that cover PsVs at the periphery and the cell body (see Supplementary Figures 6A and 8A).’

      (7) Figure 5/6: The data needs a better analysis on correlation by using randomisation as explained above.

      Please see our reply to the same point of the public review raised by the same referee.

      (8) Figure 7: This model involves CD151 being a mediator in transfer, but this has not been functionally shown. There are HaCaT CD151 KO cells available (from the Sonnenberg lab), it would be good to use those to test the model and whether transfer indeed involves CD151.

      As already stated above, we are sorry for having raised the impression that PsVs bind directly to CD151. The model Figure has been modified. Please see our reply above.

      (9) The manuscript would benefit from a number of experiments addressing the most crucial issues:

      (a) As mentioned before, the use of blebbistatin, which blocks myosin II function and arrests actin retrograde flow within seconds of addition, would be a good inhibitor to control for transfer in at least some of the most crucial experiments.

      In Figure 8 we have tested blebbistatin. Please see our reply above.

      (b) Live cell analysis would allow for monitoring of whether membrane retraction upon cytoD treatment would have to be taken into account for the analysis of the data. The same is true for the cytoD washouts, upon which most cells exhibit pronounced membrane spreading. The latter is important to support filopodial transport rather than membrane ruffling and spreading, leading to the clearance of extracellular virions from the ECM.

      We agree that this would be desirable. As replied above, we now discuss the issue of possible membrane spreading and reason why we consider it as rather unlikely.

      (c) To rid oneself of the issue of plasma membrane-bound virions as a confounding factor, one could use cells treated by sodium chlorate, which leads to undersulfation of HS on the cell surface, and seed them onto ECM with functional HSPGs. This would then indeed establish that the HS and virus are transferred together.

      We agree that this would be a smart experiment. As the main focus of our study is not clarifying whether PsVs are coated with HS or not, we gave other experiments priority.

      (10) The manuscript is, while carefully and thoughtfully worded on the issue of microscopy analysis, for a good part, extrapolating too strongly from the authors' data and unsubstantiated assumptions to conclude on their model. It would be good if the authors would support their claims with previous or their own experimental work. Just two examples of several: the assumption that cell-bound virions are negligible should be substantiated, as the literature would indicate otherwise.

      We determined the PsV density in adhered, CytD treated cells, and find around 0.14 per µm<sup>2</sup> (Supplementary figure 1B), which is 4 to 5-fold less when compared to the PsV density quantified in an area covering the cell body and the periphery (Figure 1B, see line 174 for PsVs/µm<sup>2</sup> values). Quantifying the PsV density only in the periphery would yield a severalfold larger difference. However, due to the limited resolution of the microscope we would strongly underestimate the PsV density in the accumulations. We prefer not to discuss this in detail, as exact numbers are difficult to obtain.

      Line 129: Cyto D should not inhibit the enzymes modifying HS or proteins (including virions). This is true, but cytoD may limit their secretion and abundance.

      We show in Figure 3 that CytD does not reduce HS staining (e.g., by limiting HS secretion, as suggested by the referee), suggesting that it rather does not limit secretion.

      We thank the referee´s and the reviewing editor for their helpful comments!

    1. eLife Assessment

      This valuable work advances our understanding of the relationship between multimodal magnetic resonance imaging (MRI) measures, cognition, and mental health. Compelling use of statistical learning techniques in UK Biobank data shows that 48% of the variance between an 11-task derived g-factor and imaging data can be explained. Overall, this paper contributes to the study of brain-behaviour relations and will be of interest for both its methods and its findings on how much variance in g can be explained.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to examine how the covariation between cognition (represented by a g-factor based on 12 features of 11 cognitive tasks) and mental health (represented by 133 diverse features) is reflected in MR-based neural markers of cognition, as measured through multimodal neuroimaging (structural, rsfMRI and diffusion MR). To integrate multiple neuroimaging phenotypes across MRI modalities the authors used a so-called a stacking approach, which employs two levels of machine learning. First, they build a predictive model from each neuroimaging phenotype to predict a target variable. Next, in the stacking level, they use predicted values (i.e., cognition predicted from each neuroimaging phenotype) from the first level as features to predict the target variable. To quantify the contribution of the neural indicators of cognition explaining the relationship between cognition and mental health, they conducted commonality analyses. Results showed that when they stacked neuroimaging phenotypes within dwMRI, rsMRI, and sMRI, they captured 25.5%, 29.8%, and 31.6% of the predictive relationship between cognition and mental health, respectively. By stacking all 72 neuroimaging phenotypes across three MRI modalities, they enhanced the explanation to 48%. Age and sex shared substantial overlapping variance with both mental health and neuroimaging in explaining cognition, accounting for 43% of the variance in the cognition-mental health relationship.

      Strengths:

      (1) Big study population (UK Biobank with 14000 subjects)

      (2) Description of methods (including Figure 1) is helpful in understanding the approach

      (3) Final manuscript improved after revision

      Weaknesses:

      (1) The relevance of the question is now better described, but the impact of the work is more of conceptual value than of direct clinical value.

      (2) The discussion on the interpretation of the positive and negative PLRS loadings is now further explained, but remains a bit counterintuitive.

      Note: the computational aspects of the methods fall beyond my expertise.

    3. Reviewer #2 (Public review):

      Summary:

      The goal of this manuscript was to examine whether neural indicators explain the relationship between cognition and mental health. The authors achieved this aim by showing that the combination of MRI markers better predicted the cognition-mental health covariation. I have reviewed the paper before and the authors addressed my comments very well.

      Strengths:

      Large sample (UK biobank data) and clear description of advanced analyses.

      Weaknesses:

      My main concern in my previous review was that it was not completely clear to me what it means to look at the overlap between cognition and mental health. The authors have addressed this in the current version.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to examine how the covariation between cognition (represented by a g-factor based on 12 features of 11 cognitive tasks) and mental health (represented by 133 diverse features) is reflected in MR-based neural markers of cognition, as measured through multimodal neuroimaging (structural, rsfMRI, and diffusion MR). To integrate multiple neuroimaging phenotypes across MRI modalities, they used a so-called stacking approach, which employs two levels of machine learning. First, they built a predictive model from each neuroimaging phenotype to predict a target variable. Next, in the stacking level, they used predicted values (i.e., cognition predicted from each neuroimaging phenotype) from the first level as features to predict the target variable. To quantify the contribution of the neural indicators of cognition explaining the relationship between cognition and mental health, they conducted commonality analyses. Results showed that when they stacked neuroimaging phenotypes within dwMRI, rsMRI, and sMRI, they captured 25.5%, 29.8%, and 31.6% of the predictive relationship between cognition and mental health, respectively. By stacking all 72 neuroimaging phenotypes across three MRI modalities, they enhanced the explanation to 48%. Age and sex shared substantial overlapping variance with both mental health and neuroimaging in explaining cognition, accounting for 43% of the variance in the cognition-mental health relationship.

      Strengths:

      (1) A big study population (UK Biobank with 14000 subjects).

      (2) The description of the methods (including Figure 1) is helpful in understanding the approach.

      (3) This revised manuscript is much improved compared to the previous version.

      Weaknesses:

      (1) Although the background and reason for the study are better described in this version of the manuscript, the relevance of the question is, in my opinion, still questionable. The authors aimed to determine whether neural markers of cognition explain the covariance between cognition and mental health and which of the 72 MRI-based features contribute to explaining most of the covariance. I would like to invite the authors to make a stronger case for the relevance, keeping the clinical and scientific relevance in mind (what would you explain to the clinician, what would you explain to the people with lived experience, and how can this knowledge contribute to innovation in mental health care?).

      Thank you for this insightful observation. We agree that establishing the real-world significance of fundamental research is paramount, and we have revised our manuscript to better articulate this relevance.

      For clinicians, our work (a) corroborates the link between cognition and mental health, confirming the transdiagnostic role of cognition, and (b) demonstrates that current neuroimaging tools can capture the neurobiology underlying this relationship. These findings offer several implications for clinical practice. First, they support the development of interventions aimed at enhancing cognitive functioning as a pathway to improving mental health. Second, our work introduces neuroimaging as a potential tool for assessing the neurobiological basis of the cognition–mental health connection. With further research, clinicians may be able to use neuroimaging to track cognitive changes at the neural level, which could help monitor treatment efficacy for interventions (e.g., stimulant medications for ADHD) designed to boost cognitive functioning.

      Following your suggestions, we have expanded the Discussion (Line 684) to include future directions and clinical perspectives on the findings.

      Line 684: “Neuroimaging offers a unique window into the biological mechanisms underlying cognition–mental health overlap – insights unattainable from behavioural data alone. Our findings validate brain-based neural markers as a core unit of analysis for cognitive functioning, advancing mental health research through the lens of cognition. Beyond this conceptual contribution, the study has clinical implications. First, by demonstrating a transdiagnostic link between cognition and mental health, we support interventions that enhance cognition as a pathway to improving mental health. Second, we show neuroimaging as an effective tool for assessing the neurobiological basis of this link. Quantifying neuroimaging’s capacity to capture this relationship is essential for future research integrating imaging with cognitive testing to monitor treatment-related neural changes. Such work could enable personalised interventions, using neuroimaging to track cognitive changes and treatment efficacy (e.g., stimulant medications for ADHD) aimed at boosting cognitive functioning.”

      (2) The discussion on the interpretation of the positive and negative PLRS loadings is not very convincing, and the findings are partly counterintuitive. For example (1) how to explain that distress has a positive loading and anxiety/trauma has a negative loading?; (2) how to explain that mental health features like wellbeing and happiness load in the same direction as psychosis and anxiety/trauma? From both a clinical and a neuroscientific perspective, this is hard to interpret.

      Thank you for pointing this out. We appreciate your concern regarding the interpretation of positive and negative PLSR loadings. To clarify:

      (1) The directions of PLSR loadings are broadly consistent with univariate correlations, suggesting that the somewhat counterintuitive relationships mentioned are shown even when we apply simply univariate correlations. PLSR extends beyond univariate approaches by modelling multivariate relationships across features and outcomes. It constructs new components – linear combinations of predictors – that simultaneously explain variance in the predictors and their covariance with the response.

      (2) The positive loading of distress likely reflects cohort-specific questionnaire design in the UK Biobank, where feeling of distress was tied to seeking medical help. Individuals with higher cognition and socioeconomic status may be more likely to seek professional support, which explains the counterintuitive direction.

      (3) The negative loadings of wellbeing and happiness may also reflect cohort-specific effects, such as older age, and align with prior work linking excessive optimism to poorer reasoning and cognitive performance. This suggests that realism or pessimism may sometimes be associated with better cognition, particularly in older adults.

      These points are discussed in detail in the manuscript (Lines 493–545). We have emphasised that some of these findings may be cohort-specific and cited supporting literature, as seen below.

      (1) How to explain that distress has a positive loading and anxiety/trauma has a negative loading?

      Line 493: “The directions of PLSR loadings were broadly consistent with univariate correlations. PLSR extends beyond univariate approaches by modelling multivariate relationships across features and outcomes. Consistently, both univariate correlations and factor loadings derived from the PLSR model indicated that scores for mental distress, alcohol and cannabis use, and self-harm behaviours related positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events related negatively to the g-factor. Positive PLSR loadings of features related to mental distress may indicate greater susceptibility to or exaggerated perception of stressful events, psychological overexcitability, and predisposition to rumination in people with higher cognition [72]. On the other hand, these findings may be specific to the UK Biobank cohort and the way the questions for this mental health category were constructed. In particular, to evaluate mental distress, the UK Biobank questionnaire asked whether an individual sought or received medical help for or suffered from mental distress. In this regard, the estimate for mental distress may be more indicative of whether an individual experiencing mental distress had an opportunity or aspiration to visit a doctor and seek professional help [73]. Thus, people with better cognitive abilities and also with a higher socioeconomic status may indeed be more likely to seek professional help.”

      Line 529: “Consistent with previous studies, we showed that anxiety and negative traumatic experiences were inversely associated with cognitive abilities [90–93]. Anxiety may be linked to poorer cognitive performance via reduced working memory capacity, increased focus on negative thoughts, and attentional bias to threatening stimuli that hinder the allocation of cognitive resources to a current task [94–96]. Individuals with PTSD consistently showed impaired verbal and working memory, visual attention, inhibitory function, task switching, cognitive flexibility, and cognitive control [97–100]. Exposure to traumatic events that did not reach the PTSD threshold was also linked to impaired cognition. For example, childhood trauma is associated with worse performance in processing speed, attention, and executive function tasks in adulthood, and age at a first traumatic event is predictive of the rate of executive function decline in midlife [101,102]. In the UK Biobank cohort, adverse life events have been linked to lower cognitive flexibility, partially via depression level [103].”

      (2) How to explain that mental health features like wellbeing and happiness load in the same direction as psychosis and anxiety/trauma?

      Line 545: “Finally, both negative PLSR loadings and corresponding univariate correlations for features related to happiness and subjective well-being may be specific to the study cohort, as these findings do not agree with some previous research [107–109]. On the other hand, our results agree with the study linking excessive optimism or optimistic thinking to lower cognitive performance in memory, verbal fluency, fluid intelligence, and numerical reasoning tasks, and suggesting that pessimism or realism indicates better cognition [110]. The concept of realism/optimism as indicators of cognition is a plausible explanation for a negative association between the gfactor and friendship satisfaction, as well as a negative PLSR loading of feelings that life is meaningful, especially in older adults who tend to reflect more on the meaning of life [111]. The latter is supported by the study showing a negative association between cognitive function and the search for the meaning of life and a change in the pattern of this relationship after the age of 60 [112]. Finally, a UK Biobank study found a positive association of happiness with speed and visuospatial memory but a negative relationship with reasoning ability [113].”

      (3) The analysis plan has not been preregistered (e.g. at OSF).

      Note: the computational aspects of the methods fall beyond my expertise.

      Thank you for pointing this out. We acknowledge that the analysis plan was not preregistered, as our approach was primarily data‑driven rather than hypothesis‑driven. We essentially applied the machine learning approach to quantify the strength of the cognition-mental health relationship in relation to neuroimaging. To ensure transparency and reproducibility, we have made all analysis code and intermediate outputs publicly available on our GitHub repository (https://github.com/HAM-lab-Otago-University/UKBiobank/) within the constraints of UK Biobank’s ethical policy and provided a detailed description of each methodological step in the Supplementary Materials.

      Reviewer #2 (Public review):

      Summary:

      The goal of this manuscript was to examine whether neural indicators explain the relationship between cognition and mental health. The authors achieved this aim by showing that the combination of MRI markers better predicted the cognition-mental health covariation.

      Strengths:

      The evidence supporting the conclusions is compelling. There is a large sample (UK biobank data) and a clear description of advanced analyses.

      Weaknesses:

      In the previous version of the paper, it was not completely clear what it means to look at the overlap between cognition and mental health. The authors have addressed this in the current version.

      Thank you for your positive feedback and for recognizing the strengths of our work. We appreciate your comments and are happy that the revisions addressed your concerns.

    1. eLife Assessment

      This study offers a valuable methodological advance by introducing a gene panel selection approach that captures combinatorial specificity to define cell identity. The findings address key limitations of current single-gene marker methods. The evidence is compelling, but would be strengthened by further validation of rare cell states and unexpected marker categories.

    2. Joint Public Review:

      In this study, the authors introduce CellCover, a gene panel selection algorithm that leverages a minimal covering approach to identify compact sets of genes with high combinatorial specificity for defining cell identities and states. This framework addresses a key limitation in existing marker selection strategies, which often emphasize individually strong markers while neglecting the informative power of gene combinations. The authors demonstrate the utility of CellCover through benchmarking analyses and biological applications, particularly in uncovering previously unresolved cell states and lineage transitions during neocorticogenesis.

      The major strengths of the work include the conceptual shift toward combinatorial marker selection, a clear mathematical formulation of the minimal covering strategy, and biologically relevant applications that underscore the method's power to resolve subtle cell-type differences. The authors' analysis of the Telley et al. dataset highlights intriguing cases of ribosomal, mitochondrial, and tRNA gene usage in specific cortical cell types, suggesting previously underappreciated molecular signatures in neurodevelopment. Additionally, the observation that outer radial glia markers emerge prior to gliogenic progenitors in primates offers novel insights into the temporal dynamics of cortical lineage specification.

      However, several aspects of the study would benefit from further elaboration. First, the interpretability of gene panels containing individually lowly expressed genes but high combinatorial specificity could be improved by providing clearer guidelines or illustrative examples. Second, the utility of CellCover in identifying rare or transient cell states should be more thoroughly quantified, especially under noisy conditions typical of single-cell datasets. Third, while the findings on unexpected gene categories are provocative, they require further validation - either through independent transcriptomic datasets or orthogonal methods such as immunostaining or single-molecule FISH-to confirm their cell-type-specific expression patterns.

      Specifically, the manuscript would benefit from further clarification and additional validation in the following areas:

      • A more in-depth explanation of marker panel applications is needed. Specifically, how should users interpret gene panels where individual genes show only moderate or low expression levels, but the combination provides high specificity? Providing a concrete example, along with guidelines for interpreting such combinatorial signatures, would enhance the practical utility of the method.

      • Further quantification of CellCover's sensitivity in detecting rare cell subtypes or states would strengthen the evaluation of its performance. Additionally, it would be helpful to assess how CellCover performs under noisy conditions, such as low cell numbers or read depths, which are common challenges in scRNA-seq datasets.

      • It is intriguing and novel that CellCover analysis of the dataset from Telley et al. suggests cell-type-specific expression of ribosomal, mitochondrial, or tRNA genes. These findings would be significantly strengthened by additional validation. For example, the reported radial glia-specific expression of Rps18-ps3 and Rps10-ps1, as well as the postmitotic neuron-specific expression of mt-Tv and mt-Nd4l, should be corroborated using independent scRNA-seq or spatial transcriptomic datasets of the developing neocortex. Alternatively, these expression patterns could be directly examined through immunostaining or single-molecule FISH analysis.

      • The observation that outer radial glia (oRG) markers are expressed in neural progenitors before the emergence of gliogenic progenitors in primates and humans is compelling. This could be further supported by examining the temporal and spatial expression patterns of early oRG-specific markers versus gliogenic progenitor markers in recent human spatial transcriptomic datasets - such as the one published by Xuyu et al. (PMID: 40369074) or Wang et al. (PMID: 39779846).

      Summary:

      Overall, this work provides a conceptually innovative and practically useful method for cell type classification that will be valuable to the single-cell and developmental biology communities. Its impact will likely grow as more researchers seek scalable, interpretable, and biologically informed gene panels for multimodal assays, diagnostics, and perturbation studies.

    3. Author response:

      A more in-depth explanation of marker panel applications is needed. Specifically, how should users interpret gene panels where individual genes show only moderate or low expression levels, but the combination provides high specificity? Providing a concrete example, along with guidelines for interpreting such combinatorial signatures, would enhance the practical utility of the method.

      We appreciate the need to explain and demonstrate how to use the novel combinatorial gene marker sets that CellCover generates. To be clear, individual genes expressed at low levels and in small numbers of cells, in general, have high specificity (the ability to mark cells of a particular type without erroneously marking other cells as this type) and are often used in combinations by CellCover to achieve a panel of genes with high sensitivity (the ability to mark all cells of a particular type). Low or sparsely expressed genes of this type may represent poorly measured genes (i.e. zero inflation known to occur in single-cell data, where genes are measured as zero in cells which actually express the gene) or may represent genes which are truly expressed only in a subset of the annotated class. Because CellCover can borrow strength across genes, it can harness the true information in either class of genes, even if affected by zero inflation. Further investigation of structure within the cell class (and across other cell classes) using the CellCover gene marker panel, as well as other genes, is necessary to clarify this issue in any particular analysis. In the manuscript, we evaluate the expression of individual genes within and across classes in this manner to understand deeper structure in Figures 1A, S6 and S8.

      To demonstrate how CellCover selects individual genes with high specificity and low sensitivity, but which are complementary to one another, in order to achieve high collective sensitivity, here we consider a hypothetical dataset of many cells where we focus on one cell class that contains 100 cells composed of four subtypes.

      - Subtype A: cells 1–20

      - Subtype B: cells 21–30

      - Subtype C: cells 31–50

      - Subtype D: cells 51–100

      To illustrate how CellCover evaluates marker gene panels, in this example, the genes under instigation have very different weights (i.e. the ratio of a gene’s expression in the cell class of interest versus its expression in other cells). Suppose we have two candidate marker panels:

      Panel 1 (coarse markers).

      - Gene A: covers cells 1–30 (weight = 0.4)

      - Gene B: covers cells 30–60 (weight = 0.3)

      - Gene C: covers cells 60–100 (weight = 0.2)

      Each gene in this panel covers a relatively large portion of the population (> 30%), but their weights are comparatively high, indicating limited specificity to the focal cell type. Although the panel {A,B,C} attains full coverage, its markers are coarse and nonspecific.

      Panel 2 (fine-grained, combinatorial markers).

      - Gene A’: covers cells 1–20 (weight = 0.05)

      - Gene B’: covers cells 20–30 (weight = 0.10)

      - Gene C’: covers cells 30–50 (weight = 0.05)

      - Gene D’: covers cells 50–100 (weight = 0.10)

      Each marker is expressed in a smaller fraction of the population (individually low sensitivity), but the weights are substantially lower, reflecting strong subtype specificity. Importantly, these genes are complementary: their union covers all 100 cells (high combinatorial sensitivity), even though no single gene spans more than 20–50% of the cells.

      Under a strict covering requirement (e.g., α \= 0, requiring 100% coverage, i.e. perfect sensitiity), both panels satisfy the constraint. However, CellCover selects the second panel because its total weight (specificity) is smaller. This preference reflects the design of the objective function: the method favors markers that are highly cell-type-specific, even if they individually cover only a subset of the population, as long as their complements yield full coverage. As a result, CellCover can reveal refined subtype structure within what appears to be a single cell population.

      Interpretation guidelines. We explicitly note that CellCover marker panels should be interpreted as combinatorial signatures:

      - Individual genes may show localized, subtype-restricted expression.

      - The union of their expression defines the target cell type.

      - Low-weight genes are more specific; CellCover therefore prioritizes them whenever they provide complementary coverage.

      - The resulting panel may highlight latent heterogeneity or subpopulations within the cell type that express different subsets of the markers.

      In addition to these technical guidelines for interpreting gene panels, throughout the manuscript we use the transfer of CellCover marker gene panels to related datasets to assess the biological function of the gene sets. We propose this as a general tool in the examination of gene lists and have implemented methods to visualize the expression of any gene list (including gene lists uploaded by users) using the Projection Tool within NeMO Anlaytics.

      Further quantification of CellCover’s sensitivity in detecting rare cell subtypes or states would strengthen the evaluation of its performance. Additionally, it would be helpful to assess how CellCover performs under noisy conditions, such as low cell numbers or read depths, which are common challenges in scRNA-seq datasets.

      While CellCover is a method to define marker gene panels for cell classes that are already defined in a dataset, its performance on rare cell classes, small numbers of cells and low read depths is still a relevant issue. The analyses in the paper can speak to some of these concerns: The Telley dataset, which we use throughout the manuscript, used FlashTag labeling of cells prior to sequencing in order to ascertain the time since terminal division for each cell. This unique metadata linked to each cell’s expression data enabled many of the analyses we performed in the paper, but also limited the number of cells that were sequenced. For this reason, the number of cells in this dataset (total cells = 2756) is much lower than that seen in the vast majority of other single-cell sequencing studies, including those we use for the transfer of marker gene sets defined by CellCover in the Telley data. As a result, the cell classes for which we define marker gene panels in the paper contain relatively small numbers of cells. This is especially true in the 12-class analysis in Figures 4 and 5 where CellCover successfully defines gene panels for all 12 classes which transfer well to other datasets. Total cells per class range from 134 to 301. Figure S6 shows that the discriminative power of the 12 gene panels varied widely, with the most highly discriminative panel being from the E12.1H condition with only 189 cells).

      In addition, we note that the behavior of CellCover on rare (or any) cell classes can be characterized deterministically under mild condition. For a fixed cell class and a required covering rate of 1, a depth-k covering gene panel exists if and only if every cell in the class expresses at least k genes. Under this condition, CellCover is guaranteed to find a covering panel of depth-k. Importantly, this guarantee does not impose any restriction on the panel size. Consequently, the compactness of the resulting panel reflects intrinsic properties of the data rather than algorithmic limitations: a small panel indicates that a subset of genes is robustly and consistently expressed across most cells in the class, even if the class itself is rare, whereas a large panel suggests highly heterogeneous expression patterns, where different genes are expressed in different cells. In this sense, the feasibility and structure of a covering panel are determined by the biological and technical characteristics of the dataset (e.g., read depth, expression sparsity, and the specificty of gene expression in the defined cell classes), rather than by the performance of CellCover itself.

      It is intriguing and novel that CellCover analysis of the dataset from Telley et al. suggests cell-type-specific expression of ribosomal, mitochondrial, or tRNA genes. These findings would be significantly strengthened by additional validation. For example, the reported radial glia-specific expression of Rps18-ps3 and Rps10-ps1, as well as the postmitotic neuron-specific expression of mt-Tv and mt-Nd4l, should be corroborated using independent scRNA-seq or spatial transcriptomic datasets of the developing neocortex. Alternatively, these expression patterns could be directly examined through immunostaining or single-molecule FISH analysis.

      The main problem with such analysis is that most studies have omitted the expression of these genes (especially mitochondrial genes that are primarily viewed as QC metrics) from their datasets. We encourage researchers to retain the expression of these transcripts in their data so that their biological functions can be explored. Where available, the expression of these genes can be visualized in NeMO Analytics in the mouse where the enrichment of Rps18-ps3 expression in radial glia can be seen in the Di Bella 2021 dataset and in the human where the expression of mt-Tv can be seen in neurons in the Polioudakis 2019, Darmanis 2015, Camp 2015, and Liu 2016 datasets.

      Taking a broader perspective, a growing body of foundational work in developmental neurobiology supports the observation that mitochondrial state and metabolic programs undergo systematic changes during neuronal differentiation, consistent with our CellCover findings. For example, Khacho 2016 demonstrated that mitochondrial dynamics are essential regulators of neuronal fate commitment and that the maturation of the mitochondrial network is essential for the transition from the progenitor metabolic state to the neuronal state. Iwata 2020 further highlight cell type specific mitochondrial dynamics by showing that daughter cells with highly fragmented mitochondria tend to become neurons.

      The observation that outer radial glia (oRG) markers are expressed in neural progenitors before the emergence of gliogenic progenitors in primates and humans is compelling. This could be further supported by examining the temporal and spatial expression patterns of early oRG-specific markers versus gliogenic progenitor markers in recent human spatial transcriptomic datasets - such as the one published by Xuyu et al. (PMID: 40369074) or Wang et al. (PMID: 39779846).

      We have added the scRNA-seq data from Wang et al., as well as data from the Nano et al. 2025 meta-atlas to the NeMO Analytics data collection. oRG markers from Liu et al 2023 can now be visualized across the Wang, Nano and many more human in vivo datasets. In the Nano data, these oRG markers can be seen increasing in expression in the human neocortex from GW7-12, leading into peak neurogenesis and prior to gliogenesis. Although with lower age resolution, the peaking of oRG markers in the 2nd trimester (dring peak neurogenesis) and their precipitous drop in the 3rd trimester (during peak gliogenesis) can also be seen in the Wang data. At NeMO Analytics individual marker genes of oRGs can also visualized in these datasets.

    1. eLife Assessment

      This manuscript presents a valuable methodological approach to investigating context-dependent activity of cis-regulatory activity within defined genomic loci. The authors combine a locus-specific massively parallel reporter assay, enabling unbiased and high-coverage profiling of enhancer activity across large genomic regions, with a degenerate reporter assay to identify nucleotides critical for enhancer function. The data supporting the conclusions are solid, highlighted by successful identification and characterization of both previously known and new regulatory elements across multiple developmental stages, cell types, and species. While the approach has inherent limitations in sensitivity, and indirect assignment of regulatory elements to target genes, it provides a flexible platform for nominating candidate cis-regulatory elements across defined loci.

    2. Reviewer #1 (Public review):

      MPRAs are a high-throughput and powerful tool for assaying the regulatory potential of genomic sequences. However, linking MPRA-nominated regulatory sequences to their endogenous target genes, and identifying the more specific functional regions within these sequences can be challenging. MPRAs that tile a genomic region, and saturation mutagenesis-based MRPAs can help to address these challenges. In this work, Tulloch et al. describe a streamlined MPRA system for the identification and investigation of the regulatory elements surrounding a gene of interest with high resolution. The use of BACs covering a locus of interest to generate MPRA libraries allows for an unbiased, and high-coverage assessment of a particular region. Follow up degenerate MPRAs, where each nucleotide in the nominated sequences are systematically mutated, then can point to key motifs driving their regulatory activity. The authors present this MPRA platform as straightforward, easily customizable, and less time- and resource-intensive than traditional MPRA designs. They demonstrate the utility of their design in the context of the developing mouse retina, where they first use the LS-MPRA to identify active regulatory elements for select retinal genes, followed by d-MPRA which allowed them to dissect the functional regions within those elements and nominate important regulatory motifs. These assays were able to recapitulate some previously known cis-regulatory modules (CRMs), as well as identify some new potential regulatory regions. Follow up experiments assessing co-localization of the gene of interest with the CRM-linked GFP reporter in the target cells, and CUT&RUN assays to confirm transcription factor binding to nominated motifs provided support linking these CRMs to the genes of interest. Overall, this method appears flexible and could be an easy to implement tool for other investigators aiming to study their locus of interest with high resolution.

      Strengths:

      (1) The method of fragmenting BACs allows for high, overlapping coverage of the region of interest.

      (2) The d-MPRA method was an efficient way to identify key functional transcription factor motifs, and nominate specific transcription factor-driven regulatory pathways that could be studied further.

      (3) Additional assays like co-expression analyses using the endogenous gene promoter, and use of the Notch inhibitor in the case of Olig2, helped correlate the activity of the CRMs to the expression of the gene of interest, and distinguish false positives from the initial MPRA.

      (4) The use of these assays across different time points, tissues, and even species demonstrated that they can be used across many contexts to identify both common and divergent regulatory mechanisms for the same gene.

      Weaknesses:

      (1) The LS-MPRA assay most strongly identified promoters, which are not usually novel regulatory elements you would try to discover, and the signal to noise ratio for more TSS-distal, non-promoter regulatory elements was usually high, making it difficult to discriminate lower activity CRMs, like enhancers, from the background. For example, NR2 and NR3 in Figure 3 have very minimal activity peaks (NR3 seems non-existent). The ex vivo data in Figure 2 is similarly noisy. Is there a particular metric or calculation that was or could be used to quantitatively or statistically call a peak above the background? The authors mention in the discussion some adjustments that could reduce the noise, such as increased sequencing depth, which I think is needed to make these initial LS-MPRA results and the benchmarking of this assay more convincing and impactful.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Tulloch et al. developed two modified massively parallel reporter assays (MPRAs) and applied them to identify cis-regulatory modules (CRMs) - genomic regions that activate gene expression - controlling retinal gene expression. These CRMs usually function at specific developmental stages and in distinct cell types to orchestrate retinal development. Studying them provides insights into how retinal progenitor cells give rise to various retinal cell types.

      The first assay, named locus-specific MPRA (LS-MPRA), tests all genomic regions within 150-300 kb of the gene of interest, rather than relying on previously predicted candidate regulatory elements. This approach reduces potential bias introduced during candidate selection, lowers the cost of synthesizing a library of candidate sequences, and simplifies library preparation. The LS-MPRA libraries were electroporated into mouse retinas in vivo or ex vivo. To benchmark the method, the authors first applied LS-MPRA near stably expressed retinal genes (e.g., Rho, Cabp5, Grm6, and Vsx2), and successfully identified both known and novel CRMs. They then used LS-MPRA to identify CRMs in embryonic mouse retinas, near Olig2 and Ngn2, genes expressed in subsets of retinal progenitor cells. Similar experiments were conducted in chick retinas and postnatal mouse retinas, revealing some CRMs with conserved activity across species and developmental stages.

      Although the study identified CRMs with robust reporter activity in Olig2+ or Ngn2+ cells, the data do not provide sufficient evidence to support the claims that these CRMs regulate Olig2 or Ngn2, rather than other nearby genes, in a cell type-specific manner. For example, the authors propose that three regions (NR1/2/3) regulate Olig2 specifically in retinal progenitor cells based on: 1) the three regions are close to Olig2, 2) increased Olig2 expression and NR1/2/3 activity upon Notch inhibition, and 3) reporter activity observed in Olig2+ cells (though also present in many Olig2- cells). While these are promising findings, they do not directly support the claims.

      The second assay, called degenerate MPRA (d-MPRA), introduces random point mutations into CRMs via error-prone PCR to assess the impact of sequence variations on regulatory activity. This approach was used on NR1/2/3 to identify mutations that alter CRM activity, potentially by influencing transcription factor binding. The authors inferred candidate transcription factors, such as Mybl1 and Otx2, through motif analysis, co-expression with Olig2 (based on single-cell RNA-seq), and CUR&RUN profiling. While some transcription factors identified in this way overlapped with the d-MPRA results, others did not. This raises questions about how well d-MPRA complements other methods for identifying TF binding sites.

      Strengths:

      The study introduces two technically robust MPRA protocols that offer advantages over standard methods, such as avoiding reliance on predefined candidate regions, reducing cost and labor, and minimizing selection bias.

      The identified regulatory elements and transcription factors contribute to our understanding of gene regulation in retinal development and may have translational potential for cell type-specific gene delivery into developing retinas.

      Weakness:

      Like other MPRA-based approaches, LS-MPRA mainly tests whether a sequence can drive expression of a reporter gene in given cell type(s). However, this type of assay generally does not show which endogenous gene the sequence regulates. In this study, the evidence supporting gene-specific CRMs is largely correlative. The evidence for cell-type-specific CRMs is also not fully supported (e.g., reporter expression is observed in the intended cell type as well as additional cell types). If further validation in the native genomic context (e.g., CRISPRi of the candidate element followed by RNA-seq across relevant cell types) is out of scope, the manuscript should avoid wording that implies definitive target gene assignment or cell-type specificity.

    4. Reviewer #3 (Public review):

      Summary:

      Use of reporter assays to understand the regulatory mechanisms controlling gene expression moves beyond simple correlations of cis-regulatory sequence accessibility, evolutionary sequence conservation, and epigenetic status with gene expression, instead quantifying regulatory sequence activity for individual elements. Tulloch et al., provide systematic characterization of two new reporter assay techniques (LS-MPRA and d-MPRA) to comprehensively identify cis-regulatory sequences contained within genomic loci of interest during retinal development. The authors then apply LS-MPRA and d-MPRA to identify putative cis-regulatory sequences controlling Olig2 and Ngn2 expression, including potential regulatory motifs that known retinal transcription factors may bind. Transcription factor binding to regulatory sequences is then assessed via CUT&RUN. The broader utility of the techniques are then highlighted by performing the assays across development, across species, and across tissues.

      Strengths:

      The authors validate the reporter assays on retinal loci for which the regulatory sequences are known (Rho, Vsx2, Grm6, Cabp5) mostly confirming known regulatory sequence activity but highlighting either limitations of the current technology or discrepancies of previous reporter assays and known biology. The techniques are then applied to loci of interest (Olig2 and Ngn2) to better understand the regulatory sequences driving expression of these transcription factors across retinal development within subsets of retinal progenitor cells, identifying novel regulatory sequences through comprehensive profiling of the region.

      LS-MPRA provides broad coverage of loci of interest

      d-MPRA identifies sequence features that are important for cis-regulatory sequence activity.

      The authors take into account transcript and protein stability when determining the correlation of putative enhancer sequence activity with target gene expression.

      Overall, the manuscript highlights the utility of the techniques to identify novel cis-regulatory sequence contributions to gene expression, including systematic characterizations of sequence motifs conferring activating or repressive functions.

      Limitations:

      Barcoding strategies have the potential to induce high collision rates (see Table S3) that may lead to misinterpretation of the data and/or high false positive/negative rates.

      There are limited robust methods to distinguish differentially active versus inactive CRMs in the LS-MPRA.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      MPRAs are a high-throughput and powerful tool for assaying the regulatory potential of genomic sequences. However, linking MPRA-nominated regulatory sequences to their endogenous target genes and identifying the more specific functional regions within these sequences can be challenging. MPRAs that tile a genomic region, and saturation mutagenesis-based MPRAs, can help to address these challenges. In this work, Tulloch et al. describe a streamlined MPRA system for the identification and investigation of the regulatory elements surrounding a gene of interest with high resolution. The use of BACs covering a locus of interest to generate MPRA libraries allows for an unbiased and high-coverage assessment of a particular region. Follow-up degenerate MPRAs, where each nucleotide in the nominated sequences is systematically mutated, can then point to key motifs driving their regulatory activity. The authors present this MPRA platform as straightforward, easily customizable, and less time- and resource-intensive than traditional MPRA designs. They demonstrate the utility of their design in the context of the developing mouse retina, where they first use the LS-MPRA to identify active regulatory elements for select retinal genes, followed by d-MPRA, which allowed them to dissect the functional regions within those elements and nominate important regulatory motifs. These assays were able to recapitulate some previously known cis-regulatory modules (CRMs), as well as identify some new potential regulatory regions. Follow-up experiments assessing co-localization of the gene of interest with the CRM-linked GFP reporter in the target cells, and CUT&RUN assays to confirm transcription factor binding to nominated motifs, provided support linking these CRMs to the genes of interest. Overall, this method appears flexible and could be an easy-to-implement tool for other investigators aiming to study their locus of interest with high resolution.

      Strengths:

      (1) The method of fragmenting BACs allows for high, overlapping coverage of the region of interest.

      (2) The d-MPRA method was an efficient way to identify key functional transcription factor motifs and nominate specific transcription factor-driven regulatory pathways that could be studied further.

      (3) Additional assays like co-expression analyses using the endogenous gene promoter, and use of the Notch inhibitor in the case of Olig2, helped correlate the activity of the CRMs to the expression of the gene of interest, and distinguish false positives from the initial MPRA.

      (4) The use of these assays across different time points, tissues, and even species demonstrated that they can be used across many contexts to identify both common and divergent regulatory mechanisms for the same gene.

      Weaknesses:

      The LS-MPRA assay most strongly identified promoters, which are not usually novel regulatory elements you would try to discover, and the signal-to-noise ratio for more TSS-distal, non-promoter regulatory elements was usually high, making it difficult to discriminate lower activity CRMs, like enhancers, from the background. For example, NR2 and NR3 in Figure 3 have very minimal activity peaks (NR3 seems non-existent). The ex vivo data in Figure 2 are similarly noisy. Is there a particular metric or calculation that was or could be used to quantitatively or statistically call a peak above the background? The authors mention in the discussion some adjustments that could reduce the noise, such as increased sequencing depth, which I think is needed to make these initial LS-MPRA results and the benchmarking of this assay more convincing and impactful.

      Much of the statistical and quantitative data asked for by the Reviewers have been provided in the Revision. However, it is important to note that the types of statistics using peak callers asked for regarding candidate choice will be of limited value. If one is testing a library in a single cell type in vitro, and/or running genome-wide assays, these statistics could aid in the choice of candidates. However, here we are electroporating a complex and dynamic set of cells, with each cell type constituting what can be very different frequencies (e.g. Olig2-expressing cells are <2.4% of cells). This fact alone will give different apparent signal to noise values. In addition, at least for Olig2 and Ngn2, their expression is very transient, suggesting dynamic regulation by what is likely multiple positive and negative CRMs. An additional confound is that the level of expression of each gene that one might test is variable. All of these variables render a statistical prediction of candidates to be less valuable than one might hope, and might lead one to miss those CRMs of interest, particularly those in a small subset of cells. Instead, we suggest that one use one’s own level of interest and knowledge in choosing CRM candidates. We provide several examples of experimental, rather than purely statistical, approaches that might help in one’s choice of candidates. We used a functional read-out of CRM activity (Notch perturbation), carried out in the context of the entire LS-MPRA library, as one method. Co-expression in single cells of candidate regulators identified by the d-MPRA is another. One can of course use chromatin structure and sequence conservation, as used in many studies of regulatory regions, as other ways to narrow down candidates. The d-MPRA predictions also can be viewed in light of previous genetic studies, i.e. mutations in TFs that effect the cell type of interest or the regulation of the gene of interest, as we were able to do here for CRMs predicted to be regulated by Otx2.

      Reviewer #2 (Public review):

      Summary:

      In this study, Tulloch et al. developed two modified massively parallel reporter assays (MPRAs) and applied them to identify cis-regulatory modules (CRMs) - genomic regions that activate gene expression, controlling retinal gene expression. These CRMs usually function at specific developmental stages and in distinct cell types to orchestrate retinal development. Studying them provides insights into how retinal progenitor cells give rise to various retinal cell types.

      The first assay, named locus-specific MPRA (LS-MPRA), tests all genomic regions within 150-300 kb of the gene of interest, rather than relying on previously predicted candidate regulatory elements. This approach reduces potential bias introduced during candidate selection, lowers the cost of synthesizing a library of candidate sequences, and simplifies library preparation. The LS-MPRA libraries were electroporated into mouse retinas in vivo or ex vivo. To benchmark the method, the authors first applied LS-MPRA near stably expressed retinal genes (e.g., Rho, Cabp5, Grm6, and Vsx2), and successfully identified both known and novel CRMs. They then used LS-MPRA to identify CRMs in embryonic mouse retinas, near Olig2 and Ngn2, genes expressed in subsets of retinal progenitor cells. Similar experiments were conducted in chick retinas and postnatal mouse retinas, revealing some CRMs with conserved activity across species and developmental stages.

      Although the study identified CRMs with robust reporter activity in Olig2+ or Ngn2+ cells, the data do not provide sufficient evidence to support the claims that these CRMs regulate Olig2 or Ngn2, rather than other nearby genes, in a cell-type-specific manner. For example, the authors propose that three regions (NR1/2/3) regulate Olig2 specifically in retinal progenitor cells based on: (1) the three regions are close to Olig2, (2) increased Olig2 expression and NR1/2/3 activity upon Notch inhibition, and (3) reporter activity observed in Olig2+ cells (though also present in many Olig2- cells). While these are promising findings, they do not directly support the claims.

      The second assay, called degenerate MPRA (d-MPRA), introduces random point mutations into CRMs via error-prone PCR to assess the impact of sequence variations on regulatory activity. This approach was used on NR1/2/3 to identify mutations that alter CRM activity, potentially by influencing transcription factor binding. The authors inferred candidate transcription factors, such as Mybl1 and Otx2, through motif analysis, co-expression with Olig2 (based on single-cell RNA-seq), and CUR&RUN profiling. While some transcription factors identified in this way overlapped with the d-MPRA results, others did not. This raises questions about how well d-MPRA complements other methods for identifying transcriptional regulators.

      Strengths:

      (1) The study introduces two technically robust MPRA protocols that offer advantages over standard methods, such as avoiding reliance on predefined candidate regions, reducing cost and labor, and minimizing selection bias.

      (2) The identified regulatory elements and transcription factors contribute to our understanding of gene regulation in retinal development and may have translational potential for cell-type-specific gene delivery into developing retinas.

      Weaknesses:

      (1) The claims for gene-specific and cell type-specific CRMs would benefit from further validation using complementary approaches, such as CRISPR interference or Prime editing.

      The methods that we developed were meant to provide candidates for regulatory elements for a gene of interest. These candidates could be used to further understand the regulation of a gene, a complex and difficult task, especially for dynamically regulated genes in the context of development. These candidates could also, or instead, be used to drive gene expression specifically in a target cell of interest for applications such as gene therapy or perturbations that need this type of specificity. In the first case, to use the candidates to understand the regulation of a gene, one would need to validate the candidates using the types of methods typically employed for this purpose, most rigorously in the in vivo genomic context. We did not pursue this level of validation as it would encompass a great deal of work outside the scope of the current study. However, by initially testing loci which have been studied by several groups (as cited in the manuscript, Rho, Grm6, Vsx2, and Cabp5), we were able to show that LS-MPRA can identify known CRMs. In the cases of Rho and Vsx2, previous data have shown the CRMs to be relevant in the genomic context in vivo. In addition, two Vsx2 CRM’s identified by LS-MPRA are located at -37 Kb and -17Kb, and the Grm6 CRM identified by LS-MPRA is at -8Kb. These are the same CRM locations identified previously using classical methods. These data show that the method is capable of identifying distal elements. When one has only one or a few loci of interest, i.e. one does not need to use genome-wide approaches, LS-MPRA is accurate enough to be worth the relatively small effort to identify potential CRMs, even those at some distance from the TSS. However, it is apparent that our methods are not perfect and that the LS-MPRA does not pick up all CRMs. We do not know of a method that has been shown to do so.

      Reviewer #3 (Public review):

      Summary:

      Use of reporter assays to understand the regulatory mechanisms controlling gene expression moves beyond simple correlations of cis-regulatory sequence accessibility, evolutionary sequence conservation, and epigenetic status with gene expression, instead quantifying regulatory sequence activity for individual elements. Tulloch et al., provide a systematic characterization of two new reporter assay techniques (LS-MPRA and d-MPRA) to comprehensively identify cis-regulatory sequences contained within genomic loci of interest during retinal development. The authors then apply LS-MPRA and d-MPRA to identify putative cis-regulatory sequences controlling Olig2 and Ngn2 expression, including potential regulatory motifs that known retinal transcription factors may bind. Transcription factor binding to regulatory sequences is then assessed via CUT&RUN. The broader utility of the techniques is then highlighted by performing the assays across development, across species, and across tissues.

      Strengths:

      (1) The authors validate the reporter assays on retinal loci for which the regulatory sequences are known (Rho, Vsx2, Grm6, Cabp5) mostly confirming known regulatory sequence activity but highlighting either limitations of the current technology or discrepancies of previous reporter assays and known biology. The techniques are then applied to loci of interest (Olig2 and Ngn2) to better understand the regulatory sequences driving expression of these transcription factors across retinal development within subsets of retinal progenitor cells, identifying novel regulatory sequences through comprehensive profiling of the region.

      (2) LS-MPRA provides broad coverage of loci of interest.

      (3) d-MPRA identifies sequence features that are important for cis-regulatory sequence activity.

      (4) The authors take into account transcript and protein stability when determining the correlation of putative enhancer sequence activity with target gene expression.

      Weaknesses:

      (1) In its current form, the many important controls that are standard for other MPRA experiments are not shown or not performed, limiting the interpretations of the utility of the techniques. This includes limited controls for basal-promoter activity, limited information about sequence saturation and reproducibility of individual fragments across different barcode sequences, limitations in cloning and assay delivery, and sequencing requirements. Additional quantitative metrics, including locus coverage and number of barcodes/fragments, would be beneficial throughout the manuscript.

      We thank the reviewer for these comments and have provided detailed responses to the additional analyses in the subsequent Recommendations section.

      (2) There are no statistical metrics for calling a region/sequence 'active'. This is especially important given that NR3 for Olig2 seems to have a small 'peak' and has non-significant activity in Figure 4.

      See comments about peak calling in our response to Reviewer #1.

      (3) The authors present correlational data for identified cis-regulatory sequences with target gene expression. Additionally, the significance of transcription factor binding to the putative regulatory sequences is not currently tested, only correlated based on previous single-cell RNA-sequencing data. While putative regulatory sequences with potential mechanisms of regulation are identified/proposed, the lack of validation (and discrepancies with previous literature) makes it hard to decipher the utility of the techniques.

      See comments about further validation in our response to Reviewer #2.

      (4) While the interpretations that Olig2 mRNA/protein expression is dynamically regulated improved the proportions of cells that co-expressed CRM-regulated GFP and Olig2, alternate explanations (some noted) are just as likely. First, the electroporation isn't specific to Olig2+ progenitors. Also, the tested, short CRM fragments may have activating signals outside of Olig2 neurogenic cells because chromatin conformation, histone modifications, and DNA methylation are not present on plasmids to precisely control plasmid activity. Alternatively, repressive elements that control Olig2 expression are not contained in the reporter vectors.

      The electroporation of Olig2 minus and plus cells is an excellent way to determine if a CRM is active in all cells, or only a specific subset, and we therefore consider this the best way to answer the question of specificity. We agree that we were unable to show that all CRM active cells were indeed Olig2-expressing cells. As noted by the Reviewer, we went to some lengths to quantify RNA and protein co-expression, including of endogenous Olig2 protein and RNA. Even with the endogenous RNA and protein, there was a mismatch wherein one infrequently saw the two together in the same cell, which could be predicted from the short half-lives of these molecules. Regarding chromatin, etc., we are intrigued by the proper regulation that we have observed for CRMs that we have previously discovered by plasmid electroporation (e.g. Kim et al. 2008, Matsuda and Cepko, 2004, Wang et al. 2014, Emerson et al. 2013). It is indeed interesting that plasmids can recapitulate proper regulation, without the proper genomic context or chromatin modifications. We have expanded our discussion of these points in the Discussion.

      (5) It is unclear as to why the d-MPRA uses a different barcoding strategy, placing a second copy of the cis-regulatory sequence in the 3' UTR. As acknowledged by the author, this will change the transcript stability by changing the 3' UTR sequence. Because of this, comparisons of sequence activity between the LS-MPRA and d-MPRA should not be performed as the experiments are not equivalent.

      We had provided a rationale for the different strategies of barcoding in the original submission, and believe it is at the discretion of the experimenter to utilize either strategy for their specific purposes. We agree that comparing activity between different techniques would not be appropriate. The analysis of mutated CRMs using d-MPRA does not utilize data from the LS-MPRA, but is an analysis of relative activity among all mutated d-MPRA constructs.

      (6) Furthermore, details of the mutational burden in d-MPRA experiments are not provided, limiting the interpretations of these results.

      We have provided detailed responses to the additional analyses in the subsequent Recommendations section and included details of the mutational burden in Supplemental Document A.

      (7) Many figures are IGV screenshots that suffer from low resolution. Many figures could be consolidated.

      We have increased the resolution of all IGV genome tracks, but believe the content within all figures remains appropriate.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Suggestions for improving the clarity of the results in the figures:

      (1) The pie charts used the show the percentage of overlapping cells in the colocalization analyses were not especially intuitive to read, and although the percentages and any statistical significance were often written in the text, it would've been helpful to have them written in the figures. I would suggest displaying the results in stacked bar plots, possibly like the one shown in Figure 6A, to demonstrate the data more clearly.

      We thank the reviewer for the suggestions. Though adding the percentages directly to the pie charts would make the relevant panels too confusing to interpret, we added supplemental tables (Tables S5-S9) with the percentages displayed in all pie charts for readers interested in the precise quantifications.

      (2) The scRNA-seq UMAPs showing co-expression of Olig2 with the TFS of interest - it is very hard to see the cells that co-express. I would recommend either having a window zoomed in on the Olig2-expressing cell population to be able to see the co-expression more clearly visually, and/or including a graph demonstrating the percentages of co-expressing cells. These numbers were written in the text, but would be useful to see in the figure.

      The resolution of the scRNA-Seq plot has been improved for the visualization of co-expressing cells, which were also brought forward in all UMAP plots to improve clarity. Because of the higher quality images, insets should no longer be necessary. We have also included percentages of co-expression in the figures (Figs. 8 and 8S) and thank the reviewer for the suggestion.

      Other minor suggestions/corrections:

      (3) Figures 6B and 10S are missing the overlap quantification (in bar or pie charts) like in the other figures.

      The quantification for the image in 6B (i.e., GFP fluorescence and GFP RNA) is displayed in 6D for the four Olig2 CRM plasmid constructs. In Fig. 10S, the experiments in early chick ventral neural tube delivered constructs to a very limited number of cells, and quantification of cells would not necessarily represent an accurate number of cells with CRM activity. We therefore decided to show only representative images of CRM activity in this population of cells rather than present a biased count or increase the number of experiments/samples to obtain a robust quantification.

      (4) On the second-to-last line of page 10, in the sentence "The d-MPRA approach provided a robust, high resolution method for functionally relevant TF binding sites....", I think you're missing a word between "for" and "functionally". For example, it might be "for identifying..." or "for nominating...".

      We have revised the sentence accordingly.

      Reviewer #2 (Recommendations for the authors):

      Minor suggestions:

      (1) Please indicate which mouse reference genome (e.g., mm10) was used in plots such as Figure 2.

      We have added text to the relevant sections in the Results (the reference genome was already mentioned in Methods).

      (2) In Figures 2 and 2S, the CRMs discussed in the text are not labeled or highlighted, making it unclear which regions are being referenced.

      We have labeled peaks with roman numerals in both the figures, legends, and text for clarity and thank the reviewer for the suggestion.

      (3) Consider listing the genomic coordinates for the CRMs mentioned in the text, as this information would be especially useful for readers interested in exploring these regions further.

      This information was included in Table 2S in the original submission, with all relevant coordinates provided therein.

      (4) The d-MPRA plots (e.g., Figure 7C-E) do not clearly show the effects of different nucleotide substitutions. A more informative visualization style can be found in Kircher et al (PMID: 31395865, Fig. 1D) or Deng et al (PMID: 38781390, Fig. 5F).

      The precise nucleotide substitutions would be informative to visualize the effects of specific changes. However, we were more interested in how any nucleotide substitution influenced the CRM activity to hone in on relevant TFBS. We therefore believe the current visualization is the most appropriate to accomplish this. However, for some types of future applications, a more informative visualization as noted would be a valuable addition.

      (5) It would be extremely helpful to the community if the LS-MPRA data were uploaded to the UCSC genome browser and made accessible via a link.

      We have uploaded all LS-MPRA genome tracks to a Track Hub in the UCSC genome browser and provided the appropriate link to access the Hub (https://github.com/cattapre/ALAS00) in the methods section.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors should address the following metrics to showcase the utility of the techniques:

      We thank the reviewer for requesting the detailed metrics outlined below. We have addressed all inquiries and included the majority of metrics in the resubmission.

      (a) Library size

      This should be shown for each library that is generated. It is acknowledged that the complete size of the library is limited by sequencing, and the comprehensiveness of the library will change every time the library is re-prepped. However, metrics of this are not currently provided in a robust manner for each library. "Libraries of at least 7x10^6 and as many as 9x10^7 fragments are made" - vague - how was library complexity established since this seems to be an estimation, how many reads were utilized to estimate library complexity?

      We created a new supplemental table (Table S3) that displays the complexity based on sequencing rather than the estimated complexity based on the serial dilutions prior to 3D culture (which was used for the estimates listed in the results). We updated the complexity range in the text as well and thank the reviewer for the suggestion.

      Does library size scale proportionally to the BACs of different sizes?

      The fragmentation of different BACs with differing sizes does not necessarily alter the size of the library. Library size is primarily determined by the library creation pipeline, with the size selection step of the fragmented BAC and the cloning step that inserts adapter-ligated fragments into the barcoded expression vector being the primary determinants of complexity of plasmid libraries.

      (b) Sequence saturation

      Can the authors please provide evidence that the libraries have been sequenced to saturation or estimates of the degree of under-sequencing? How many reads does it take to discover a new barcode associated with a new regulatory sequence?

      We have provided library characteristics for this in Table S3 and have also generated Sequence Saturation Curves for each association library in Supplemental Document A.

      (c) Barcode saturation

      How many barcodes are present for each fragment in the libraries? Are most fragments only covered by 1 barcode? The barcoding strategy doesn't prevent the same barcode from being assigned to multiple different fragments, as barcodes are random. What is the incidence of barcode collisions?

      We have provided library characteristics for this in Table S3 and have also generated Barcode Saturation Curves for each association library in Supplemental Document A.

      Additionally, we tested whether the omission of barcode collisions would affect the output of our LS-MPRA. We reanalyzed one barcode abundance library (one replicate following 12h Notch inhibitor) and filtered the barcodes so that only unique barcodes were analyzed. We were able to replicate all previously identified peaks. Though it is not necessary to filter out barcode collisions, there may be an improvement in signal-to-noise if the sequencing depth of libraries was sufficient (see Supplemental Document B).

      (d) Normalization

      As performed, fragment activity is normalized by RNA expression compared to the presence of fragments in the library. While this is done for small libraries, for large libraries, this may not be appropriate. For large libraries, every sequence in the library will not be delivered to each cell, and many fragments contained in the library may not be electroporated at all. Ideally, the authors would have sequenced both the RNA and DNA from the electroporations to i) identify the fragment distribution of the library that was successfully electroporated and ii) provide an internal normalization factor across replicate samples. This is especially important if the libraries were ever re-prepped, as the jack-potting or asymmetries in fragment recovery can occur every time the library is re-derived.

      We agree with the reviewer’s comments about the variability in fragments delivered experimentally, though we also believe the normalization of the libraries is still appropriate. We never needed to re-prep the libraries as there was sufficient material for many more experiments than were performed. However, should one ever need to re-prep an LS-MPRA library, all experimental sequencing should be normalized to the respective sequenced association library to account for biased distributions, as the reviewer mentions.

      In the absence of these metrics (this would likely require the authors to repeat all experiments and is acknowledged to be outside the scope of revisions), the authors should provide information on the percentage of the library that is profiled in the RNA for each library.

      We have provided RNA profiles of all abundance libraries in Table S4. The overall fraction of fragments represented in the RNA pools was lower than that observed in other published MPRAs. This difference is expected given that most MPRA studies preselect fragments based on chromatin accessibility, transcription factor binding, sequence conservation, or bioinformatically predicted CRMs, thereby enriching for regulatory elements with high activity potential. Our locus-specific MPRA libraries, by contrast, include all fragments across the targeted genomic region, many of which are likely to be inactive in the tested context. Consequently, only a smaller proportion of fragments show measurable RNA expression.

      (e) Fragment sizes

      Please provide a density plot or something similar showcasing the size distribution of the libraries generated. Is there any correlation between sequence activity and the size of fragments?

      We have generated size distribution plots and correlations between fragment size and activity of all libraries and have included them in Supplemental Document A.

      (2) Questions about the statistical validity of results:

      (a) What threshold is utilized for calling a sequence as active? This is important as NR3 does not seem to be an element that has significant activity.

      See comments about peak calling in prior responses.

      (b) A Fisher's exact test using cells from single-cell RNA-sequencing as replicate samples is inappropriate as the cells are i) not from replicate experiments and ii) potentially in different cell states. The proportions of cells across replicate scRNA-seq datasets would be more appropriate.

      We thank the reviewer for raising this important point. While we agree that individual cells do not substitute for biological replicates, we believe Fisher’s exact test remains appropriate for testing whether gene expression is associated with Olig2 expression within a single scRNA-seq dataset. The test assesses co-occurrence at the level of individual cells, which is valid under the assumption that each cell represents an independent sampling of transcriptional states, even when it is possible that cells are in different states. We use this method as an exploratory tool to identify candidate genes associated with Olig2 expression in this dataset, and in the future, this could also be further validated by comparing the proportions of cells across replicate datasets, as the reviewer mentions.

      (3) Discussion of the reporter/Olig2/Ngn2 RNA/protein disconnect needs to be expanded. Some simpler explanations for the presence of GFP in Olig2- and Ngn2- cells, as well as the presence of Olig2 or Ngn2 in GFP- cells, is that (i) these putative CRMs are being introduced to cells in plasmids, taking them out of their native genomic context where they may be inaccessible or repressed and allowing them to drive reporter expression even if their candidate target gene is not endogenously expressed, (ii) these putative CRMs may regulate genes besides just Olig2 or Ngn2, and (iii) Olig2 and Ngn2 are regulated by far more regulatory elements than the 3 or 4 being tested in each reporter assay, so their expression likely does not rely solely on the activity of the few putative CRMs tested.

      We have added these points in an expanded discussion in the text.

      (4) Problems with figures: Low resolution of many IGV genome tracks, pink 'co-expression' dots are completely indiscernible. Numbers should be listed with the pie charts. BFP expression should be shown since this is being quantified, especially since electroporation efficiency can change across age and/or tissue samples.

      We have reconfigured the IGV tracks so that they are higher resolution and have included supplemental tables for the numbers pertaining to the pie charts. For electroporation controls (BFP and RFP), BFP expression is shown in Figs 5S, 6, and 10S and the RFP electroporation control is shown in Fig. 11. Though BFP is sometimes used as a qualifier in the denominator of some of the quantification, displaying its expression, particularly in combination with three other signals that are already included in most images, provides limited utility.

      (5) More information is required to understand the utility of the d-MPRA. Detailed quantification of the number of mutations/fragments needs to be ascertained. When multiple mutations are present, how are the authors controlling for which mutation is affecting activity? What is the coverage of the loci of interest for mutational burden (ie, is every base pair mutated in at least one fragment?). For mutations that increase the activity of the element, are there specific sequence features that increase activity (new motifs generated)?

      The d-MPRA platform is a high-throughput assay that seeks to identity putative sub-regions within CRMs nominated by the LS-MPRA, or any other assay. It relies on deep mutational coverage to determine positive and negative regulatory sub-regions of the CRMs. While many reads have multiple mutations, they are broadly co-occurring across the entire fragment (see Supplemental Document A) so as not to create a false linkage between the sites. Every individual site is mutated many times with roughly even coverage across each fragment (see Supplemental Document A), thus allowing us to assess the requirement of each base in contributing to a putative CRM’s activity. Comparing d-MPRA plots using bulk fragments or fragments with singleton mutations (Supplemental Document A) yielded almost identical plots for two libraries, and a similar analysis of the third library. Any differences between analysis of fragments with one or more mutations is likely a result of either sequencing depth or the requirement of multiple bases for binding or CRM activation. Follow-up experiments investigating intra-CRM interactions would elucidate such variability. Whether new motifs are generated for any specific substitution is an interesting question, which could be followed up for a CRM of interest. The d-MPRA data that we provide would provide the starting point for such follow-up experiments.

      (6) Transcription factors as regulators of CRM-activity.

      It is appreciated that the authors validated the binding of transcription factors to NR2. However, this correlative analysis should be further tested in follow-up experiments to highlight novel biology using systems already in place. Potential experiments that could be performed include the following (reagents in hand, or performed in a manner similar to experiments performed by the lab in previous publications):

      (a) over-expression of TF using LS-MPRA library.

      (b) over-expression of TF using d-MPRA library, showing that mutations in the putative TF binding site disrupt activity compared to non-mutated sequences.

      (c) performing TF over-expression using target CRMs, including sequences where the TF binding site is mutated (similar to a small MPRA).

      (d) the quantification of target gene expression when i) TF is over-expressed, ii) CRM is activated using CRISPRa, or iii) CRM is inhibited using CRISPRi.

      These are all valid follow-up experiments. Please see prior responses we have provided regarding further validation.

      Minor points

      (1) Please acknowledge that some distal regulatory sequences may be contained outside of the BAC regions. Also, the authors should emphasize the point that the assay is NOT cell-type-specific or specific to regulatory sequences for the gene of interest, but ALL regulatory sequences contained within the locus. The discussion of this with respect to Ift122 and Rpl32 is somewhat confusing.

      We have added a sentence in the Discussion addressing possible CRMs outside the BAC coverage. We believe it is implicitly understood that the assay only screens regulatory activity in the BAC, and believe we have addressed this in the manuscript.

      If one wishes to use a candidate CRM to drive gene expression in a targeted cell type, one needs to establish specificity. In particular, specificity needs to be established in the context of the vector that is being used. Non-integrated vs integrated vectors, different types of viral vectors with their own confounding regulatory sequences, different types of plasmids and methods of delivery, and copy number can all affect specificity. We provided a double in situ hybridization method for the examination of specificity for some of the novel candidate CRMs. It was quite difficult in the case of Olig2 and Ngn2 as their RNAs and proteins are unstable. We would need to provide further evidence should we wish to use these candidate CRMs for directing expression specifically in Olig2- or Ngn2-expressing cells. We suggest that an investigator can choose the vector and method for establishing specificity depending upon the goals of the application.

      (2) I am curious as to why low-resolution, pseudo-bulked single-nucleus ATAC was utilized instead of more comprehensive retina ATAC samples at similar time-points (for example, as available in Al Diri et al., 2017 (E14, E17, P0, P3, P7, P10) samples are all available.

      The use of pseudo-bulked single-nucleus ATAC-seq data provided a convenient and consistent comparison to our LS-MPRA results. We agree that incorporating higher-resolution datasets such as those from Al Diri et al. would be valuable for future analyses aimed at linking CRM activity with broader chromatin accessibility dynamics.

    1. eLife Assessment

      This study provides valuable mechanistic insight into the mutually exclusive distributions of the histone variant H2A.Z and DNA methylation by testing two hypotheses: (i) that DNA methylation destabilizes H2A.Z nucleosomes, thereby preventing H2A.Z retention, and (ii) that DNA methylation suppresses H2A.Z deposition by ATP-dependent chromatin remodelling complexes. Through a series of well-designed and carefully executed experiments, findings are presented in support of both hypotheses. However, the evidence in support of either hypothesis is incomplete, so that the proposed mechanisms underlying the enrichment of H2A.Z on unmethylated DNA remain somewhat speculative.

    2. Reviewer #1 (Public review):

      Summary:

      The authors considered the mechanism underlying previous observations that H2A.Z is preferentially excluded from methylated DNA regions. They considered two non-mutually exclusive mechanisms. First, they tested the hypothesis that nucleosomes containing both methylated DNA and H2A.Z might be intrinsically unstable due to their structural features. Second, they explored the possibility that DNA methylation might impede SRCAP-C from efficiently depositing H2A.Z onto these DNA methylated regions.

      Their structural analyses revealed subtle differences between H2A.Z-containing nucleosomes assembled on methylated versus unmethylated DNA. To test the second hypothesis, the authors allowed H2A.Z assembly on sperm chromatin in Xenopus egg extracts and mapped both H2A.Z localization and DNA methylation in this transcriptionally inactive system. They compared these data with corresponding maps from a transcriptionally active Xenopus fibroblast cell line. This comparison confirmed the preferential deposition or enrichment of H2A.Z on unmethylated DNA regions, an effect that was much more pronounced in the fibroblast genome than in sperm chromatin. Furthermore, nucleosome assembly on methylated versus unmethylated DNA, along with SRCAP-C depletion from Xenopus egg extracts, provided a means to test whether SRCAP-C contributes to the preferential loading of H2A.Z onto unmethylated DNA.

      Strengths:

      The strength and originality of this work lie in its focused attempt to dissect the unexplained observation that H2A.Z is excluded from methylated genomic regions.

      Weaknesses:

      The study has two weaknesses. First, although the authors identify specific structural effects of DNA methylation on H2A.Z-containing nucleosomes, they do not provide evidence demonstrating that these structural differences lead to altered histone dynamics or nucleosome instability. Second, building on the elegant work of Berta and colleagues (cited in the manuscript), the authors implicate SRCAP-C in the selective deposition of H2A.Z at unmethylated regions. Yet the role of SRCAP-C appears only partial, and the study does not address how the structural or molecular consequences of DNA methylation prevent efficient H2A.Z deposition. Finally, additional plausible mechanisms beyond the two scenarios the authors considered are not investigated or discussed in the manuscript.

    3. Reviewer #2 (Public review):

      This manuscript aims to elucidate the mechanistic basis for the long-standing observation that DNA methylation and the histone variant H2A.Z occupy mutually exclusive genomic regions. The authors test two hypotheses: (i) that DNA methylation intrinsically destabilizes H2A.Z nucleosomes, thereby preventing H2A.Z retention, and (ii) that DNA methylation suppresses H2A.Z deposition by ATP-dependent chromatin-remodelling complexes. However, neither hypothesis is rigorously addressed. There are experimental caveats, issues with data interpretation, and conclusions that are not supported by the data. Substantial revision and additional experiments, including controls, would be required before mechanistic conclusions can be drawn. Major concerns are as follows:

      (1) The cryo-EM structure of methylated H2A.Z nucleosomes is insufficiently resolved to address the central mechanistic question: where the methylated CpGs are located relative to DNA-histone contact points and how these modifications influence H2A.Z nucleosome structure. The structure provides no mechanistic insights into methylation-induced destabilization.

      The experimental system also lacks physiological relevance. The template DNA sequence is artificial, despite the existence of well-characterised native genomic sequences for which DNA methylation is known to inhibit H2A.Z incorporation. Alternatively, there are a number of studies examining the effect of DNA methylation on nucleosome structure, stability, DNA unwrapping, and positioning. Choosing one of these DNA sequences would have at least allowed a direct comparison with a canonical nucleosome. Indeed, a major omission is the absence of a cryo-EM structure of a canonical nucleosome assembled on the same DNA template - this is essential to assess whether the observed effects are H2A.Z-specific.

      Furthermore, the DNA template is methylated at numerous random CpG sites. The authors' argument that only the global methylation level is relevant is inconsistent with the literature, which clearly demonstrates that methylation effects on canonical nucleosomes are position-dependent. Not all CpG sites contribute equally to nucleosome stability or unwrapping, and this critical factor is not considered.

      Finally, and most importantly, the reported increase in accessibility of the methylated H2A.Z nucleosome is negligible compared with the much larger intrinsic DNA accessibility of the unmethylated H2A.Z nucleosome. These data do not support the authors' hypothesis and contradict the manuscript's conclusions. Claims that methylated H2A.Z nucleosomes are "more open and accessible" must therefore be removed, and the title is misleading, given that no meaningful impact of DNA methylation on H2A.Z nucleosome stability is demonstrated.

      (2) The cryo-EM structures of methylated and unmethylated 601L H2A.Z nucleosomes show no detectable differences. As presented, this negative result adds little value. If anything, it reinforces the point that the positional context of CpG methylation is critical, which the manuscript does not consider.

      (3) Very little H3 signal coincides with H2A.Z at TSSs in sperm pronuclei, yet this is neither explained nor discussed (Supplementary Figure 10D). The authors need to clarify this.

      (4) In my view, the most conceptually important finding is that H2A.Z-associated reads in sperm pronuclei show ~43% CpG methylation. This directly contradicts the model of strict mutual exclusivity and suggests that the antagonism is context-dependent. Similarly, the finding that the depletion of SRCAP reduces H2A.Z deposition only on unmethylated templates is also very intriguing. Collectively, these result warrants further investigation (see below).

      (5) Given that H2A.Z is located at diverse genomic elements (e.g., enhancers, repressed gene bodies, promoters), the manuscript requires a more rigorous genomic annotation comparing H2A.Z occupancy in sperm pronuclei versus XTC-2 cells. The authors should stratify H2A.Z-DNA methylation relationships across promoters, 5′UTRs, exons, gene bodies, enhancers, etc., as described in Supplementary Figure 10A.

      (6) Although H2A.Z accumulates less efficiently on exogenous methylated substrates in egg extract, substantial deposition still occurs (~50%). This observation directly challenges the strong antagonistic model described in the manuscript, yet the authors do not acknowledge or discuss it. Moreover, differences between unmethylated and methylated 601 DNA raise further questions about the biological relevance of the cryo-EM 601 structures.

      (7) The SRCAP depletion is insufficiently validated i.e., the antibody-mediated depletion of SRCAP lacks quantitative verification. A minimum of three biological replicates with quantification is required to substantiate the claims.

      (8) It appears that the role of p400-Tip60 has been completely overlooked. This complex is the second major H2A.Z deposition complex. Because p400 exhibits DNA methylation-insensitive binding (Supplementary Figure 14), it may account for the deposition of H2A.Z onto methylated DNA. This possibility is highly significant and must be addressed by repeating the key experiments in Figure 5 following p400-Tip60 depletion.

      (9) The manuscript repeatedly states that H2A.Z nucleosomes are intrinsically unstable; however, this is an oversimplification. Although some DNA unwrapping is observed, multiple studies show that H3/H4 tetramer-H2A.Z/H2B interactions are more stable (important recent studies include the following: DOI: 10.1038/s41594-021-00589-3; 10.1038/s41467-021-22688-x; and reviewed in 10.1038/s41576-024-00759-1).

      In summary, the current manuscript does not present a convincing mechanistic explanation for the antagonism between DNA methylation and H2A.Z. The observation that H2A.Z can substantially coexist with DNA methylation in sperm pronuclei, perhaps, should be the conceptual focus.

    4. Reviewer #3 (Public review):

      Summary:

      Histone variant H2A.Z is evolutionarily conserved among various species. The selective incorporation and removal of histone variants on the genome play crucial roles in regulating nuclear events, including transcription. Shih et al. aimed to address antagonistic mechanisms between histone variant H2A.Z deposition and DNA methylation. To this end, the authors reconstituted H2A.Z nucleosomes in vitro using methylated or unmethylated human satellite II DNA sequence and examined how DNA methylation affects H2A.Z nucleosome structure and dynamics. The cryo-EM analysis revealed that DNA methylation induces a more open conformation in H2A.Z nucleosomes. Consistent with this, their biochemical assays showed that DNA methylation subtly increases restriction enzyme accessibility in H2A.Z nucleosomes compared with canonical H2A nucleosomes. The authors identified genome-wide profiles of H2A.Z and DNA methylation using genomic assays and found their unique distribution between Xenopus sperm pronuclei and fibroblast cells. Using Xenopus egg extract systems, the authors showed SRCAP complex, the chromatin remodelers for H2A.Z deposition, preferentially deposit H2A.Z on unmethylated DNA.

      Strengths:

      The study is solid, and most conclusions are well-supported. The experiments are rigorously performed, and interpretations are clear. The study presents a high-resolution cryo-EM structure of human H2A.Z nucleosome with methylated DNA. The discovery that the SRCAP complex senses DNA methylation is novel and provides important mechanistic insight into the antagonism between H2A.Z and DNA methylation.

      Weaknesses:

      The study is already strong, and most conclusions are well supported. However, it can be further strengthened in several ways.

      (1) It is difficult to interpret how DNA methylation alters the orientation of the H4 tail and leads to the additional density on the acidic patch. The data do not convincingly support whether DNA methylation enhances interactions with H2A.Z mono-nucleosomes, nor whether this effect is specific to methylated H2A.Z nucleosomes.

      (2) It remains unclear whether DNA methylation alters global H2A.Z nucleosome stability or primarily affects local DNA end flexibility. Moreover, while the authors showed locus-specific accessibility by HinfI digestion, an unbiased assay such as MNase digestion would strengthen the conclusions.

    5. Author response:

      eLife Assessment

      This study provides valuable mechanistic insight into the mutually exclusive distributions of the histone variant H2A.Z and DNA methylation by testing two hypotheses: (i) that DNA methylation destabilizes H2A.Z nucleosomes, thereby preventing H2A.Z retention, and (ii) that DNA methylation suppresses H2A.Z deposition by ATP-dependent chromatin remodeling complexes. Through a series of well-designed and carefully executed experiments, findings are presented in support of both hypotheses. However, the evidence in support of either hypothesis is incomplete, so that the proposed mechanisms underlying the enrichment of H2A.Z on unmethylated DNA remain somewhat speculative.

      We would like to thank the editor and reviewers for their critical assessments of our manuscript. While we do acknowledge the limitations of our work, we believe that our results provide important mechanistic insights into the long-standing question of how H2A.Z is preferentially enriched in hypomethylated genomic DNA regions. First, our structural and biochemical data suggest that DNA methylation increases the openness and physical accessibility of H2A.Z, albeit the effect is relatively subtle and is sequence-dependent. Second, using Xenopus egg extracts and synthetic DNA templates, we provide the first clear and direct evidence that DNA methylation-sensitive H2A.Z deposition is due to the H2A.Z chaperone SRCAP-C, corroborated by our discovery that SRCAP-C binding to DNA is suppressed by DNA methylation. Although the molecular details by which DNA methylation inhibits binding of SRCAP-C is an important area of future study, in our current manuscript, we do provide evidence that directly links the presence of SRCAP-C to the establishment of the DNA methylation/H2A.Z antagonism in a physiological system. Thanks to criticisms by the reviewers, we realized that we did not clearly state in our Abstract that the impact of DNA methylation on intrinsic H2A.Z nucleosome stability is relatively subtle, although we did explain these observations and limitations in the main text. In our revised manuscript, we are willing to edit the text to better clarify the criticisms raised by the reviewers.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors considered the mechanism underlying previous observations that H2A.Z is preferentially excluded from methylated DNA regions. They considered two non-mutually exclusive mechanisms. First, they tested the hypothesis that nucleosomes containing both methylated DNA and H2A.Z might be intrinsically unstable due to their structural features. Second, they explored the possibility that DNA methylation might impede SRCAP-C from efficiently depositing H2A.Z onto these DNA methylated regions.

      Their structural analyses revealed subtle differences between H2A.Z-containing nucleosomes assembled on methylated versus unmethylated DNA. To test the second hypothesis, the authors allowed H2A.Z assembly on sperm chromatin in Xenopus egg extracts and mapped both H2A.Z localization and DNA methylation in this transcriptionally inactive system. They compared these data with corresponding maps from a transcriptionally active Xenopus fibroblast cell line. This comparison confirmed the preferential deposition or enrichment of H2A.Z on unmethylated DNA regions, an effect that was much more pronounced in the fibroblast genome than in sperm chromatin. Furthermore, nucleosome assembly on methylated versus unmethylated DNA, along with SRCAP-C depletion from Xenopus egg extracts, provided a means to test whether SRCAP-C contributes to the preferential loading of H2A.Z onto unmethylated DNA.

      Strengths:

      The strength and originality of this work lie in its focused attempt to dissect the unexplained observation that H2A.Z is excluded from methylated genomic regions.

      Weaknesses:

      The study has two weaknesses. First, although the authors identify specific structural effects of DNA methylation on H2A.Z-containing nucleosomes, they do not provide evidence demonstrating that these structural differences lead to altered histone dynamics or nucleosome instability. Second, building on the elegant work of Berta and colleagues (cited in the manuscript), the authors implicate SRCAP-C in the selective deposition of H2A.Z at unmethylated regions. Yet the role of SRCAP-C appears only partial, and the study does not address how the structural or molecular consequences of DNA methylation prevent efficient H2A.Z deposition. Finally, additional plausible mechanisms beyond the two scenarios the authors considered are not investigated or discussed in the manuscript.

      Although we acknowledge the limitations of our study and are willing to expand our discussion to more thoroughly discuss these points, we believe our manuscript provides several important mechanistic insights which this reviewer may not have fully appreciated.

      Our first conclusion that H2A.Z nucleosomes on methylated DNA are more open and accessible compared to their unmethylated counterparts is supported by both our cryo-EM study and the restriction enzyme accessibility assay. Although the physical effect of DNA methylation is relatively subtle and is likely sequence dependent, as we clearly noted within the manuscript, the difference does exist and is valuable information for the chromatin field at large to consider.

      The second major conclusion of our manuscript is that SRCAP-C exhibits preferential binding to unmethylated DNA over methylated DNA, and that SRCAP-C represents the major mechanism that can explain the biased deposition of H2A.Z to unmethylated DNA in Xenopus egg extracts. Furthermore, our experiments using Xenopus egg extract clearly demonstrated that H2A.Z is deposited by both DNA-methylation sensitive and insensitive mechanisms. Depletion of SRCAP-C almost completely eliminated the levels of DNA-methylation-sensitive H2A.Z deposition and reduced the total level of H2A.Z on chromatin to less than half of that seen in non-depleted extract. This result demonstrated that DNA methylation-sensitive H2A.Z loading is primarily regulated by SRCAP-C, at least in our experimental context where transcription, replication, and other epigenetic modifications are not involved. It is likely that additional mechanisms do further contribute, implicated by our sequencing experiments, particularly at regions with active transcription, and we have noted these possibilities and the rationale for their existence in the Discussion.

      Our study also suggests that a SRCAP-independent, DNA methylation-insensitive mechanism of H2A.Z loading exists, which we suspect to be mediated by Tip60-C. In line with this possibility, our data suggest that Tip60-C binds DNA in a DNA methylation-insensitive manner in Xenopus egg extract. Since antibodies to deplete Tip60-C from Xenopus egg extract are currently unavailable, we were unable to directly test that hypothesis and decided not to include Tip60-C into our final model as we lacked experimental evidence for its role. However, whether or not Tip60-C is the complex responsible for the DNA methylation-insensitive pathway does not influence our final conclusion that SRCAP-C plays a major role in DNA methylation-sensitive H2A.Z loading. We are planning to edit our manuscript to more comprehensively discuss these points.

      Please note that while Berta et al reported that DNA methylation increases at H2A.Z loci in tumors defective in SRCAP-C, they selected those regions based off where H2A.Z is typically enriched within normal tissues (Berta et al., 2021). They did not show data indicating whether H2A.Z is still retained specifically at those analyzed loci upon mutation of SRCAP-C subunits. Thus, although we greatly admire their work and are pleased that many of our findings align with theirs, their paper did not directly address whether SRCAP-C itself differentiates between DNA methylation status nor the impact that has on H2A.Z and DNA methylation colocalization. In contrast, our Xenopus egg extract system, where de novo methylation is undetectable (Nishiyama et al., 2013; Wassing et al., 2024) offers a unique opportunity to examine the direct impact of DNA methylation on H2A.Z deposition using controlled synthetic DNA substrates. Corroborated with our demonstration that DNA binding of SRCAP-C is suppressed by DNA methylation, we believe that our manuscript provides a specific mechanism that can explain the preferential deposition of H2A.Z at hypomethylated genomic regions.

      Reviewer #2 (Public review):

      This manuscript aims to elucidate the mechanistic basis for the long-standing observation that DNA methylation and the histone variant H2A.Z occupy mutually exclusive genomic regions. The authors test two hypotheses: (i) that DNA methylation intrinsically destabilizes H2A.Z nucleosomes, thereby preventing H2A.Z retention, and (ii) that DNA methylation suppresses H2A.Z deposition by ATP-dependent chromatin-remodelling complexes. However, neither hypothesis is rigorously addressed. There are experimental caveats, issues with data interpretation, and conclusions that are not supported by the data. Substantial revision and additional experiments, including controls, would be required before mechanistic conclusions can be drawn. Major concerns are as follows:

      We appreciate the critical assessment of our manuscript by this reviewer. Although we acknowledge the limitations of our study and will revise the manuscript to better describe them, we would like to respectfully argue against the statement that our "conclusions […] are not supported by the data".

      (1) The cryo-EM structure of methylated H2A.Z nucleosomes is insufficiently resolved to address the central mechanistic question: where the methylated CpGs are located relative to DNA-histone contact points and how these modifications influence H2A.Z nucleosome structure. The structure provides no mechanistic insights into methylation-induced destabilization.

      The fact that the DNA resolution in the methylated structure was not high enough to resolve the positions of methylated CpGs despite a high overall resolution of 2.78 Å implies that 1) the Sat2R-P DNA was not as stably registered as the 601L sequence, requiring us to create two alternative Sat2R-P atomic models to account for the variable positioning in our samples, and 2) that the presence of DNA methylation increases that positional variability. We understand that one may prefer to see highly resolved density around each methylation mark, but we do believe that our inability to accomplish that is actually a feature rather than a weakness and has important biological implications. The decrease in local DNA resolution on the methylated Sat2R-P structure compared to its unmethylated counterpart is meaningful and suggests to us that DNA methylation weakens overall DNA wrapping and positioning on the nucleosome, supported by the increased flexibility seen at the linker DNA ends as well as an increase in the population of highly shifted nucleosomes amongst the methylated particles. Additionally, one major view in the DNA methylation/nucleosome stability field is that the presence of DNA methylation can make DNA stiffer and harder to bend, causing opening and destabilization of nucleosomes (Ngo et al., 2016). The increased opening of linker DNA ends and accessibility of methylated H2A.Z nucleosomes in our hands also aligns with such an idea, again suggesting decreased histone-DNA contact stability on methylated DNA substrates. We plan to revise the writing in our manuscript to better reflect these ideas.

      The experimental system also lacks physiological relevance. The template DNA sequence is artificial, despite the existence of well-characterised native genomic sequences for which DNA methylation is known to inhibit H2A.Z incorporation. Alternatively, there are a number of studies examining the effect of DNA methylation on nucleosome structure, stability, DNA unwrapping, and positioning. Choosing one of these DNA sequences would have at least allowed a direct comparison with a canonical nucleosome. Indeed, a major omission is the absence of a cryo-EM structure of a canonical nucleosome assembled on the same DNA template - this is essential to assess whether the observed effects are H2A.Z-specific.

      The reviewer raises a fair question about whether canonical H2A would experience the same DNA methylation-dependent structural effects. We had considered solving the H2A structures, however, ultimately decided against it for a few reasons. First, there already exists crystal structures of canonical H2A nucleosomes using a DNA sequence highly similar to our Sat2R-P with and without the presence of DNA methylation (PDB: 5CPI and 5CPJ). The authors of this study did not see any physical differences present in their structures (Osakabe et al., 2015). Additionally, we had included canonical H2A conditions within our restriction enzyme accessibility assay and did not see a significant impact of DNA methylation on those samples (Fig 3). Because of the previous report and our own negative data, we expected that only limited additional insights would be obtained from the canonical H2A structures and decided not to pursue that analysis.

      One of the primary reasons we chose the Sat2R-P sequence was, as noted above, that there already was a published study examining how DNA methylation affects nucleosome structure using a variant of this sequence which we could compare to our results, as the reviewer has suggested. We did have to modify the sequence, namely by making it palindromic, in order to increase the final achievable resolution. We viewed the Sat2R-P sequence as an attractive candidate because it is physiologically relevant; the initial sequence was taken directly from human satellite II. Several modifications were made for technical reasons, including making the sequence palindromic as described above and also ensuring that each CpG is recognizable by a methylation-sensitive restriction enzyme so that we could be certain about the degree of methylation on our substrates. These practical concerns outweighed the necessity of maintaining a strict physiological sequence to us. However, we still believe the final Sat2R-P more closely mimics physiological sequences than Widom 601. Additionally, human satellite II is a highly abundant sequence in the human genome that is known to undergo large methylation changes on the onset of many disorders, like cancer, as well as during aging. Thus, there are interesting biological questions surrounding how the methylation state of this particular sequence affects chromatin structure. Furthermore, it has been reported that satellite II is devoid of H2A.Z (Capurso et al., 2012). Beyond those reasons, the satellite II sequence is generally interesting to our lab because we have been studying genes involved in ICF syndrome, where hypomethylation of satellite II sequences forms one of the hallmarks of this disorder (Funabiki et al., 2023; Jenness et al., 2018; Wassing et al., 2024). We understand that sequence context plays a large role in nucleosome wrapping and stability. This is why we strived to test multiple sequences in each of our assays. We do agree that it would be interesting to use DNA sequences where H2A.Z binding has already been described to be affected in a DNA methylation-dependent manner, forming an exciting future study to pursue.

      Furthermore, the DNA template is methylated at numerous random CpG sites. The authors' argument that only the global methylation level is relevant is inconsistent with the literature, which clearly demonstrates that methylation effects on canonical nucleosomes are position-dependent. Not all CpG sites contribute equally to nucleosome stability or unwrapping, and this critical factor is not considered.

      We did not argue that only the global methylation level is relevant. We also would appreciate it if the reviewer could provide specific references that "clearly demonstrates that methylation effects on canonical nucleosomes are position-dependent". We are aware of a series of studies conducted by Chongli Yuan's group, including one testing the effect of placing methylated CpGs at different positions along the Widom 601 sequence. In that study (Jimenez-Useche et al., 2013), they did find that positioning of mCpGs has differential impacts on the salt resistance of the nucleosomes, with 5 tandem mCpG copies at the dyad causing the most dramatic nucleosome opening whereas having mCpGs only at the DNA major grooves, but not elsewhere, increased nucleosome stability. However, they did also find that methylation of the original Widom 601 sequence also caused destabilization, albeit to a lesser degree, and another study by the same group (Jimenez-Useche et al., 2014) also found that CpG methylation decreased nucleosome-forming ability for all tested variants of the Widom 601 sequence, regardless of CpG density or positioning.

      Other studies monitored how distribution of methylated CpGs correlates with nucleosome positioning (Collings et al., 2013; Davey et al., 1997; Davey et al., 2004). However, these studies assessed the sequence-dependent effects specifically on nucleosome assembly during in vitro salt dialysis, which is a different physical process than the one our manuscript focuses on, especially when considering the fact that H2A.Z is deposited onto preassembled H2A-nucleosome. Our cryo-EM analysis examines the structural changes induced by DNA methylation on already formed nucleosomes rather than the process of formation. Thus, probing accessibility changes using a restriction enzyme was the more appropriate biochemical assay to verify our structures.

      We do very much agree that DNA context can influence nucleosome stability under different conditions. A study of molecular dynamics simulations concluded that the "combination of overall DNA geometrical and shape properties upon methylation" makes nucleosomes resistant to unwrapping (Li et al., 2022), while another modeling study suggests that DNA methylation impacts nucleosome stability in a manner dependent on DNA sequence, where "[s]trong binding is weakened and weak binding is strengthened" (Minary and Levitt, 2014). While G/C-dinucleotides are preferentially placed at major groove-inward positions in the nucleosomes in vivo (Chodavarapu et al., 2010; Segal et al., 2006) and G/C-rich segments are excluded from major groove-outward positions in Widom 601-like nucleosomes (Chua et al., 2012), methylated CpG dinucleotides are preferably, if not exclusively, located at major groove-outward positions in vivo. Mechanisms behind this biased mCpG positioning on the nucleosome remain speculative, likely caused by a combination of multiple factors, but the fact that we did not observe clear structural impacts using the Widom 601L sequence, where mCpGs are located at the major groove-outward and -inward positions ((Chua et al., 2012) and our structure), deserves a space for discussion. On the other hand, positioning of mCpG on satellite II-derived sequences that we used in this study was based on a physiological sequence, and thus it may not be appropriate to say that those CpGs are placed at multiple "random" positions. Although we decided not to discuss the position of 5mC on our Sat2R nucleosome structure due to ambiguous base assignments, neither of our two atomic models is consistent with an idea that DNA methylation repositions the CpG to the outward major grooves. As the potential contribution of how DNA methylation affects the nucleosome structure via modulating DNA stiffness has been extensively studied (Choy et al., 2010; Li et al., 2022; Ngo et al., 2016; Perez et al., 2012), we believe that it is appropriate to consider overall DNA properties along the whole DNA sequence, though we are willing to discuss potential positional effects in the revised manuscript.

      Perhaps one of the most important points that we did not emphasize enough in our original manuscript was that in contrast to the subtle intrinsic effect of DNA methylation that was DNA sequence dependent, we observed SRCAP-dependent preferential H2A.Z deposition to unmethylated DNA over methylated DNA in both 601 and satellite II DNAs. In the revised manuscript, we will make the value of comparative studies on 601 and satellite II in two distinct mechanisms.

      Finally, and most importantly, the reported increase in accessibility of the methylated H2A.Z nucleosome is negligible compared with the much larger intrinsic DNA accessibility of the unmethylated H2A.Z nucleosome. These data do not support the authors' hypothesis and contradict the manuscript's conclusions. Claims that methylated H2A.Z nucleosomes are "more open and accessible" must therefore be removed, and the title is misleading, given that no meaningful impact of DNA methylation on H2A.Z nucleosome stability is demonstrated.

      We respectfully disagree with this reviewer's criticism. We investigated the potential impact of DNA methylation on nucleosome stability to the best of our abilities through complementary assays and reported our observations. The effect of DNA methylation is smaller than the difference between H2A.Z and H2A, but we were able to see an effect. It is also not uncommon for small differences to have functional impacts in biological systems. We agree that further testing is required to determine whether this subtle effect is functionally important, and it remains the subject of future research due to the many technical challenges associated with addressing said question. We would like to note that 18 years have passed since Daniel Zilberman first reported the antagonistic relationship between H2AZ and DNA methylation (Zilberman et al., 2008) but very few studies have since directly tested specific mechanistic hypotheses. We believe that our study lays the groundwork for exciting future investigation that better elucidates the pathways that contribute to this antagonism and will have meaningful impacts on the field in general. However, thanks to the reviewer's criticism, we realized that we did not clearly state in the Abstract the relatively subtle effect of DNA methylation on the intrinsic H2A.Z nucleosome stability. Therefore, we will accordingly revise the Abstract to make this point clearer.

      (2) The cryo-EM structures of methylated and unmethylated 601L H2A.Z nucleosomes show no detectable differences. As presented, this negative result adds little value. If anything, it reinforces the point that the positional context of CpG methylation is critical, which the manuscript does not consider.

      We believe the inclusion and factual reporting of negative data is important for the scientific community as one of the major issues currently in biology research is biased omission of negative data. We considered eLife as a venue to publish this work for this reason. We understand that the reviewer believes our 601L structures may detract from the overall message of our manuscript. We believe this data rather emphasizes the importance of DNA sequence context, something that the reviewer also rightfully notes. It is standard practice in the nucleosome field to use the Widom 601 sequence, along with its variants. Our experience has shown that use of an artificially strong positioning sequence may mask weaker physical effects that could play a physiological role. Thus, we were careful to validate all further assays with multiple DNA sequences and believed it important to report these sequence-dependent effects on nucleosome structure.

      (3) Very little H3 signal coincides with H2A.Z at TSSs in sperm pronuclei, yet this is neither explained nor discussed (Supplementary Figure 10D). The authors need to clarify this.

      Our H3 signal, which represents the global nucleosome population, is more broadly distributed across the genome than H2A.Z, which is known to localize at specific genomic sites. Since both histone types were sequenced to similar read depths, H3 peaks are generally shallower than H2A.Z and peak heights cannot be directly compared (i.e. they should be represented in separate appropriate data ranges).

      (4) In my view, the most conceptually important finding is that H2A.Z-associated reads in sperm pronuclei show ~43% CpG methylation. This directly contradicts the model of strict mutual exclusivity and suggests that the antagonism is context-dependent. Similarly, the finding that the depletion of SRCAP reduces H2A.Z deposition only on unmethylated templates is also very intriguing. Collectively, these result warrants further investigation (see below).

      (5) Given that H2A.Z is located at diverse genomic elements (e.g., enhancers, repressed gene bodies, promoters), the manuscript requires a more rigorous genomic annotation comparing H2A.Z occupancy in sperm pronuclei versus XTC-2 cells. The authors should stratify H2A.Z-DNA methylation relationships across promoters, 5′UTRs, exons, gene bodies, enhancers, etc., as described in Supplementary Figure 10A.

      (below is response to (4) and (5) together)

      We agree that the substantial presence of co-localized H2A.Z and DNA methylation specifically in the sperm pronuclei samples and the changes in pattern between nuclear types are highly interesting and require further investigation. However, we faced technical challenges in our sequencing experiments that made us refrain from conducting a more detailed analysis for fear of over-interpreting potential artifacts. These challenges mainly stemmed from the difficulties in collecting enough material from Xenopus egg extracts and Tn5’s innate bias towards accessible regions of the genome. Because of this, open regions of the genome tend to be overrepresented in our data (as noted in our Discussion), making it challenging to rigorously compare methylation profiles and H2A.Z/H3 associated genomic elements.

      While the degree of separation seems to be dependent on nuclei type, we still believe the antagonism exists in both the sperm pronuclei and XTC-2 samples when comparing H2A.Z methylation profiles to the corresponding H3 condition. Our study also demonstrates that H2A.Z is preferentially deposited to hypomethylated DNA in a manner dependent of SRCAP-C (the loss of SRCAP only reduces H2A.Z on unmethylated substrates) but an additional methylation-insensitive H2A.Z deposition mechanism also exists. We realized that this interesting point was not clearly highlighted in Abstract, so we will revise it accordingly.

      (6) Although H2A.Z accumulates less efficiently on exogenous methylated substrates in egg extract, substantial deposition still occurs (~50%). This observation directly challenges the strong antagonistic model described in the manuscript, yet the authors do not acknowledge or discuss it. Moreover, differences between unmethylated and methylated 601 DNA raise further questions about the biological relevance of the cryo-EM 601 structures.

      As depicted in Figure 6 and described in the Discussion, we clearly indicated that both methylation-sensitive and methylation-insensitive pathways exist to deposit H2A.Z within the genome. We also directly stated in our Discussion that a substantial proportion of H2A.Z colocalizes with DNA methylation both in our study as well as in previous reports, which is of major interest for future study. Additionally, we further discussed how the absence of transcription in Xenopus eggs is a likely reason for the more limited effect of DNA methylation restricting H2A.Z deposition in our egg extract system.

      As noted in our response to (2), the lack of a clear impact on our 601L structures implies that this is due to the extraordinarily strong artificial nucleosome positioning capacity of the 601 sequence and its variants. Since 601 is heavily used in chromatin biology, including within DNA methylation research, such negative data are still useful to include and publish.

      (7) The SRCAP depletion is insufficiently validated i.e., the antibody-mediated depletion of SRCAP lacks quantitative verification. A minimum of three biological replicates with quantification is required to substantiate the claims.

      We are willing to address this concern. However, please note that our data showed that methylation-dependent H2A.Z deposition is almost completely erased upon SRCAP depletion, indicating functionally effective depletion. The specificity of the custom antibody against Xenopus SRCAP was verified by mass spectrometry. Additionally, we have obtained the same effect using another commercially available SRCAP antibody, though we did not include this preliminary result in our original manuscript. Due to its relatively low abundance and high molecular weight, SRCAP western blot signals are weak, making it challenging to quantify the degree of depletion. We also believe that the value of quantification in this context, with the points noted above, is rather limited. In the past, our lab has published papers on depleting the H3T3 kinase Haspin from Xenopus egg extracts (Ghenoiu et al., 2013; Kelly et al., 2010) but were never able to detect Haspin via western blot. This protein was only detected by mass spectrometry specifically on nucleosome array beads with H3K9me3 (Jenness et al., 2018). However, depletion of Haspin was readily monitored by erasure of H3T3ph, the enzymatic product of Haspin. In these experiments, it was impossible, and not critical, to quantitatively monitor the depletion of Haspin protein in order to investigate its molecular functions. Similarly, in this current study, the important fact is that depletion of SRCAP suppressed methylation-sensitive H2A.Z deposition and quantifying the degree of SRCAP depletion would not have a major impact on this conclusion.

      (8) It appears that the role of p400-Tip60 has been completely overlooked. This complex is the second major H2A.Z deposition complex. Because p400 exhibits DNA methylation-insensitive binding (Supplementary Figure 14), it may account for the deposition of H2A.Z onto methylated DNA. This possibility is highly significant and must be addressed by repeating the key experiments in Figure 5 following p400-Tip60 depletion.

      We are aware that the Tip60 complex is a very likely candidate for mediating DNA methylation-insensitive H2A.Z deposition, which is why we tested whether DNA binding of p400 is methylation sensitive. Therefore, the reviewer's statement that we "completely overlooked" Tip60-C’s role does not fairly report on our efforts. We wished to test the potential contribution of Tip60-C, but, unfortunately, the antibodies we currently have available to us were not successful in depleting the complex from egg extract. Since we had no direct experimental evidence indicating the role Tip60-C plays, we decided to take a conservative approach to our model and leave the methylation-insensitive pathway as mediated by something still unidentified. While further investigating Tip60-C’s contribution to this pathway is of definite value, we do not believe that it impacts our major conclusion that SRCAP-C is the main mediator responsible for H2A.Z deposition on unmethylated DNA and thus remains a subject for future study.

      (9) The manuscript repeatedly states that H2A.Z nucleosomes are intrinsically unstable; however, this is an oversimplification. Although some DNA unwrapping is observed, multiple studies show that H3/H4 tetramer-H2A.Z/H2B interactions are more stable (important recent studies include the following: DOI: 10.1038/s41594-021-00589-3; 10.1038/s41467-021-22688-x; and reviewed in 10.1038/s41576-024-00759-1).

      We understand that the H2A.Z stability field is highly controversial. We have introduced the many conflicting reports that have been published in the field but can further expand on the controversies if desired. We also understand that the term “nucleosome stability” is broad and encompasses many physical aspects. As noted in a prior response, we will better specify our use of the term within the manuscript. In our assays, we are most focused on the DNA wrapping stability of the nucleosome and have consistently seen in our hands that H2A.Z nucleosomes are much more open and accessible compared to canonical H2A on satellite II-derived sequences, regardless of methylation status. However, we do understand that many groups have observed the opposite findings while others have obtained results similar to us. We reported on our findings of the general H2A.Z stability with the hopes to help clarify some of the field’s controversies.

      In summary, the current manuscript does not present a convincing mechanistic explanation for the antagonism between DNA methylation and H2A.Z. The observation that H2A.Z can substantially coexist with DNA methylation in sperm pronuclei, perhaps, should be the conceptual focus.

      We appreciate this reviewer’s advice. However, please note that the first author who led this project has already successfully defended their PhD thesis primarily based on this project, making it impractical and unrealistic to completely change the focus of this manuscript to include an entirely new avenue of research. We believe that our data provide important insights into the mechanisms by which H2A.Z is excluded from methylated DNA, particularly via the DNA methylation-sensitive binding of SRCAP-C, which has never been described before. We agree that many questions are still left unanswered, including the exact molecular mechanism behind how DNA methylation prevents SRCAP-C binding. We have preliminary data that suggest none of the known DNA-binding modules of SRCAP-C, including ZNHIT1, by themselves can explain this sensitivity. This implies that domain dissection in the context of the holo-SRCAP complex is required to fully address this question. We believe this represents a very exciting future avenue of study; however, it does not negate our finding that SRCAP-C itself is important for maintaining the DNA methylation/H2A.Z antagonism. Therefore, we respectfully disagree with this reviewer's summary statement, which misleadingly undermines the impact of our work.

      Reviewer #3 (Public review):

      Summary:

      Histone variant H2A.Z is evolutionarily conserved among various species. The selective incorporation and removal of histone variants on the genome play crucial roles in regulating nuclear events, including transcription. Shih et al. aimed to address antagonistic mechanisms between histone variant H2A.Z deposition and DNA methylation. To this end, the authors reconstituted H2A.Z nucleosomes in vitro using methylated or unmethylated human satellite II DNA sequence and examined how DNA methylation affects H2A.Z nucleosome structure and dynamics. The cryo-EM analysis revealed that DNA methylation induces a more open conformation in H2A.Z nucleosomes. Consistent with this, their biochemical assays showed that DNA methylation subtly increases restriction enzyme accessibility in H2A.Z nucleosomes compared with canonical H2A nucleosomes. The authors identified genome-wide profiles of H2A.Z and DNA methylation using genomic assays and found their unique distribution between Xenopus sperm pronuclei and fibroblast cells. Using Xenopus egg extract systems, the authors showed SRCAP complex, the chromatin remodelers for H2A.Z deposition, preferentially deposit H2A.Z on unmethylated DNA.

      Strengths:

      The study is solid, and most conclusions are well-supported. The experiments are rigorously performed, and interpretations are clear. The study presents a high-resolution cryo-EM structure of human H2A.Z nucleosome with methylated DNA. The discovery that the SRCAP complex senses DNA methylation is novel and provides important mechanistic insight into the antagonism between H2A.Z and DNA methylation.

      We are grateful that this reviewer recognizes the importance of our study.

      Weaknesses:

      The study is already strong, and most conclusions are well supported. However, it can be further strengthened in several ways.

      (1) It is difficult to interpret how DNA methylation alters the orientation of the H4 tail and leads to the additional density on the acidic patch. The data do not convincingly support whether DNA methylation enhances interactions with H2A.Z mono-nucleosomes, nor whether this effect is specific to methylated H2A.Z nucleosomes.

      The altered H4 tail orientation and extra density seen on the acidic patch were incidental findings that we thought could be interesting for the field to be aware of but decided not to follow up on as there were other structural differences that were more directly related to our central question. We do believe that the above two differences are linked to each other because we used a highly purified and homogenous sample for cryo-EM analysis and the H4 tail/acidic patch interaction is a well characterized contact that mediates inter-nucleosome interactions. Additionally, other groups have reported that the presence of DNA methylation causes condensation of both chromatin and bare DNA (cited within our manuscript), though the mechanics behind this phenomenon remain to be elucidated. We believed that our structure data may also align with those findings. However, the reviewer is fair in pointing out that we do not provide further experimental evidence in verifying the existence of these increased interactions. We can revise our writing to clarify that these points are currently hypotheses rather than validated results.

      (2) It remains unclear whether DNA methylation alters global H2A.Z nucleosome stability or primarily affects local DNA end flexibility. Moreover, while the authors showed locus-specific accessibility by HinfI digestion, an unbiased assay such as MNase digestion would strengthen the conclusions.

      We would like to thank the reviewer for bringing up these issues. Although our current data cannot explicitly clarify these possibilities, we favor an idea that DNA methylation specifically alters histone to DNA contacts and that this effect is felt globally across the entire nucleosome rather than only at specific locations. The intrinsic flexibility of linker DNA ends means that that region tends to exhibit the greatest differences under different physical influences, hence the focus on characterizing that area; flexibility of a thread on a spool is most pronounced at the ends. However, we also found that the DNA backbone of H2A.Z on methylated DNA had a lower local resolution compared to its unmethylated counterpart, despite that structure having a higher global resolution, which suggested to us that DNA positioning along the nucleosome is overall weaker under the presence of DNA methylation. This is corroborated by the increased population of open/shifted structures in our classification analysis. The reviewer raises a fair point about the use of a specific restriction enzyme versus MNase. We agree that our accessibility assay is highly influenced by the position of the restriction site and have previously seen that moving the cut site too close to the linker DNA end will abolish any DNA methylation-dependent differences. We did initially attempt an MNase digestion-based assay, but the data were not as reproducible as with the use of a specific restriction enzyme. We do not know the reason behind this irreproducibility though we believe that the processivity of MNase could make it difficult to capture subtle effects like those induced by DNA methylation on already highly accessible H2A.Z nucleosomes. Overall, while we believe that DNA methylation does exert a physical effect, its subtlety may explain the many contradictory studies present within the DNA methylation and nucleosome stability field.

      References

      Berta, D.G., H. Kuisma, N. Valimaki, M. Raisanen, M. Jantti, A. Pasanen, A. Karhu, J. Kaukomaa, A. Taira, T. Cajuso, S. Nieminen, R.M. Penttinen, S. Ahonen, R. Lehtonen, M. Mehine, P. Vahteristo, J. Jalkanen, B. Sahu, J. Ravantti, N. Makinen, K. Rajamaki, K. Palin, J. Taipale, O. Heikinheimo, R. Butzow, E. Kaasinen, and L.A. Aaltonen. 2021. Deficient H2A.Z deposition is associated with genesis of uterine leiomyoma. Nature. 596:398–403.

      Capurso, D., H. Xiong, and M.R. Segal. 2012. A histone arginine methylation localizes to nucleosomes in satellite II and III DNA sequences in the human genome. BMC Genomics. 13:630.

      Chodavarapu, R.K., S. Feng, Y.V. Bernatavichute, P.Y. Chen, H. Stroud, Y. Yu, J.A. Hetzel, F. Kuo, J. Kim, S.J. Cokus, D. Casero, M. Bernal, P. Huijser, A.T. Clark, U. Kramer, S.S. Merchant, X. Zhang, S.E. Jacobsen, and M. Pellegrini. 2010. Relationship between nucleosome positioning and DNA methylation. Nature. 466:388–392.

      Choy, J.S., S. Wei, J.Y. Lee, S. Tan, S. Chu, and T.H. Lee. 2010. DNA methylation increases nucleosome compaction and rigidity. J Am Chem Soc. 132:1782–1783.

      Chua, E.Y., D. Vasudevan, G.E. Davey, B. Wu, and C.A. Davey. 2012. The mechanics behind DNA sequence-dependent properties of the nucleosome. Nucleic Acids Res. 40:6338–6352.

      Collings, C.K., P.J. Waddell, and J.N. Anderson. 2013. Effects of DNA methylation on nucleosome stability. Nucleic Acids Res. 41:2918–2931.

      Davey, C., S. Pennings, and J. Allan. 1997. CpG methylation remodels chromatin structure in vitro. J Mol Biol. 267:276–288.

      Davey, C.S., S. Pennings, C. Reilly, R.R. Meehan, and J. Allan. 2004. A determining influence for CpG dinucleotides on nucleosome positioning in vitro. Nucleic Acids Res. 32:4322–4331.

      Funabiki, H., I.E. Wassing, Q. Jia, J.D. Luo, and T. Carroll. 2023. Coevolution of the CDCA7-HELLS ICF-related nucleosome remodeling complex and DNA methyltransferases. Elife. 12.

      Ghenoiu, C., M.S. Wheelock, and H. Funabiki. 2013. Autoinhibition and polo-dependent multisite phosphorylation restrict activity of the histone h3 kinase haspin to mitosis. Mol Cell. 52:734–745.

      Jenness, C., S. Giunta, M.M. Muller, H. Kimura, T.W. Muir, and H. Funabiki. 2018. HELLS and CDCA7 comprise a bipartite nucleosome remodeling complex defective in ICF syndrome. Proc Natl Acad Sci U S A. 115:E876–E885.

      Jimenez-Useche, I., J. Ke, Y. Tian, D. Shim, S.C. Howell, X. Qiu, and C. Yuan. 2013. DNA methylation regulated nucleosome dynamics. Sci Rep. 3:2121.

      Jimenez-Useche, I., D. Shim, J. Yu, and C. Yuan. 2014. Unmethylated and methylated CpG dinucleotides distinctively regulate the physical properties of DNA. Biopolymers. 101:517–524.

      Kelly, A.E., C. Ghenoiu, J.Z. Xue, C. Zierhut, H. Kimura, and H. Funabiki. 2010. Survivin reads phosphorylated histone H3 threonine 3 to activate the mitotic kinase Aurora B. Science. 330:235–239.

      Li, S., Y. Peng, D. Landsman, and A.R. Panchenko. 2022. DNA methylation cues in nucleosome geometry, stability and unwrapping. Nucleic Acids Res. 50:1864–1874.

      Minary, P., and M. Levitt. 2014. Training-free atomistic prediction of nucleosome occupancy. Proc Natl Acad Sci U S A. 111:6293–6298.

      Ngo, T.T., J. Yoo, Q. Dai, Q. Zhang, C. He, A. Aksimentiev, and T. Ha. 2016. Effects of cytosine modifications on DNA flexibility and nucleosome mechanical stability. Nat Commun. 7:10813.

      Nishiyama, A., L. Yamaguchi, J. Sharif, Y. Johmura, T. Kawamura, K. Nakanishi, S. Shimamura, K. Arita, T. Kodama, F. Ishikawa, H. Koseki, and M. Nakanishi. 2013. Uhrf1-dependent H3K23 ubiquitylation couples maintenance DNA methylation and replication. Nature. 502:249–253.

      Osakabe, A., F. Adachi, Y. Arimura, K. Maehara, Y. Ohkawa, and H. Kurumizaka. 2015. Influence of DNA methylation on positioning and DNA flexibility of nucleosomes with pericentric satellite DNA. Open Biol. 5.

      Perez, A., C.L. Castellazzi, F. Battistini, K. Collinet, O. Flores, O. Deniz, M.L. Ruiz, D. Torrents, R. Eritja, M. Soler-Lopez, and M. Orozco. 2012. Impact of methylation on the physical properties of DNA. Biophys J. 102:2140–2148.

      Segal, E., Y. Fondufe-Mittendorf, L. Chen, A. Thastrom, Y. Field, I.K. Moore, J.P. Wang, and J. Widom. 2006. A genomic code for nucleosome positioning. Nature. 442:772–778.

      Wassing, I.E., A. Nishiyama, R. Shikimachi, Q. Jia, A. Kikuchi, M. Hiruta, K. Sugimura, X. Hong, Y. Chiba, J. Peng, C. Jenness, M. Nakanishi, L. Zhao, K. Arita, and H. Funabiki. 2024. CDCA7 is an evolutionarily conserved hemimethylated DNA sensor in eukaryotes. Sci Adv. 10:eadp5753.

      Zilberman, D., D. Coleman-Derr, T. Ballinger, and S. Henikoff. 2008. Histone H2A.Z and DNA methylation are mutually antagonistic chromatin marks. Nature. 456:125–129.

    1. eLife Assessment

      This study uses a valuable combination of functional magnetic resonance imaging and electroencephalography (EEG) to study brain activity related to prediction errors in relation to both sensorimotor and more complex cognitive functions. It provides incomplete evidence to suggest that prediction error minimisation drives brain activity across both types of processing and that elevated inter-regional functional coupling along a superior-inferior axis is associated with high prediction error, whereas coupling along a posterior-anterior axis is associated with low prediction error. The manuscript will be of interest to neuroscientists working on predictive coding and decision-making, but would benefit from more precise localisation of EEG sources and more rigorous statistical controls.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates whether prediction error extends beyond lower-order sensory-motor processes to include higher-order cognitive functions. Evidence is drawn from both task-based and resting-state fMRI, with addition of resting-state EEG-fMRI to examine power spectral correlates. The results partially support the existence of dissociable connectivity patterns: stronger ventral-dorsal connectivity is associated with high prediction error, while posterior-anterior connectivity is linked to low prediction error. Furthermore, spontaneous switching between these connectivity patterns was observed at rest and correlated with subtle intersubject behavioral variability.

      Strengths:

      Studying prediction error from the lens of network connectivity provides new insights into predictive coding frameworks. The combination of various independent datasets to tackle the question adds strength, including two well-powered fMRI task datasets, resting-state fMRI interpreted in relation to behavioral measures, as well as EEG-fMRI.

      Minor Weakness:

      The lack of spatial specificity of sensor-level EEG somewhat limits the inferences that can be obtained in terms of how the fMRI network processes and the EEG power fluctuations relate to each other.<br /> While the language no longer suggests a strong overlap of the source of the two signals, several scenarios remain open (e.g., the higher-order fMRI networks being the source of the EEG oscillations, or the networks controlling the EEG oscillations expressed in lower-order cortices, or a third process driving both the observations in fMRI networks and EEG oscillations...) and somewhat weaken interpretability of this section.

      Comments on revisions:

      My prior recommendations have been mostly addressed.

      Questions remaining about the NBS results:

      The authors write about the NBS cluster: "Visual examination of the cluster roughly points to the same four posterior-anterior and ventral-dorsal modules identified formally in main-text ". I think it might be good to add quantification, not just visual inspection. The size of the significant NBS cluster should be reported. What proportion of the edges that passed uncorrected threshold and entered NBS were part of the NBS cluster? Put simply, I don't think any edges beyond those passing NBS-based correction should be interpreted or used downstream in the manuscript.

      Also, NBS is not typically used by collapsing over effects in two effect directions, but the authors use NBS on the absolute value of Z. I understand the logic of the general manuscript focusing on strength rather than direction, but here I am wondering about the methodological validity. I believe that the editor who is an expert on the methodology may be able to comment on the validity of this approach (as opposed to running two separate NBS analyses for the two directions of effect).

    3. Reviewer #2 (Public review):

      Summary:

      This paper investigates putative networks associated with prediction errors in task-based and resting state fMRI. It attempts to test the idea that prediction errors minimisation includes abstract cognitive functions, referred to as global prediction error hypothesis, by establishing a parallel between networks found in task-based fMRI where prediction errors are elicited in a controlled manner and those networks that emerge during "resting state".

      Strengths:

      Clearly a lot of work and data went into this paper, including 2 task-based fMRI experiments and the resting state data for the same participants, as well as a third EEG-fMRI dataset. Overall well written with a couple of exceptions on clarity as per below and the methodology appears overall sound, with a couple of exceptions listed below that require further justification. It does a good job of acknowledging its own weakness.

      Weaknesses:

      The paper does a good job of acknowledging its greatest weakness, the fact that it relies heavily on reverse inference, but cannot quite resolve it. As the authors put, "finding the same networks during a prediction error task and during rest does not mean that the networks engagement during rest reflect prediction error processing". Again, the authors acknowledge the speculative nature of their claims in the discussion, but given that this is the key claim and essence of the paper, it is hard to see how the evidence is compelling to support that claim.

      Given how uncontrolled cognition is during "resting-state" experiments, the parallel made with prediction errors elicited during a task designed to that effect is a little difficult to make. How often are people really surprised when their brains are "at rest", likely replaying a previously experienced event or planning future actions under their control? It seems to be more likely a very low prediction error scenario, if at all surprising.

      The quantitative comparison between networks under task and rest was done on a small subset of the ROIs rather than on the full network - why? Noting how small the correlation between task and rest is (r=0.021) and that's only for part of the networks, the evidence is a little tenuous. Running the analysis for the full networks could strengthen the argument.

      Looking at the results in Figure 2C, the four-quadrant description of the networks labelled for low and high PE appears a little simplistic. The authors state that this four-quadrant description omits some ROIs as motivated by prior knowledge. This would benefit from a more comprehensive justification. Which ROIs are excluded and what is the evidence for exclusion?

      The EEG-fMRI analysis claiming 3-6Hz fluctuations for PE is hard to reconcile with the fact that fMRI captures activity that is a lot slower while some PEs are as fast as 150 ms. The discussion acknowledges this but doesn't seem to resolve it - would benefit from a more comprehensive argument.

      Comments on revisions:

      The authors have done a good job of addressing the issues raised during the review process. There is one issue remaining that still required attention. In R2.4. when referring to "existing knowledge of prominent structural pathways among these quadrants" please cite the relevant literature.

    4. Reviewer #3 (Public review):

      Summary:

      Bogdan et al. present an intriguing investigation into the spontaneous dynamics of prediction error (PE)-related brain states. Using two independent fMRI tasks designed to elicit prediction and prediction error in separate participant samples, alongside both fMRI and EEG data, the authors identify convergent brain network patterns associated with high versus low PE. Notably, they further show that similar patterns can be detected during resting-state fMRI, suggesting that PE-related neural states may recur outside of explicit task demands.

      Strengths:

      The authors use a well-integrated analytic framework that combines multiple prediction tasks and brain imaging modalities. The inclusion of several datasets probing PE under different contexts strengthens the claim of generalizability across tasks and samples. The open sharing of code and data is commendable and will be valuable for future work seeking to build on this framework.

      Weaknesses:

      A central challenge of the manuscript lies in interpreting the functional significance of PE-related brain network states during rest. Demonstrating that a task-defined cognitive state recurs spontaneously is intriguing, but without clear links to behavior, individual traits, or experiential content during rest, it remains difficult to interpret what such spontaneous brain states tell us about the mind and brain. For example, it is unclear whether these states support future inference or learning, reflect offline predictive processing, or instead suggest state reinstatement due to a more general form of neural plasticity and circuit dynamics in the brain. Demonstrating any one of these downstream relationships would be valuable since it has the potential to inform our understanding of cognitive function or more general principles of neural organization.

      I appreciate the authors' position that establishing the existence of such states is a necessary first step, and that future work may clarify their behavioral relevance. However, the current form makes it challenging to assess the conceptual advance of the present work in isolation.

      Relatedly, in my previous review I raised questions about both across- and within-individual variability-for example, whether individuals who exhibit stronger or more distinct PE-related fluctuations at rest also show superior performance on prediction-related tasks (across-individual), or whether momentary increases in PE-network expression during tasks relate to faster or more accurate prediction (within-individual). The authors thoughtfully addressed this suggestion by conducting an individual-differences analysis correlating each participant's fluctuation amplitude with approximately 200 behavioral and trait measures from the HCP dataset.

      The reported findings-a negative association with age and card-sorting performance, alongside a positive association with age-adjusted picture sequence memory-are interesting but difficult to interpret within a coherent functional framework. As presented, these results do not clearly support the idea that spontaneous PE-state fluctuations are related to enhancement in prediction, inference, or broader cognitive function. Instead, they raise the possibility that fluctuation amplitude may reflect more general factors (e.g., age) rather than a functionally meaningful PE-related process.

      Overall, while the methodological contribution is strong, the manuscript would benefit from a clearer articulation of what functional conclusions can or cannot be drawn from the presence of spontaneous PE-related states, as well as a more cautious framing of their potential cognitive significance.

      Further comments:

      I appreciate that the authors took my earlier suggestions seriously and incorporated additional analyses examining behavioral relevance and permutation tests in the revision.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The Reviewer structured their review such that their first two recommendations specifically concerned the two major weaknesses they viewed in the initial submission. For clarity and concision, we have copied their recommendations to be placed immediately following their corresponding points on weaknesses.

      Strengths:

      Studying prediction error from the lens of network connectivity provides new insights into predictive coding frameworks. The combination of various independent datasets to tackle the question adds strength, including two well-powered fMRI task datasets, resting-state fMRI interpreted in relation to behavioral measures, as well as EEG-fMRI.

      Weaknesses:

      Major:

      (R1.1) Lack of multiple comparisons correction for edge-wise contrast:

      The analysis of connectivity differences across three levels of prediction error was conducted separately for approximately 22,000 edges (derived from 210 regions), yet no correction for multiple comparisons appears to have been applied. Then, modularity was applied to the top 5% of these edges. I do not believe that this approach is viable without correction. It does not help that a completely separate approach using SVMs was FDR-corrected for 210 regions.

      [Later recommendation] Regarding the first major point: To address the issue of multiple comparisons in the edge-wise connectivity analysis, I recommend using the Network-Based Statistic (NBS; Zalesky et al., 2010). NBS is well-suited for identifying clusters (analogous to modules) of edges that show statistically significant differences across the three prediction error levels, while appropriately correcting for multiple comparisons.

      Thank you for bringing this up. We acknowledge that our modularity analysis does not evaluate statistical significance. Originally, the modularity analysis was meant to provide a connectome-wide summary of the connectivity effects, whereas the classification-based analysis was meant to address the need for statistical significance testing. However, as the reviewer points out, it would be better if significance were tested in a manner more analogous to the reported modules. As they suggest, we updated the Supplemental Materials (SM) to include the results of Network-Based Statistic analysis (SM p. 1-2):

      “(2.1) Network-Based Statistic

      Here, we evaluate whether PE significantly impacts connectivity at the network level using the Network-Based Statistic (NBS) approach.[1] NBS relied on the same regression data generated for the main-text analysis, whereby a regression is performed examining the effect of PE (Low = –1, Medium = 0, High = +1) on connectivity for each edge. This was done across the connectome, and for each edge, a z-score was computed. For NBS, we thresholded edges to |Z| > 3.0, which yielded one large network cluster, shown in Figure S3. The size of the cluster – i.e., number of edges – was significant (p < .05) per a permutation-test using 1,000 random shuffles of the condition data for each participant, as is standard.[1] These results demonstrate that the networklevel effects of PE on connectivity are significant. The main-text modularity analysis converts this large cluster into four modules, which are more interpretable and open the door to further analyses”.

      We updated the Results to mention these findings before describing the modularity analysis (p. 8-9):

      “After demonstrating that PE significantly influences brain-wide connectivity using Network-Based Statistic analysis (Supplemental Materials 2.1), we conducted a modularity analysis to study how specific groups of edges are all sensitive to high/low-PE information.”

      (R1.2) Lack of spatial information in EEG:

      The EEG data were not source-localized, and no connectivity analysis was performed. Instead, power fluctuations were averaged across a predefined set of electrodes based on a single prior study (reference 27), as well as across a broader set of electrodes. While the study correlates these EEG power fluctuations with fMRI network connectivity over time, such temporal correlations do not establish that the EEG oscillations originate from the corresponding network regions. For instance, the observed fronto-central theta power increases could plausibly originate from the dorsal anterior cingulate cortex (dACC), as consistently reported in the literature, rather than from a distributed network. The spatially agnostic nature of the EEG-fMRI correlation approach used here does not support interpretations tied to specific dorsal-ventral or anterior-posterior networks. Nonetheless, such interpretations are made throughout the manuscript, which overextends the conclusions that can be drawn from the data.

      [Later recommendation] Regarding the second major point: I suggest either adopting a source-localized EEG approach to assess electrophysiological connectivity or revising all related sections to avoid implying spatial specificity or direct correspondence with fMRI-derived networks. The current approach, which relies on electrode-level power fluctuations, does not support claims about the spatial origin of EEG signals or their alignment with specific connectivity networks.

      We thank the reviewer for this important point, which allows us to clarify the specific and distinct contributions of each imaging modality in our study. Our primary goal for Study 3 was to leverage the high temporal resolution of EEG to identify the characteristic frequency at which the fMRI-defined global connectivity states fluctuate. The study was not designed to infer the spatial origin of these EEG signals, a task for which fMRI is better suited and which we addressed in Studies 1 and 2.

      As the reviewer points out, fronto-central theta is generally associated with the dACC. We agree with this point entirely. We suspect that there is some process linking dACC activation to the identified network fluctuations – some type of relationship that does not manifest in our dynamic functional connectivity analyses – although this is only a hypothesis and one that is beyond the present scope.

      We updated the Discussion to mention these points and acknowledge the ambiguity regarding the correlation between network fluctuation amplitude (fMRI) and Delta/Theta power (EEG) (p. 24):

      “We specifically interpret the fMRI-EEG correlation as reflecting fluctuation speed because we correlated EEG oscillatory power with the fluctuation amplitude computed from fMRI data. Simply correlating EEG power with the average connectivity or the signed difference between posterior-anterior and ventral-dorsal connectivity yields null results (Supplemental Materials 6), suggesting that this is a very particular association, and viewing it as capturing fluctuation amplitude provides a parsimonious explanation. Yet, this correlation may be interpreted in other ways. For example, resting-state Theta is also a signature of drowsiness,[2] which may correlate with PE processing, but perhaps should be understood as some other mechanism. Additionally, Theta is widely seen as a sign of dorsal anterior cingulate cortex activity,3 and it is unclear how to reconcile this with our claims about network fluctuations. Nonetheless, as we show with simulations (Supplemental Materials 5), a correlation between slow fMRI network fluctuations and fast EEG Delta/Theta oscillations is also consistent with a common global neural process oscillating rapidly and eliciting both measures.”

      Regarding source-localization, several papers have described known limitations of this strategy for drawing precise anatomical inferences,[4–6] and this seems unnecessary given that our fMRI analyses already provide more robust anatomical precision. We intentionally used EEG in our study for what it measures most robustly: millisecond-level temporal dynamics.

      (R1.2a)Examples of problematic language include:

      Line 134: "detection of network oscillations at fast speeds" - the current EEG approach does not measure networks.

      This is an important issue. We acknowledge that our EEG approach does not directly measure fMRI-defined networks. Our claim is inferential, designed to estimate the temporal dynamics of the large-scale fMRI patterns we identified. The correlation between our fMRI-derived fluctuation amplitude (|PA – VD|) and 3-6 Hz EEG power provides suggestive evidence that the transitions between these network states occur at this frequency, rather than being a direct measurement of network oscillations.

      To support the validity of this inference, we performed two key analyses (now in Supplemental Materials). First, a simulation study provides a proof-of-concept, confirming our method can recover the frequency of a fast underlying oscillator from slow fMRI and fast EEG data. Second, a specificity analysis shows the EEG correlation is unique to our measure of fluctuation amplitude and not to simpler measures like overall connectivity strength. These analyses demonstrate that our interpretation is more plausible than alternative explanations.

      Overall, we have revised the manuscript to be more conservative in the language employed, such as presenting alternative explanations to the interpretations put forth based on correlative/observational evidence (e.g., our modifications above described in our response to comment R1.2). In addition, we have made changes throughout the report to state the issues related to reverse inference more explicitly and to better communicate that the evidence is suggestive – please see our numerous changes described in our response to comment R3.1. For the statement that the reviewer specifically mentioned here, we revised it to be more cautious (p. 7):

      “Although such speed outpaces the temporal resolution of fMRI, correlating fluctuations in dynamic connectivity measured from fMRI data with EEG oscillations can provide an estimate of the fluctuations’ speed. This interpretation of a correlation again runs up against issues related to reverse inference but would nonetheless serve as initial suggestive evidence that spontaneous transitions between network states occur rapidly.”

      (R1.2b) Line 148: "whether fluctuations between high- and low-PE networks occur sufficiently fast" - this implies spatial localization to networks that is not supported by the EEG analysis.

      Building on our changes described in our immediately prior response, we adjusted our text here to say our analyses searched for evidence consistent with the idea that the network fluctuations occur quickly rather than searching for decisive evidence favoring this idea (p. 7-8):

      “Finally, we examined rs-fMRI-EEG data to assess whether we find parallels consistent with the high/low-PE network fluctuations occurring at fast timescales suitable for the type of cognitive operations typically targeted by PE theories.”

      (R1.2c) Line 480: "how underlying neural oscillators can produce BOLD and EEG measurements" - no evidence is provided that the same neural sources underlie both modalities.

      As described above, these claims are based on the simulation study demonstrating that this is a possibility, and we have revised the manuscript overall to be clearer that this is our interpretation while providing alternative explanations.

      Reviewer #2 (Public review):

      Strengths:

      Clearly, a lot of work and data went into this paper, including 2 task-based fMRI experiments and the resting state data for the same participants, as well as a third EEG-fMRI dataset. Overall, well written with a couple of exceptions on clarity, as per below, and the methodology appears overall sound, with a couple of exceptions listed below that require further justification. It does a good job of acknowledging its own weakness.

      Weaknesses:

      (R2.1) The paper does a good job of acknowledging its greatest weakness, the fact that it relies heavily on reverse inference, but cannot quite resolve it. As the authors put it, "finding the same networks during a prediction error task and during rest does not mean that the networks' engagement during rest reflects prediction error processing". Again, the authors acknowledge the speculative nature of their claims in the discussion, but given that this is the key claim and essence of the paper, it is hard to see how the evidence is compelling to support that claim.

      We thank the reviewer for this comment. We agree that reverse inference is a fundamental challenge and that our central claim requires a particularly high bar of evidence. While no single analysis resolves this issue, our goal was to build a cumulative case that is compelling by converging on the same conclusion from multiple, independent lines of evidence.

      For our investigation, we initially established a task-general signature of prediction error (PE). By showing the same neural pattern represents PE in different contexts, we constrain the reverse inference, making it less likely that our findings are a task-specific artifact and more likely that they reflect the core, underlying process of PE. Building on this, our most compelling evidence comes from linking task and rest at the individual level. We didn't just find the same general network at rest; we showed that an individual’s unique anatomical pattern of PE-related connectivity during the task specifically predicts their own brain's fluctuation patterns at rest. This highly specific, person-by-person correspondence provides a direct bridge between an individual's task-evoked PE processing and their intrinsic, resting-state dynamics. Furthermore, these resting-state fluctuations correlate specifically with the 3-6 Hz theta rhythm—a well-established neural marker for PE.

      While reverse inference remains a fundamental limitation for many studies on resting-state cognition, the aspects mentioned above, we believe, provide suggestive evidence, favoring our PE interpretation. Nonetheless, we have made changes throughout the manuscript to be more conservative in the language we use to describe our results, to make it clear what claims are based on correlative/observational evidence, and to put forth alternative explanations for the identified effects. Please find our numerous changes detailed in our response to comment R3.1.

      (R2.2) Given how uncontrolled cognition is during "resting-state" experiments, the parallel made with prediction errors elicited during a task designed for that effect is a little difficult to make. How often are people really surprised when their brains are "at rest", likely replaying a previously experienced event or planning future actions under their control? It seems to be more likely a very low prediction error scenario, if at all surprising.

      We (and some others) take a broad interpretation of PE and believe it is often more intuitive to think about PE minimization in terms of uncertainty rather than “surprise”; the word “surprise” usually implies a sudden emotive reaction from the violation of expectations, which is not useful here.

      When planning future actions, each step of the plan is spurred by the uncertainty of what is the appropriate action given the scenario set up by prior steps. Each planned step erases some of that uncertainty. For example, you may be mentally simulating a conversation, what you will say, and what another person will say. Each step of this creates uncertainty of “what is the appropriate response?” Each reasoning step addresses contingencies. While planning, you may also uncover more obvious forms of uncertainty, sparking memory retrieval to finish it. A resting-state participant may think to cook a frozen pizza when they arrive home, but be uncertain about whether they have any frozen pizzas left, prompting episodic memory retrieval to address this uncertainty. We argue that every planning step or memory retrieval can be productively understood as being sparked by uncertainty/surprise (PE), and the subsequent cognitive response minimizes this uncertainty.

      We updated the Introduction to include a paragraph near the start providing this explanation (p. 3-4):

      “PE minimization may broadly coordinate brain functions of all sorts, including abstract cognitive functions. This includes the types of cognitive processes at play even in the absence of stimuli (e.g., while daydreaming). While it may seem counterintuitive to associate this type of cognition with PE – a concept often tied to external surprises – it has been proposed that the brain's internal generative model is continuously active.[12–14] Spontaneous thought, such as planning a future event or replaying a memory, is not a passive, low-PE process. Rather, it can be seen as a dynamic cycle of generating and resolving internal uncertainty. While daydreaming, you may be reminded of a past conversation, where you wish you had said something different. This situation contains uncertainty about what would have been the best thing to say. Wondering about what you wish you said can be viewed as resolving this uncertainty, in principle, forming a plan if the same situation ever arises again in the future. Each iteration of the simulated conversation repeatedly sparks and then resolves this type of uncertainty.”

      (R2.3)The quantitative comparison between networks under task and rest was done on a small subset of the ROIs rather than on the full network - why? Noting how small the correlation between task and rest is (r=0.021) and that's only for part of the networks, the evidence is a little tenuous. Running the analysis for the full networks could strengthen the argument.

      We thank the reviewer for this opportunity to clarify our method. A single correlation between the full, aggregated networks would be conceptually misaligned with what we aimed to assess. To test for a personspecific anatomical correspondence, it is necessary to examine the link between task and rest at a granular level. We therefore asked whether the specific parts of an individual's network most responsive to PE during the task are the same parts that show the strongest fluctuations at rest. Our analysis, performed iteratively across all 3,432 possible ROI subsets, was designed specifically to answer this question, which would be obscured by an aggregated network measure.

      We appreciate the reviewer's concern about the modest effect size (r = .021). However, this must be contextualized, as the short task scan has very low reliability (.08), which imposes a severe statistical ceiling on any possible task-rest correlation. Finding a highly significant effect (p < .001) in the face of such noisy data, therefore, provides robust evidence for a genuine task-rest correspondence.

      We updated the Discussion to discuss this point (p. 22-23):

      “A key finding supporting our interpretation is the significant link between individual differences in task-evoked PE responses and resting-state fluctuations. One might initially view the effect size of this correspondence (r = .021) as modest. However, this interpretation must be contextualized by the considerable measurement noise inherent in short task-fMRI scans; the split-half reliability of the task contrast was only .08. This low reliability imposes a severe statistical ceiling on any possible task-rest correlation. Therefore, detecting a highly significant (p < .001) relationship despite this constraint provides robust evidence for a genuine link. Furthermore, our analytical approach, which iteratively examined thousands of ROI subsets rather than one aggregated network, was intentionally granular. The goal was not simply to correlate two global measures, but to test for a personspecific anatomical correspondence – that is, whether the specific parts of an individual's network most sensitive to PE during the task are the same parts that fluctuate most strongly at rest. An aggregate analysis would obscure this critical spatial specificity. Taken together, this granular analysis provides compelling evidence for an anatomically consistent fingerprint of PE processing that bridges task-evoked activity and spontaneous restingstate dynamics, strengthening our central claim.”

      (R2.4) Looking at the results in Figure 2C, the four-quadrant description of the networks labelled for low and high PE appears a little simplistic. The authors state that this four-quadrant description omits some ROIs as motivated by prior knowledge. This would benefit from a more comprehensive justification.Which ROIs are excluded, and what is the evidence for exclusion?

      Our four-quadrant model is a principled simplification designed to distill the dominant, large-scale connectivity patterns from the complex modularity results. This approach focuses on coherent, well-documented anatomical streams while setting aside a few anatomically distant and disjoint ROIs that were less central to the main modules. This heuristic additionally unlocks more robust and novel analyses.

      The two low-PE posterior-anterior (PA) pathways are grounded in canonical processing streams. (i) The OCATL connection mirrors the ventral visual stream (the “what” pathway), which is fundamental for object recognition and is upregulated during the smooth processing of expected stimuli. (ii) The IPL-LPFC connection represents a core axis of the dorsal attention stream and the Fronto-Parietal Control Network (FPCN), reflecting the maintenance of top-down cognitive control when information is predictable; the IPL-LPFC module excludes ROIs in the middle temporal gyrus, which are often associated with the FPCN but are not covered here.

      In contrast, the two high-PE ventral-dorsal (VD) pathways reflect processes for resolving surprise and conflict. (i) The OC-IPL connection is a classic signature of attentional reorienting, where unexpected sensory input (high PE) triggers a necessary shift in attention; the OC-IPL module excludes some ROIs that are anterior to the occipital lobe and enter the fusiform gyrus and inferior temporal lobe. (ii) The ATL-LPFC connection aligns with mechanisms for semantic re-evaluation, engaging prefrontal control regions to update a mental model in the face of incongruent information.

      Beyond its functional/anatomical grounding, this simplification provides powerful methodological and statistical advantages. It establishes a symmetrical framework that makes our dynamic connectivity analyses tractable, such as our “cube” analysis of state transitions, which required overlapping modules. Critically, this model also offers a statistical safeguard. By ensuring each quadrant contributes to both low- and high-PE connectivity patterns, we eliminate confounds like region-specific signal variance or global connectivity. This design choice isolates the phenomenon to the pattern of connectivity itself (posterior-anterior vs. ventral-dorsal), making our interpretation more robust.

      We updated the end of the Study 1A results (p. 10-11):

      “Some ROIs appear in Figure 2C but are excluded from the four targeted quadrants (Figures 2C & 2D) – e.g., posterior inferior temporal lobe and fusiform ROIs are excluded from the OC-IPL module, and middle temporal gyrus ROIs are excluded from the IPL-LPFC modules. These exclusions, in favor of a four-quadrant interpretation, are motivated by existing knowledge of prominent structural pathways among these quadrants. This interpretation is also supported by classifier-based analyses showing connectivity within each quadrant is significantly influenced by PE (Supplemental Materials 2.2), along with analyses of single-region activity showing that these areas also respond to PE independently (Supplemental Materials 3). Hence, we proceeded with further analyses of these quadrants’ connections, which summarize PE’s global brain effects.

      “This four-quadrant setup also imparts analytical benefits. First, this simplified structure may better generalize across PE tasks, and Study 1B would aim to replicate these results with a different design. Second, the four quadrants mean that each ROI contributes to both the posterior-anterior and ventral-dorsal modules, which would benefit later analyses and rules out confounds such as PE eliciting increased/decreased connectivity between an ROI and the rest of the brain. An additional, less key benefit is that this setup allows more easily evaluating whether the same phenomena arise using a different atlas (Supplemental Materials Y).”

      (R2.5) The EEG-fMRI analysis claiming 3-6Hz fluctuations for PE is hard to reconcile with the fact that fMRI captures activity that is a lot slower, while some PEs are as fast as 150 ms. The discussion acknowledges this but doesn't seem to resolve it - would benefit from a more comprehensive argument.

      We thank the reviewer for raising this important point, which allows us to clarify the logic of our multimodal analysis. Our analysis does not claim that the fMRI BOLD signal itself oscillates at 3-6 Hz. Instead, it is based on the principle that the intensity of a fast neural process can be reflected in the magnitude of the slow BOLD response. It’s akin to using a long-exposure photograph to capture a fast-moving object; while the individual movements are blurred, the intensity of the blur in the photo serves as a proxy for the intensity of the underlying motion. In our case, the magnitude of the fMRI network difference (|PA – VD|) acts as the "blur," reflecting the intensity of the rapid fluctuations between states within that time window.

      Following this logic, we correlated this slow-moving fMRI metric with the power of the fast EEG rhythms, which reflects their amplitude. To bridge the different timescales, we averaged the EEG power over each fMRI time window and convolved it with the standard hemodynamic response function (HRF) – a crucial step to align the timing of the neural and metabolic signals. The resulting significant correlation specifically in the 3-6 Hz band demonstrates that when this rhythm is stronger, the fMRI data shows a greater divergence between network states. This allows us to infer the characteristic frequency of the underlying neural fluctuations without directly measuring them at that speed with fMRI, thus reconciling the two timescales.

      Reviewer #3 (Public review):

      Bogdan et al. present an intriguing and timely investigation into the intrinsic dynamics of prediction error (PE)-related brain states. The manuscript is grounded in an intuitive and compelling theoretical idea: that the brain alternates between high and low PE states even at rest, potentially reflecting an intrinsic drive toward predictive minimization. The authors employ a creative analytic framework combining different prediction tasks and imaging modalities. They shared open code, which will be valuable for future work.

      (R3.1) Consistency in Theoretical Framing

      The title, abstract, and introduction suggest inconsistent theoretical goals of the study.

      The title suggests that the goal is to test whether there are intrinsic fluctuations in high and low PE states at rest. The abstract and introduction suggest that the goal is to test whether the brain intrinsically minimizes PE and whether this minimization recruits global brain networks. My comments here are that a) these are fundamentally different claims, and b) both are challenging to falsify. For one, task-like recurrence of PE states during resting might reflect the wiring and geometry of the functional organization of the brain emerging from neurobiological constraints or developmental processes (e.g., experience), but showing that mirroring exists because of the need to minimize PE requires establishing a robust relationship with behavior or showing a causal effect (e.g., that interrupting intrinsic PE state fluctuations affects prediction).

      The global PE hypothesis-"PE minimization is a principle that broadly coordinates brain functions of all sorts, including abstract cognitive functions"-is more suitable for discussion rather than the main claim in the abstract, introduction, and all throughout the paper.

      Given the above, I recommend that the authors clarify and align their core theoretical goals across the title, abstract, introduction, and results. If the focus is on identifying fluctuations that resemble taskdefined PE states at rest, the language should reflect that more narrowly, and save broader claims about global PE minimization for the discussion. This hypothesis also needs to be contextualized within prior work. I'd like to see if there is similar evidence in the literature using animal models.

      Thank you for bringing up this issue. We have made changes throughout the paper to address these points. First, we have omitted reference to a “global PE hypothesis” from the Abstract and Introduction, in favor of structuring the Introduction in terms of a falsifiable question (p. 4):

      “We pursued this goal using three studies (Figure 1) that collectively targeted a specific question: Do the taskdefined connectivity signatures of high vs. low PE also recur during rest, and if so, how does the brain transition between exhibiting high/low signatures?”

      We made changes later in the Introduction to clarify that the investigation is based on correlative evidence and requires interpretations that may be debated (p. 5-7):

      “Although this does not entirely address the reverse inference dilemma and can only produce correlative evidence, the present research nonetheless investigates these widely speculated upon PE ideas more directly than any prior work.

      Although such speed outpaces the temporal resolution of fMRI, correlating fluctuations in dynamic connectivity measured from fMRI data with EEG oscillations can provide an estimate of the fluctuations’ speed. This interpretation of a correlation again runs up against issues related to reverse inference but would nonetheless serve as initial suggestive evidence that spontaneous transitions between network states occur rapidly.

      Second, we examined the recruitment of these networks during rs-fMRI, and although the problems related to reverse inference are impossible to overcome fully, we engage with this issue by linking rs-fMRI data directly to task-fMRI data of the same participants, which can provide suggestive evidence that the same neural mechanisms are at play in both.”

      We made changes throughout the Results now better describing the results as consistent with a hypothesis rather than demonstrating it (p. 12-19):

      “In other words, we essentially asked whether resting-state participants are sometimes in low PE states and sometimes in high PE states, which would be consistent with spontaneous PE processing in the absence of stimuli.

      These emerging states overlap strikingly with the previous task effects of PE, suggesting that rs-fMRI scans exhibit fluctuations that resemble the signatures of low- and high-PE states. 

      To be clear, this does not entirely dissuade concerns about reverse inference, which would require a type of causal manipulation that is difficult (if not impossible) to perform in a resting state scan. Nonetheless, these results provide further evidence consistent with our interpretation that the resting brain spontaneously fluctuates between high/low PE network states.

      These patterns are most consistent with a characteristic timescale near 3–6 Hz for the amplitude of the putative high/low-PE fluctuations. This is notably consistent with established links between PE and Delta/Theta and is further consistent with an interpretation in which these fluctuations relate to PE-related processing during rest.”

      We have also made targeted edits to the Discussion to present the findings in a more cautious way, more clearly state what is our interpretation, and provide alternative explanations (p. 19-26):

      “The present research conducted task-fMRI, rs-fMRI, and rs-fMRI-EEG studies to clarify whether PE elicits global connectivity effects and whether the signatures of PE processing arise spontaneously during rest. This investigation carries implications for how PE minimization may characterize abstract task-general cognitive processes. […] Although there are different ways to interpret this correlation, it is consistent with high/low PE states generally fluctuating at 3-6 Hz during rest. Below, we discuss these three studies’ findings.

      Our rs-fMRI investigation examined whether resting dynamics resemble the task-defined connectivity signatures of high vs. low PE, independent of the type of stimulus encountered. The resting-state analyses indeed found that, even at rest, participants’ brains fluctuated between strong ventral-dorsal connectivity and strong posterior-anterior connectivity, consistent with shifts between states of high and low PE. This conclusion is based on correlative/observational evidence and so may be controversial as it relies on reverse inference.

      These patterns resemble global connectivity signatures seen in resting-state participants, and correlations between fMRI and EEG data yield associations, consistent with participants fluctuating between high-PE (ventral-dorsal) and low-PE (posterior-anterior) states at 3-6 Hz. Although definitively testing these ideas is challenging, given that rs-fMRI is defined by the absence of any causal manipulations, our results provide evidence consistent with PE minimization playing a role beyond stimulus process.”

      (R3.2) Interpretation of PE-Related Fluctuations at Rest and Its Functional Relevance. It would strengthen the paper to clarify what is meant by "intrinsic" state fluctuations. Intrinsic might mean taskindependent, trait-like, or spontaneously generated. Which do the authors mean here? Is the key prediction that these fluctuations will persist in the absence of a prediction task?

      Of the three terms the reviewer mentioned, “spontaneous” and “task-independent” are the most accurate descriptors. We conceptualize these fluctuations as a continuous background process that persists across all facets of cognition, without requiring a task explicitly designed to elicit prediction error – although we, along with other predictive coding papers, would argue that all cognitive tasks are fundamentally rooted in PE mechanisms and thus anything can be seen as a “prediction task” (see our response to comment R2.2 for our changes to the Introduction that provide more intuition for this point). The proposed interactions can be seen as analogous to cortico-basal-thalamic loops, which are engaged across a vast and diverse array of cognitive processes.

      The prior submission only used the word “intrinsic” in the title. We have since revised it to “spontaneous,” which is more specific than “intrinsic,” and we believe clearer for a title than “task-independent” (p. 1): “Spontaneous fluctuations in global connectivity reflect transitions between states of high and low prediction error”

      We have also made tweaks throughout the manuscript to now use “spontaneously” throughout (it now appears 8 times in the paper).

      Regardless of the intrinsic argument, I find it challenging to interpret the results as evidence of PE fluctuations at rest. What the authors show directly is that the degree to which a subset of regions within a PE network discriminates high vs. low PE during task correlates with the magnitude of separation between high and low PE states during rest. While this is an interesting relationship, it does not establish that the resting-state brain spontaneously alternates between high and low PE states, nor that it does so in a functionally meaningful way that is related to behavior. How can we rule out brain dynamics of other processes, such as arousal, that also rise and fall with PE? I understand the authors' intention to address the reverse inference concern by testing whether "a participant's unique connectivity response to PE in the reward-processing task should match their specific patterns of resting-state fluctuation". However, I'm not fully convinced that this analysis establishes the functional role of the identified modules to PE because of the following:

      Theoretically, relating the activities of the identified modules directly to behavior would demonstrate a stronger functional role.

      (R3.2a) Across participants: Do individuals who exhibit stronger or more distinct PE-related fluctuations at rest also perform better on tasks that require prediction or inference? This could be assessed using the HCP prediction task, though if individual variability is limited (e.g., due to ceiling effects), I would suggest exploring a dataset with a prediction task that has greater behavioral variance.

      This is a good idea, but unfortunately difficult to test with our present data. The HCP gambling task used in our study was not designed to measure individual differences in prediction or inference and likely suffers from ceiling effects. Because the task outcomes are predetermined and not linked to participants' choices, there is very little meaningful behavioral variance in performance to correlate with our resting-state fluctuation measure.

      While we agree that exploring a different dataset with a more suitable task would be ideal, given the scope of the existing manuscript, this seems like it would be too much. Although these results would be informative, they would ultimately still not be a panacea for the reverse inference issues.

      Or even more broadly, does this variability in resting state PE state fluctuations predict general cognitive abilities like WM and attention (which the HCP dataset also provides)? I appreciate the inclusion of the win-loss control, and I can see the intention to address specificity. This would test whether PE state fluctuations reflect something about general cognition, but also above and beyond these attentional or WM processes that we know are fluctuating.

      This is a helpful suggestion, motivating new analyses: We measured the degree of resting-state fluctuation amplitude across participants and correlated it with the different individual differences measures provided with the HCP data (e.g., measures of WM performance). We computed each participant’s fluctuation amplitude measure as the average absolute difference between posterior-anterior and ventral-dorsal connectivity; this is the average of the TR-by-TR fMRI amplitude measure from Study 3. We correlated this individual difference score with all of the ~200 individual difference measures provided with the HCP dataset (e.g., measures of intelligence or personality). We measured the Spearman correlation between mean fluctuation amplitude with each of those ~200 measures, while correcting for multiple hypotheses using the False Discovery Rate approach.[18]

      We found a robust negative association with age, where older participants tend to display weaker fluctuations (r = -.16, p < .001). We additionally find a positive association with the age-adjusted score on the picture sequence task (r = .12, p<sub>corrected</sub> = .03) and a negative association with performance in the card sort task (r = -.12, p<sub>corrected</sub> = 046). It is unclear how to interpret these associations, without being speculative, given that fluctuation amplitude shows one positive association with performance and one negative association, albeit across entirely different tasks.  We have added these correlation results as Supplemental Materials 8 (SM p. 11):

      “(8) Behavioral differences related to fluctuation amplitude 

      To investigate whether individual differences in the magnitude of resting-state PE-state fluctuations predict general cognitive abilities, we correlated our resting-state fluctuation measure with the cognitive and demographic variables provided in the HCP dataset.

      (8.1) Methods

      For each of the 1,000 participants, we calculated a single fluctuation amplitude score. This score was defined as the average absolute difference between the time-varying posterior-anterior (PA) and ventral-dorsal (VD) connectivity during the resting-state fMRI scan (the average of the TR-by-TR measure used for Study 3). We then computed the Spearman correlation between this score and each of the approximately 200 individual difference measures provided in the HCP dataset. We corrected for multiple comparisons using the False Discovery Rate (FDR) approach.

      (8.2) Results

      The correlations revealed a robust negative association between fluctuation amplitude and age, indicating that older participants tended to display weaker fluctuations (r = -.16, p<sub>corrected</sub> < .001). After correction, two significant correlations with cognitive performance emerged: (i) a positive association with the age-adjusted score on the Picture Sequence Memory Test (r = .12, p<sub>corrected</sub> = .03), (ii) a negative association with performance on the Card Sort Task (r = -.12, p<sub>corrected</sub> = .046). As greater fluctuation amplitude is linked to better performance on one task but worse performance on another, it is unclear how to interpret these findings.”

      We updated the main text Methods to direct readers to this content (p. 39-40):

      “(4.4.3) Links between network fluctuations and behavior

      We considered whether the extent of PE-related network expression states during resting-state is behaviorally relevant. We specifically investigated whether individual differences in the overall magnitude of resting-state fluctuations could predict individual difference measures, provided with the HCP dataset. This yielded a significant association with age, whereby older participants tended to display weaker fluctuations. However, associations with cognitive measures were limited. A full description of these analyses is provided in Supplemental Materials 8.”

      (R3.2b) Within participants: Do momentary increases in PE-network expression during tasks relate to better or faster prediction? In other words, is there evidence that stronger expression of PE-related states is associated with better behavioral outcomes?

      This is a good question that probes the direct behavioral relevance of these network states on a trial-by-trial basis. We agree with the reviewer's intuition; in principle, one would expect a stronger expression of the low-PE network state on trials where a participant correctly and quickly gives a high likelihood rating to a predictable stimulus.

      Following this suggestion, we performed a new analysis in Study 1A to test this. We found that while network expression was indeed linked to participants’ likelihood ratings: higher likelihood ratings correspond to stronger posterior-anterior connectivity, whereas lower ratings correspond to stronger ventral-dorsal connectivity (Connectivity-Direction × likelihood, β [standardized] = .28, p = .02). Yet, this is not a strong test of the reviewer’s hypothesis, and different exploratory analyses of response time yield null results (p > .05). We suspect that this is due to the effect being too subtle, so we have insufficient statistical power. A comparable analysis was not feasible for Study 1B, as its design does not provide an analogous behavioral measure of trialby-trial prediction success.

      (R3.3) A priori Hypothesis for EEG Frequency Analysis.

      It's unclear how to interpret the finding that fMRI fluctuations in the defined modules correlate with frontal Delta/Theta power, specifically in the 3-6 Hz range. However, in the EEG literature, this frequency band is most commonly associated with low arousal, drowsiness, and mind wandering in resting, awake adults, not uniquely with prediction error processing. An a priori hypothesis is lacking here: what specific frequency band would we expect to track spontaneous PE signals at rest, and why? Without this, it is difficult to separate a PE-based interpretation from more general arousal or vigilance fluctuations.

      This point gets to the heart of the challenge with reverse inference in resting-state fMRI. We agree that an interpretation based on general arousal or drowsiness is a potential alternative that must be considered. However, what makes a simple arousal interpretation challenging is the highly specific nature of our fMRI-EEG association. As shown in our confirmatory analyses (Supplemental Materials 6), the correlation with 3-6 Hz power was found exclusively with the absolute difference between our two PE-related network states (|PA – VD|)—a measure of fluctuation amplitude. We found no significant relationship with the signed difference (a bias toward one state) or the sum (the overall level of connectivity). This specificity presents a puzzle for a simple drowsiness account; it seems less plausible that drowsiness would manifest specifically as the intensity of fluctuation between two complex cognitive networks, rather than as a more straightforward change in overall connectivity. While we cannot definitively rule out contributions from arousal, the specificity of our finding provides stronger evidence for a structured cognitive process, like PE, than for a general, undifferentiated state. 

      We updated the Discussion to make the argument above and also to remind readers that alternative explanations, such as ones based on drowsiness, are possible (p. 24):

      “We specifically interpret the fMRI-EEG correlation as reflecting fluctuation speed because we correlated EEG oscillatory power with the fluctuation amplitude computed from fMRI data. Simply correlating EEG power with the average connectivity or the signed difference between posterior-anterior and ventral-dorsal connectivity yields null results (Supplemental Materials 6), suggesting that this is a very particular association, and viewing it as capturing fluctuation amplitude provides a parsimonious explanation. Yet, this correlation may be interpreted in other ways. For example, resting-state Theta is also a signature of drowsiness,[2] which may correlate with PE processing, but perhaps should be understood as some other mechanism.”

      (R3.4) Significance Assessment

      The significance of the correlation above and all other correlation analyses should be assessed through a permutation test rather than a single parametric t-test against zero. There are a few reasons: a) EEG and fMRI time series are autocorrelated, violating the independence assumption of parametric tests;

      Standard t-tests can underestimate the true null distribution's variance, because EEG-fMRI correlations often involve shared slow drifts or noise sources, which can yield spurious correlations and inflating false positives unless tested against an appropriate null.

      Building a null distribution that preserves the slow drifts, for example, would help us understand how likely it is for the two time series to be correlated when the slow drifts are still present, and how much better the current correlation is, compared to this more conservative null. You can perform this by phase randomizing one of the two time courses N times (e.g., N=1000), which maintains the autocorrelation structure while breaking any true co-occurrence in patterns between the two time series, and compute a non-parametric p-value. I suggest using this approach in all correlation analyses between two time series.

      This is an important statistical point to clarify, and the suggested analysis is valuable. The reviewer is correct that the raw fMRI and EEG time series are autocorrelated. However, because our statistical approach is a twolevel analysis, we reasoned that non-independence at the correlation-level would not invalidate the higher-level t-test. The t-test’s assumption of independence applies to the individual participants' coefficients, which are independent across participants. Thus, we believe that our initial approach is broadly appropriate, and its simplicity allows it to be easily communicated.

      Nonetheless, the permutation-testing procedure that the Reviewer describes seems like an important analysis to test, given that permutation-testing is the gold standard for evaluating statistical significance, and it could guarantee that our above logic is correct. We thus computed the analysis as the reviewer described. For each participant, we phase-randomized the fMRI fluctuation amplitude time series. Specifically, we randomized the Fourier phases of the |PA–VD| series (within run), while retaining the original amplitude spectrum; inverse transforms yielded real surrogates with the same power spectrum. This was done for each participant once per permutation. Each participant’s phase-randomized data was submitted to the analysis of each oscillatory power band as originally, generating one mean correlation for each band. This was done 1,000 times.

      Across the five bands, we find that the grand mean correlation is near zero (M<sub>r</sub> = .0006) and the 97.5<sup>th</sup> percentile critical value of the null distribution is r = ~.025; this 97.5<sup>th</sup> percentile corresponds to the upper end of a 95% confidence interval for a band’s correlation; the threshold minimally differs across bands (.024 < rs < .026). Our original correlation coefficients for Delta (M<sub>r</sub> = .042) and Theta (M<sub>r</sub> = .041), which our conclusions focused on, remained significant (p ≤ .002); we can perform family-wise error-rate correction by taking the highest correlation across any band for a given permutation, and the Delta and Theta effects remain significant (p<sub>FWE</sub>corrected ≤ .003); previously Reviewer comment R1.4c requested that we employ family-wise error correction.

      These correlations were previously reported in Table 1, and we updated the caption to note what effects remain significant when evaluated using permutation-testing and with family-wise error correction (p. 19):

      “The effects for Delta, Theta, Beta, and Gamma remain significant if significance testing is instead performed using permutation-testing and with family-wise error rate correction (p<sub>corrected</sub> < .05).”

      We updated the Methods to describe the permutation-testing analysis (p. 43):

      “To confirm the significance of our fMRI-EEG correlations with a non-parametric approach, we performed a group-level permutation-test. For each of 1,000 permutations, we phase-randomized the fMRI fluctuation amplitude time series. Specifically, we randomized the Fourier phases of the |PA–VD| series (within run), while retaining the original amplitude spectrum; inverse transforms yielded real surrogates with the same power spectrum. This procedure breaks the true temporal relationship between the fMRI and EEG data while preserving its structure. We then re-computed the mean Spearman correlation for each frequency band using this phase-randomized data. We evaluated significance using a family-wise error correction approach that accounts for us analyzing five oscillatory power bands. We thus create a null distribution composed of the maximum correlation value observed across all frequency bands from each permutation. Our observed correlations were then tested for significance against this distribution of maximums.”

      (R3.5) Analysis choices

      If I'm understanding correctly, the algorithm used to identify modules does so by assigning nodes to communities, but it does not itself restrict what edges can be formed from these modules. This makes me wonder whether the decision to focus only on connections between adjacent modules, rather than considering the full connectivity, was an analytic choice by the authors. If so, could you clarify the rationale? In particular, what justifies assuming that the gradient of PE states should be captured by edges formed only between nearby modules (as shown in Figure 2E and Figure 4), rather than by the full connectivity matrix? If this restriction is instead a by-product of the algorithm, please explain why this outcome is appropriate for detecting a global signature of PE states in both task and rest.

      We discuss this matter in our response to comment R2.(4).

      When assessing the correspondence across task-fMRI and rs-fMRI in section 2.2.2, why was the pattern during task calculated from selecting a pair of bilateral ROIs (resulting in a group of eight ROIs), and the resting state pattern calculated from posterior-anterior/ventral-dorsal fluctuation modules? Doesn't it make more sense to align the two measures? For example, calculating task effects on these same modules during task and rest?

      We thank the reviewer for this question, as it highlights a point in our methods that we could have explained more clearly. The reviewer is correct that the two measures must be aligned, and we can confirm that they were indeed perfectly matched.

      For the analysis in Section 2.2.2, both the task and resting-state measures were calculated on the exact same anatomical substrate for each comparison. The analysis iteratively selected a symmetrical subset of eight ROIs from our larger four quadrants. For each of these 3,432 iterations, we computed the task-fMRI PE effect (the Connectivity Direction × PE interaction) and the resting-state fluctuation amplitude (E[|PA – VD|]) using the identical set of eight ROIs. The goal of this analysis was precisely to test if the fine-grained anatomical pattern of these effects correlated within an individual across the task and rest states. We will revise the text in Section 2.2.2 to make this direct alignment of the two measures more explicit.

      Recommendations for authors:

      Reviewer #1 (Recommendations for authors):

      (R1.3) Several prior studies have described co-activation or connectivity "templates" that spontaneously alternate during rest and task states, and are linked to behavioral variability. While they are interpreted differently in terms of cognitive function (e.g., in terms of sustained attention: Monica Rosenberg; alertness: Catie Chang), the relationship between these previously reported templates and those identified in the current study warrants discussion. Are the current templates spatially compatible with prior findings while offering new functional interpretations beyond those already proposed in the literature? Or do they represent spatially novel patterns?

      Thank you for this suggestion. Broadly, we do not mean to propose spatially novel patterns but rather focus on how these are repurposed for PE processing. In the Discussion, we link our identified connectivity states to established networks (e.g., the FPCN). We updated this paragraph to mention that these patterns are largely not spatially novel (p. 20):

      “The connectivity patterns put forth are, for the most part, not spatially novel and instead overlap heavily with prior functional and anatomical findings.”

      Regarding the specific networks covered in the prior work by Rosenberg and Chang that the reviewer seems to be referring to, [7,8] this research has emphasized networks anchored heavily in sensorimotor, subcortical– cerebellar, and medial frontal circuits, and so mostly do not overlap with the connectivity effects we put forth.

      (R1.4) Additional points:

      (R1.4a) I do not think that the logic for taking the absolute difference of fMRI connectivity is convincing. What happens if the sign of the difference is maintained ?

      Thank you for pointing out this area that requires clarification. Our analysis targets the amplitude of the fluctuation between brain states, not the direction. We define high fluctuation amplitude as moments when the brain is strongly in either the PA state (PA > VD) or the VD state (VD > PA). The absolute difference |PA – VD| correctly quantifies this intensity, whereas a signed difference would conflate these two distinct high-amplitude moments. Our simulation study (Supplemental Materials, Section 5) provides the theoretical validation for this logic, showing how this absolute difference measure in slow fMRI data can track the amplitude of a fast underlying neural oscillator.

      When the analysis is tested in terms of the signed difference, as suggested by the Reviewer, the association between the fMRI data and EEG power is insignificant for each power band (ps<sub>uncorrected</sub> ≥ .47). We updated Supplemental Materials 6 to include these results. Previously, this section included the fluctuation amplitude (fMRI) × EEG power results while controlling for: (i) the signed difference between posterior-anterior and ventral-dorsal connectivity, (ii) the sum of posterior-anterior and ventral-dorsal connectivity, and (iii) the absolute value of the sum of posterior-anterior and ventral-dorsal connectivity. For completeness, we also now report the correlation between each EEG power band and each of those other three measures (SM, p. 9)

      “We additionally tested the relationship between each of those three measures and the five EEG oscillation bands. Across the 15 tests, there were no associations (ps<sub>uncorrected</sub>  ≥ .04); one uncorrected p-value was at p = .044, although this was expected given that there were 15 tests. Thus, the association between EEG oscillations and the fMRI measure is specific to the absolute difference (i.e., amplitude) measure.”

      (R1.4b) Reasoning of focus on frontal and theta band is weak, and described as "typical" (line 359) based on a single study.

      Sorry about this. There is a rich literature on the link between frontal theta and prediction error,[3,9–11] and we updated the Introduction to include more references to this work (p. 18): “The analysis was first done using power averaged across frontal electrodes, as these are the typical focus of PE research on oscillations.[3,9–11]”

      We have also updated the Methods to cite more studies that motivate our electrode choice (p. 41): “The analyses first targeted five midline frontal electrodes (F3, F1, Fz, F2, F4; BioSemi64 layout), given that this frontal row is typically the focus of executive-function PE research on oscillations.[9–11]”

      (R1.4c) No correction appears to have been applied for the association between EEG power and fMRI connectivity. Given that 100 frequency bins were collapsed into 5 canonical bands, a correction for 5 comparisons seems appropriate. Notably, the strongest effects in the delta and theta bands (particularly at fronto-central electrodes) may still survive correction, but this should be explicitly tested and reported.

      Thanks for this suggestion. We updated the Table 1 caption to mention what results survive family-wise error rate correction – as the reviewer suggests, the Delta/Theta effects would survive Bonferroni correction for five tests, although per a later comment suggesting that we evaluate statistical significance with a permutationtesting approach (comment R3.4), we instead report family-wise error correction based on that. The revised caption is as follows (p. 19):

      “The effects for Delta, Theta, Beta, and Gamma remain significant if significance testing is instead performed using permutation-testing and with family-wise error rate correction (p<sub>corrected</sub> < .05).”

      (R1.4d) Line 135. Not sure I understand what you mean by "moods". What is the overall point here?

      The overall argument is that the fluctuations occur rapidly rather than slowly. By slow “moods” we refer to how a participant could enter a high anxiety state of >10 seconds, linked to high PE fluctuations, and then shift into a low anxiety state, linked to low PE fluctuations. We argue that this is not occurring. Regardless, we recognize that referring to lengths of time as short as 10 seconds or so is not a typical use of the word “mood” and is potentially ambiguous, so we have omitted this statement, which was originally on page 6: “Identifying subsecond fluctuations would broaden the relevance of the present results, as they rule out that the PE states derive from various moods.”

      (R1.4e) Line 100. "Few prior PE studies have targeted PE, contrasting the hundreds that have targeted BOLD". I don't understand this sentence. It's presumably about connectivity vs activity?

      Yes, sorry about this typo. The reviewer is correct, and that sentence was meant to mention connectivity. We corrected (p. 5): “Few prior PE studies have targeted connectivity, contrasting the hundreds that have targeted BOLD.”

      (R1.4f) Line 373: "0-0.5Hz" in the caption is probably "0-50Hz".

      Yes, this was another typo, thank you. We have corrected it (p. 19): “… every 0.5 Hz interval from 0-50 Hz.”

      Reviewer #2 (Recommendations for authors):

      (R2.6) (Page 3) When referring to the "limited" hypothesis of local PE, please clarify in what sense is it limited. That statement is unclear.

      Thank you for pointing out this text, which we now see is ambiguous. We originally use "limited" to refer to the hypothesis's constrained scope – namely, that PE is relevant to various low-level operations (e.g., sensory processing or rewards) but the minimization of PE does not guide more abstract cognitive processes. We edited this part of the Introduction to be clearer (p. 3)

      “It is generally agreed that the brain uses PE mechanisms at neuronal or regional levels,[15,16] and this idea has been useful in various low-level functional domains, including early vision [15] and dopaminergic reward processing.[17] Some theorists have further argued that PE propagates through perceptual pathways and can elicit downstream cognitive processes to minimize PE.”

      (R2.7) (Page 5) "Few prior PE have targeted PE"... this statement appears contradictory. Please clarify.

      Sorry about this typo, which we have corrected (p. 5):

      “Few prior PE studies have targeted connectivity, contrasting the hundreds that have targeted BOLD.”

      (R2.8) What happened to the data of the medium PE condition in Study 1A?

      The medium PE condition data were not excluded. We modeled the effect of prediction error on connectivity using a linear regression across the three conditions, coding them as a continuous variable (Low = -1, Medium = 0, High = +1). This approach allowed us to identify brain connections that showed a linear increase or decrease in strength as a function of increasing PE. This linear contrast is a more specific and powerful way to isolate PErelated effects than a High vs. Low contrast. We updated the Results slightly to make this clearer (p. 8-9):

      “In the fMRI data, we compared the three PE conditions’ beta-series functional connectivity, aiming to identify network-level signatures of PE processing, from low to high. […] For the modularity analysis, we first defined a connectome matrix of beta values, wherein each edge’s value was the slope of a regression predicting that edge’s strength from PE (coded as Low = -1, Medium = 0, High = +1; Figure 2A).”

      (R2.9) (Page 15) The point about how the dots in 6H follow those in 6J better than those in 6I is a little subjective - can the authors provide an objective measure?

      Thank you for pointing out this issue. The visual comparison using Figure 6 was not meant as a formal analysis but rather to provide intuition. However, as the reviewer describes, this is difficult to convey. Our formal analysis is provided in Supplemental Materials 5, where we report correlation coefficients between a very large number of simulated fMRI data points and EEG data points corresponding to different frequencies. We updated this part of the Results to convey this (p. 16-17):

      “Notice how the dots in Figure 6H follow the dots in Figure 6J (3 Hz) better than the dots in Figure 6I (0.5 Hz) or Figure 6K (10 Hz); this visual comparison is intended for illustrative purposes only, and quantitative analyses are provided in Supplemental Materials 5.”

      References

      (1) Zalesky, A., Fornito, A. & Bullmore, E. T. Network-based statistic: identifying differences in brain networks. Neuroimage 53, 1197–1207 (2010)

      (2) Strijkstra, A. M., Beersma, D. G., Drayer, B., Halbesma, N. & Daan, S. Subjective sleepiness correlates negatively with global alpha (8–12 Hz) and positively with central frontal theta (4–8 Hz) frequencies in the human resting awake electroencephalogram. Neuroscience letters 340, 17–20 (2003).

      (3) Cavanagh, J. F. & Frank, M. J. Frontal theta as a mechanism for cognitive control. Trends in cognitive sciences 18, 414–421 (2014).

      (4) Grech, R. et al. Review on solving the inverse problem in EEG source analysis. Journal of neuroengineering and rehabilitation 5, 25 (2008)

      (5) Palva, J. M. et al. Ghost interactions in MEG/EEG source space: A note of caution on inter-areal coupling measures. Neuroimage 173, 632–643 (2018).

      (6) Koles, Z. J. Trends in EEG source localization. Electroencephalography and clinical Neurophysiology 106, 127–137 (1998).

      (7) Rosenberg, M. D. et al. A neuromarker of sustained attention from whole-brain functional connectivity. Nature neuroscience 19, 165–171 (2016).

      (8) Goodale, S. E. et al. fMRI-based detection of alertness predicts behavioral response variability. elife 10, e62376 (2021).

      (9) Cavanagh, J. F. Cortical delta activity reflects reward prediction error and related behavioral adjustments, but at different times. NeuroImage 110, 205–216 (2015)

      (10) Hoy, C. W., Steiner, S. C. & Knight, R. T. Single-trial modeling separates multiple overlapping prediction errors during reward processing in human EEG. Communications Biology 4, 910 (2021).

      (11) Neo, P. S.-H., Shadli, S. M., McNaughton, N. & Sellbom, M. Midfrontal theta reactivity to conflict and error are linked to externalizing and internalizing respectively. Personality neuroscience 7, e8 (2024).

      (12) Friston, K. J. The free-energy principle: a unified brain theory? Nature reviews neuroscience 11, 127–138 (2010)

      (13) Feldman, H. & Friston, K. J. Attention, uncertainty, and free-energy. Frontiers in human neuroscience 4, 215 (2010).

      (14) Friston, K. J. et al. Active inference and epistemic value. Cognitive neuroscience 6, 187–214 (2015).

      (15) Rao, R. P. & Ballard, D. H. Predictive coding in the visual cortex: a functional interpretation of some extraclassical receptive-field effects. Nature neuroscience 2, 79–87 (1999)

      (16) Walsh, K. S., McGovern, D. P., Clark, A. & O’Connell, R. G. Evaluating the neurophysiological evidence for predictive processing as a model of perception. Annals of the new York Academy of Sciences 1464, 242– 268 (2020)

      (17) Niv, Y. & Schoenbaum, G. Dialogues on prediction errors. Trends in cognitive sciences 12, 265–272 (2008).

      (18) Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological) 57, 289–300 (1995).

    1. eLife Assessment

      This manuscript provides evidence that mouse germline cysts develop an asymmetric Golgi, ER, and microtubule-associated structure that resembles the fusome in Drosophila germline cysts. This fundamental study provides new evidence that fusome-like structures exist in germ cell cysts across species. Overall, the data are convincing and represent a significant advance in our understanding of germ cell biology.

    2. Reviewer #2 (Public review):

      This study identifies Visham, an asymmetric structure in developing mouse cysts resembling the Drosophila fusome, an organelle crucial for oocyte determination. Using immunofluorescence, electron microscopy, 3D reconstruction, and lineage labeling, the authors show that primordial germ cells (PGCs) and cysts, but not somatic cells, contain an EMA-rich, branching structure that they named Visham, which remains unbranched in male cysts. Visham accumulates in regions enriched in intercellular bridges, forming clusters reminiscent of fusome "rosettes." It is enriched in Golgi and endosomal vesicles and partially overlaps with the ER. During cell division, Visham localizes near centrosomes in interphase and early metaphase, disperses during metaphase, and reassembles at spindle poles during telophase before becoming asymmetric. Microtubule depolymerization disrupts its formation.

      Cyst fragmentation is shown to be non-random, correlating with microtubule gaps. The authors propose that 8-cell (or larger) cysts fragment into 6-cell and 2-cell cysts. Analysis of Pard3 (the mouse ortholog of Par3/Baz) reveals its colocalization with Visham during cyst asymmetry, suggesting that mammalian oocyte polarization depends on a conserved system involving Par genes, cyst formation, and a fusome-like structure.

      Transcriptomic profiling identifies genes linked to pluripotency and the unfolded protein response (UPR) during cyst formation and meiosis, supported by protein-level reporters monitoring Xbp1 splicing and 20S proteasome activity. Visham persists in meiotic germ cells at stage E17.5 and is later transferred to the oocyte at E18.5 along with mitochondria and Golgi vesicles, implicating it in organelle rejuvenation. In Dazl mutants, cysts form, but Visham dynamics, polarity, rejuvenation, and oocyte production are disrupted, highlighting its potential role in germ cell development.

      Overall, this is an interesting and comprehensive study of a conserved structure in the germline cells of both invertebrate and vertebrate species. Investigating these early stages of germ cell development in mice is particularly challenging. Although primarily descriptive, the study represents a remarkable technical achievement. The images are generally convincing, with only a few exceptions.

      Major comments:

      (1) Some titles contain strong terms that do not fully match the conclusions of the corresponding sections.

      (1a) Article title "Mouse germline cysts contain a fusome-like structure that mediates oocyte development":

      The term "mediates" could be misleading, as the functional data on Visham (based on comparing its absence to wild-type) actually reflects either a microtubule defect or a Dazl mutant context. There is no specific loss-of-function of visham only.

      (1b) Result title, "Visham overlaps centrosomes and moves on microtubules":

      The term "moves" implies dynamic behavior, which would require live imaging data that are not described in the article.

      (1c) Result title, "Visham associates with Golgi genes involved in UPR beginning at the onset of cyst formation":

      The presented data show that the presence of Visham in the cyst coincides temporally with the expression and activity of the UPR response; the term "associates" is unclear in this context.

      (1d) Result title, "Visham participates in organelle rejuvenation during meiosis":

      The term "participates" suggests that Visham is required for this process, whereas the conclusion is actually drawn from the Dazl mutant context, not a specific loss-of-function of visham only.

      (2) The authors aim to demonstrate that Visham is a fusome-like structure. I would suggest simply referring to it as a "fusome-like structure" rather than introducing a new term, which may confuse readers and does not necessarily help the authors' goal of showing the conservation of this structure in Drosophila and Xenopus germ cells. Interestingly, in a preprint from the same laboratory describing a similar structure in Xenopus germ cells, the authors refer to it as a "fusome-like structure (FLS)" (Davidian and Spradling, BioRxiv, 2025).

      Comments on revisions:

      The revised manuscript has been clearly improved, and the authors have addressed all of our comments. I would like to point out two minor issues:

      (1) As suggested by the reviewers, the authors now use the term fusome instead of visham. However, they also acknowledge that this structure lacks many components of the Drosophila fusome. It may therefore be more appropriate to refer to it as a "mouse fusome" or as a "fusome-like structure (FLS)," as used in Xenopus.

      (2) I agree with Reviewer 3 that co-localization between EMA and acTubulin on still images does not convincingly demonstrate that fusome vesicles move along microtubules (Figure S2E).

    3. Reviewer #3 (Public review):

      The manuscript provides evidence that mice have a fusome, a conserved structure most well studied in Drosophila that is important for oocyte specification. Overall, a myriad of evidence is presented demonstrating the existence of a mouse fusome. This work is important as it addresses a long-standing question in the field of whether mice have fusomes and sheds light on how oocytes are specified in mammals.

      Comments on revisions:

      Overall, the authors did a good job of responding to reviewer comments that have improved the manuscript by including higher quality microscope images, revising text for clarity and using the term mouse fusome instead of using a new term. However, two of the headings in the results section that didn't correspond to the data presented in that section still have not been revised eventhough the authors stated that they were revised in their response to reviewer comments. The heading of the first section of the results is: "PGCs contain a Golgi-rich structure known as the EMA granule" even though no evidence in that section shows it is Golgi rich. The heading of the fifth section of the results is: "The mouse fusome associates with polarity and microtubule genes including pard3" however, only evidence for pard3 is presented.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review)

      Summary

      We thank the reviewer for the constructive and thoughtful evaluation of our work. We appreciate the recognition of the novelty and potential implications of our findings regarding UPR activation and proteasome activity in germ cells.

      (1) The microscopy images look saturated, for example, Figure 1a, b, etc. Is this a normal way to present fluorescent microscopy?

      The apparent saturation was not present in the original images, but likely arose from image compression during PDF generation. While the EMA granule was still apparent, in the revised submission, we will provide high-resolution TIFF files to ensure accurate representation of fluorescence intensity and will carefully optimize image display settings to avoid any saturation artifacts.

      (2) The authors should ensure that all claims regarding enrichment/lower vs. lower values have indicated statistical tests.

      We fully agree. In the revised version, we will correct any quantitative comparisons where statistical tests were not already indicated, with a clear statement of the statistical tests used, including p-values in figure legends and text.

      (a) In Figure 2f, the authors should indicate which comparison is made for this test. Is it comparing 2 vs. 6 cyst numbers?

      We acknowledge that the description was not sufficiently detailed. Indeed, the test was not between 2 vs 6 cyst numbers, but between all possible ways 8-cell cysts or the larger cysts studied could fragment randomly into two pieces, and produce by chance 6-cell cysts in 13 of 15 observed examples. We will expand the legend and main text to clarify that a binomial test was used to determine that the proportion of cysts producing 6-cell fragments differed very significantly from chance.

      Revised text:

      “A binomial test was used to assess whether the observed frequency of 6-cell cyst products differed from random cyst breakage. Production of 6-cell cysts was strongly preferred (13/15 cysts; ****p < 0.0001).”

      (b) Figures 4d and 4e do not have a statistical test indicated.

      We will include the specific statistical test used and report the corresponding p-values directly in the figure legends.

      (3) Because the system is developmentally dynamic, the major conclusions of the work are somewhat unclear. Could the authors be more explicit about these and enumerate them more clearly in the abstract?

      We will revise the abstract to better clarify the findings of this study. We will also replace the term Visham with mouse fusome to reflect its functional and structural analogy to the Drosophila and Xenopus fusomes, making the narrative more coherent and conclusive.

      (4) The references for specific prior literature are mostly missing (lines 184-195, for example).

      We appreciate this observation of a problem that occurred inadvertently when shortening an earlier version.  We will add 3–4 relevant references to appropriately support this section.

      (5) The authors should define all acronyms when they are first used in the text (UPR, EGAD, etc).

      We will ensure that all acronyms are spelled out at first mention (e.g., Unfolded Protein Response (UPR), Endosome and Golgi-Associated Degradation (EGAD)).

      (6) The jumping between topics (EMA, into microtubule fragmentation, polarization proteins, UPR/ERAD/EGAD, GCNA, ER, balbiani body, etc) makes the narrative of the paper very difficult to follow.

      We are not jumping between topics, but following a narrative relevant to the central question of whether female mouse germ cells develop using a fusome.  EMA, microtubule fragmentation, polarization proteins, ER, and balbiani body are all topics with a known connection to fusomes. This is explained in the general introduction and in relevant subsections. We appreciate this feedback that further explanations of these connections would be helpful. In the revised manuscript, use of the unified term mouse fusome will also help connect the narrative across sections.  UPR/ERAD/EGAD are processes that have been studied in repair and maintenance of somatic cells and in yeast meiosis.  We show that the major regulator XbpI is found in the fusome, and that the fusome and these rejuvenation pathway genes are expressed and maintained throughout oogenesis, rather than only during limited late stages as suggested in previous literature.

      (7) The heading title "Visham participates in organelle rejuvenation during meiosis" in line 241 is speculative and/or not supported. Drawing upon the extensive, highly rigorous Drosophila literature, it is safe to extrapolate, but the claim about regeneration is not adequately supported.

      We believe this statement is accurate given the broad scope of the term "participates." It is supported by localization of the UPR regulator XbpI to the fusome. XbpI is the ortholog of HacI a key gene mediating UPR-mediated rejuvenation during yeast meiosis.  We also showed that rejuvenation pathway genes are expressed throughout most of meiosis (not previously known) and expanded cytological evidence of stage-specific organelle rejuvenation later in meiosis, such as mitochondrial-ER docking, in regions enriched in fusome antigens. However, we recognize the current limitations of this evidence in the mouse, and want to appropriately convey this, without going to what we believe would be an unjustified extreme of saying there is no evidence.

      Reviewer #2 (Public review):

      We thank the reviewer for the comprehensive summary and for highlighting both the technical achievement and biological relevance of our study. We greatly appreciate the thoughtful suggestions that have helped us refine our presentation and terminology.

      (1) Some titles contain strong terms that do not fully match the conclusions of the corresponding sections.

      (1a) Article title “Mouse germline cysts contain a fusome-like structure that mediates oocyte development”

      We will change the statement to: “Mouse germline cysts contain a fusome that supports germline cyst polarity and rejuvenation.”

      (1b) Result title “Visham overlaps centrosomes and moves on microtubules”

      We acknowledge that “moves” implies dynamics. We will include additional supplementary images showing small vesicular components of the mouse fusome on spindle-derived microtubule tracks.

      (1c) Result title “Visham associates with Golgi genes involved in UPR beginning at the onset of cyst formation”

      We will revise this title to: “The mouse fusome associates with the UPR regulatory protein Xbp1 beginning at the onset of cyst formation” to reflect the specific UPR protein that was immunolocalized.

      (1d) Result title “Visham participates in organelle rejuvenation during meiosis”

      We will revise this to: “The mouse fusome persists during organelle rejuvenation in meiosis.”

      (2) The authors aim to demonstrate that Visham is a fusome-like structure. I would suggest simply referring to it as a "fusome-like structure" rather than introducing a new term, which may confuse readers and does not necessarily help the authors' goal of showing the conservation of this structure in Drosophila and Xenopus germ cells. Interestingly, in a preprint from the same laboratory describing a similar structure in Xenopus germ cells, the authors refer to it as a "fusome-like structure (FLS)" (Davidian and Spradling, BioRxiv, 2025).

      We appreciate the reviewer’s insightful comment. To maintain conceptual clarity and align with existing literature, we will refer to the structure as the mouse fusome throughout the manuscript, avoiding introduction of a new term.

      Reviewer #3 (Public review):

      We thank the reviewer for emphasizing the importance of our study and for providing constructive feedback that will help us clarify and strengthen our conclusions.

      (1) Line 86 - the heading for this section is "PGCs contain a Golgi-rich structure known as the EMA granule"

      We agree that the enrichment of Golgi within the EMA PGCs was not shown until the next section. We will revise this heading to:

      “PGCs contain an asymmetric EMA granule.” 

      (2) Line 105-106, how do we know if what's seen by EM corresponds to the EMA1 granule?

      We will clarify that this identification is based on co-localization with Golgi markers (GM130 and GS28) and response to Brefeldin A treatment, which will be included as supplementary data. These findings support that the mouse fusome is Golgi-derived and can therefore be visualized by EM. The Golgi regions in E13.5 cyst cells move close together and associate with ring canals as visualized by EM (Figure 1E), the same as the mouse fusomes identified by EMA.

      (3) Line 106-107-states "Visham co-stained with the Golgi protein Gm130 and the recycling endosomal protein Rab11a1". This is not convincing as there is only one example of each image, and both appear to be distorted.

      Space is at a premium in these figures, but we have no limitation on data documenting this absolutely clear co-localization. We will replace the existing images with high-resolution, noncompressed versions for the final figures to clearly illustrate the co-staining patterns for GM130 and Rab11a1.

      (4) Line 132-133---while visham formation is disrupted when microtubules are disrupted, I am not convinced that visham moves on microtubules as stated in the heading of this section.

      We will include additional supplementary data showing small mouse fusome vesicles aligned along microtubules.

      (5) Line 156 - the heading for this section states that Visham associates with polarity and microtubule genes, including pard3, but only evidence for pard3 is presented.

      We agree and will revise the heading to: “Mouse fusome associates with the polarity protein Pard3.” We are adding data showing association of small fusome vesicles on microtubules.

      (6) Lines 196-210 - it's strange to say that UPR genes depend on DAZ, as they are upregulated in the mutants. I think there are important observations here, but it's unclear what is being concluded.

      UPR genes are not upregulated in DAZ in the sense we have never documented them increasing. We show that UPR genes during this time behave like pleuripotency genes and normally decline, but in DAZ mutants their decline is slowed.  We will rephrase the paragraph to clarify that Dazl mutation partially decouples developmental processes that are normally linked, which alters UPR gene expression relative to cyst development.

      (7) Line 257-259-wave 1 and 2 follicles need to be explained in the introduction, and how these fits with the observations here clarified.

      Follicle waves are too small a focus of the current study to explain in the introduction, but we will request readers to refer to the cited relevant literature (Yin and Spradling, 2025) for further details.

      We sincerely thank all reviewers for their insightful and constructive feedback. We believe that the planned revisions—particularly the refined terminology, improved image quality, clarified statistics, and restructured abstract—will substantially strengthen the manuscript and enhance clarity for readers.

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1E: need to use some immuno-gold staining to identify the Visham. Just circling an area of cytoplasm that contains ER between germ cell pairs is not enough.

      We appreciate the reviewer’s insistence that the association between the mouse fusome and Golgi be clearly demonstrated. However, the EMA granule is a large structure discovered and defined by light microscopy, and presents no inherent challenge to documenting its Golgi association by immunofluorescence experiments, which we presented and now further strengthened as described in the next paragraph.  We believe that the suggested EM experiment would add little to the EM we already presented (Figure 1E, E')  Moreover, due to facility limitations, we are currently unable to perform immunogold staining. 

      To strengthen previous immunolocalization experiments, we have now included additional immunostaining data showing the clear colocalization of the fusome region with the Golgi markers GM130 and GS28 (Figure S1H). We have also incorporated a new experiment using the Golgi-specific inhibitor Brefeldin A (BFA) see Figure S1I.  Treatment of in vitro–cultured gonads with BFA, disrupted EMA granule formation, demonstrating that EMA granules not only associate with Golgi, but require Golgi function to to be maintained.

      Additionally, in Figure 2, we showed that the fusome overlaps with the peri-centriolar region—a characteristic locus for Golgi due to its movement on microtubules.  We showed that the dynamic behavior of the fusome during the cell cycle, parallels Golgi dispersal and reassembly, and all these facts provide further strong support for the Golgi-association of the EMA granule and fusome.

      (2) Figure 1F: is this image compressed?

      We have now substituted the image in Figure 1F with a better image and have avoided the compression of the image. 

      (3) In the figure legends, are the sample sizes individual animals or individual sections? Please ensure that all figure legends for each figure panel consistently contain the sample size.

      We have now included the number of measurements (N) in every figure legend. Each experiment was performed using samples from at least three different animals, and in most cases from more than three. This information has also been added to the Methods section under Statistics. In addition, N values are now consistently provided for each graph throughout the figures.

      (4) Figure 2b/c: seemly likely based on the snapshot of different stages of cytokinesis that the "newly formed" visham is accurate, but without live imaging, this claim of "newly formed" is putative/speculative. It is OK if it is labeled as "putative" in the figure panel.  

      The behavior of the Drosophila fusome during mitosis was deduced without live imaging (deCuevas et al. 1998). We clarified that the conversion of a single mouse germ cell with one round fusome to an interconnected pair of cells with two round fusomes of greater total volume following mitosis is the basis for deducing that new fusome formation occurs each cell cycle. However, we agree with the reviewer that the phrase "newly formed" in the original label on Figure 2c suggested a specific mechanism of fusome increase that was not intended and this phrase has been removed entirely.  

      (5) Figure 2e/e is extremely difficult to follow. In order to improve the readability of these figure panels, can individual panels with a single stain be shown? The 'gap' between YFP+ sister cells is not immediately obvious in panel e or e" with the current layout. Since this is a key aspect of the author's claim about cleavage of the cyst, it would be best to make this claim more robust by showing more convincing images. In Figure 2E, the staining pattern of EMA needs to be clarified and described more fully in the text.

      We mapped discontinuities in the microtubule connections, not the fusome or YFP.  YFP is the lineage marker indicating that the cells of a single cyst are being studied. Consequently, no gap between YFP cytoplasmic expression is expected because only in the last example (figure E”), has fragmentation already occurred (and here there is a YFP gap).  The acetylated tubulin gap proceeds fragmentation.  The mitotic spindle remnants labeled by AcTub link the cells into two groups separated by a gap, which is clearly shown in the data images and in the third column where only the relevant AcTub from the cyst itself is shown. In response to the reviewers question about the fusome, which is not directly relevant to fragmentation, we have now provided images of the separate fusome channel and corresponding measurements for all three Figure 2E-E'' cysts in the supplementary Figure S4H. We have improved the text regarding this important figure to try and make it easier to follow, and also added a new example of a 10-cell cyst also in S2H (lower panels).  We also added, movies allowing full 3D study of one of the 8 cell cysts and the new 10-cell cyst.  I also suggest that the reviewer examine how the deduced mechanism of fragmentation explains previously published but not fully understood data on cyst fragmentation going back to 1998 as described in the expanded Discussion on this topic.  

      (6) It would be best to support the proposed model in Figure 2G (4+4+4) with microscopy images of a 12-cell or 16-cell cyst? Would these 12-cell or 16-cell cysts be too large to technically recover in a section?

      Unfortunately the reviewer 's suggestion that 12- or 16-cell cysts are too large to recover and present convincingly is correct. Because our analysis depends on capturing lineage-labeled cysts specifically at telophase with acetylated-tubulin connections, the likelihood of obtaining the correct stage is very low.  In addition, the dense packing of germ cells in the mouse gonad further limits our ability to fully reconstruct all the cells in large cysts, with difficulty increasing as cyst size grows.

      However, as noted, we added a well-resolved 10-cell cyst—the largest size we could confidently analyze—in a 3D video in Supplementary Figure S2H (lower panel), which shows a 6 + 4 breakage pattern.

      (7) We did not find a reference in the text for Figure 2G.

      We have now provided reference for 2G in the text and in the discussion section. 

      (8) Line 189: ERAD is used as an acronym, but is not defined until the discussion.

      We have now provided full form of acronym at its first usage in the text.

      (9) Fig 3i/i': the increase of UPR pathway components, increasing expression during zygotene, is interesting to note, but is not commented enough in the text of the paper.

      We have discussed this issue in the discussion section with specific reference to figure 3I. Please find the detailed discussion under the heading “Germ cell rejuvenation is highly active during cyst formation.”

      (10) Please quantify DNMT3A expression levels in WT control vs Dazl KO germ cells in Figure 4a.

      We have now quantified DNMT3A expression levels in WT control vs Dazl KO germ cells and have added the data in the Figure 4A.

      (11) Please introduce the rationale behind selecting DazL KO for studying cyst formation (text in line 197). This comes out of nowhere.

      True.  We significantly expanded our discussion of Dazl and citations of previous work, including evidence that it can affect cyst structures like ring canals, in the Introduction.  

      (12) It would be best to stain WT control vs DazL KO oogonia in Figure 4a with 5mC antibodies to support their claim that DNA methylation might be affected in the mutants.

      We respectfully disagree that this additional experiment is necessary within the scope of the current study. At the developmental stage examined (E12.5), germ cells in the Dazl mutant are clearly in an arrested and hypomethylated state, as supported by previous evidence (Haston et al. 2009).This initial experiments was designed to show that in our hands Dazl mutants show this known pkuripotency delay. However, the effects of Dazl mutation on female germline cyst development as it relates to polarity or the fusome was not studied before, and that is what the paper addresses, building on previous work.

      Because our study does not focus on germ-cell epigenetic modifications but rather on the consequences of Dazl loss on germ cell cyst development, adding 5mC immunostaining would not substantially advance the main conclusions. The existing data and previous published work already provide sufficient background.

      (13) Figure 4c: a very interesting figure, it would be best to quantify developmental pseudotime (perhaps using monocle3 analysis) and compare more rigorously the developmental stage of WT control vs DazL KO.

      Developmental pseudotime, such as through Monocle3 analysis, might sometimes be valuable but involves assumptions that when possible are better addressed by direct experimental examination. Our conclusions regarding cyst developmental stage are supported by straightforward evidence rather to which computational trajectory inference would add little. Specifically, we have performed analysis of germ-cell methylation state, ring canal formation, pluripotency markers, UPR pathway activity assay (Xbp1 and Proteomic assay), Golgi-stress analysis and Pard3 which collectively document the developmental status of the WT and Dazl KO germ cells. These empirical data demonstrate the same developmental pattern reflected in Figure 4c, making the less reliable pseudotime-based computational method superfluous.

      (14) Figure 4d has two panels labeled as "d".

      We have now corrected the labelling of the figure

      (15) Color coding in 4d, d', d" is confusing; please harmonize some visual presentation here.

      We have now harmonized the visual representation of all the graph in figure 4

      (16) Fig 4e' is labeled as DazL +/- but is this really a typo?

      Thank you for pointing it out. We have now corrected the typo

      (17) Figure F': typo labeled as E3.5, which is E13.5?

      Thank you for pointing it out. We have now corrected the typo

      (18) Figure F': was DazL KO mutant but no WT control.

      The WT control was not provided to avoid the redundancy. Please refer to earlier figure 3A-B, Fig S3C and D and videos S3A and S3b to refer to WT control at every stage.

      (19) Figure G: unusual choice in punctuation marks for cartoon schematic. No key to guide the reader for color-coded structures would be helpful to have something similar to 4h.

      We have now provided the key to guide the readers in the mentioned figure 4G.

      (20) The authors use WGA and EMA as interchangeable markers (Figure 5a) without fully explaining why they have switched markers.

      Because it is germ cell specific, we used EMA as a fusome marker during the time when it is found up through E13.5.  After that point we used WGA which is still usable, but also labels somatic cells.  This rationale is explicitly described at the end of the section “Fusome is highly enriched in Golgi and vesicles”, where we state:

      “EMA staining disappears from germ cells at E14.5 (Figure 1I). However, very similar (but non–germ-cell-specific) staining continued with wheat germ agglutinin (WGA) at later stages (Figure 1G, G’; Figure S1G).”

      To ensure this is fully clear to readers, we have now added an additional statement in the start of the text section discussing the figure 5:

      “For the reasons explained previously (see text for Figure 1G), WGA was used as a fusome marker beyond stage E14.5.”

      (21) Figure 5b' is compressed.

      We have now decompressed the image

      (22) Line 267, Balbiani body is misspelled.  

      We have now corrected the spelling.

      (23) The explanation of why the authors switch focus from DazL KO to DazL +/- is not adequately described. The authors should also explain the phenotype of the DazL +/- animals or reference a paper citing the hets are sterile or subfertile.

      We have now added the explanation of why Dazl KO is used in our introduction section where we have mentioned the phenotype of Dazl homozygous and heterozygous mouse.

      (24) Is Figure 5i actually DazL +/-? It is not labeled clearly in the text, the figure legend, or the figure panel. 

      We have now labelled the figure correctly in figure and in the legend.

      (25) The paper ends abruptly at line 275 with no context or summary.

      The manuscript does not end at line 275; the apparent interruption is due to a page break occurring immediately before the beginning of the Discussion section. We hope that continuation is fully visible in the reviewer 1 (your) version of the PDF.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 93: Fig. 1B: DDX4 marks germ cells; do all the red and yellow cells in the NE inset originate from the same PGC? There are only 2 cells marked in yellow among the group of red cells. Is it a z-projection issue? Or do they come from different PGCs?

      This experiment used vasa staining to identify all germ cells, which are produced by multiple PGCs. Green labeling is a lineage marker derived from a single PGC (due to the low frequency of tamoxifen-activated labeling). Consequently, the two yellow cells observed in the NE inset of Fig. 1B represent YFP-labeled germ cells (YFP + DDX4 double-positive) that have arisen from a single, lineage-traced PGC. This approach, introduced in 2013, is described in the Methods, and represents the field's single largest technical advance that has made it possible to analyze mouse germ cell development at single cell resolution.

      To ensure clarity, we have added a brief explanatory note to the figure legend indicating that yellow cells represent the lineage-traced progeny of a single PGC, while the red staining marks all germ cells.

      (2) Line 96: Figure 1C vs 1C'. The difference between female and male Visham is not obvious, although quantification shows a clear difference. How was the quantification made? Manual or automatic thresholding? Would it be possible to show only the Visham channel?

      We thank the reviewer for pointing out this problem. We have now more clearly described in the text that the female fusome increases in some cells with close attachments to other cells (future oocytes) and decreases in distant nurse cells.  It branches due to rosette formation..  In males, the fusome remains much like the initial EMA granules present in early germ cells, with only fine and difficult to see connections.  The quantification shown in Figures 1C and 1C′ was performed manually, based on the presence of either (i) fused, branched EMA-positive fusome structures or (ii) dispersed, punctate EMA granules. This assessment was carried out across multiple E13.5 male and female gonad samples to ensure robustness.  To facilitate independent evaluation, we have already provided supplementary videos S3B1 and S3B2, which display the EMA-stained E13.5 male and female gonads in three dimensions. These videos allow the structural differences to be examined more clearly than in static images.

      In response to the reviewer’s request, we now additionally include the single-channel fusome image in Supplementary Figure S1E′. This presentation highlights the fusome signal alone and further clarifies the morphological differences underlying the quantification.

      (3) L118: Figure 2A, third row = 2-cell cyst? Please specify PCNT in the legend.

      We appreciate the reviewer’s observation. In Figure 2A (third row), the cells were not specifically labeled as a 2-cell cyst; rather, the intention was to illustrate the presence of two distinct centrosomes positioned on a fused fusome structure, a configuration we frequently observe.

      We have now updated the figure legend to explicitly define PCNT.

      (4) L169: Missing reference to S3B and video S3B1?

      We have now included the reference to S3B1 and S3B2 in the text and in the legend

      (5) L170: Please describe the graph in the Figure 3D legend.

      We have now described the Graph in the legend

      (6) L171: Would it be possible to have a close-up showing both Pard3 and Visham in a ringlike pattern related to RACGAP (RC) staining? The images are too small.

      It is difficult to capture this relationship perfectly in a two dimensional picture. The images represent the maximum close-up possible that still includes enough relevant area for the necessary conclusions. We have now provided additional three close-up images exclusively for ring-canal and Pard3 association in the supplementary Figure S3C for further clarity. However, we also note that the quality of the image permits the reader of a pdf to zoom and to visualize the images in great detail.

      (7) L181: Wrong reference, should be 3 then 3I.

      Thank you for pointing it out, we have now corrected the reference.

      (8) L199: In Figure S4B, was DNMT3 staining quantified? Red intensity differs globally between images; use the somatic red level as a reference? Note: EMA seems higher in Dazl- vs. WT?

      We have now performed quantification of DNMT3 staining, which is presented in Figure 4A. While the red intensity (DNMT3 or EMA) can appear to differ between images, this variation can result from biological differences between tissues or minor technical variability despite using consistent microscope settings. To account for this, we normalized the staining intensity using the somatic cell signal as an internal reference, ensuring that the quantification reflects genuine differences between WT and Dazl-/- samples rather than global intensity variation.

      (9) L229: Should be "proteasome."

      We have now corrected the spelling error.

      (10) L233: Quantify fragmentation of Gs28? EMA doesn't seem affected. Could you quantify both Gs28 and EMA? Images are too small.

      We thank the reviewer for this suggestion. While the current images are small, they can be examined in detail using zoom to visualize the structures clearly. As noted, EMA staining is not affected, (we agree) as cells are in arrested state. This arrested state creates stress on Golgi. The fragmentation of Gs28-labeled Golgi membranes is a classical indicator of Golgi stress, even though the fragmented membranes may remain functionally active. Our results show that Dazl deletion specifically affects Golgi in germ cells, while Golgi in neighboring somatic cells appears healthy. To quantify this effect, we have now included manual quantification of Golgi fragmentation in Figure 4F, assessing tissues for the presence of fragmented versus intact Golgi structures. This confirms that Golgi fragmentation is a germ cell–specific phenotype in Dazl– samples, while pre-formed EMA-positive fusomes remain unaffected but probably in arrested state.

      (11) L237: Figure 4F graph shows E3.5, not E13.5.

      We have now corrected the typo in the figure 

      (12) L257: Figure 5D: quantify as in 5A? overlap?

      Yes, it's an overlap and shown as two separate image with ring canal for better clarity. We have now quantified the image and have produced combined graph for fusome and pard3 in Figure 5A graph.

      (13) L261: Figure 5E-E': black arrowhead not mentioned in legend.

      We have now mentioned the black arrowhead in the legend

      (14) L262: Figure 5C: arrowhead not mentioned in legend. Figure 5F: oocyte appears separated from nurse cells compared to 5C.

      Yes, that may happen as cysts undergo fragmentation; what matters is all cells are lineage labelled and hence are members of a single cyst derived from one PGC.

      (15) L263: Figure 5G has no legend reference; nurse cells are not outlined as in 5C.

      We have now outlined the nurse cells and have added the reference to the graph in the legend.

      (16) L279: "The fusome and Visham and both..." should be replaced with "Both fusome and Visham...".

      We have now replaced the term Visham with fusome as suggested by reviewers and editor.  We updated the statement to correct the grammatical error.

      (17) L1127: Video S3B1: It is unclear what to focus on.

      We have now added the Rectangle area and arrow to highlight what to focus on

      (18) L1128: Video "S3B1" should be "S3B2."

      We have now corrected the legend

      (19) Finally: curiosity question: have the authors tried to use known markers of the Drosophila fusome in mice, such as Spectrin or other markers described in Lighthouse, Buszczak and Spradling, Dev Bio, 2008? And conversely, do EMA and WGA label the fusome in Drosophila?

      Yes, we and others used the most specific markers of the Drosophila fusome such alpha-spectrin, adducin-like Hts, tropomodulin, etc. to search for fusomes in vertebrate species. It was unsuccessful in clarifying the situation, because Hts and alpha-spectrin in Drosophila and other insects generate a protein skeleton that stabilizes the fusome and is easily stained. But this structure is simply not conserved in vertebrates. The polarity behavior of the fusome, it core developmental property, is conserved, however. The mammalian fusome still acquires and maintains cyst polarity, and goes even farther and reflects both initial cyst formation and cyst cleavage, before marking oocyte vs nurse cell development in the smaller cysts.  Expression of the inner microtubule-rich portion of the fusome, its Par proteins, and many ER-related and lysosomal fusome proteins are mostly conserved but their ability to mark the fusome alone varies with time and context (only some of the examples are shown in Figure 3I'). Nearly all of the proteins identified in Lighthouse et al. 2008 are expressed.  These proteins may be involved in rejuvenation as studied here.  We modified the first section of the Discussion to explicitly compare mouse, Xenopus and Drosophila fusomes, which was not possible before this work.  

      Reviewer #3 (Recommendations for the authors):

      The authors should either revise the conclusions or add additional evidence to support their claims. In addition, minor corrections are listed below.

      We have added additional evidence as noted in responses above, and revised some claims that were stated inaccurately.  In addition, we have attempted to clarify the evidence we do present, so that its full significance is more easily grasped by readers.    

      (1) Lines 20-21 are unclear - the cyst doesn't get sent into meiosis, each oocyte does.

      Research is showing that it's more complicated than that.  All cyst cells enter "pre-meiotic S phase", and most cell cycles are conventionally considered to start after the previous M phase-

      i.e. in G1 or S, not in the next prophase, an ancient view limited just to meiosis. Absent this old tradition from meiosis cytology, pre-meiotic S would just be called meiotic S as some workers on meiosis do.  In addition, in different species, nurse cells diverge from meiosis on different schedules, including many much later in the meiotic cycle.  Two cyst cells in Drosophila fully enter meiosis by all criteria, the oocyte and one nurse cell that only exits in late zygotene.  In Xenopus and mouse, scRNAseq shows that many cyst cells enter meiosis up to leptotene and zygotene, including nurse cells that specifically downregulate meiotic genes during this time, possibly to assist their nurse cell functions, while others remain in meiosis even longer (Davidian and Spradling, 2025; Niu and Spradling, 2022). Eventually, only the oocytes within each fragmented mouse cyst complete meiosis. 

      (2) Many places in the manuscript abbreviations are never defined or not defined the first time they are used (but the second or third time): Line 23-ER, Line 29-UPR, Line 33-PGC (not defined until line 45), Line 79-EGAD.

      We have defined full acronyms now upon their first occurrence.

      (3) Line 5 should be the pachytene substage of meiosis I.

      We have now updated the statement to “In pachytene stage of meiosis I…”

      (4) Line 59-61 - this statement needs a reference(s).

      These statements are a continuation from the references cited in the previous statements. However, for further clarity we have again cited the relevant reference here (Niu and Spradling, 2022).

      (5) Line 80 - should it be oocyte proteome quality control?

      We have now updated the statement to “Oocyte proteome quality control begins early”.

      (6) Line 87 - in this case, EMA does not stand for epithelial membrane antigen (AI will call it that, but it is not correct). I believe it originally was the abbrev for (Em)bryonic (a)ntigen, though some papers call it (e)mbryonic (m)ouse (a)ntigen. And the reference here is Hahnel and Eddy, 1986, but in the reference list is a different paper, 1987 (both refer to EMA-1).

      We have now updated the acronym EMA-1 in corrected form and have corrected the citation.

      (7) Line 176 - RNA seq.

      We have now updated the statement to “We performed single cell RNA sequencing (scRNA seq) of mouse gonad”.

      (8) Line 181 - Figure 4E and 4I should be 3E and 3I.

      We have now updated the figure reference in the text to correct one.

      (9) Line 183 - missing period.

      Added.

    1. eLife Assessment

      This paper develops a fundamental theory that explains how the brain can hold in working memory not only the identity but also the order of presented stimuli. Previous theories did not explain the ability of people to immediately recall the correct order of the stimulus presentation. The authors present compelling evidence that this can be achieved through synaptic augmentation, an experimentally observed phenomenon with a time scale of tens of seconds.

    2. Reviewer #1 (Public review):

      Summary:

      The issue of how the brain can maintain serial order of presented items in working memory is a major unsolved question in cognitive neuroscience. It has been proposed that this serial order maintenance could be achieved thanks to periodic reactivations of different presented items at different phases of an oscillation, but the mechanisms by which this could be achieved by brain networks, as well as the mechanisms of read-out, are still unclear. In an influential 2008 paper, the authors have proposed a mechanism by which a recurrent network of neurons could maintain multiple items in working memory, thanks to `population spikes' of populations of neurons encoding for the different items, occurring at alternating times. These population spikes occur in a specific regime of the network and are a result of synaptic facilitation, an experimentally observed type of synaptic short-term dynamics with time scales of order hundreds of ms.

      In the present manuscript, the authors extend their model to include another type of experimentally observed short-term synaptic plasticity termed synaptic augmentation, that operates on longer time scales on the order of 10s. They show that while a network without augmentation loses information about serial order, augmentation provides a mechanism by which this order can be maintained in memory thanks to a temporal gradient of synaptic efficacies. The order can then be read out using a read-out network whose synapses are also endowed with synaptic augmentation. Interestingly, the read-out speed can be regulated using background inputs.

      Strengths:

      This is an elegant solution to the problem of serial order maintenance, that only relies on experimentally observed features of synapses. The model is consistent with a number of experimental observations in humans and monkeys. The paper will be of interest to the broad readership of eLife and I believe it will have a strong impact on the field.

      Comments on revisions:

      I am happy with how the authors have addressed my comments, and believe the paper can be published in its present form.

    3. Reviewer #2 (Public review):

      In this manuscript, the authors present a model to explain how working memory (WM) encodes both existence and timing simultaneously using transient synaptic augmentation. A simple yet intriguing idea.

      The model presented here has the potential to explain what previous theories like 'active maintenance via attractors' and 'liquid state machine' do not, and describe how novel sequences are immediately stored in WM. Altogether, the topic is of great interest to those studying higher cognitive processes, and the conclusions the authors draw are certainly thought-provoking from an experimental perspective.

      Comments on revisions:

      The authors have done an excellent job of addressing the questions that I raised, and the manuscript is greatly improved - both in content and clarity. It is an insightful advance and I recommend publication.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The network they propose is extremely simple. This simplicity has pros and cons: on the one hand, it is nice to see the basic phenomenon exposed in the simplest possible setting. On the other hand, it would also be reassuring to check that the mechanism is robust when implemented in a more realistic setting, using, for instance, a network of spiking neurons similar to the one they used in the 2008 paper. The more noisy and heterogeneous the setting, the better.

      The choice of a minimal model to illustrate our hypothesis is deliberate. Our main goal was to suggest a physiologically-grounded mechanism to rapidly encode temporally-structured information (i.e., sequences of stimuli) in Working Memory, where none was available before. Indeed, as discussed in the manuscript, previous proposals were unsatisfactory in several respects. In view of our main goal, we believe that a spiking implementation is beyond the scope of the present work.

      We would like to note that the mechanism originally proposed in Mongillo et al. (2008), has been repeatedly implemented, by many different groups, in various spiking network models with different levels of biological realism (see, e.g., Lundquivst et al. (2016), for an especially ‘detailed’ implementation) and, in all cases, the relevant dynamics has been observed. We take this as an indication of ‘robustness’; the relevant network dynamics doesn’t critically depend on many implementation details and, importantly, this dynamics is qualitatively captured by a simple rate model (see, e.g., Mi et al. (2017)).

      In the present work, we make a relatively ‘minor’ (from a dynamical point of view) extension of the original model, i.e., we just add augmentation. Accordingly, we are fairly confident that a set of parameters for the augmentation dynamics can be found such that the spiking network behaves, qualitatively, as the rate model. A meaningful study, in our opinion, then would require extensively testing the (large) parameters’ space (different models of augmentation?) to see how the network behavior compares with the relevant experimental observations (which ones? Behavioral? Physiological?). As said above, we believe that this is beyond the scope of the present work.

      This being said, we definitely agree with the reviewer that not presenting a spiking implementation is a limitation of the present work. We have clearly acknowledged this limitation here, by adding the following paragraph to the Discussion.

      “To illustrate our theory in a simple setting, we used a minimal model network that neglects many physiological details. This, however, constitutes a limitation of the present study. It would be reassuring to see that the mechanism we propose here is robust enough to reliably operate also in spiking networks, in the presence of heterogeneity in both single-cell and synaptic properties. While we are fairly confident that this is the case, a spiking implementation of our model is beyond the scope of the present study and will be addressed in the future. Also, because of the simplicity of the model network, a comparison between the model behavior and the electrophysiological observations cannot be completely direct. Nevertheless the model qualitatively accounts for a diverse set of experimental data”.

      (2) One major issue with the population spike scenario is that (to my knowledge) there is no evidence that these highly synchronized events occur in delay periods of working memory experiments. It seems that highly synchronized population spikes would imply (a) a strong regularity of spike trains of neurons, at odds with what is typically observed in vivo (b) high synchronization of neurons encoding for the same item (and also of different items in situations where multiple items have to be held in working memory), also at odds with in vivo recordings that typically indicate weak synchronization at best. It would be nice if the authors at least mention this issue, and speculate on what could possibly bridge the gap between their highly regular and synchronized network, and brain networks that seem to lie at the opposite extreme (highly irregular and weakly synchronized). Of course, if they can demonstrate using a spiking network simulation that they can bridge the gap, even better.

      Direct experimental evidence (in monkeys) in support of the existence of highly synchronized events -- to be identified with the ‘population spikes’ of our model -- during the delay period of a memory task is available in the literature, i.e., Panichello et al. (2024). we provide a short discussion of the results of Panichello et al. (2024) and how these results directly relate to our model. We also provide a short discussion of the results of Liebe et al. (2025), which, again, are fully consistent with our model.

      We note that there is no fundamental contradiction between highly synchronized events in ‘small’ neural populations (e.g., a cell assembly) on one hand, and temporally irregular (i.e., Poisson-like) spiking at the single-neuron level and weakly synchronized activity at the network level, on the other hand. This was already illustrated in our original publication, i.e., Mongillo et al. (2008) (see, in particular, Fig. S2). We further note that the mechanism we propose to encode temporal order -- a temporal gradient in the synaptic efficacies brought about by synaptic augmentation -- would also work if the memory of the items is maintained by ‘tonic’ persistent activity (i.e., without highly synchronized events), provided this activity occurs at suitably low rates such as to prevent the saturation of the synaptic augmentation.

      We have added the following two paragraphs to the Discussion.

      “More direct support to this interpretation comes from recent electrophysiological studies [Panichello et al., 2024, Liebe et al., 2025]. By recording large neuronal populations (∼ 300) simultaneously in the prefrontal cortex of monkeys performing a WM task, [Panichello et al., 2024] found that, during the maintenance period, the decoding of the actively held item from neural activity was ’intermittent’; that is, decoding was only possible during short epochs (∼ 100ms) interleaved with epochs (also ∼ 100ms) where decoding was at chance level. The inability to decode resulted from a loss of selectivity at the population level, with a return of the single-neuron firing rates to their spontaneous (pre-stimulus) activity levels. The transitions between these two activity states (decodable/not-decodable) were coordinated across large populations of neurons in PFC. By recording single-neuron activity in the medial temporal lobe of humans performing a sequential multi-item WM task, [Liebe et al., 2025] found that during maintenance, neurons coding for a given item tended to fire at a specific phase of the underlying theta rhythm, again suggesting that the corresponding neuronal populations reactivate briefly and sequentially. In summary, these experimental results suggest that active memory maintenance relies on brief reactivations of the neural representations of the items, which we identify with the population spikes in our model, and that these reactivatations occur sequentially in time, as predicted by our theory”.

      “We note that the proposed mechanism would still work if the items were maintained by tonically-enhanced firing rates, instead of population spikes, provided that those firing rates were suitably low. However, obtaining low firing rates in model networks of persistent activity is quite difficult”.

      Reviewer #2 (Public review):

      The study relates to the well-known computational theory for working memory, which suggests short-term synaptic facilitation is required to maintain working memory, but doesn't rely on persistent spiking. This previous theory appears similar to the proposed theory, except for the change from facilitation to augmentation. A more detailed explanation of why the authors use augmentation instead of facilitation in this paper is warranted: is the facilitation too short to explain the whole process of WM? Can the theory with synaptic facilitation also explain the immediate storage of novel sequences in WM?

      In the model, synaptic dynamics displays both short-term facilitation and augmentation (and shortterm depression). Indeed, synaptic facilitation, alone, would be too short-lived to encode novel sequences. This is illustrated in Fig. 1B.

      We provide a discussion of this important point, by adding the following paragraph to the Results section.

      “If augmentation was the only form of synaptic plasticity present in the network, the encoding of an item in WM would require long presentation times, or alternatively high firing rates upon presentation, precisely because K_A is small. Instead, rapid encoding is made possible by the presence of the short-term facilitation, which builds up significantly faster than augmentation, as U >> K_A . For the same reason, however, the level of facilitation rapidly reaches the steady state; therefore, short-term facilitation alone is unable to encode temporal order (see Fig. 1B). Thus, our model requires the existence of transitory synaptic enhancement on at least two time scales, such that longer decays are accompanied by slower build-ups. Intriguingly, this pattern is experimentally observed [Fisher et al., 1997]”.

      In Figure 1, the authors mention that synaptic augmentation leads to an increased firing rate even after stimulus presentation. It would be good to determine, perhaps, what the lowest threshold is to see the encoding of a WM task, and whether that is biologically plausible.

      We believe that this comment is related to the above point. The reviewer is correct; augmentation alone would require fairly long stimulus presentations to encode an item in WM. ‘Fast’ encoding, indeed, is guaranteed by the presence of short-term facilitation. This important point is emphasized; see above.

      In the middle panel of Figure 4, after 15-16 sec, when the neuronal population prioritizes with the second retro-cue, although the second retro-cue item's synaptic spike dominates, why is the augmentation for the first retro-cue item higher than the second-cue augmentation until the 20 sec?

      This is because of the slow build-up and decay of the augmentation. When the second item is prioritized, and the corresponding neuronal population re-activates, its augmentation level starts to increase. At the same time, as the first item is now de-prioritized and the corresponding neuronal population is now silent, its augmentation level starts to decrease. Because of the ‘slowness’ of both processes (i.e., augmentation build-up and decay), it takes about 5 seconds for the augmentation level of the second item to overcome the augmentation level of the first item.

      We note that the slow time scales of the augmentation dynamics, consistently with experimental observations, are necessary for our mechanism to work; see above.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 46 identify -> identity.

      (2) Line 207 scale -> scales.

      Fixed. Thank you.

      (3) Lines 222-224 what about behavioral time-scale plasticity? This type of plasticity can apparently be induced very quickly.

      We have removed the corresponding paragraph.

      (4) Line 231 identification of `gamma bursts' with population spikes: These two phenomena seem to be very different - one can be weakly synchronized and can be consistent with highly irregular activity, while it is not clear whether the other can (see major issue 2). Also, it seems that population spikes occur at frequencies that are an order of magnitude lower than gamma.

      We have rewritten the corresponding paragraph and we rely now on more direct electrophysiological evidence (i.e., on the simultaneous recording of large neuronal populations) to identify putative population spikes; see above.

      Reviewer #2 (Recommendations for the authors):

      (1) On page 7, the behavioral study of Rose et al. (2016) is quite important for readers to understand the 'low-activity regime', and to fully appreciate Figure 4, it would be beneficial to explain that study in greater detail.

      We have added a panel to Fig. 4, and accompanying text in the caption, to better illustrate the main task events in the experiment of Rose et al. (2016).

      (2) Line 17: "wrong order", but wrong timing matters too

      Definitely, depending on the task. Specifically, in our example, timing is immaterial.

      (3) Line 33-34: "special training", what is considered special? One could argue that the number of trials needed to learn, depending on the TI timing, is special, depending on the task.

      We have removed the sentence as apparently it was confusing. We simply meant that ‘naive’ human subjects can perform the task (e.g., serial recall); that is, they didn’t undergo any kind of practice that can be construed as ‘training’.

      (4) Line 40-41: but timing is also part of working memory processing. Perhaps it can be merged with the next sentence.

      We have merged the two sentences.

      (5) Line 53: Is the implication here that what happens in the synapses is what drives WM, and not just that the neurons stay persistently on?

      Yes. The idea is that information can be maintained in the synaptic facilitation level, without enhanced spiking activity. Reading-out and refreshing the memory contents, however, requires neuronal activity. We explain this in some detail in the next paragraph (i.e., lines 60-65 in the revised submission).

      (6) Line 102: could a lack of excitatory activity be explained by inhibitory signaling? It appears the inhibitory component is quite understated here.

      Here we are just defining A-bar; according to Eq. (6), if r_a is 0 (i.e., no synaptic activity, for whatever reason), then A_a will converge to A-bar after a time much longer than \tau_A (i.e., a long period). We have rephrased the sentence to improve clarity.

      (7) Line 158-172: please consider revising this paragraph for a more general audience.

      We have rewritten this paragraph to improve clarity. For the same purpose, we have also slightly modified Fig. 3.

      (8) Line 227: it would seem this is due to a singular inhibitory group making the model highly dependent on the excitatory groups.

      We are not sure that we understand this comment. Here, we are just saying that if the item-coding populations don’t reactivate during the maintenance period (i.e., activity-silent regime) then the augmentation gradient cannot build up. If, on the other hand, the item-coding populations are constantly active at high rates during the maintenance period (i.e., persistent-activity regime) then then augmentation levels will rapidly saturate and, again, there will be no augmentation gradient. This is independent of how ‘silence’ or ‘activity’ of the item-coding populations is determined by the interplay of excitation and inhibition.

      (9) Line 284: this would certainly be an interesting take, but it isn't clear that the model proved this type of decoupling of the temporal aspect of the recall.

      This is an ‘educated’ speculation, based on the model and on a specific interpretation of some experimental results, as discussed in the paper and, in particular, in the last paragraph of the Discussion. We believe that the phrasing of the paragraph makes clear that this is, indeed, a speculation.

    1. eLife Assessment

      This valuable study a computational language model, i.e., HM-LSTM, to quantify the neural encoding of hierarchical linguistic information in speech, and addresses how hearing impairment affects neural encoding of speech. Overall the evidence for the findings is solid, although the evidence for different speech processing stages could be strengthened by a more rigorous temporal response function (TRF) analysis. The study is of potential interest to audiologists and researchers who are interested in the neural encoding of speech.

    2. Reviewer #1 (Public review):

      The authors relate a language model developed to predict whether a given sentence correctly followed another given sentence to EEG recordings in a novel way, showing receptive fields related to widely used TRFs. In these responses (or "regression results"), differences between representational levels are found, as well as differences between attended and unattended speech stimuli, and whether there is hearing loss. These differences are found per EEG channel.

      In addition to these novel regression results, which are apparently captured from the EEG specifically around the sentence stimulus offsets, the authors also perform a more standard mTRF analysis using a software package (Eelbrain) and TRF regressors that will be more familiar to researchers adjacent to these topics, which was highly appreciated for its comparative value. Comparing these TRFs with the authors' original regression results, several similarities can be seen. Specifically, response contrasts for attended versus unattended speaker during mixed speech, for the phoneme, syllable, and sentence regressors, are greater for normal-hearing participants than hearing-impaired participants for both analyses, and the temporal and spatial extents of the significant differences are roughly comparable (left-front and 0 - 200 ms for phoneme and syllable, and left and 200 - 300 ms for sentence).

      The inclusion of the mTRF analysis is helpful also because some aspects of the authors' original regression results, between the EEG data and the HM-LSTM linguistic model, are less than clear. The authors state specifically that their regression analysis is only calculated in the -100 - 300 ms window around stimulus/sentence offsets. They clarify that this means that most of the EEG data acquired while the participants are listening to the sentences is not analyzed, because their HM-LSTM model implementation represents all acoustic and linguistic features in a condensed way, around the end of the sentence. Thus the regression between data and model only occurs where the model predictions exist, which is the end of the sentences. This is in contrast to the mTRF analysis, which seems to have been done in a typical way, regressing over the entire stimulus time, because those regressors (phoneme onset, word onset, etc.) exist over the entire sentence time. If my reading of their description of the HM-LSTM regression is correct, it is surprising that the regression weights are similar between the HM-LSTM model and the mTRF model.

      However, the code that the authors uploaded to OSF seems to clarify this issue. In the file ridge_lstm.py, the authors construct the main regressor matrices called X1 and X2 which are passed to sklearn to do the ridge regression. This ridge regression step is calculated on the continuous 10-minute bouts of EEG and stimuli, and it is calculated in a loop over lag times, from -100 ms to 300 ms lag. These regressor matrices are initialized as zeros, and are then filled in two steps: the HM_LSTM model unit weights are read from numpy files and written to the matrices at one timepoint per sentence (as the authors describe in the text), and the traditional phoneme, syllable, etc. annotations are ALSO read in (from csv files) and written to the matrices, putting 1s at every timepoint of those corresponding onsets/offsets. Thus the actual model regressor matrix for the authors' main EEG results includes BOTH the HM_LSTM model weights for each sentence AND the feature/annotation times, for whichever of the 5 features is being analyzed (phonemes, syllables, words, phrases, or sentences).

      So for instance, for the syllable HM_LSTM regression results, the regressor matrix contains: 1) the HM_LSTM model weights corresponding to syllables (a static representation, placed once per sentence offset time), AND 2) the syllable onsets themselves, placed as a row of 1s at every syllable onset time. And as another example, for the word HM_LSTM regression results, the regressor matrix contains: 1) the HM_LSTM model weights corresponding to words (a static representation, placed once per sentence offset time), AND 2) the word onsets themselves, placed as a row of 1s at every word onset time.

      If my reading of the code is correct, there are two main points of clarification for interpreting these methods:

      First, the authors' window of analysis of the EEG is not "limited" to 400 ms as they say; rather the time dimension of both their ridge regression results and their traditional mTRF analysis is simply lags (400 ms-worth), and the responses/receptive fields are calculated over the entire 10-minute trials. This is the normal way of calculating receptive fields in a continuous paradigm. The authors seem to be focusing on the peri-sentence offset time points because that is where the HM_LSTM model weights are placed in the regressor matrix. Also because of this issue, it is not really correct when the authors say that some significant effect occurred at some latency "after sentence offset". The lag times of the regression results should have the traditional interpretation of lag/latency in receptive field analyses.

      Second, as both the traditional linguistic feature annotations and the HM_LSTM model weights are part of the regression for the main ridge regression results here, it is not known what the contribution specifically of the HM_LSTM portion of the regression was. Because the more traditional mTRF analysis showed many similar results to the main ridge regression results here, it seems probable that the simple feature annotations themselves, rather than the HM_LSTM model weights, are responsible for the main EEG results. A further analysis separating these two sets of regressors would shed light on this question.

    3. Reviewer #3 (Public review):

      Summary:

      The authors aimed to investigate how the brain processes different linguistic units (from phonemes to sentences) in challenging listening conditions, such as multi-talker environments, and how this processing differs between individuals with normal hearing and those with hearing impairments. Using a hierarchical language model and EEG data, they sought to understand the neural underpinnings of speech comprehension at various temporal scales and identify specific challenges that hearing-impaired listeners face in noisy settings.

      Strengths:

      Overall, the combination of computational modeling, detailed EEG analysis, and comprehensive experimental design thoroughly investigates the neural mechanisms underlying speech comprehension in complex auditory environments.

      The use of a hierarchical language model (HM-LSTM) offers a data-driven approach to dissect and analyze linguistic information at multiple temporal scales (phoneme, syllable, word, phrase, and sentence). This model allows for a comprehensive neural encoding examination of how different levels of linguistic processing are represented in the brain.

      The study includes both single-talker and multi-talker conditions, as well as participants with normal hearing and those with hearing impairments. This design provides a robust framework for comparing neural processing across different listening scenarios and groups.

      Weaknesses:

      The study tests only a single deep neural network model for extracting linguistic features, which limits the robustness of the conclusions. A lower model fit does not necessarily indicate that a given type of information is absent from the neural signal-it may simply reflect that the model's representation was not optimal for capturing it. That said, this limitation is a common concern for data-driven, correlation-based approaches, and should be viewed as an inherent caveat rather than a critical flaw of the present work.

    4. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This valuable study combines a computational language model, i.e., HM-LSTM, and temporal response function (TRF) modeling to quantify the neural encoding of hierarchical linguistic information in speech, and addresses how hearing impairment affects neural encoding of speech. The analysis has been significantly improved during the revision but remain somewhat incomplete - The TRF analysis should be more clearly described and controlled. The study is of potential interest to audiologists and researchers who are interested in the neural encoding of speech.

      We thank the editors for the updated assessment. In the revised manuscript, we have added a more detailed description of the TRF analysis on p. of the revised manuscript. We have also updated Figure 1 to better visualize the analyses pipeline. Additionally, we have included a supplementary video to illustrate the architecture of the HM-LSTM model, the ridge regression methods using the model-derived features, and mTRF analysis using the acoustic envelop and the binary rate models.

      Public Reviews:

      Reviewer #1 (Public review):

      About R squared in the plots:

      The authors have used a z-scored R squared in the main ridge regression plots. While this may be interpretable, it seems non-standard and overly complicated. The authors could use a simple Pearson r to be most direct and informative (and in line with similar work, including Goldstein et al. 2022 which they mentioned). This way the sign of the relationships is preserved.

      We did not use Pearson’s r as in Goldstein et al. (2022) because our analysis did not involve a train-test split, which was a key aspect of their approach. Specifically, Goldstein et al. (2022) divided their data into training and testing sets, trained a ridge regression model on the training set, and then used the trained model to predict neural responses on the test set. They calculated Pearson’s r to assess the correlation between the predicted and observed neural responses, making the correlation coefficient (r) their primary measure of model performance. In contrast, our analysis focused on computing the model fitting performance (R²) of the ridge regression model for each sensor and time point for each subject. At the group level, we conducted one-sample t-tests with spatiotemporal cluster-based correction on the R² values to identify sensors and time windows where R² values were significantly greater than baseline. We established the baseline by normalizing the R² values using Fisher z-transformation across sensors within each subject. We have added this explanation on p.13 of the revised manuscript.

      About the new TRF analysis:

      The new TRF analysis is a necessary addition and much appreciated. However, it is missing the results for the acoustic regressors, which should be there analogous to the HM-LSTM ridge analysis. The authors should also specify which software they have utilized to conduct the new TRF analysis. It also seems that the linguistic predictors/regressors have been newly constructed in a way more consistent with previous literature (instead of using the HM-LSTM features); these specifics should also be included in the manuscript (did it come from Montreal Forced Aligner, etc.?). Now that the original HM-LSTM can be compared to a more standard TRF analysis, it is apparent that the results are similar.

      We used the Python package Eelbrain (https://eelbrain.readthedocs.io/en/r0.39/auto_examples/temporal-response-functions/trf_intro.html) to conduct the multivariate temporal response function (mTRF) analyses. As we previously explained in our response to R3, we did not apply mTRF to the acoustic features due to the high dimensionality of the input. Specifically, our acoustic representation consists of a 130-dimensional vector sampled every 10 ms throughout the speech stimuli (comprising a 129-dimensional spectrogram and a 1dimensional amplitude envelope). This led to interpreting the 130-dimensional TRF estimation difficult to interpret. A similar constraint applied to the hidden-layer activations from our HMLSTM model for the five linguistic features. After dimensionality reduction via PCA, each still resulted in 150-dimensional vectors. To address this, we instead used binary predictors marking the offset of each linguistic unit (phoneme, syllable, word, phrase, sentence). Since our speech stimuli were computer-synthesized, the phoneme and syllable boundaries were automatically generated. The word boundaries were manually annotated by a native Mandarin as in Li et al. (2022). The phrase boundaries were automatically annotated by the Stanford parser and manually checked by a native Mandarin speaker. These rate models are represented as five distinct binary time series, each aligned with the timing of the corresponding linguistic unit, making them well-suited for mTRF analysis. Although the TRF results from the 1-dimensional rate predictors and the ridge regression results from the high-dimensional HM-LSTM-derived features are similar, they encode different things: The rate regressors only encode the timing of linguistic unit boundaries, while the model-derived features encode the representational content of the linguistic input. Therefore, we do not consider the mTRF analyses to be analogous to the ridge regression analyses. Rather, these results complement each other and both provide informative results into the neural tracking of linguistic structures at different levels for the attended and unattended speech.

      Since the TRF result for the continuous acoustic features also concerns R2, we have added an mTRF analysis where we fitted the one-dimensional speech envelope to the EEG. We extracted the envelope at 10 ms intervals for both attended and unattended speech and computed mTRFs independently for each subject and sensor using a basis of 50 ms Hamming windows spanning –100 ms to 300 ms relative to envelope onset. The results showed that in hearing-impaired participants, attended speech elicited a significant cluster in the bilateral temporal regions from 270 to 300 ms post-onset (t = 2.40, p = 0.01, Cohen’s d = 0.63). Unattended speech elicited an early cluster in right temporal and occipital regions from –100 ms to –80 ms (t = 3.07, p = 0.001, d = 0.83). Normal-hearing participants showed significant envelope tracking in the left temporal region at 280–300 ms after envelope onset (t = 2.37, p = 0.037, d = 0.48), with no significant cluster for unattended speech. These results further suggest that hearing-impaired listeners may have difficulty suppressing unattended streams. We have added the new TRF results for envelope to Figure S3 and the “mTRF results for attended and unattended speech” on p.7 and the “mTRF analysis” in Material and Methods of the revised manuscript.

      The authors' wording about this suggests that these new regressors have a nonzero sample at each linguistic event's offset, not onset. This should also be clarified. As the authors know, the onset would be more standard, and using the offset has implications for understanding the timing of the TRFs, as a phoneme has a different duration than a word, which has a different duration from a sentence, etc.

      In our rate‐model mTRF analyses, we initially labelled linguistic boundaries as “offsets” because our ridge‐regression with HM-LSTM features was aligned to sentence offsets rather than onsets. However, since each offset coincides with the next unit’s onset—and our regressors simply mark these transition points as 1—the “offset” and “onset” models yield identical mTRFs. To avoid confusion, we have relabeled “offset” as “boundary” in Figure S2.

      As discussed in our prior responses, this design was based on the structure of our input to the HM-LSTM model, where each input consists of a pair of sentences encoded in phonemes, such as “t a_1 n əŋ_2 f ei_1 <sep> zh ə_4 sh iii_4 f ei_1 j ii_1” (“It can fly <sep> This is an airplane”). The two sentences are separated by a special <sep> token, and the model’s objective is to determine whether the second sentence follows the first, similar to a next-sentence prediction task. Since the model processes both sentences in full before making a prediction, the neural activations of interest should correspond to the point at which the entire sentence has been processed by humans. To enable a fair comparison between the model’s internal representations and brain responses, we aligned our neural analyses with the sentence offsets, capturing the time window after the sentence has been fully perceived by the participant. Thus, we extracted epochs from -100 to +300 ms relative to each sentence offset, consistent with our model-informed design.

      We understand that phonemes, syllables, words, phrases, and sentences differ in their durations. However, the five hidden activity vectors extracted from the model are designed to capture the representations of these five linguistic levels across the entire sentence. Specifically, for a sentence pair such as “It can fly <sep> This is an airplane,” the first 2048-dimensional vector represents all the phonemes in the two sentences (“t a_1 n əŋ_2 f ei_1 <sep> zh ə_4 sh iii_4 f ei_1 j ii_1”), the second vector captures all the syllables (“ta_1 nəŋ_2 fei_1 <sep> zhə_4 shiii_4 fei_1jii_1”), the third vector represents all the words, the fourth vector captures the phrases, and the fifth vector represents the sentence-level meaning. In our dataset, input pairs consist of adjacent sentences from the stimuli (e.g., Sentence 1 and Sentence 2, Sentence 2 and Sentence 3, and so on), and for each pair, the model generates five 2048-dimensional vectors, each corresponding to a specific linguistic level. To identify the neural correlates of these model-derived features—each intended to represent the full linguistic level across a complete sentence—we focused on the EEG signal surrounding the completion of the second sentence rather than on incremental processing. Accordingly, we extracted epochs from -100 ms to +300 ms relative to the offset of the second sentence and performed ridge regression analyses using the five model features (reduced to 150 dimensions via PCA) at every 50 ms across the epoch. We have added this clarification on p.12 of the revised manuscript.

      About offsets:

      TRFs can still be interpretable using the offset timings though; however, the main original analysis seems to be utilizing the offset times in a different, more confusing way. The authors still seem to be saying that only the peri-offset time of the EEG was analyzed at all, meaning the vast majority of the EEG trial durations do not factor into the main HM-LSTM response results whatsoever. The way the authors describe this does not seem to be present in any other literature, including the papers that they cite. Therefore, much more clarification on this issue is needed. If the authors mean that the regressors are simply time-locked to the EEG by aligning their offsets (rather than their onsets, because they have varying onsets or some such experimental design complexity), then this would be fine. But it does not seem to be what the authors want to say. This may be a miscommunication about the methods, or the authors may have actually only analyzed a small portion of the data. Either way, this should be clarified to be able to be interpretable.

      We hope that our response in RE4, along with the supplementary video, has helped clarify this issue. We acknowledge that prior studies have not used EEG data surrounding sentence offsets to examine neural responses at the phoneme or syllable levels. However, this is largely due to a lack of model that represent all linguistic levels across an entire sentence. There is abundant work comparing model predictors with neural data time-locked to offsets because they mark the point at which participants has already processed the relevant information (Brennan, 2016; Brennan et al., 2016; Gwilliams et al., 2024, 2025). Similarly, in our model– brain alignment study, our goal is to identify neural correlates for each model-derived feature. If we correlate model activity with EEG data aligned to sentence onsets, we would be examining linguistic representations at all levels (from phoneme to sentence) of the whole sentence at the time when participants have not heard the sentence yet. Although this limits our analysis to a subset of the data (143 sentences × 400 ms windows × 4 conditions), it targets the exact moment when full-sentence representations emerge against background speech, allowing us to examine each model-derived feature onto its neural signature. We have added this clarification on p.12 of the revised manuscript.

      Reviewer #2 (Public review):

      This study presents a valuable finding on the neural encoding of speech in listeners with normal hearing and hearing impairment, uncovering marked differences in how attention to different levels of speech information is allocated, especially when having to selectively attend to one speaker while ignoring an irrelevant speaker. The results overall support the claims of the authors, although a more explicit behavioural task to demonstrate successful attention allocation would have strengthened the study. Importantly, the use of more "temporally continuous" analysis frameworks could have provided a better methodology to assess the entire time course of neural activity during speech listening. Despite these limitations, this interesting work will be useful to the hearing impairment and speech processing research community. The study compares speech-in-quiet vs. multi-talker scenarios, allowing to assess within-participant the impact that the addition of a competing talker has on the neural tracking of speech. Moreover, the inclusion of a population with hearing loss is useful to disentangle the effects of attention orienting and hearing ability. The diagnosis of high-frequency hearing loss was done as part of the experimental procedure by professional audiologists, leading to a high control of the main contrast of interest for the experiment. Sample size was big, allowing to draw meaningful comparisons between the two populations.

      We thank you very much for your appreciation of our research and we have now added a more description of the mTRF analyses on p.13-14 of the revised manuscript.

      An HM-LSTM model was employed to jointly extract speech features spanning from the stimulus acoustics to word-level and phrase-level information, represented by embeddings extracted at successive layers of the model. The model was specifically expanded to include lower level acoustic and phonetic information, reaching a good representation of all intermediate levels of speech. Despite conveniently extracting all features jointly, the HMLSTM model processes linguistic input sentence-by-sentence, and therefore only allows to assess the corresponding EEG data at sentence offset. If I understood correctly, while the sentence information extracted with the HM-LSTM reflects the entire sentence - in terms of its acoustic, phonetic and more abstract linguistic features - it only gives a condensed final representation of the sentence. As such, feature extraction with the HM-LSTM is not compatible with a continuous temporal mapping on the EEG signal, and this is the main reason behind the authors' decision to fit a regression at nine separate time points surrounding sentence offsets.

      Yes, you are correct. As explained in RE4, the model generates five hidden-layer activity vectors, each intended to represent all the phonemes, syllables, words, phrases within the entire sentence (“a condensed final representation”). This is the primary reason we extract EEG data surrounding the sentence offsets—this time point reflects when the full sentence has been processed by the human brain. We assume that even at this stage, residual neural responses corresponding to each linguistic level are still present and can be meaningfully analyzed.

      While valid and previously used in the literature, this methodology, in the particular context of this experiment, might be obscuring important attentional effects impacted by hearing-loss. By fitting a regression only around sentence-final speech representations, the method might be overlooking the more "online" speech processing dynamics, and only assessing the permanence of information at different speech levels at sentence offset. In other words, the acoustic attentional bias between Attended and Unattended speech might exist even in hearing-impaired participants but, due to a lower encoding or permanence of acoustic information in this population, it might only emerge when using methodologies with a higher temporal resolution, such as Temporal Response Functions (TRFs). If a univariate TRF fit simply on the continuous speech envelope did not show any attentional bias (different trial lengths should not be a problem for fitting TRFs), I would be entirely convinced of the result. For now, I am unsure on how to interpret this finding.

      We agree and we have added the mTRF results using the rate models for the 5 linguistic levels in the prior revision. The rate model aligns with the boundaries of each linguistic unit at each level. As explained in RE3, the rate regressors encode the timing of linguistic unit boundaries, while the model-derived features encode the representational content of the linguistic input. The mTRF results showed similar patterns to those observed using features from our HM-LSTM model with ridge regression (see Figure S2). These results complement each other and both provide informative results into the neural tracking of linguistic structures at different levels for the attended and unattended speech.

      We have also added TRF results fitting the envelope of attended and unattended speech at every 10 ms to the whole 10-minute EEG data at every 10 ms. Our results showed that in hearing-impaired participants, attended speech elicited a significant cluster in the bilateral temporal regions from 270 to 300 ms post-onset (t = 2.40, p = 0.01, Cohen’s d = 0.63). Unattended speech elicited an early cluster in right temporal and occipital regions from –100 ms to –80 ms (t = 3.07, p = 0.001, d = 0.83). Normal-hearing participants showed significant envelope tracking in the left temporal region at 280–300 ms after envelope onset (t = 2.37, p = 0.037, d = 0.48), with no significant cluster for unattended speech. These results further suggest that hearing-impaired listeners may have difficulty suppressing unattended streams. We have added the new TRF results for envelope to Figure S3 and the “mTRF results for attended and unattended speech” on p.7 and the “mTRF analysis” in Material and Methods of the revised manuscript.

      Despite my doubts on the appropriateness of condensed speech representations and singlepoint regression for acoustic features in particular, the current methodology allows the authors to explore their research questions, and the results support their conclusions. This work presents an interesting finding on the limits of attentional bias in a cocktail-party scenario, suggesting that fundamentally different neural attentional filters are employed by listeners with highfrequency hearing loss, even in terms of the tracking of speech acoustics. Moreover, the rich dataset collected by the authors is a great contribution to open science and will offer opportunities for re-analysis.

      We sincerely thank you again for your encouraging comments regarding the impact of our study.

      Reviewer #3 (Public review):

      Summary:

      The authors aimed to investigate how the brain processes different linguistic units (from phonemes to sentences) in challenging listening conditions, such as multi-talker environments, and how this processing differs between individuals with normal hearing and those with hearing impairments. Using a hierarchical language model and EEG data, they sought to understand the neural underpinnings of speech comprehension at various temporal scales and identify specific challenges that hearing-impaired listeners face in noisy settings.

      Strengths:

      Overall, the combination of computational modeling, detailed EEG analysis, and comprehensive experimental design thoroughly investigates the neural mechanisms underlying speech comprehension in complex auditory environments. The use of a hierarchical language model (HM-LSTM) offers a data-driven approach to dissect and analyze linguistic information at multiple temporal scales (phoneme, syllable, word, phrase, and sentence). This model allows for a comprehensive neural encoding examination of how different levels of linguistic processing are represented in the brain. The study includes both single-talker and multi-talker conditions, as well as participants with normal hearing and those with hearing impairments. This design provides a robust framework for comparing neural processing across different listening scenarios and groups.

      Weaknesses:

      The analyses heavily rely on one specific computational model, which limits the robustness of the findings. The use of a single DNN-based hierarchical model to represent linguistic information, while innovative, may not capture the full range of neural coding present in different populations. A low-accuracy regression model-fit does not necessarily indicate the absence of neural coding for a specific type of information. The DNN model represents information in a manner constrained by its architecture and training objectives, which might fit one population better than another without proving the non-existence of such information in the other group. It is also not entirely clear if the DNN model used in this study effectively serves the authors' goal of capturing different linguistic information at various layers. More quantitative metrics on acoustic/linguistic-related downstream tasks, such as speaker identification and phoneme/syllable/word recognition based on these intermediate layers, can better characterize the capacity of the DNN model.

      We agree that, before aligning model representations with neural data, it is essential to confirm that the model encodes linguistic information at multiple hierarchical levels. This is the purpose of our validation analysis: We evaluated the model’s representations across five layers using a test set of 20 four-syllable sentences in which every syllable shares the same vowel—e.g., “mā ma mà mǎ” (mother scolds horse), “shū shu shǔ shù” (uncle counts numbers; see Table S1). We hypothesized that the activity in the phoneme and syllable layer would be more similar than other layers for same-vowel sentences. The results confirmed our hypothesis: Hidden-layer activity for same-vowel sentences exhibited much more similar distributions at the phoneme and syllable levels compared to those at the word, phrase and sentence levels Figure 3C displays the scatter plot of the model activity at the five linguistic levels for each of the 20 4-syllable sentences, post dimension reduction using multidimensional scaling (MDS). We used color-coding to represent the activity of five hidden layers after dimensionality reduction. Each dot on the plot corresponds to one test sentence. Only phonemes are labeled because each syllable in our test sentences contains the same vowels (see Table S1).The plot reveals that model representations at the phoneme and syllable levels are more dispersed for each sentence, while representations at the higher linguistic levels—word, phrase, and sentence—are more centralized. Additionally, similar phonemes tend to cluster together across the phoneme and syllable layers, indicating that the model captures a greater amount of information at these levels when the phonemes within the sentences are similar.

      Apart from the DNN model, we also included the rate models which simply mark 1 at each unit boundaries across the 5 levels. We performed mTRF analyses with these rate models and found similar patterns to our ridge‐regression results with the DNN: (see Figure S2). This provides further evidence that the model reliably captures information across all five hierarchical levels.

      Since EEG measures underlying neural activity in near real-time, it is expected that lower-level acoustic information, which is relatively transient, such as phonemes and syllables, would be distributed throughout the time course of the entire sentence. It is not evident if this limited time window effectively captures the neural responses to the entire sentence, especially for lower-level linguistic features. A more comprehensive analysis covering the entire time course of the sentence, or at least a longer temporal window, would provide a clearer understanding of how different linguistic units are processed over time.

      We agree that lower-level linguistic features may be distributed throughout the whole sentence, however, using the entire sentence duration was not feasible, as the sentences in the stimuli vary in length, making statistical analysis challenging. Additionally, since the stimuli consist of continuous speech, extending the time window would risk including linguistic units from subsequent sentences. This would introduce ambiguity as to whether the EEG responses correspond to the current or the following sentence. Additionally, our model activity represents a “condensed final representation” at the five linguistic levels for the whole sentence, rather than incrementally during the sentence. We think the -100 to 300 ms time window relative to each sentence offset targets the exact moment when full-sentence representations are comprehended and a “condensed final representation” for the whole sentence across five linguistic level have been formed in the brain. We have added this clarification on p.13 of the revised manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Here are some specifics and clarifications of my public review:

      Initially I was interpreting the R squared as a continuous measure of predicted EEG relative to actual EEG, based on an encoding model, but this does not appear to be correct. Thank you for pointing out that the y axis is z-scored R squared in your main ridge regression plots. However, I am not sure why/how you chose to represent this that way. It seems to me that a simple Pearson r would be most informative here (and in line with similar work, including Goldstein et al. 2022 that you mentioned). That way you preserve the sign of the relationships between the regressors and the EEG. With R squared, we have a different interpretation, which is maybe also ok, but I also don't see the point of z-scoring R squared. Another possibility is that when you say "z-transformed" you are referring to the Fisher transformation; is that the case? In the plots you say "normalized", so that sounds like a z-score, but this needs to be clarified; as I say, a simple Pearson r would probably be best.

      We did not use Pearson’s r, as in Goldstein et al. (2022), because our analysis did not involve a train-test split, which was central to their approach. In their study, the data were divided into training and testing sets, and a ridge regression model was trained on the training set. They then used the trained model to predict neural responses on the held-out test set, and calculated Pearson’s r to assess the correlation between the predicted and observed neural responses. As a result, their final metric of model performance was the correlation coefficient (r). In contrast, our analysis is more aligned with standard temporal response function (TRF) approaches. We did not perform a train-test split; instead, we computed the model fitting performance (R²) of the ridge regression model at each sensor and time point for each subject. At the group level, we conducted one-sample t-tests with spatiotemporal cluster-based correction on the R² values to determine which sensors and time windows showed significantly greater R² values than baseline. To establish a baseline, we z-scored the R² values across sensors and time points, effectively centering the distribution around zero. This normalization allowed us to interpret deviations from the mean R² as meaningful increases in model performance and provided a suitable baseline for the statistical tests. We have added this clarification on p.13 of the revised manuscript.

      Thank you for doing the TRF analysis, but where are the acoustic TRFs, analogous to the acoustic results for your HM-LSTM ridge analyses? And what tools did you use to do the TRF analysis? If it is something like the mTRF MATLAB toolbox, then it is also using ridge regression, as you have already done in your original analysis, correct? If so, then it is pretty much the same as your original analysis, just with more dense timepoints, correct? This is what I meant by referring to TRFs originally, because what you have basically done originally was to make a 9-point TRF (and then the plots and analyses are contrasts of pairs of those), with lags between -100 and 300 ms relative to the temporal alignment between the regressors and the EEG, I think (more on this below).

      Also with the new TRF analysis, you say that the regressors/predictors had "a value of 1 at each unit boundary offset". So this means you re-made these predictors to be discrete as I and reviewer 3 were mentioning before (rather than using the HM-LSTM model layer(s)), and also, that you put each phoneme/word/etc. marker at its offset, rather than its onset? I'm also confused as to why you would do this rather than the onset, but I suppose it doesn't change the interpretation very much, just that the TRFs are slid over by a small amount.

      We used the Python package Eelbrain (https://eelbrain.readthedocs.io/en/r0.39/auto_examples/temporal-response-functions/trf_intro.html) to conduct the multivariate temporal response function (mTRF) analyses. As we previously explained in our response to Reviewer 3, we did not apply mTRF to the acoustic features due to the high dimensionality of the input. Specifically, our acoustic representation consists of a 130-dimensional vector sampled every 10 ms throughout the speech stimuli (comprising a 129-dimensional spectrogram and a 1-dimensional amplitude envelope). This renders the 130 TRF weights to the acoustic features uninterpretable. However, we have now added TRF results from the 1- dimension envelope to the attended and unattended speech at every 10 ms.

      A similar constraint applied to the hidden-layer activations from our HM-LSTM model for the five linguistic features. After dimensionality reduction via PCA, each still resulted in 150-dimensional vectors, further preventing their use in mTRF analyses. To address this, we instead used binary predictors marking the offset of each linguistic unit (phoneme, syllable, word, phrase, sentence). These rate models are represented as five distinct binary time series, each aligned with the timing of the corresponding linguistic unit, making them well-suited for mTRF analysis. It is important to note that these rate predictors differ from the HM-LSTMderived features: They encode only the timing of linguistic unit boundaries, not the content or representational structure of the linguistic input. Therefore, we do not consider the mTRF analyses to be equivalent to the ridge regression analyses based on HM-LSTM features

      For onset vs. offset, as explained RE4, we labelled them “offsets” because our ridge‐regression with HM-LSTM features was aligned to sentence offsets rather than onsets (see RE4 and RE15 below for the rationale of using sentence offset). However, since each unit offset coincides with the next unit’s onset—and the rate model simply mark these transition points as 1—the “offset” and “onset” models yield identical mTRFs. To avoid confusion, we have relabeled “offset” as “boundary” in Figure S2.

      I'm still confused about offsets generally. Does this maybe mean that the EEG, and each predictor, are all aligned by aligning their endpoints, which are usually/always the ends of sentences? So e.g. all the phoneme activity in the phoneme regressor actually corresponds to those phonemes of the stimuli in the EEG time, but those regressors and EEG do not have a common starting time (one trial to the next maybe?), so they have to be aligned with their ends instead?

      We chose to use sentence offsets rather than onsets based on the structure of our input to the HM-LSTM model, where each input consists of a pair of sentences encoded in phonemes, such as “t a_1 n əŋ_2 f ei_1 <sep> zh ə_4 sh iii_4 f ei_1 j ii_1” (“It can fly <sep> This is an airplane”). The two sentences are separated by a special <sep> token, and the model’s objective is to determine whether the second sentence follows the first, similar to a next-sentence prediction task. Since the model processes both sentences in full before making a prediction, the neural activations of interest should correspond to the point at which the entire sentence has been processed. To enable a fair comparison between the model’s internal representations and brain responses, we aligned our neural analyses with the sentence offsets, capturing the time window after the sentence has been fully perceived by the participant. Thus, we extracted epochs from -100 to +300 ms relative to each sentence offset, consistent with our modelinformed design. If we align model activity with EEG data aligned to sentence onsets, we would be examining linguistic representations at all levels (from phoneme to sentence) of the whole sentence at the time when participants have not heard the sentence yet. By contrast, aligning to sentence offsets ensures that participants have constructed a full-sentence representation.

      We understand that it is a bit confusing why the regressor of each level is not aligned to their own offsets in the data. The hidden-layer activations of the HM-LSTM model corresponding to the five linguistic levels (phoneme, syllable, word, phrase, sentence) are consistently 150-dimensional vectors after PCA reduction. As a result, for each input sentence pair, the model produces five distinct hidden-layer activations, each capturing the representational content associated with one linguistic level for the whole sentence. We believe our -100 to 300 ms time window relative to sentence offset reflects a meaningful period during which the brain integrates and comprehends information across multiple linguistic levels.

      Being "time-locked to the offset of each sentence at nine latencies" is not something I can really find in any of the references that you mentioned, regarding the offset aspect of this method. Can you point me more specifically to what you are trying to reference with that, or further explain? You said that "predicting EEG signals around the offset of each sentence" is "a method commonly employed in the literature", but the example you gave of Goldstein 2022 is using onsets of words, which is indeed much more in line with what I would expect (not offsets of sentences).

      You are correct that Goldstein (2022) aligned model predictions to onsets rather than offsets; however, many studies in the literature also align model predictions with unit offsets. typically because they mark the point at which participants has already processed the relevant information (Brennan, 2016; Brennan et al., 2016; Gwilliams et al., 2024, 2025). Similarly, in our study, we aim to identify neural correlates for each model-derived feature. If we correlate model activity with EEG data aligned to sentence onsets, we would be examining linguistic representations at all levels (from phoneme to sentence) of the whole sentence at the time when participants have not heard the sentence yet. By contrast, aligning to sentence offsets ensures that participants have constructed a full-sentence representation. Although this limits our analysis to a subset of the data (143 sentences × 400 ms windows × 4 conditions), it targets the exact moment when full-sentence representations emerge against background speech, allowing us to examine each model-derived feature onto its neural signature. We have added this clarification on p.12 of the revised manuscript.

      This new sentence does not make sense to me: "The regressors are aligned to sentence offsets because all our regressors are taken from the hidden layer of our HM-LSTM model, which generates vector representations corresponding to the five linguistic levels of the entire sentence".

      Thank you for the suggestion. We hope our responses in RE4, 15 and 16, along with our supplementary video have now clarified the issue. We have deleted the sentence and provided a more detailed explanation on p.12 of the revised manuscript: The regressors are aligned to sentence offsets because our goal is to identify neural correlates for each model-derived feature of a whole sentence. If we align model activity with EEG data time-locked to sentence onsets, we would be finding neural responses to linguistic levels (from phoneme to sentence) of the whole sentence at the time when participants have not processed the sentence yet. By contrast, aligning to sentence offsets ensures that participants have constructed a full-sentence representation. Although this limits our analysis to a subset of the data (143 sentences × 2 sections × 400 ms windows), it targets the exact moment when full-sentence representations emerge against background speech, allowing us to examine each model-derived feature onto its neural signature. We understand that phonemes, syllables, words, phrases, and sentences differ in their durations. However, the five hidden activity vectors extracted from the model are designed to capture the representations of these five linguistic levels across the entire sentence Specifically, for a sentence pair such as “It can fly <sep> This is an airplane,” the first 2048dimensional vector represents all the phonemes in the two sentences (“t a_1 n əŋ_2 f ei_1 <sep> zh ə_4 sh iii_4 f ei_1 j ii_1”), the second vector captures all the syllables (“ta_1 nəŋ_2 fei_1 <sep> zhə_4 shiii_4 fei_1jii_1”), the third vector represents all the words, the fourth vector captures the phrases, and the fifth vector represents the sentence-level meaning. In our dataset, input pairs consist of adjacent sentences from the stimuli (e.g., Sentence 1 and Sentence 2, Sentence 2 and Sentence 3, and so on), and for each pair, the model generates five 2048dimensional vectors, each corresponding to a specific linguistic level. To identify the neural correlates of these model-derived features—each intended to represent the full linguistic level across a complete sentence—we focused on the EEG signal surrounding the completion of the second sentence rather than on incremental processing. Accordingly, we extracted epochs from -100 ms to +300 ms relative to the offset of the second sentence and performed ridge regression analyses using the five model features (reduced to 150 dimensions via PCA) at every 50 ms across the epoch.

      More on the issue of sentence offsets: In response to reviewer 3's question about -100 - 300 ms around sentence offset, you said "Using the entire sentence duration was not feasible, as the sentences in the stimuli vary in length, making statistical analysis challenging. Additionally, since the stimuli consist of continuous speech, extending the time window would risk including linguistic units from subsequent sentence." This does not make sense to me, so can you elaborate? It sounds like you are actually saying that you only analyzed 400 ms of each trial, but that cannot be what you mean.

      Yes, we analyzed only the 400 ms window surrounding each sentence offset. Although this represents just a subset of our data (143 sentences × 400 ms × 4 conditions), it precisely captures when full-sentence representations emerge against background speech. Because our model produces a single, condensed representation for each linguistic level over the entire sentence—rather than incrementally—we think it is more appropriate to align to the period surrounding sentence offsets. Additionally, extending the window (e.g. to 2 seconds) would risk overlapping adjacent sentences, since sentence lengths vary. Our focus is on the exact period when integrated, level-specific information for each sentence has formed in the brain, and our results already demonstrate different response patterns to different linguistic levels for the two listener groups within this interval. We have added this clarification on p.13 of the revised manuscript.

      In your mTRF analysis, you are now saying that the discrete predictors have "a value of 1" at each of the "boundary offsets", and those TRFs look very similar to your original plots. It sounds to me like you should not be referring to time zero in your original ridge analysis as "sentence offset". If what you mean is that sentence offset time is merely how you aligned the regressors and EEG in time, then your time zero still has a standard, typical TRF interpretation. It is just the point in time, or lag, at which the regressor(s) and EEG are aligned. So activity before zero is "predictive" and activity after zero is "reactive", to think of it crudely. So also in the text, when you say things like "50-150 ms after the sentence offsets", I think this is not really what you mean. I think you are referring to the lags of 50 - 150 ms, relative to the alignment of the regressor and the EEG.

      Thank you very much for the explanation. We agree that, in our ridge‐regression time course, pre zero lags index “predictive” processing and post-zero lags index “reactive” processing. Unlike TRF analysis, we applied ridge regression to our high-dimensional model features at nine discrete lags around the sentence offset. At each lag, we tested whether the regression score exceeded a baseline defined as the mean regression score across all lags. For example, finding a significantly higher regression score between 50 and 150 ms suggests that our regressor reliably predicted EEG activity in that time window. So here time zero refers to the precise moment of the sentence offset—not the the alignment of the regressor and the EEG.

      I look forward to discussing how much of my interpretation here makes sense or doesn't, both with the authors and reviewers.

      Thank you very much for these very constructive feedback and we hope that we have addressed all your questions.

    1. eLife Assessment

      This study investigates low-affinity Ca2+ binding by WT calreticulin and mutant calreticulin associated with type I myeloproliferative neoplasms, as well as the impact on Ca2+ fluxes in suspension cultures of megakaryocyte-like cells in vitro in response to ER Ca2+ ATPase inhibitors that deplete endoplasmic reticulum (ER) Ca2+ store and open plasma membrane Ca2+ channels through STIM1-Orai interactions. The results are important in that they show that Ca2+ binding by calreticulin and store-operated Ca2+ entry are not fundamentally impacted by the type I deletion mutation in calreticulin, which rules out a direct effect of the calreticulin mutation on its own low-affinity Ca2+ binding and any broad impact on ER Ca2+ regulation. The strength of the data and methods used ranges from solid to convincing, although the use of suspension-based flow cytometric assays to investigate ER Ca2+ levels and Ca2+ entry can be challenged. High-affinity Ca2+ binding sites could be further considered, and possible confounding effects of Abl kinase activity in the megakaryocyte-like cell lines could be offset.

    2. Reviewer #1 (Public review):

      The authors attempted to compare calcium calcium-binding properties of wildtype calreticulin with calreticulin deletion mutant (CRTDel52) associated with myeloproliferative neoplasms.

      The researchers conducted their study using advanced techniques. They found almost no difference in calcium binding between the two proteins and observed no impact on calcium signaling, specifically store-operated calcium entry (SOCE). The study also noted an increase in ER luminal calcium-binding chaperone proteins. Surprisingly, the authors selected flow cytometry as a technique for measurements of ER luminal calcium. Considering the limitations of this approach, it would be better to use alternative approaches. This is particularly important as previous reports, using cells from MPN patients, indicate reduced ER luminal calcium and effects on SOCE (Blood, 2020). This issue matters because earlier research with MPN patient cells reported reduced ER luminal calcium levels and altered SOCE (Blood, 2020). How do the authors explain the difference between their results and previous findings about lower ER luminal calcium and changed SOCE in MPN patient cells expressing CRTDel52? Other studies have found that unfolded protein responses are activated in MPN cells with CRTDel52 calreticulin (see Blood, 2021), and increased UPR could account for higher levels of some ER-resident calcium-binding proteins observed here. Overall, it remains unclear how this work improves our understanding of MPN or clarifies calreticulin's role in MPN pathophysiology.

    3. Reviewer #2 (Public review):

      Summary:

      Tagoe and colleagues present a thorough analysis of the calcium (Ca2+) binding capacity of calreticulin (CRT), an endoplasmic reticulum (ER) Ca2+-buffer protein, using a mutant version (CRT del52) found in myeloproliferative neoplasms (MPNs). The authors use purified human CRT protein variants, CRT-KO cell lines, and an MPN cell line to elucidate the differing Ca2+ dynamics, both on the level of the protein and on cell-wide Ca2+-governed processes. In sum, the authors provide new insights into CRT that can be applied to both normal and malignant cell biology.

      First, the authors purify CRT protein and perform isothermal titration calorimetry to quantify the Ca2+ binding capacity of CRT. They use full-length human CRT, CRT del52, and two truncations of CRT (1-339 and 1-351, the former of which should lead to the entire loss of low-affinity Ca2+ binding). While CRT del52 has previously been shown to lead to a decrease in Ca2+ binding affinity in other models, the ITC data show that this is retained in CRT del52.

      Next, the authors utilize a CRT-KO cell line with subsequent addition of CRT protein variants to validate these findings with flow cytometric analysis. Cells were transfected with a ratiometric ER Ca2+ probe, and fluorescence indicates that CRT del52 is unable to restore basal ER Ca2+ levels to the same extent as CRT wild-type. To translate these findings to MPNs, the authors perform CRT-KO in a megakaryocytic cell line, where reconstitution with either CRT variant did not cause a difference in cytosolic calcium levels. The authors further test store-operated calcium entry (SOCE), an important process for maintaining ER Ca2+ levels, in these cells, and find that CRT-KO cells have lower SOCE activity, and that this can be slightly recovered with CRT addition.

      Finally, the authors ask whether other effects of CRT-KO/reconstitution can affect the cellular Ca2+ signaling pathway and levels. RNASeq analysis revealed that CRT-KO leads to an increase in various chaperone protein expressions, and that reconstitution with CRT del52 is unable to reduce expression to the same extent as reconstitution with CRT wildtype.

      Strengths:

      The authors provide new insights into CRT that can be applied to both normal and malignant cell biology.

      Weaknesses:

      (1) The authors should consider discussing the high-affinity Ca2+ binding site more in the introduction. Can they show a proof-of-concept experiment that validates that incubation of recombinant CRT reduces the function of that high-affinity Ca2+ binding site?

      (2) For Figure 2B, do you have an explanation for why the purified proteins run higher than predicted (48-52kDa) - are these proteins still tagged with pGB1?

      (3) The MEG-01 cell line has the BCR::ABL1 translocation, while CRT mutations are strictly found in BCR::ABL1 negative MPNs. Could these experiments be repeated in these cells treated with imatinib to decrease these effects, or see if basal MEG-01 Ca2+ levels/activity are changed with or without imatinib?

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The researchers conducted their study using advanced techniques. They found almost no difference in calcium binding between the two proteins and observed no impact on calcium signaling, specifically store-operated calcium entry (SOCE). The study also noted an increase in ER luminal calcium-binding chaperone proteins. Surprisingly, the authors selected flow cytometry as a technique for measurements of ER luminal calcium. Considering the limitations of this approach, it would be better to use alternative approaches.

      The flow cytometric assay shows good responsiveness to conditions expected to alter ER calcium levels (Figure 4C), is high throughput compared to microscopy, and allows for averaging of signals across a large number of cells. This was thus our original method of choice.

      This is particularly important as previous reports, using cells from MPN patients, indicate reduced ER luminal calcium and effects on SOCE (Blood, 2020). This issue matters because earlier research with MPN patient cells reported reduced ER luminal calcium levels and altered SOCE (Blood, 2020). How do the authors explain the difference between their results and previous findings about lower ER luminal calcium and changed SOCE in MPN patient cells expressing CRTDel52?

      We thank the reviewer for asking for these clarifications. The referenced study (Di Buduo et al. Blood, 135(2):133-143, 2020) first showed that thrombopoietin induces spontaneous cytosolic calcium spikes in cultured megakaryocytes, which is dependent on store operated calcium entry (SOCE). In parallel, STIM1-ORAI interactions were induced by thrombopoietin. On the other hand, the addition of thrombopoietin caused the dissociation of STIM1-calreticulin interactions, based on proximity ligation assays. The implication is that signaling via the thrombopoietin receptor (TPOR/MPL) activation induces the dissociation of calreticulin-STIM1 complexes, and the formation of STIM1-ORAI complexes, which contribute to the measured spontaneous cytosolic calcium spikes. Different MPN mutations induced spontaneous calcium spikes in a thrombopoietin-independent manner, including the JAK2V617F mutations and the CALR type I and type II mutations. The study found that the number of megakaryocytes exhibiting spontaneous calcium spikes was enhanced in the context of both type I and type II CALR mutations compared to the JAK2V617F mutant. Correspondingly, the calreticulin-STIM1 interactions/cell were more significantly reduced for type I and type II CALR mutations compared to the JAK2V617F mutant. It was suggested that defective interactions between mutant calreticulin, ERp57, and STIM1 activated SOCE and generated spontaneous cytosolic calcium spikes. However, based on the findings with thrombopoietin, the spontaneous calcium spikes could simply result from thrombopoietin-independent MPL activation by the mutant calreticulin and JAK2V617F and downstream signaling. Importantly, the referenced studies did not directly measure ER luminal calcium. A number of undefined factors could account for the measured differences between the megakaryocytes from patients with calreticulin mutations vs. JAK2V617F. These include the relative mutant allele burdens, the extent of MPL activation, as well as genetic differences unrelated to calreticulin. Different from these experiments, through the use of purified proteins, our studies show that the Del52 mutant has calcium binding characteristics resembling that of the wild type protein. Additionally, through genetic manipulations in cell lines, our studies directly address the effects of calreticulin KO and its Del52 mutation upon ER luminal and cytosolic calcium levels, and cellular SOCE signals. We did not measure significant differences in any of these parameters between the KO cells and those reconstituted with wild type calreticulin or the Del52 mutant. As noted by the editors, these results show that Ca2+ binding by calreticulin and store-operated Ca2+ entry in a cell are not fundamentally impacted by the type I deletion mutation. On the other hand, in primary megakaryocytes, when co-expressed with MPL, the Del52 mutant, through its known ability to bind and activate TPOR/MPL, is expected to induce SOCE and calcium fluxes similar to those induced by thrombopoietin. These points will be clarified in the revised discussion.

      Other studies have found that unfolded protein responses are activated in MPN cells with CRTDel52 calreticulin (see Blood, 2021), and increased UPR could account for higher levels of some ER-resident calcium-binding proteins observed here.

      Multiple studies have suggested the induction of the unfolded protein response (UPR) in cells expressing MPN mutants of calreticulin.  We don’t know the specific signals that cause the upregulation of various calcium binding proteins in calreticulin-KO cells and cells expressing the Del52 mutant. Indeed, these could result from increased protein misfolding in cells with wild type calreticulin deficiency. Alternatively, the sensing of cellular calcium perturbations could induce their expression. Regardless of the precise mechanisms underlying the expression changes in calcium binding proteins, the upregulated factors are predicted to compensate for calreticulin deficiency and contribute to the maintenance of the overall cellular calcium homeostasis. These points will be clarified in the revised discussion.

      Overall, it remains unclear how this work improves our understanding of MPN or clarifies calreticulin's role in MPN pathophysiology.

      The points discussed above as well as their implications for the understanding of calreticulin’s role in MPN pathophysiology will be clarified in the revised manuscript.

      Reviewer #2 (Public review):

      Tagoe and colleagues present a thorough analysis of the calcium (Ca2+) binding capacity of calreticulin (CRT), an endoplasmic reticulum (ER) Ca2+-buffer protein, using a mutant version (CRT del52) found in myeloproliferative neoplasms (MPNs). The authors use purified human CRT protein variants, CRT-KO cell lines, and an MPN cell line to elucidate the differing Ca2+ dynamics, both on the level of the protein and on cell-wide Ca2+-governed processes. In sum, the authors provide new insights into CRT that can be applied to both normal and malignant cell biology.

      First, the authors purify CRT protein and perform isothermal titration calorimetry to quantify the Ca2+ binding capacity of CRT. They use full-length human CRT, CRT del52, and two truncations of CRT (1-339 and 1-351, the former of which should lead to the entire loss of low-affinity Ca2+ binding). While CRT del52 has previously been shown to lead to a decrease in Ca2+ binding affinity in other models, the ITC data show that this is retained in CRT del52.

      Next, the authors utilize a CRT-KO cell line with subsequent addition of CRT protein variants to validate these findings with flow cytometric analysis. Cells were transfected with a ratiometric ER Ca2+ probe, and fluorescence indicates that CRT del52 is unable to restore basal ER Ca2+ levels to the same extent as CRT wild-type. To translate these findings to MPNs, the authors perform CRT-KO in a megakaryocytic cell line, where reconstitution with either CRT variant did not cause a difference in cytosolic calcium levels. The authors further test store-operated calcium entry (SOCE), an important process for maintaining ER Ca2+ levels, in these cells, and find that CRT-KO cells have lower SOCE activity, and that this can be slightly recovered with CRT addition.

      Finally, the authors ask whether other effects of CRT-KO/reconstitution can affect the cellular Ca2+ signaling pathway and levels. RNASeq analysis revealed that CRT-KO leads to an increase in various chaperone protein expressions, and that reconstitution with CRT del52 is unable to reduce expression to the same extent as reconstitution with CRT wildtype.

      Strengths:

      The authors provide new insights into CRT that can be applied to both normal and malignant cell biology.

      We thank the reviewer for the recognition that this study is important for our understanding of both normal and malignant cell biology.

      Weaknesses:

      (1) The authors should consider discussing the high-affinity Ca2+ binding site more in the introduction. Can they show a proof-of-concept experiment that validates that incubation of recombinant CRT reduces the function of that high-affinity Ca2+ binding site?

      In a previous study (Wijeyesakere et al. 2011 J. Biol Chem, 286 8771-8785), we showed that at a starting calcium concentration of 0 mM and with 3.3 mM injections of CaCl<sub>2</sub>, the measured K<sub>D</sub> value was 16.6 mM for calcium binding to wild type murine calreticulin, (which has  ~95% % sequence identity with human calreticulin), corresponding to the high affinity site. On the other hand, at a starting calcium concentration of 50 mM and with 33 mM CaCl<sub>2</sub>  injections, the measured K<sub>D</sub> value for calcium binding to wild type murine calreticulin was 590 mM (corresponding to the low affinity sites). Thus, we did not measure the high affinity sites when the starting calcium concentration was 50 mM. This point will be clarified in the revised manuscript.

      (2) For Figure 2B, do you have an explanation for why the purified proteins run higher than predicted (48-52kDa) - are these proteins still tagged with pGB1?

      Yes, the purified proteins shown in Figure 2B retained a GB1 tag. This point will be clarified in the revised manuscript.

      (3) The MEG-01 cell line has the BCR:ABL1 translocation, while CRT mutations are strictly found in BCR:ABL1 negative MPNs. Could these experiments be repeated in these cells treated with imatinib to decrease these effects, or see if basal MEG-01 Ca2+ levels/activity are changed with or without imatinib?

      Thank you for the important point. We will assess cytosolic calcium levels in MEG-01 cells with or without imatinib.

    1. eLife Assessment

      This important study combines a two-person joint hand-reaching paradigm with game-theoretical modeling to examine whether, and how, one's reflexive visuomotor responses are modulated by a partner's control policy and cost structure. The study provides a solid and novel set of behavioral findings suggesting that involuntary visuomotor feedback is indeed modulated in the context of interpersonal coordination. The work will be of interest to cognitive scientists studying the motoric and social aspects of action control.

    2. Reviewer #1 (Public review):

      Summary:

      Sullivan and colleagues examined the modulation of reflexive visuomotor responses during collaboration between pairs of participants performing a joint reaching movement to a target. In their experiments, the players jointly controlled a cursor that they had to move towards narrow or wide targets. In each experimental block, each participant had a different type of target they had to move the joint cursor to. During the experiment, the authors used lateral perturbation of the cursor to test participants' fast feedback responses to the different target types. The authors suggest participants integrate the target type and related cost of their partner into their own movements, which suggests that visuomotor gains are affected by the partner's task.

      Strengths:

      The topic of the manuscript is very interesting, and the authors are using well-established methodology to test their hypothesis. They combine experimental studies with optimal control models to further support their work. Overall, the manuscript is very timely and shows important findings - that the feedback responses reflect both our and our partner's tasks.

      Weaknesses:

      However, in the current version of the manuscript, I believe the results could also be interpreted differently, which suggests that the authors should provide further support for their hypothesis and conclusions.

      Major Comments:

      (1) Results of the relevant conditions:

      In addition to the authors' explanation regarding the results, it is also possible that the results represent a simple modulation of the reflexive response to a scaled version of cursor movement. That is, when the cursor is partially controlled by a partner, which also contributes to reducing movement error, it can also be interpreted by the sensorimotor system as a scaling of hand-to-cursor movement. In this case, the reflexes are modulated according to a scaling factor (how much do I need to move to bring the cursor to the target). I believe that a single-agent simulation of an OFC model with a scaling factor in the lateral direction can generate the same predictions as those presented by the authors in this study. In other words, maybe the controller has learned about the nature of the perturbation in each specific context, that in some conditions I need to control strongly, whereas in others I do not (without having any model of the partner). I suggest that the authors demonstrate how they can distinguish their interpretation of the results from other explanations.

      (2) The effect of the partner target:

      The authors presented both self and partner targets together. While the effect of each target type, presented separately, is known, it is unclear how presenting both simultaneously affects individual response. That is, does a small target with a background of the wide target affect the reflexive response in the case of a single participant moving? The results of Experiment 2, comparing the case of partner- and self-relevant targets versus partner-irrelevant and self-relevant targets, may suggest that the system acted based on the relevant target, regardless of the presence and instructions regarding the self-target.

      (3) Experiment instructions:

      It is unclear what the general instructions were for the participants and whether the instructions provided set the proposed weighted cost, which could be altered with different instructions.

      (4) Some work has shown that the gain of visuomotor feedback responses reflects the time to target and that this is updated online after a perturbation (Cesonis & Franklin, 2020, eNeuro; Cesonis and Franklin, 2021, NBDT; also related to Crevecoeur et al., 2013, J Neurophysiol). These models would predict different feedback gains depending on the distance remaining to the target for the participant and the time to correct for the jump, which is directly affected by the small or large targets. Could this time be used to target instead of explaining the results? I don't believe that this is the case, but the authors should try to rule out other interpretations. This is maybe a minor point, but perhaps more important is the location (& time remaining) for each participant at the time of the jump. It appears from the figures that this might be affected by the condition (given the change in movement lengths - see Figure 3 B & C). If this is the case, then could some of the feedback gain be related to these parameters and not the model of the partner, as suggested? Some evidence to rule this out would be a good addition to the paper - perhaps the distance of each partner at the time of the perturbation, for example. In addition, please analyze the synchrony of the two partners' movements.

    3. Reviewer #2 (Public review):

      Summary:

      Sullivan and colleagues studied the fast, involuntary, sensorimotor feedback control in interpersonal coordination. Using a cleverly designed joint-reaching experiment that separately manipulated the accuracy demands for a pair of participants, they demonstrated that the rapid visuomotor feedback response of a human participant to a sudden visual perturbation is modulated by his/her partner's control policy and cost. The behavioral results are well-matched with the predictions of the optimal feedback control framework implemented with the dynamic game theory model. Overall, the study provides an important and novel set of results on the fast, involuntary feedback response in human motor control, in the context of interpersonal coordination.

      Review:

      Sullivan and colleagues investigated whether fast, involuntary sensorimotor feedback control is modulated by the partner's state (e.g., cost and control policy) during interpersonal coordination. They asked a pair of participants to make a reaching movement to control a cursor and hit a target, where the cursor's position was a combination of each participant's hand position. To examine fast visuomotor feedback response, the authors applied a sudden shift in either the cursor (experiment 1) or the target (experiment 2) position in the middle of movement. To test the involvement of partner's information in the feedback response, they independently manipulated the accuracy demand for each participant by varying the lateral length of the target (i.e., a wider/narrower target has a lower/higher demand for correction when movement is perturbed). Because participants could also see their partner's target, they could theoretically take this information (e.g., whether their partner would correct, whether their correction would help their partner, etc.) into account when responding to the sudden visual shift. Computationally, the task structure can be handled using dynamic game theory, and the partner's feedback control policy and cost function are integrated into the optimal feedback control framework. As predicted by the model, the authors demonstrated that the rapid visuomotor feedback response to a sudden visual perturbation is modulated by the partner's control policy and cost. When their partner's target was narrow, they made rapid feedback corrections even when their own target was wide (no need for correction), suggesting integration of their partner's cost function. Similarly, they made corrections to a lesser degree when both targets were narrower than when the partner's target was wider, suggesting that the feedback correction takes the partner's correction (i.e., feedback control policy) into account.

      The strength of the current paper lies in the combination of clever behavioral experiments that independently manipulate each participant's accuracy demand and a sophisticated computational approach that integrates optimal feedback control and dynamic game theory. Both the experimental design and data analysis sound good. While the main claim is well-supported by the results, the only current weakness is the lack of discussion of limitations and an alternative explanation. Adding these points will further strengthen the paper.

    1. eLife Assessment

      This important study addresses a classic debate in visual processing, using a strong method applied to a rare clinical population to evaluate hierarchical models of visual object perception. The paper finds only partial support for the hierarchical model: as expected, neural responses in ventral visual cortex show increased representational selectivity for faces along the posterior-anterior axes, but the onsets of the signals do not show a temporal hierarchy, indicating more parallel processing. The iEEG dataset is impressive, but the evidence for lack of temporal hierarchy is incomplete: essential quality checks need to be performed, and statistical analyses adapted to ensure that the data and analyses would be able to reveal temporal hierarchy if it were present in the data.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript aims to test the idea that visual recognition (of faces) is hierarchically organized in the human ventral occipital-temporal cortex (VOTC). The paper proposes that if VOTC has a hierarchical organization, this should be seen in two independent features of the VOTC signal. First, hierarchy assumes that signals along the hierarchy increase in representational complexity. Second, hierarchy assumes a progressive increase in the onset time of the earliest neural response at each level of the hierarchy. To test these predictions, the authors extract high-frequency broadband signals from iEEG electrodes in a very large sample of patients (N=140). They find that face selectivity in these signals is distributed across the VOTC with increasing posterior-anterior face selectivity, hence providing evidence for the first prediction. However, they also find broadband activity to occur concurrently, therefore challenging the view of a serial hierarchy.

      Strengths:

      (1) The hypothesis (that VOTC is hierarchically organized) and predictions (that hierarchy predicts increases in representational complexity and increases in onset time) were clearly described.

      (2) The number of subjects sampled (140) is extremely large for iEEG studies that typically involve <10 subjects. Also, 444 face selective recording contacts provide a very nice sampling of the areas of interest.

      Weaknesses:

      (1) A control analysis where areas have known differences in response onset should be performed to increase confidence that the proposed analyses would reveal expected results when a difference in response onset was present across areas. From Figure 3, it can be seen that many electrodes are placed in earlier visual areas (V1-V3) that have previously been shown to have earlier broadband responses to visual images compared to VOTC (e.g. Martin et al., 2019, JNeurosci https://doi.org/10.1523/JNEUROSCI.1889-18.2018). The same analyses as in Figures 4 and 5 should be used comparing VOTC to early visual areas to confirm that the analyses would detect that V1-V3 have earlier onsets compared to VOTC.

      (2) It is unclear why correlating mean timeseries helps understand how much variance is shared between regions (Figure 4). Any variance between images is lost when averaging time series across all images, and this metric thus overestimates the variance shared between areas. Moreover, the finding that correlating time domain signals across VOTC areas does not differ from correlating signals within an area could be driven by this averaging. For example, if the same analysis was done on electrodes in left and right V1 when half of the images had contrast in the left hemifield and the other half had contrast in the right hemifield, the average signals may correlate extremely well, while this correlation falls apart on a trial-by-trial basis. These analyses therefore need to be evaluated on a trial-by-trial basis.

      (3) Previous studies on visual processing in VOTC have shown that evoked potentials are more predictive of the onset of visual stimuli than broadband activity (e.g. Miller et al., 2016, PLOS CB, https://doi.org/10.1371/journal.pcbi.1004660). Testing the prediction from a hierarchical representation that signals along the VOTC increase in onset time should therefore include an evaluation of evoked potential onsets in addition to broadband signals.

      (4) Testing the second prediction, that the onset time of processing increases along the VOTC posterior to anterior path, is difficult using the iEEG broadband signal, because from a signal processing perspective, broadband signals are inherently temporally inaccurate, given that they are filtered. Any filtering in the signal introduces a certain level of temporal smoothing. The manuscript should clearly describe the level of temporal smoothing for the filter settings used.

      (5) The onsets of neural activity in VOTC are surprisingly early: around 80-100 ms. This is earlier than what has previously been reported. For example, the cited Quian Quiroga et al. (2023) found single neuron responses to have the earlier onset around 125 ms (their Figure 3). Similarly, the cited Jacques et al., 2016b and Kadipasaoglu et al., 2017 papers also observe broadband onsets in VOTC after 100 ms. Understanding the temporal smoothing in the broadband signal, as well as showing that typical evoked potentials have latencies compared to other work, would increase confidence that latencies are not underestimated due to factors in the analysis pipeline.

      (6) Understanding the extent to which neural processing in the VOTC is hierarchical is essential for building models of vision that capture processing in the human brain, and the data provides novel insight into these processes.

      For additional context, a schematic figure of the hierarchical view and a more parallel system described in the paragraph on models of visual recognition (lines 553) would help the reader interpret and understand the implications of the paper.

    3. Reviewer #2 (Public review):

      Summary:

      This very ambitious project addresses one of the core questions in visual processing related to the underlying anatomical and functional architecture. Using a large sample of rare and high-quality EEG recordings in humans, the authors assess whether face-selectivity is organised along a posterior-anterior gradient, with selectivity and timing increasing from posterior to anterior regions. The evidence suggests that it is the case for selectivity, but the data are more mixed about the temporal organisation, which the authors use to conclude that the classic temporal hierarchy described in textbooks might be questioned, at least when it comes to face processing.

      Strengths:

      A huge amount of work went into collecting this highly valuable dataset of rare intracranial EEG recordings in humans. The data alone are valuable, assuming they are shared in an easily accessible and documented format. Currently, the OSF repository linked in the article is empty, so no assessment of the data can be made. The topic is important, and a key question in the field is addressed. The EEG methodology is strong, relying on a well-established and high SNR SSVEP method. The method is particularly well-suited to clinical populations, leading to interpretable data in a few minutes of recordings. The authors have attempted to quantify the data in many different ways and provided various estimates of selectivity and timing, with matching measures of uncertainty. Non-parametric confidence intervals and comparisons are provided. Collectively, the various analyses and rich illustrations provide superficially convincing evidence in favour of the conclusions.

      Weaknesses:

      (1) The work was not pre-registered, and there is no sample size justification, whether for participants or trials/sequences. So a statistical reviewer should assess the sensitivity of the analyses to different approaches.

      (2) Frequentist NHST is used to claim lack of effects, which is inappropriate, see for instance:

      Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. European Journal of Epidemiology, 31(4), 337-350. https://doi.org/10.1007/s10654-016-0149-3

      Rouder, J. N., Morey, R. D., Verhagen, J., Province, J. M., & Wagenmakers, E.-J. (2016). Is There a Free Lunch in Inference? Topics in Cognitive Science, 8(3), 520-547. https://doi.org/10.1111/tops.12214

      (3) In the frequentist realm, demonstrating similar effects between groups requires equivalence testing, with bounds (minimum effect sizes of interest) that should be pre-registered:

      Campbell, H., & Gustafson, P. (2024). The Bayes factor, HDI-ROPE, and frequentist equivalence tests can all be reverse engineered-Almost exactly-From one another: Reply to Linde et al. (2021). Psychological Methods, 29(3), 613-623. https://doi.org/10.1037/met0000507

      Riesthuis, P. (2024). Simulation-Based Power Analyses for the Smallest Effect Size of Interest: A Confidence-Interval Approach for Minimum-Effect and Equivalence Testing. Advances in Methods and Practices in Psychological Science, 7(2), 25152459241240722. https://doi.org/10.1177/25152459241240722

      (4) The lack of consideration for sample sizes, the lack of pre-registration, and the lack of a method to support the null (a cornerstone of this project to demonstrate equivalence onsets between areas), suggest that the work is exploratory. This is a strength: we need rich datasets to explore, test tools and generate new hypotheses. I strongly recommend embracing the exploration philosophy, and removing all inferential statistics: instead, provide even more detailed graphical representations (include onset distributions) and share the data immediately with all the pre-processing and analysis code.

      (5) Even if the work was pre-registered, it would be very difficult to calculate p-values conditional on all the uncertainty around the number of participants, the number of contacts and the number of trials, as they are random variables, and sampling distributions of key inferences should be integrated over these unknown sources of variability. The difficulty of calculating/interpreting p-values that are conditional on so many pre-processing stages and sources of uncertainty is traditionally swept under the rug, but nevertheless well documented:

      Kruschke, J.K. (2013) Bayesian estimation supersedes the t test. J Exp Psychol Gen, 142, 573-603. https://pubmed.ncbi.nlm.nih.gov/22774788/

      Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14(5), 779-804. https://doi.org/10.3758/BF03194105<br /> https://link.springer.com/article/10.3758/BF03194105

      (6) Currently, there is no convincing evidence in the article to clearly support the main claims.

      Bootstrap confidence intervals were used to provide measures of uncertainty. However, the bootstrapping did not take the structure of the data into account, collapsing across important dependencies in that nested structure: participants > hemispheres > contacts > conditions > trials.

      Ignoring data dependencies and the uncertainty from trials could lead to a distorted CI. Sampling contacts with replacement is inappropriate because it breaks the structure of the data, mixing degrees of freedom across different levels of analysis. The key rule of the bootstrap is to follow the data acquisition process, and therefore, sampling participants with replacement should come first. In a hierarchical bootstrap, the process can be repeated at nested levels, so that for each resampled participant, then contacts are resampled (if treated as a random variable), then trials/sequences are resampled, keeping paired measurements together (hemispheres, and typically contacts in a standard EEG experiment with fixed montage). The same hierarchical resampling should be applied to all measurements and inferences to capture all sources of variability. Selectivity and timing should be quantified at each contact after resampling of trials/sequences before integrating across hemispheres and participants using appropriate and justified summary measures.

      The authors already recognise part of the problem, as they provide within-participant analyses. This is a very good step, inasmuch as it addresses the issue of mixing-up degrees of freedom across levels, but unfortunately these analyses are plagued with small sample sizes, making claims about the lack of differences even more problematic--classic lack of evidence == evidence of absence fallacy. In addition, there seem to be discrepancies between the mean and CI in some cases: 15 [-20, 20]; 8 [-24, 24].

      (7) Three other issues related to onsets:

      (a) FDR correction typically doesn't allow localisation claims, similarly to cluster inferences:

      Winkler, A. M., Taylor, P. A., Nichols, T. E., & Rorden, C. (2024). False Discovery Rate and Localizing Power (No. arXiv:2401.03554). arXiv. https://doi.org/10.48550/arXiv.2401.03554

      Rousselet, G. A. (2025). Using cluster-based permutation tests to estimate MEG/EEG onsets: How bad is it? European Journal of Neuroscience, 61(1), e16618. https://doi.org/10.1111/ejn.16618

      (b) Percentile bootstrap confidence intervals are inaccurate when applied to means. Alternatively, use a bootstrap-t method, or use the pb in conjunction with a robust measure of central tendency, such as a trimmed mean.

      Rousselet, G. A., Pernet, C. R., & Wilcox, R. R. (2021). The Percentile Bootstrap: A Primer With Step-by-Step Instructions in R. Advances in Methods and Practices in Psychological Science, 4(1), 2515245920911881. https://doi.org/10.1177/2515245920911881

      (c) Defining onsets based on an arbitrary "at least 30 ms" rule is not recommended:

      Piai, V., Dahlslätt, K., & Maris, E. (2015). Statistically comparing EEG/MEG waveforms through successive significant univariate tests: How bad can it be? Psychophysiology, 52(3), 440-443. https://doi.org/10.1111/psyp.12335

      (8) Figure 5 and matching analyses: There are much better tools than correlations to estimate connectivity and directionality. See for instance:

      Ince, R. A. A., Giordano, B. L., Kayser, C., Rousselet, G. A., Gross, J., & Schyns, P. G. (2017). A statistical framework for neuroimaging data analysis based on mutual information estimated via a Gaussian copula. Human Brain Mapping, 38(3), 1541-1573. https://doi.org/10.1002/hbm.23471

      (9) Pearson correlation is sensitive to other features of the data than an association, and is maximally sensitive to linear associations. Interpretation is difficult without seeing matching scatterplots and getting confirmation from alternative robust methods.

    1. eLife Assessment

      This valuable study provides a practical computational framework for inferring latent neural states directly from calcium fluorescence recordings, bypassing the traditional need for a separate spike deconvolution step. The evidence supporting the method is solid, featuring rigorous validation across multiple latent variable model families (including HMM, GPFA, and LFADS) using both simulated and experimental data. However, the assessment of the method's generality would be further strengthened by application to a broader range of experimental datasets, such as recordings from different brain regions or using different calcium indicators.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors elegantly combined latent variable models (i.e., HMM, GPFA and dynamical system models) with a calcium imaging observation model (i.e., latent Poisson spiking and autoregressive calcium dynamics (AR)).

      Strengths:

      Integrating a calcium observation model into existing latent variable models improves significantly the inference of latent neural states compared to existing approaches such as spike deconvolution or Gaussian assumptions.<br /> The authors also provide an open-source access to their method for direct application to calcium imaging data analysis.

      Weaknesses:

      As acknowledged by the authors, their method is dependent on the quality of calcium trace extraction from fluorescence videos. It should be noted that this limitation applies to alternative strategies.

      While the contribution of this study should prove useful for researchers using calcium imaging, the novelty is limited, as it consists of an integration of the calcium imaging model from Ganmor et al. 2016 with existing LVM frameworks.

    3. Reviewer #2 (Public review):

      Summary:

      This compelling study proposes a framework to implement latent variable models using population level calcium imaging data. The study incorporates autoregressive dynamics and latent Poisson spiking to improve inference of latent states across different model classes including HMMs, Gaussian Process Factor Analysis and nonlinear dynamical systems models. This approach allows for a more seamless integration of existing methods typically used with spiking data to apply on calcium imaging data. The authors test the model on piriform cortex recordings as well as a biophysical simulator to validate their methods. This approach promises to have wide usability for neuroscientists using large population level calcium imaging.

      Strengths:

      The strengths of this study are the flexibility in the choice of models and relatively easy adaptation to user-specific use cases.

      Weaknesses:

      The weakness of the study lies in its limited validation of biological calcium imaging data. Calcium dynamics in a task-specific context in a sensory brain region might be very different from slower dynamics in a region of integration. The biophysical properties of the data would also be dependent on the SNR of the imaging platform and the generation of calcium indicator being used.

    4. Reviewer #3 (Public review):

      Summary:

      S. Keeley & collaborators propose a computational approach to infer time-varying latent variables directly from calcium traces (for instance, obtained with 2p imaging) without the need for deconvolving the traces into spike trains in a preliminary, independent step. Their approach rests on 1 of 3 families of latent models: GPFA, HMM and dynamical systems - which they augment with an observation model that maps latent variables to fluorescence traces. They validate their approach on simulated and real data, showing that the approach improves latent variable inference and model fitting, compared to more traditional approaches (although not directly compared with the 2-step one; see below). They provide a GitHub repository with code to fit their models (which I have not tested).

      Strengths:

      The approach is sound and well-motivated. The authors are specialists in latent variable models. The manuscript is succinct, well-written, and the figures are clear. I particularly liked the diversity of latent models considered, in particular latent models with continuous (GPFA) vs. discrete (HMM) dynamics, which are useful for characterizing different types of neural computations. The validation on both simulated and real data is convincing.

      Weaknesses:

      The main weakness that I see is that the approach is tested only on a single real dataset (odor response dataset). The other model fits are obtained from simulated data. While the results are convincing, it would be useful to see the approach tested on other datasets, for instance, datasets with different brain areas, different behavioral conditions, or different calcium indicators. This would help assess the generality of the approach and its robustness to different experimental conditions.

      The other points below mostly pertain to clarifications and possible extensions of the approach, and to simple model recovery experiments that would help quantify the advantage of the proposed approach over more traditional ones.

      I have a question related to interpretability and diagnosis of model fits. One advantage of the two-step approach: (1) deconvolution => (2) latent variance inference, is that one can inspect the quality of the deconvolution step independently from the latent variable inference step. In the proposed approach, it seems more difficult to diagnose potential problems with model fitting. For instance, if the inferred latent variables are not interpretable, how can one determine whether this is due to a poor choice of latent model (e.g., HMM with too few states), or a poor fit of the observation model (e.g., wrong parameters for the calcium dynamics)? Are there any diagnostic tools that could help identify potential problems with model fitting?

      Could the authors comment on whether their approach allows for instance to compare different forms of latent models (e.g., HMM vs. GPFA) in terms of model evidence, cross-validated log-likelihood or other model comparison metrics? This would be useful to quantitatively determine which type of latent dynamics is more appropriate for a given dataset.

      The HMM part reveals a pretty large number of states, with one state being interpretable (evoked response). Shouldn't we expect a simpler scenario, with 2 states? I know this is a difficult question that is more general and common with HMM approaches, but it would be useful to discuss this point. For instance, would a hierarchical HMM (with a smaller number of "super-states") be more appropriate here?

      While it certainly makes sense that models accounting for the full transformation of latent => spikes => fluorescence data should outperform the two-step (1) deconvolution => (2) latent variance inference approach, the amount of improvement is not clear. A direct comparison (e.g., w/ parameter & model recovery metrics) between the two approaches on simulated data would be useful to quantify the advantage of the proposed approach over more traditional ones.

      It would be useful to discuss the possible extension of the approach to other types of data that are related to neural activity but have different observation models, e.g., voltage imaging, or neuromodulator sensors (e.g., GRAB-NE, dLight, etc). Do the authors see any specific challenges that would arise in these cases and that would need to be addressed in the future (other than changing the Poisson spiking part)?

    1. eLife Assessment

      Insects can act as vectors of plant diseases, hence the study of insect-pathogen interactions is relevant for agriculture. This important study identifies in Diaphorina citri a dopamine receptor responsive to 'Candidatus Liberibacter asiaticus' infection, demonstrate direct regulation of this receptor by a microRNA, and integrate dopamine signaling into an established insect reproductive hormone framework. Multiple complementary experimental approaches convincingly support the findings, but key conclusions rely on correlative data and the mechanistic evidence for the proposed linear signaling cascade is incomplete. This work will be of interest for insect physiology and vector-pathogen biology, and more broadly for citrus agriculture.

    2. Reviewer #1 (Public review):

      I read this paper with great interest based on my experience in insect sciences. I have some minor comments (and recommendations) that I believe the authors should address.

      (1) The paper has an original biological question that is overly broad and mechanistically ambitious. The central biological question, namely how CLas infection enhances fecundity of Diaphorina citri via dopamine signaling, is clearly stated and well motivated by previous literature. However, my advice to the authors is that, while the general question is clear, the manuscript attempts to answer multiple mechanistic layers simultaneously. As a result, I feel that the biological narrative becomes diffuse, especially in later sections where DA, miRNA regulation, AKH signaling, and JH signaling are all proposed as parts of a single linear cascade. In summary, my key concern is that the paper often moves from correlation to causal hierarchy without fully disentangling whether these pathways act sequentially, in parallel, or redundantly. A more explicitly framed primary hypothesis (e.g., "DA-DcDop2 is necessary and sufficient for CLas-induced fecundity") may improve conceptual clarity.

      (2) On the novelty of the data, I feel they are moderately novel, with substantial confirmatory components. If I am correct, the novel contributions include the identification of DcDop2 as the DA receptor responsive to CLas infection in D. citri, the discovery that miR-31a directly targets DcDop2, which is supported by luciferase assays and RIP, and thirdly, the integration of dopamine signaling into the already-described CLas-AKH-JH-fecundity framework. My advice to the authors is to focus more on the manuscript's novelty, which lies more in pathway integration than in discovering fundamentally new biological phenomena. This is appropriate for a mechanistic paper, but should be framed as an extension of existing models rather than a paradigm shift.

      (3) On the conclusions, I recommend that the authors modify their statements a little. I feel that there are some overstated or insufficiently supported claims. For instance, the assertion that CLas "hijacks" the DA-DcDop2-miR-31a-AKH-JH cascade implies direct pathogen manipulation, but no CLas-derived effector or mechanism is identified. Also that the model suggests a linear signaling hierarchy, but the data largely show correlation and partial dependency rather than strict epistasis. In third, the term "mutualistic interaction" may be too strong, as host fitness costs outside fecundity (e.g., longevity, immunity) are not evaluated. In conclusion, I confirm that the data support a functional association, but mechanistic causality and evolutionary interpretation are somewhat overstated.

    3. Reviewer #2 (Public review):

      Summary:

      Nian and colleagues comprehensively apply metabolomics, molecular, and genetic approaches to demonstrate that CLas hijacks the DA/DcDop2-miR-31a-AKH-JH signaling cascade to enhance lipid metabolism and fecundity in D. citri, while concurrently promoting its own replication.

      Strengths:

      These findings provide solid evidence of a mutualistic interaction between CLas proliferation and ovarian development in the insect host. This insight significantly advances our understanding of the molecular interplay between plant pathogens and vector insects, and offers novel targets and strategies for HLB field management.

      Weaknesses:

      While the article investigates the involvement of dopamine signaling and specific microRNAs in enhancing fecundity and pathogen proliferation, it still needs to provide a detailed mechanistic understanding of these interactions. The precise molecular pathways and feedback mechanisms by which CLas manipulates dopamine signaling in Diaphorina citri remain unclear.

    1. eLife Assessment

      The authors address a hard question and propose a pipeline for using Large Language Models to reconstruct signalling networks as well as to benchmark future models. The findings are valuable for a defined subfield, as the proposed framework allows for assessing such approaches systematically. The overall support is solid, although the present evaluation remains limited in scope and would benefit from a wider range of networks and performance metrics.

    2. Reviewer #1 (Public review):

      Summary:

      Large language models (LLMs) have been developed rapidly in recent years and are already contributing to progress across scientific fields. The manuscript tries to address a specific question: whether LLMs can accurately infer signaling networks from gene lists. However, the evaluation is inadequate due to four major weaknesses described below. Despite these limitations, the authors conclude that current general-purpose LLMs lack adequate accuracy, which is already widely recognized. Its key contribution should instead be to provide concrete recommendations for the development of specialized LLMs for this task, which is completely absent. Developing such specific LLMs would be highly valuable, as they could substantially reduce the time required by researchers to analyze signaling networks.

      Strengths:

      The manuscript raises a good question: whether current LLMs can accurately generate signaling networks from gene lists.

      Weaknesses:

      (1) The authors evaluate LLM performance using only three signaling networks: "hypertrophy", "fibroblast", and "mechanosignaling". Given the large number of well-established signaling pathways available, this is not a comprehensive assessment. Moreover, the analysis need not be restricted to signaling networks. Other network types, including metabolic and transcriptional regulatory networks, are already accessible in well-known databases such as KEGG, Reactome, BioCyc, WikiPathways, and Pathway Commons. Including these additional networks would substantially strengthen the evaluation.

      (2) In LLM evaluation, the authors use the gene lists that exactly match those in their "ground truth" networks, thereby fixing the set of nodes and evaluating only the predicted edges. However, in practical research, the relevant genes or nodes are not fully known. A more realistic assessment would therefore include gene lists with both genes present in the ground-truth network and additional genes absent from it, to evaluate the ability of the LLM to exclude irrelevant genes.

      (3) The authors report only the recall/sensitivity of the LLM, without assessing specificity. In practical applications, if an LLM generates a large number of incorrect interactions that greatly exceed the correct ones, researchers may be misled or may lose confidence in the LLM output. Therefore, a comprehensive evaluation must include both sensitivity and specificity. Furthermore, it would be informative to check whether some of the "false positives" might in fact represent biologically plausible interactions that are absent from the manually curated "ground truth". Manually generated "ground truth" can overlook genuine interactions, and the ability of LLMs to recover such missing edges could be particularly valuable. This may even represent one of the most important potential contributions of LLMs.

      (4) It is widely known that applying differential equation models to highly complex biological networks, such as the three networks in the manuscript, is meaningless, because these systems involve a large number of parameters whose values can drastically alter the results. As Richard Feynman once said: "with four parameters I can fit an elephant, and with five I can make him wiggle his trunk." Thus, the evaluation of LLMs on "logic-based differential equation models" does not make much sense.

    3. Reviewer #2 (Public review):

      Summary:

      The authors evaluate whether commonly used LLMs (ChatGPT, Claude and Gemini) can reconstruct signalling networks and predict effects of network perturbations, and propose a pipeline for benchmarking future models. Across three phenotypes (hypertrophy, fibroblast signalling, and mechanosignalling), LLMs capture upstream ligand-receptor interactions and conserved crosstalk but fail to recover downstream transcriptional programmes. Logic-based simulations show that LLM-derived networks underperform compared to manually curated models. The authors also propose that their pipeline can be used for benchmarking future models aimed at reconstructing signalling networks.

      Strength:

      The authors compare the outcomes from three LLMs with three manually curated and validated models. Additionally, they have investigated gene network reconstruction in the context of three distinct phenotypes. Using logic-based modelling, the authors assessed how LLM-derived networks predict perturbation effects, providing functional validation beyond network overlap.

      Weaknesses:

      The authors have used legacy models for all three LLMs, and the study would benefit from testing the current versions of the LLMs (ChatGPT 5.2, Claude 4.5 and Gemini 2.5). Additional metrics such as node coverage, node invention, direction accuracy and sign accuracy would be useful to make robust comparisons across models.

    1. eLife Assessment

      Important findings from this study include clear evidence of the impact of methylphenidate on cognitive control over Pavlovian biasing of actions and decision-making in humans, in a manner dependent on baseline working memory capacity. The design used drug dosing after learning, allowing a compelling test of the influence of catecholamines on decision processes independent of learning. The task is very well designed, using a combination of aversive and appetitive Pavlovian to Instrumental Transfer, and the in-depth behavioural analysis extends the authors' previous work, which will be of interest to those working in cognitive psychopharmacology. The results challenge the view that catecholamines operate by modulating behavioural invigoration alone.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use methylphenidate (MPH) administration after learning a Pavlovian-to-instrumental transfer (PIT) task to parse decision making from instrumental influences. While the main pharmacological effects were null, individual differences in working memory ability moderated the tendency of MPH to boost cognitive control in order to override PIT-biased instrumental learning. Importantly, this working memory moderator had symmetrical effects in appetite and aversive conditions, and these patterns replicated within each valence condition across different values of gain/loss (Fig S1c), suggesting a reliable effect that is generalized across instances of Pavlovian influence.

      Strengths:

      The idea of using pharmacological challenge after learning but prior to transfer is a novel technique that highlights the influence of catecholamines on the expression of learning under Pavlovian bias, and importantly it dissociates this decision feature from the learning of stimulus-outcome or action-outcome pairings.

      Comments on revisions:

      I have no further recommendations or concerns.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Geurts et al. investigated the effects of the catecholamine reuptake inhibitor methylphenidate (MPH) on value-based decision making using a combination of aversive and appetitive Pavlovian to Instrumental Transfer (PIT) in a human cohort. Using an elegant behavioural design they showed a valence- and action-specific effects of Pavlovian cues on instrumental responses. Initial analyses showed no effect of MPH on these processes. However the authors performed a more in-depth analysis and demonstrated that MPH actually modulates PIT in action-specific manner, depending on individual working memory capacities. The authors interpret that as an effect on cognitive control of Pavlovian biasing of actions and decision-making more than an invigoration of motivational biases.

      Strengths:

      A major strength a this study is its experimental design. The elegant combination of appetitive and aversive Pavlovian learning with approach/avoidance instrumental actions allows the authors to precisely investigate the differential modulation of value-based decision making, depending on the context and environmental stimuli. Importantly, MPH was only administered after Pavlovian and instrumental learning, restricting the effect to PIT performance only. Finally, the use of a placebo-controlled crossover design allows within-comparisons between the PIT effect under placebo and MPH and the investigation of the relationships between working memory abilities, PIT and MPH effects.

      Weaknesses:

      Previous weaknesses regarding the neurobiological circuits underlying such effects and the possible role of dopamine vs noradrenaline have been clearly discussed in the new version of the manuscript.

      Comments on revisions:

      The authors answered my previous points. The changes to the manuscript clearly improve the clarity of the results and the strength of the study.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors use methylphenidate (MPH) administration after learning a Pavlovian to instrumental transfer (PIT) task to parse decision making from instrumental influences. While the main effects were null, individual differences in working memory ability moderated the tendency of MPH to boost cognitive control in order to override PIT-biased instrumental learning. Importantly, this working memory moderator had symmetrical effects in appetite and aversive conditions, and these patterns replicated within each valence condition across different values of gain/loss (Fig S1c), suggesting a reliable effect that is generalized across instances of Pavlovian influence.

      Strengths:

      The idea of using pharmacological challenge after learning but prior to transfer is a novel technique that highlights the influence of catecholamines on the expression of learning under Pavlovian bias, and importantly it dissociated this decision feature from the learning of stimulus-outcome or action-outcome pairings.

      We thank the reviewer for highlighting the timing of the pharmacological intervention as a strength for this study and for the suggested improvements for clarification.

      Weaknesses:

      While the report is largely straightforward and clearly written, some aspects may be edited to improve the clarity for other readers.

      (1) Theoretical clarity. The authors seem to hedge their bets when it comes to placing these findings within a broader theoretical framework.

      Our findings ask for a revision of theories on how catecholamines are involved in instantiation of Pavlovian biases in decision making. The reviewer rightly notices that we offer three routes to modify current theory to be able to incorporate our findings. Briefly, these routes discuss catecholaminergic modulation of Pavlovian biases (i) through modulation of the putative striatal ‘origin’ of Pavlovian biases, (ii) through top-down control, primarily relying on prefrontal processes, and (ii) a combination of the two, where catecholamines regulate the balance between these striatal and frontal processes.

      Given the systemic nature of the pharmacological manipulation, we cannot dissociate between these three accounts. We believe that discussing these possible explanations enriches our Discussion and strengthens our recommendation in the ultimate paragraph to use pharmacological neuroimaging studies to arbitrate between these options. In the revision, we have made this line of reasoning more clear, in part by adding guiding titles to the Discussion section and adding a summary paragraph in the Discussion (Discussion, page 9-12).

      (2) Analytic clarity: what's c^2?

      C^2 seems a technical pdf conversion error problem: all chi-squares (Χ2) have been converted to C2. This is now corrected in our revision.

      Reviewer #2 (Public review):

      Summary:

      In this study, Geurts et al. investigated the effects of the catecholamine reuptake inhibitor methylphenidate (MPH) on value-based decision making using a combination of aversive and appetitive Pavlovian to Instrumental Transfer (PIT) in a human cohort. Using an elegant behavioural design they showed a valence- and action-specific effects of Pavlovian cues on instrumental responses. Initial analyses show no effect of MPH on these processes. However the authors performed a more in-depth analysis and demonstrated that MPH actually modulates PIT in actionspecific manner depending of individual working memory capacities. The authors interpret that as an effect on cognitive control of Pavlovian biasing of actions and decision making more than an invigoration of motivational biases.

      Strengths:

      A major strength of this study is its experimental design. The elegant combination of appetitive and aversive Pavlovian learning with approach/avoidance instrumental actions allows to precisely investigate the different modulation of value-based decision making depending on the context and environmental stimuli. Important MPH is only administered after Pavlovian and instrumental learning, restricting the effect on PIT performance only. Finally, the use of a placeboontrolled crossover design allows within-comparisons between PIT effect under placebo and MPH and the investigation of the relationships between working memory abilities, PIT and MPH effects.

      We thank the reviewer for highlighting the experimental design as a strength for this study and the suggested improvements for clarification.

      Weaknesses:

      As authors stated in their discussion, this study is purely correlational and their conclusions could be strengthened by the addition of interesting (but time- and resource-consuming) neuroimaging work.

      We employ a pharmacological intervention within a randomized placebo controlled cross-over design, which allows for causal inferences with respect to the placebo-controlled intervention. Thus, the reported interactions of interest include correlations, but these are causally dependent on our intervention.

      Perhaps the reviewer refers to the implications of our findings for hypotheses regarding neural implementation of Pavlovian bias-generation. Indeed, based on our data we are not able to arbitrate between frontal and striatal accounts, due to the systemic nature of the pharmacological intervention. Thus, we agree with the reviewer that neuroimaging (in combination with for example brain stimulation) would be a valuable next step to identify the neural correlates to these pharmacological intervention effects, to dissociate between frontal and striatal basis of the effects. In the revision, as per our reply to reviewer 1, we have made this line of reasoning more clear, in part by adding guiding titles to the Discussion section and adding a summary paragraph in the Discussion (Discussion, page 9-12).

      The originality of this work compared to their previous published work using the same cohort could also be clarified at different stages of the article, as I initially wondered what was really novel. This point is much clearer in the discussion section.

      As recommended, we brought forward parts of the Discussion that clarify the originality of the current experiment to the introduction (page 4/5) and result section (page 8).

      A point which, in my opinion, really requires clarification is when the working memory performance presented in Figure 2B has been determined. Was it under placebo (as I would guess) or under MPH? If it is the former, it would be also interesting to look at how MPH modulates working memory based on initial abilities.

      We now clarified that working memory span was assessed for all participants on Day 2 prior to the start of instrumental training (as illustrated in figure 1A). Importantly, this was done prior to ingestion of the drug or placebo (which subjects received after Pavlovian training, which followed the instrumental training). This design also precludes an assessment of the effects of MPH on working memory capacity.

      A final point is that it could be interesting to also discuss these results, not only regarding dopamine signalling, but also including potential effect of MPH on noradrenaline in frontal regions, considering the known role of this system in modulating behavioural flexibility.

      We indeed focus our Discussion more on dopamine than on noradrenaline. Our revision now also discusses noradrenaline in light of our frontal control hypothesis and the recommendation, in future studies, to use a multi-drug design, incorporating, for example, a session with the drug atomoxetine, which modulates cortical catecholamines, but not striatal dopamine (Discussion, page 12).

      Reviewer #3 (Public review):

      The manuscript by Geurts and colleagues studies the effects of methylphenidate on Pavlovian to instrumental transfer in humans and demonstrates that the effects of the drug depend on the baseline working memory capacity of the participants. The experiment used a well established cognitive task that allows to measure the effects of Pavlovian cues predicting monetary wins and losses on instrumental responding in two different contexts, namely approach and withdraw. By administering the drug after participants went through the instrumental and Pavlovian learning phases of the experiment, the authors limited the effects of the drug to the transfer phase in extinction. This allowed the authors to make inference about the invigorating effects of the cues independently from any learning bias. Moreover, the authors employed a within subject design to study the effect of the drug on 100 participants, which also allows to detect continuous between-subject relationships with covariates such as working memory capacity.

      The study replicates previous findings using this task, namely that appetitive cues promote active responding, and aversive cues promote passive responding in an approach instrumental context, whereas the effect of the cues reverses in a withdraw instrumental context. The results of the methylphenidate manipulation show that the drug decreases the effects of the Pavlovian cues on instrumental responding in participants with low working memory capacity but increases the Pavlovian effects in participants with high working memory capacity. Importantly, in the latter group, methylphenidate increases the invigorating effect of appetitive Pavlovian cues on active approach and aversive Pavlovian cues on active withdrawal as well as the inhibitory effects of aversive Pavlovian cues on active approach and appetitive Pavlovian cues on active withdrawal. These results cannot be explained if catecholamines are just involved in Pavlovian biases by modulating behavioral invigoration driven by the anticipation of reward and punishment in the striatum, as this account can't account for the reversal of the effects of a valence cue on vigor depending on the instrumental context.

      In general, I find the methods of this study very robust and the results very convincing and important. However, I have some concerns:

      We thank the Reviewer for highlighting the robustness of the methods and the importance of the results. We are glad to shortly address the concerns here and have incorporated these in our revision.

      I am not convinced that the inclusion of impulsivity scores in the logistic mixed model to analyze the effects of methylphenidate on PIT is warranted. The authors do not show whether inclusion of this covariate is justified in terms of BIC. Moreover, they include this covariate but do not report the effects. Finally, it is possible that impulsivity is correlated with working memory capacity. In that case, multicollinearity may impact the estimation of the coefficient estimates and may inflate the p-values for the correlated covariates. Are the reported results robust when this factor is not included?

      With regard to the inclusion of impulsivity we first like to mention that this inclusion in our analyses was planned a priori and therefore consistently implemented in the other reports resulting from the overarching study (Froböse et al., 2018; Cook et al., 2019; Rostami Kandroodi et al., 2021), especially the study with regard to which the current report is an e-life research advance (Swart et al., 2017). Moreover, we preregistered both working memory span and impulsivity as potential factors (under secondary measures) that could mediate the effects of catecholamines (see https://onderzoekmetmensen.nl/nl/trial/26989). The inclusion of working memory span was based on evidence from PET imaging studies demonstrating a link with dopamine synthesis capacity (Cools et al., 2008; Landau et al, 2009), whereas the inclusion of trait impulsivity was based on evidence from other PET imaging studies showing a link with dopamine (auto)receptor availability (Buckholtz et al., 2010; Kim et al., 2014; Lee et al., 2009; Reeves et al., 2012). Although there was no significant improvement for the model with impulsivity compared with the model without impulsivity, we feel that we should follow our a priori established analyses.

      We can confirm that impulsivity and working memory were not correlated in this sample (r98=-0.16, p=0.88), which rules out multicollinearity.

      Most importantly, results are robust to excluding impulsivity scores as evidenced by a significant four-way interaction from the omnibus GLMM without impulsivity (Action Context x Valence x Drug x WM span: X<sup>2</sup> = 9.5, p=0.002). We will report these findings in the revised manuscript. We now added the text to the Supplemental Results: Control analyses, page 28.

      The authors state that working memory capacity is an established proxy for dopamine synthesis capacity and cite some studies supporting this view. However, the authors omit a recent reference by van den Bosch et al that provides evidence for the absence of links between striatal dopamine synthesis capacity and working memory capacity. The lack of a robust link between working memory capacity and dopamine synthesis capacity in the striatum strengthens the alternative explanations of the results suggested in the discussion.

      We agree with the Reviewer that the lack of a robust link between working memory capacity and dopamine synthesis capacity in the striatum, as measured with [<sup>18</sup>F]-FDOPA PET imaging, is lending support for the proposed hypothesis incorporating a broader perspective on Pavlovian bias generation than the dopaminergic direct/indirect pathway account (although it is possible that the association will hold in a larger sample when synthesis capacity is measured with [<sup>18</sup>F]-FMT PET imaging, which is sensitive to a different component of the metabolic pathway). We will indeed incorporate in our planned revision the findings from our group reported in van den Bosch et al (2022).

      See Supplemental methods 2: Working memory and impulsivity assessment, page 26.

      ** Recommendations for the authors:**

      Reviewer #1 (Recommendations for the authors):

      (1) Theoretical clarity. Some aspects of the paper are ideally clear: Figure 1 clearly explains the paradigm. The general take-home message is clearly described in the last line of the abstract, the last line of the introduction, the first line of the discussion, and throughout other places in the discussion. Yet the authors seem to hedge their bets when it comes to placing these findings within a broader theoretical framework.

      The discussion includes many possible theoretical interpretations of the findings, which is laudable, but many readers may get lost in this multitude (particularly anyone who isn't an RL/DA aficionado). The group's prior work (i.e. striatal hypothesis) is first described, followed by a rather complex breakdown of valenceaction tendencies, then the seemingly preferred explanation for the current study (i.e. cognitive control hypothesis) is advanced as "an alternative account ...". This is followed by a third, more complex idea (i.e. cortico-striatal balance hypothesis), then the paper ends. A reader may be forgiven for skimming through this discussion and not having a clear idea of how to frame these effects. I think some subheaders would help, as well as clearer labeling of the theoretical interpretations in line with a more authoritative description of the author's preferred interpretation of the empirical effects.

      Our findings ask for a revision of theories on how catecholamines are involved in instantiation of Pavlovian biases in decision making. The reviewer rightly notices that we offer three routes to modify current theory to be able to incorporate our findings. Briefly, these routes discuss catecholaminergic modulation of Pavlovian biases (i) through modulation of the putative striatal ‘origin’ of Pavlovian biases, (ii) through top-down control, primarily relying on prefrontal processes, and (ii) a combination of the two, where catecholamines regulate the balance between these striatal and frontal processes.

      Given the systemic nature of the pharmacological manipulation, we cannot dissociate between these three accounts. We believe that discussing these possible explanations enriches our Discussion and strengthens our recommendation in the ultimate paragraph to use pharmacological neuroimaging studies to arbitrate between these options. In the revision, we have made this line of reasoning more clear, in part by adding guiding titles to the Discussion section and adding a summary paragraph in the Discussion (Discussion, page 9-12).

      (2) All statistical effects are presented as c^2 with no df. The methods only describe LMER and make no mention of what the c^2 measure represents.

      C^2 seems a technical pdf conversion error problem: all chi-squares (Χ2) have been converted to C2. This is now corrected in our revision.

      Reviewer #2 (Recommendations for the authors):

      Few minor points:

      Figure 2A is not cited in the text I think

      Checked and changed.

      Figure 2C: "C" is not present in the figure. Also I could not see the data corresponding at MPH-Approach context in Neutral Pavlovian condition but I think it is probably masked by another curve.

      Checked and changed. Indeed, the one curve is masked by the other curve.

      As I stated in the public review, a clarification or more detailed analysis of working memory performance depending on if it was measured under MPH or placebo could be a plus.

      Changed this (see public review reply).

      I did not see any statement about the availability of data but I may have missed it.

      Yes, the statement can be found:

      Methods, page 13: Data and code for the study are freely available at https://data.ru.nl/collections/di/dccn/DSC_3017031.02_734.

      Reviewer #3 (Recommendations for the authors):

      The authors should check that inclusion of impulsivity in the logistic mixed model is justified and if it is justified make sure that multicollinearity is not problematic.

      See answer to public review for convenience reiterated below:

      With regard to the inclusion of impulsivity we first like to mention that this inclusion in our analyses was planned a priori and therefore consistently implemented in the other reports resulting from the overarching study (Froböse et al., 2018; Cook et al., 2019; Rostami Kandroodi et al., 2021), especially the study with regard to which the current report is an e-life research advance (Swart et al., 2017). Moreover, we preregistered both working memory span and impulsivity as potential factors (under secondary measures) that could mediate the effects of catecholamines (see https://onderzoekmetmensen.nl/nl/trial/26989). The inclusion of working memory span was based on evidence from PET imaging studies demonstrating a link with dopamine synthesis capacity (Cools et al., 2008; Landau et al, 2009), whereas the inclusion of trait impulsivity was based on evidence from other PET imaging studies showing a link with dopamine (auto)receptor availability (Buckholtz et al., 2010; Kim et al., 2014; Lee et al., 2009; Reeves et al., 2012). Although there was no significant improvement for the model with impulsivity compared with the model without impulsivity, we feel that we should follow our a priori established analyses.

      We can confirm that impulsivity and working memory were not correlated in this sample (r98=-0.16, p=0.88), which rules out multicollinearity.

      Most importantly, results are robust to excluding impulsivity scores as evidenced by a significant four-way interaction from the omnibus GLMM without impulsivity (Action Context x Valence x Drug x WM span: X<sup>2</sup> = 9.5, p=0.002). We will report these findings in the revised manuscript. We now added the text to the Supplemental Results Control analyses, page 28.

      I would recommend that the authors make clear that the effects of methylphenidate are dependent on working memory capacity in the first sentence of the fore last paragraph of the introduction on page 4.

      Changed this accordingly, see Introduction, page 5.

      I would make sure that the text in the figures is readable without needing to enlarge the figures. I would also highlight the significant effects in the figures.

      We changed the font size accordingly and added significance statements to the caption, because depicting the significance of a four-way interaction including one continuous variable is not straightforward.

      The distributions of p(Go) by conditions such as in figure 1D or 2A are very intuitive. Figure 2B is very informative as it shows the continuous effects of working memory capacity on the PIT effect. I would add (in figure 2 or in the supplement) a plot of the p(Go) with a tertile split based on working memory. Considering that the correspondent analysis is being reported, having the plot would strengthen and simplify the understanding of the results.

      The continuous effects of working memory are based on WM values on the listening span ranging from 2.5-7, in steps of 0.5, resulting in 10 different values. A tertile split would result in binning these into two bins of three values, and one bin of four values. Given that all of the datapoints for this tertile split are already presented in the current figures, we strongly prefer not to include this additional figure.

      I would add some sentences in the results section (and maybe in the discussion if needed) addressing the results that the effect of Valence by drug by WM span is only significant in the withdrawal context but not in the approach context.

      We now added an emphasis on the specifically significant drug effects in withdrawal in the Results section, page 8.

    1. eLife Assessment

      This is a valuable polymer model that provides insight into the origin of macromolecular mixed and demixed states within transcription clusters. The simulations are well performed and clearly presented in the context of existing experimental datasets. This compelling study will be of interest to those studying gene expression in the context of chromatin.

    2. Reviewer #1 (Public review):

      This manuscript discusses from a theory point of view he mechanisms underlying the formation of specialized or mixed factories. To investigate this, a chromatin polymer model was developed to mimic the chromatin binding-unbinding dynamics of various complexes of transcription factors (TFs).

      The model revealed that both specialized (i.e., demixed) and mixed clusters can emerge spontaneously, with the type of cluster formed primarily determined by cluster size. Non-specific interactions between chromatin and proteins were identified as the main factor promoting mixing, with these interactions becoming increasingly significant as clusters grow larger.

      These findings, observed in both simple polymer models and more realistic representations of human chromosomes, reconcile previously conflicting experimental results. Additionally, the introduction of different types of TFs was shown to strongly influence the emergence of transcriptional networks, offering a framework to study transcriptional changes resulting from gene editing or naturally occurring mutations.

      Overall I think this is an interesting paper discussing a valuable model of how chromosome 3D organisation is linked to transcription.

      Comments on revisions: It's a good paper.

    3. Reviewer #2 (Public review):

      Summary:

      With this report, I suggest what are in my opinion crucial additions to the otherwise very interesting and credible research manuscript "Cluster size determines morphology of transcription factories in human cells".

      Strengths:

      The manuscript in itself is technically sound, the chosen simulation methods are completely appropriate the figures are well-prepared, the text is mostly well-written spare a few typos. The conclusions are valid and would represent a valuable conceptual contribution to the field of clustering, 3D genome organization and gene regulation related to transcription factories, which continues to be an area of most active investigation.

      Weaknesses:

      However, I find that the connection to concrete biological data is weak. This holds especially given that the data that are needed to critically assess the applicability of the derived cross-over with factory size is, in fact, available for analysis, and the suggested experiments in the Discussion section are actually done and their results can be exploited. In my judgement, unless these additional analysis are added to a level that crucial predictions on TF demixing and transcriptional bursting upon TU clustering can be tested, the paper is more fitted for a theoretical biophysics venue than for a biology journal such as eLife.

      Comments on revisions:

      The authors have addressed my comments with exemplary diligence, which has clarified all my major concerns. In all cases, either the relevant work was added, or it was explained in the form of a convincing argument why the suggested modifications were not implemented or not possible to implement.

      As a discretionary suggestion, the authors might consider using a title that even more directly highlights the, in my opinion, main take-away of this work. This is not because anything is incorrect about the current title, simply an even more to-the-point title might attract more readers. I would suggest something along the lines of

      "Cluster size-dependent demixing drives specialization of transcription factories"

      Overall, I congratulate the authors on their excellent work and appreciate the opportunity to engage with this manuscript during a very insightful review process.

    4. Reviewer #3 (Public review):

      Summary:

      In this work, the authors present a chromatin polymer model with some specific pattern of transcription units (TUs) and diffusing TFs; they simulate the model and study TFclustering, mixing, gene expression activity, and their correlations. First, the authors designed a toy polymer with colored beads of a random type, placed periodically (every 30 beads, or 90kb). These colored beads are considered a transcription unit (TU). Same-colored TUs attract with each other mediated by similarly colored diffusing beads considered as TFs. This led to clustering (condensation of beads) and correlated (or anti-correlation) "gene expression" patterns. Beyond the toy model, when authors introduce TUs in a specific pattern, it leads to emergence of specialized and mixed cluster of different TFs. Human chromatin models with realistic distribution of TUs also lead to the mixing of TFs when cluster size is large.

      Strengths:

      This is a valuable polymer model for chromatin with a specific pattern of TUs and diffusing TF-like beads. Simulation of the model tests many interesting ideas. The simulation study is convincing and the results provide solid evidence showing the emergence of mixed and demixed TF clusters within the assumptions of the model.

    5. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This is a valuable polymer model that provides insight into the origin of macromolecular mixed and demixed states within transcription clusters. The well-performed and clearly presented simulations will be of interest to those studying gene expression in the context of chromatin. While the study is generally solid, it could benefit from a more direct comparison with existing experimental data sets as well as further discussion of the limits of the underlying model assumptions.

      We thank the editors for their overall positive assessment. In response to the Referees’ comments, we have addressed all technical points, including a more detailed explanation of the methodology used to extract gene transcription from our simulations and its analogy with real gene transcription. Regarding the potential comparison with experimental data and our mixing–demixing transition, we have added new sections discussing the current state of the art in relevant experiments. We also clarify the present limitations that prevent direct comparisons, which we hope can be overcome with future experiments using the emerging techniques.

      Reviewer #1 (Public Review):

      This manuscript discusses from a theory point of view the mechanisms underlying the formation of specialized or mixed factories. To investigate this, a chromatin polymer model was developed to mimic the chromatin binding-unbinding dynamics of various complexes of transcription factors (TFs).

      The model revealed that both specialized (i.e., demixed) and mixed clusters can emerge spontaneously, with the type of cluster formed primarily determined by cluster size. Non-specific interactions between chromatin and proteins were identified as the main factor promoting mixing, with these interactions becoming increasingly significant as clusters grow larger.

      These findings, observed in both simple polymer models and more realistic representations of human chromosomes, reconcile previously conflicting experimental results. Additionally, the introduction of different types of TFs was shown to strongly influence the emergence of transcriptional networks, offering a framework to study transcriptional changes resulting from gene editing or naturally occurring mutations.

      Overall I think this is an interesting paper discussing a valuable model of how chromosome 3D organisation is linked to transcription. I would only advise the authors to polish and shorten their text to better highlight their key findings and make it more accessible to the reader.

      We thank the Referee for carefully reading our manuscript and recognizing its scientific value. As suggested, we tried to better highlight our key findings and make the text more accessible while addressing also the comments from the other Referees.

      Reviewer #2 (Public Review):

      Summary:

      With this report, I suggest what are in my opinion crucial additions to the otherwise very interesting and credible research manuscript ”Cluster size determines morphology of transcription factories in human cells”.

      Strengths:

      The manuscript in itself is technically sound, the chosen simulation methods are completely appropriate the figures are well-prepared, the text is mostly well-written spare a few typos. The conclusions are valid and would represent a valuable conceptual contribution to the field of clustering, 3D genome organization and gene regulation related to transcription factories, which continues to be an area of most active investigation.

      Weaknesses:

      However, I find that the connection to concrete biological data is weak. This holds especially given that the data that are needed to critically assess the applicability of the derived cross-over with factory size is, in fact, available for analysis, and the suggested experiments in the Discussion section are actually done and their results can be exploited. In my judgement, unless these additional analysis are added to a level that crucial predictions on TF demixing and transcriptional bursting upon TU clustering can be tested, the paper is more fitted for a theoretical biophysics venue than for a biology journal such as eLife.

      We thank the Reviewer for their positive assessment of the soundness of our work and its contribution to the field. We have added a paragraph to the Conclusions highlighting the current state of experimental techniques and outlining near-term experiments that could be extended to test our predictions. We also emphasise that our analysis builds on state-of-the-art polymer models of chromatin and on quantitative experimental datasets, which we used both to build the model construction and to validate its outcomes (gene activity). We hope this strengthened link to experiment will catalyse further studies in the field.

      Major points:

      (1) My first point concerns terminology.The Merriam-Webster dictionary describes morphology as the study of structure and form. In my understanding, none of the analyses carried out in this study actually address the form or spatial structuring of transcription factories. I see no aspects of shape, only size. Unless the authors want to assess actual shapes of clusters, I would recommend to instead talk about only their size/extent. The title is, by the same argument, in my opinion misleading as to the content of this study.

      We agree with the Referee that the title could be misleading. In our study we characterized clusters size, that is a morphological descriptor, and cluster composition that isn’t morphology per se but used in the community in a broader sense. Nevertheless to strength the message we have changed the title in: “Cluster size determines internal structure of transcription factories in human cells”

      (2) Another major conceptual point is the choice of how a single TF:pol particle in the model relates to actual macromolecules that undergo clustering in the cell. What about the fact that even single TF factories still contain numerous canonical transcription factors, many of which are also known to undergo phase separation? Mediator, CDK9, Pol II just to name a few. This alone already represents phase separation under the involvement of different species, which must undergo mixing. This is conceptually blurred with the concept of gene-specific transcription factors that are recruited into clusters/condensates due to sequencespecific or chromatin-epigenetic-specific affinities. Also, the fact that even in a canonical gene with a ”small” transcription factory there are numerous clustering factors takes even the smallest factories into a regime of several tens of clustering macromolecules. It is unclear to me how this reality of clustering and factory formation in the biological cell relates to the cross-over that occurs at approximately n=10 particles in the simulations presented in this paper.

      This is a good point. However in our case we can either look at clustering transcription factors or transcription units. In an experimental situation, transcription units could be “coloured”, or assigned different types, by looking at different cell types, so that they can be classified as housekeeping, or cell-type independent, or cell-type specific. This is similar to how DHS can be clustered. In this way the mixing or demixing state can be identified by looking at the type of transcription unit, removing any ambiguity due to the fact that the same protein may participate in different TF complexes..

      (3) The paper falls critically short in referencing and exploiting for analysis existing literature and published data both on 3D genome organization as well as the process of cluster formation in relation to genomic elements. In terms of relevant literature, most of the relevant body of work from the following areas has not been included:

      (i) mechanisms of how the clustering of Pol II, canonical TFs, and specific TFs is aided by sequence elements and specific chromatin states

      (ii) mechanisms of TF selectivity for specific condensates and target genomic elements

      (iii) most crucially, existing highly relevant datasets that connect 3D multi-point contacts with transcription factor identity and transcriptional activity, which would allow the authors to directly test their hypotheses by analysis of existing data

      Here, especially the data under point (iii) are essential. The SPRITE method (cited but not further exploited by the authors), even in its initial form of publication, would have offered a data set to critically test the mixing vs. demixing hypothesis put forward by the authors. Specifically, the SPRITE method offers ordered data on k-mers of associated genomic elements. These can be mapped against the main TFs that associate with these genomic elements, thereby giving an account of the mixed / demixed state of these k-mer associations. Even a simple analysis sorting these associations by the number of associated genomic elements might reveal a demixing transition with increasing association size k. However, a newer version of the SPRITE method already exists, which combines the k-mer association of genomic elements with the whole transcriptome assessment of RNAs associated with a particular DNA k-mer association. This can even directly test the hypotheses the authors put forward regarding cluster size, transcriptional activation, correlation between different transcription units’ activation etc.

      To continue, the Genome Architecture Mapping (GAM) method from Ana Pombo’s group has also yielded data sets that connect the long-range contacts between gene-regulatory elements to the TF motifs involved in these motifs, and even provides ready-made analyses that assess how mixed or demixed the TF composition at different interaction hubs is. I do not see why this work and data set is not even acknowledged? I also strongly suggest to analyze, or if they are already sufficiently analyzed, discuss these data in the light of 3D interaction hub size (number of interacting elements) and TF motif composition of the involved genomic elements.

      Further, a preprint from the Alistair Boettiger and Kevin Wang labs from May 2024 also provides direct, single-cell imaging data of all super-enhancers, combined with transcription detection, assessing even directly the role of number of super-enhancers in spatial proximity as a determinant of transcriptional state. This data set and findings should be discussed, not in vague terms but in detailed terms of what parts of the authors’ predictions match or do not match these data.

      For these data sets, an analysis in terms of the authors’ key predictions must be carried out (unless the underlying papers already provide such final analysis results). In answering this comment, what matters to me is not that the authors follow my suggestions to the letter. Rather, I would want to see that the wealth of available biological data and knowledge that connects to their predictions is used to their full potential in terms of rejecting, confirming, refining, or putting into real biological context the model predictions made in this study.

      References for point (iii):

      - RNA promotes the formation of spatial compartments in the nucleus https://www.cell.com/cell/fulltext/S0092-8674(21)01230-7?dgcid=raven_jbs_etoc_email

      - Complex multi-enhancer contacts captured by genome architecture mapping https://www.nature.com/articles/nature21411

      - Cell-type specialization is encoded by specific chromatin topologies https://www.nature.com/articles/s41586-021-04081-2

      - Super-enhancer interactomes from single cells link clustering and transcription https://www.biorxiv.org/content/10.1101/2024.05.08.593251v1.full

      For point (i) and point (ii), the authors should go through the relevant literature on Pol II and TF clustering, how this connects to genomic features that support the cluster formation, and also the recent literature on TF specificity. On the last point, TF specificity, especially the groups of Ben Sabari and Mustafa Mirx have presented astonishing results, that seem highly relevant to the Discussion of this manuscript.

      We appreciate the Reviewer’s insightful suggestion that a comparison between our simulation results and experimental data would strengthen the robustness of our model. In response, we have thoroughly revised the literature on multi-way chromatin contacts, with particular attention to SPRITE and GAM techniques. However, we found that the currently available experimental datasets lack sufficient statistical power to provide a definitive test of our simulation predictions, as detailed below.

      As noted by the Reviewer, SPRITE experiments offer valuable information on the composition of highorder chromatin clusters (k-mers) that involve multiple genomic loci. A closer examination of the SPRITE data (e.g., Supplementary Material from Ref. [1]) reveals that the majority of reported statistics correspond to 3-mers (three-way contacts), while data on larger clusters (e.g., 8-mers, 9-mers, or greater) are sparse. This limitation hinders our ability to test the demixing-mixing transition predicted in our simulations, which occurs for cluster sizes exceeding 10.

      Moreover, the composition of the k-mers identified by SPRITE predominantly involves genomic regions encoding functional RNAs—such as ITS1 and ITS2 (involved in rRNA synthesis) and U3 (encoding small nucleolar RNA)—which largely correspond to housekeeping genes. Conversely, there is little to no data available for protein-coding genes. This restricts direct comparison to our simulations, where the demixing-mixing transition depends critically on the interplay between housekeeping and tissue-specific genes.

      Similarly, while GAM experiments are capable of detecting multi-way chromatin contacts, the currently available datasets primarily report three-way interactions [2,3].

      In summary, due to the limited statistical data on higher-order chromatin clusters [4], a quantitative comparison between our simulation results and experimental observations is not currently feasible. Nevertheless, we have now briefly discussed the experimental techniques for detecting multi-way interactions in the revised manuscript to reflect the current state of the field, mentioning most of the references that the Reviewer suggested.

      (4) Another conceptual point that is a critical omission is the clarification that there are, in fact, known large vs. small transcription factories, or transcriptional clusters, which are specific to stem cells and ”stressed cells”. This distinction was initially established by Ibrahim Cisse’s lab (Science 2018) in mouse Embryonic Stem Cells, and also is seen in two other cases in differentiated cells in response to serum stimulus and in early embryonic development:

      - Mediator and RNA polymerase II clusters associate in transcription-dependent condensates https://www.science.org/doi/10.1126/science.aar4199

      - Nuclear actin regulates inducible transcription by enhancing RNA polymerase II clustering https://www.science.org/doi/10.1126/sciadv.aay6515

      - RNA polymerase II clusters form in line with surface condensation on regulatory chromatin https://www.embopress.org/doi/full/10.15252/msb.202110272

      - If ”morphology” should indeed be discussed, the last paper is a good starting point, especially in combination with this additional paper: Chromatin expansion microscopy reveals nanoscale organization of transcription and chromatin https://www.science.org/doi/10.1126/science.ade5308

      We thank the Reviewer for pointing out the discussion about small and large clusters observed in stressed cells. Our study aims to provide a broader mechanistic explanation on the formation of TF mixed and demixed clusters depending on their size. However, to avoid to generate confusion between our terminology and the classification that is already used for transcription factories in stem and stressed cells, we have now added some comments and references in the revised text.

      (5) The statement scripts are available upon request is insufficient by current FAIR standards and seems to be non-compliant with eLife requirements. At a minimum, all, and I mean all, scripts that are needed to produce the simulation outcomes and figures in the paper, must be deposited as a publicly accessible Supplement with the article. Better would be if they would be structured and sufficiently documented and then deposited in external repositories that are appropriate for the sharing of such program code and models.

      We fully agree with the Reviewer. We have now included in the main text a link to an external repository containing all the codes required to reproduce and analyze the simulations.

      Recommendations for the authors:

      Minor and technical points

      (6) Red, green, and yellow (mix of green and red) is a particularly bad choice of color code, seeing that red-green blindness is the most common color blindness. I recommend to change the color code.

      We appreciate the Reviewer’s thoughtful comment regarding color accessibility. We fully agree that red–green combinations can pose challenges for color-blind readers. In our figures, however, we chose the red–green–yellow color scheme deliberately because it provides strong contrast and intuitive representation for different TF/TU types. To ensure accessibility, we optimized brightness and saturation within red-green schemes and we carefully verified that the chosen hues are distinguishable under the most common forms of color vision deficiency, i.e. trichromatic color blindness, using color-blindness simulation tools (e.g., Coblis).

      How is the dispersing effect of transcriptional activation and ongoing transcription accounted for or expected to affect the model outcome? This affects both transcriptional clusters (they tend to disintegrate upon transcriptional activation) as well as the large scale organization, where dispersal by transcription is also known.

      We thank the Reviewer for this very insightful question. The current versions of both our toy model and the more complex HiP-HoP model do not incorporate the effects of RNA Polymerase elongation. Our primary goal was to develop a minimalisitc framework that focuses on investigating TF clusters formation and their composition. Nevertheless, we find that this straightforward approach provides a good agreement between simulations and Hi-C and GRO-seq experiments, lending confidence to the reliability of our results concerning TF cluster composition.

      We fully agree, however, that the effects of transcription elongation are an interesting topic for further exploration. For example, modeling RNA Polymerases as active motors that continually drive the system out of equilibrium could influence the chromatin polymer conformation and the structure of TF clusters. Additionally, investigating how interactions between RNA molecules and nuclear proteins, such as SAF-A, might lead to significant changes in 3D chromatin organization and, consequently, transcription [5], is also in intriguing prospect. Although we do not believe that the main findings of our study, particularly regarding cluster composition and mixed-demixed transition, would be impacted by transcription elongation effects, we recognize the importance of this aspect. As such, we have now included some comments in the Conclusions section of the revised manuscript.

      “and make the reasonable assumption that a TU bead is transcribed if it lies within 2.25 diameters (2.25σ) of a complex of the same colour; then, the transcriptional activity of each TU is given by the fraction of time that the TU and a TF:pol lie close together.” How is that justified? I do not see how this is reasonable or not, if you make that statement you must back it up.

      As pointed out by the Referee, we consider a TU to be active if at least one TF is within a distance 2.25σ from that TU. This threshold is a slightly larger than the TU-TF interaction cutoff distance, r<sub>c</sub> \= 1.8σ between TFs and TUs. The rationale for this choice is to ensure that, in the presence of a TU cluster surrounded by TFs, TUs that are not directly in contact with a TF are still considered active. Nonetheless, we find that using slightly different thresholds, such as 1.8σ or 1.1σ, leads to comparable results, as shown in Fig. S11, demonstrating the robustness of our analysis.

      Clearly, close proximity in 1D genomic space favours formation of similarly-coloured clusters. This is not surprising, it is what you built the model to do. Should not be presented as a new insight, but rather as a check that the model does what is expected.

      We believed that this sentence already conveyed that the formation of single-color clusters driven by 1D genomic proximity is not a surprising outcome. However, we have now slightly rephrased it to better emphasize that this is not a novel insight.

      That said, we would like to highlight that while 1D genomic proximity facilitates the formation of clusters of the same color, the unmixed-to-mixed transition in cluster composition is not easily predictable solely from the TU color pattern. Furthermore, in simulations of real chromosomes, where TU patterns are dictated by epigenetic marks, the complexity of these patterns makes it challenging—if not impossible—to predict cluster composition based solely on the input data of our model.

      “…how closely transcriptional activities of different TUs correlate…” Please briefly state over what variable the correlation is carried out, is it cross correlation of transcription activity time courses over time? Would be nice to state here directly in the main text to make it easier for the reader.

      We have now included a brief description in the revised manuscript explaining how the transcriptional correlations were evaluated and how the correlation matrix was constructed.

      “The second concerns how expression quantitative trait loci (eQTLs) work. Current models see them doing so post-transcriptionally in highly-convoluted ways [11, 55], but we have argued that any TU can act as an eQTL directly at the transcriptional level [11].” This text does not actually explain what eQTLs do. I think it should, in concise words.

      We agree with the Referee’s suggestion. We have revised the sentence accordingly and now provide a clear explanation of eQTLs upon their first mention. The revised paragraph now reads as follows:

      “The second concerns how expression quantitative trait loci (eQTLs)—genomic regions that are statistically associated with variation in gene expression levels—function. While current models often attribute their effects to post-transcriptional regulation through complex mechanisms [6,7], we have previously argued that any transcriptional unit (TU) can act as an eQTL by directly influencing gene expression at the transcriptional level [7]. Here, we observe individual TUs up-regulating or down-regulating the activity of others TUs – hallmark behaviors of eQTLs that can give rise to genetic effects such as “transgressive segregation” [8]. This phenomenon refers to cases in which alleles exhibit significantly higher or lower expression of a target gene, and can be, for instance, caused by the creation of a non-parental allele with a specific combination of QTLs with opposing effects on the target gene.”

      “In the string with 4 mutations, a yellow cluster is never seen; instead, different red clusters appear and disappear (Fig. 2Eii)…” How should it be seen? You mutated away most of the yellow beads. I think the kymograph is more informative about the general model dynamics, not the effects of mutations. Might be more appropriate to place a kymograph in Figure 1.

      We agree with the Referee that the kymograph is the most appropriate graphical representation for capturing the effects of mutations. Panel 2E already refers to the standard case shown in Figure 1. We have now clarified this both in the caption and in the main text. In addition, we have rephrased the sentence—which was indeed misleading—as follows:

      “From the activity profiles in Fig. 2C, we can observe that as the number of mutations increases, the yellow cluster is replaced by a red cluster, with the remaining yellow TUs in the region being expelled (Fig. 2B(ii)). This behavior is reflected in the dynamics, as seen by comparing panels E(i) and E(ii): in the string with four mutations, transcription of the yellow TUs is inhibited in the affected region, while prominent red stripes—corresponding to active, transcribing clusters—emerge (Fig. 2E(ii)).” We hope that the comparison is now immediately clear to the reader.

      “…but this block fragments in the string with 4 mutations…” I don’t know or cannot see what is meant by ”fragmentation” in the correlation matrix.

      With the sentence “this block fragments in the string with 4 mutations” we mean that the majority of the solid red pixels within the black box become light-red or white once the mutations are applied. We have now added a clarification of this point in the revised manuscript.

      “Fig. 3D shows the difference in correlation between the case with reduced yellow TFs and the case displayed in Fig. 1E.” Can you just place two halves of the different matrices to be compared into the same panel? Similar to Fig. S5. Will be much easier to compare.

      We thank the Referee for this suggestion. We tried to implement this modification, and report the modified figure below (Author response image 1). As we can see, in the new figure it is difficult to spot the details we refer to in the main text, therefore we prefer to keep the original version of the figure.

      Author response image 1.

      Heatmap comparing activity correlations of TUs in the random string under normal conditions (top half) and with reduced yellow-TF concentration (bottom half).

      What is the omnigenic model? It is not introduced.

      We thank the Reviewer for highlighting this important point. The omnigenic model, first introduced by Boyle et al in Ref. [6], was proposed to explain how complex traits, including disease risk, are influenced by a vast number of genes. Accordingly to this model, the genetic basis of a trait is not limited to a small set of core genes whose expression is directly related to the trait, but also includes peripheral genes. The latter, although not directly involved in controlling the trait, can influence the expression of core genes through gene regulatory networks, thereby contributing to the overall genetic influence on the trait. We have now added a few lines in the revised manuscript to explain this point.

      “Additionally, blue off-diagonal blocks indicate repeating negative correlations that reflect the period of the 6-pattern.” How does that look in a kymograph? Does this mean the 6 clusters of same color steal the TFs from the other clusters when they form?

      The intuition of the Referee is indeed correct. The finite number of TFs leads to competition among TUs of the same colour, resulting in anticorrelation:when a group of six nearby TUs of a given colour is active, other, more distant TUs of the same colour are not transcribing due to the lack of available TFs. As the Referee suggested,this phenomenon is visible in the kymograph showing TU activity. In Author response image 2, it can be observed that typically there is a single TU cluster for each of the three colours (yellow, green, and red). These clusters can be long-lived (e.g., the yellow cluster at the center of the kymograph) or may destroy during the simulation (e.g., the red cluster at the top of the kymograph, which dissolves at t ∼ 600 × 10<sup>5</sup> τ<sub>B</sub>). In the latter case, TFs of the corresponding colour are released into the system and can bind to a different location, forming a new cluster (as seen with the red cluster forming at the bottom of the kymograph for t > 600 × 10<sup>5</sup> τ<sub>B</sub>). This point is further discussed at the point 2.30 of this Reply where additional graphical material is provided.

      Author response image 2.

      Kymograph showing the TU activity during a typical run in the 6-pattern case. Each row reports the transcriptional state of a TU during one simulation. Black pixels correspond to inactive TUs, red (yellow, green) pixels correspond to active red (yellow, green) TUs.

      “Conversely, negative correlations connect distant TUs, as found in the single-color model…” But at the most distal range, the negative correlation is lost again! Why leave this out? Your correlation curves show the same , equilibration towards no correlation at very long ranges.

      As highlighted in Figure 5Ai, long-range negative correlations (grey segments) predominantly connect distant TUs of the same colour. This is quantified in Figure 5Bi: restricting to same-colour TUs shows that at large genomic separations the correlation is almost entirely negative, with small fluctuations at distances just below 3000 kbp where sampling is sparse; we therefore avoid further interpretation of this regime.

      “These results illustrate how the sequence of TUs on a string can strikingly affect formation of mixed clusters; they also provide an explanation of why activities of human TUs within genomic regions of hundreds of kbp are positively correlated [60].” This is a very nice insight.

      We thank the Reviewer for the very supportive comment.

      “To quantify the extent to which TFs of different colours share clusters, we introduce a demixing coefficient, θ<sub>dem</sub> (defined in Fig. 1).” This is not defined in Fig. 1 or anywhere else here in the main text.

      We thank the Referee for pointing this out. For a given cluster, the demixing coefficient is defined as

      where n is the number of colors, i indexes each color present in the model, and x<sub>i,max</sub> the largest fraction of TFs of the same i-th color in a single TF cluster.

      The demixing coefficient is defined in the Methods section; therefore, we have replaced defined in Fig. 1 with see Methods for definition.

      “Mixing is facilitated by the presence of weakly-binding beads, as replacing them with non-interacting ones increases demixing and reduces long-range negative correlations (Figure S3). Therefore, the sequence of strong and weak binding sites along strings determines the degree of mixing, and the types of small-world network that emerge. If eQTLs also act transcriptionally in the way we suggest [11], we predict that down-regulating eQTLs will lie further away from their targets than up-regulating ones.” Going into these side topics and minke points here is super distracting and waters down the message. Maybe first deal with the main conclusions on mixed vs demixed clusters in dependence on the strong and specific binding site patterns, before dealing with other additional points like the role of weak binding sites.

      Thank you for the suggestion. We now changed the paragraph to highlight the main results. The new paragraph is as follows. “These results on activity correlation and TF cluster composition suggest that, if eQTLs act transcriptionally as expected [7], down-regulating eQTLs are likely to be located further from their target genes than up-regulating ones. In addition, it is important to note that mixing is promoted by the presence of weakly binding beads; replacing these with non-interacting ones leads to increased demixing and a reduction in long-range negative correlations (Figure S3). More generally, our findings indicate that the presence of multiple TF colors offers an effective mechanism to enrich and fine-tune transcriptional regulation.”

      “…provides a powerful pathway to enrich and modulate transcriptional regulation.” Before going into the possible meaning and implications of the results, please discuss the results themselves first.

      See previous point.

      Figure 5B. Does activation typically coincide with spatial compaction of the binding sites into a small space or within the confines of a condensate? My guess would be that colocalization of the other color in a small space is what leads to the mixing effect?

      As the Reviewer correctly noted, the activity of a given TU is indeed influenced by the presence of nearby TUs of the same color, since their proximity facilitates the recruitment of additional TFs and enhances the overall transcriptional activity. In this context, the mixing effect is certainly affected by the 1D arrangement of TUs along the chromatin fiber. As emphasized in the revised manuscript, when domains of same-color TUs are present (as in the 6-pattern string), the degree of demixing is greater compared to the case where TUs of different colors alternate and large domains are absent (as in the 1-pattern string). This difference in the demixing parameter as a function of the 1D TU arrangement is clearly visible in Fig. S2B.

      “…euchromatic regions blue, and heterochromatic ones grey.” Please also explain what these color monomers mean in terms of non specific interactions with the TFs.

      Generally, in our simulation approach we assume euchromatin regions to be more open and accessible to transcription factors, whereas heterochromatin corresponds to more compacted chromatin segments [9]. To reflect this, we introduce weak, non-specific interactions between euchromatin and TFs, while heterochromatin interacts with TFs only thorugh steric effects. To clarify this point, we have now slightly revised the caption of Fig.6.

      “More quantitatively, Spearman’s rank correlation coefficient is 3.66 10<sup>−1</sup>, which compares with 3.24 10<sup>−1</sup> obtained previously using a single-colour model [11].” This comparison does not tell me whether the improvement in model performance justifies an additional model component. There are other, likelihood based approaches to assess whether a model fits better in a relevant extent by adding a free model parameter. Can these be used for a more conclusive comparison? Besides, a correlation of 0.36 does not seem so good?

      We understand the Reviewer’s concern that the observed increase in the activity correlation may not appear to provide strong evidence for the improvement of the newly introduced model. However, within the context of polymer models developed to study realistic gene transcription and chromatin organization, this type of correlation analysis is a widely accepted approach for model validation. Experimental data commonly used for such validation include Hi-C maps, FISH experiments, and GRO-seq data [10,11]. The first two are typically employed to assess how accurately the model reproduces the 3D folding of chromatin; a comparison between experimental and simulated Hi-C maps is provided in the Supplementary Information (Fig. S5), showing a Pearson correlation of 0.7. GRO-seq or RNA-seq data, on the other hand, are used to evaluate the model’s ability to predict gene transcription levels. To date, the highest correlation for transcriptional activity data has been achieved by the HiP-HoP model at a resolution of 1 kbp [10], reporting a Spearman correlation of 0.6. Therefore, the correlation obtained with our 2-color model represents a good level of agreement when compared with the more complex HiP-HoP model. In this context, the observed increase in correlation—from 0.324 to 0.366—can be regarded as a modest yet meaningful improvement.

      “…consequently, use of an additional color provides a statisticallysignificant improvement (p-value < 10<sup>−6</sup>, 2-sided t-test).” I do not follow this argument. Given enough simulation repeats, any improvement, no matter how small, will lead to statistically significant improvements.

      We agree that this sentence could be misleading. We have now rephrased it in a clearer manner specifying that each of the two correlation values is statistically significant alone, while before we were wrongly referring to the significance of the improvement.

      “Additionally, simulated contact maps show a fair agreement with Hi-C data (Figure S5), with a Pearson correlation r ∼ 0.7 (p-value < 10<sup>−6</sup>, 2-sided t-test).” Nice!

      We thank the Reviewer for the positive comment.

      “Because we do not include heterochromatin-binding proteins, we should not however expect a very accurate reproduction of Hi-C maps: we stress that here instead we are interested in active chromatin, transcription and structure only as far as it is linked to transcription.” Then why do you not limit your correlation assessment to only these regions to show that these are very well captured by your model?

      We thank the Reviewer for this insightful comment. Indeed, we could have restricted our investigation to active chromatin regions, as done in our previous works [11,12]. However, our intention in this section of the manuscript was to clarify that the current model is relatively simple and therefore not expected to achieve a very high level of agreement between experimental and simulated Hi-C maps. Another important limitation of the two color model described in the section is the absence of active loop extrusion mediated by SMC proteins, which is known to play a central role in establishing TADs boundaries. Consequently, even if our analysis were limited to active chromatin regions, the agreement with experimental Hi-C maps would still remain lower than that obtained with more comprehensive models, such as HiP-HoP, that we use later in the last section of the paper. We have now added a comment in the revised manuscript explicitly noting the lack of active loop extrusion in our 2-color model.

      “We also measure the average value of the demixing coefficient, θ<sub>dem</sub> (Materials and Methods). If θ<sub>dem</sub> = 1, this means that a cluster contains only TFs of one colour and so is fully demixed; if θ<sub>dem</sub> = 0, the cluster contains a mixture of TFs of all colors in equal number, and so is maximally mixed.” Repetitive.

      We have now rephrased the sentence in a more concise way.

      “…notably, this is similar to the average number of productivelytranscribing pols seen experimentally in a transcription factory [6].” That seems a bit fast and loose. The number of Polymerases can differ depending on state, type of factory, gene etc. and vary between anything from to a few hundreds of Polymerase complexes depending on definition of factory, and what is counted as active. Also, one would think that polymerases only make up a small part of the overall protein pool that constitutes a condensate, so it is unclear whether this is a pertinent estimate.

      Here we refer to the average size of what is normally referred to as a PolII factory, not a generic nuclear condensate. These are the clusters which arise in our simulations. These structures emerge through microphase separation and have been well characterised, for instance see [13] for a recent review. For these structures while there is a distribution the average is well defined and corresponds to a size of about 100 nm, which is very much in line with the size of the clusters we observe, both in terms of 3D diameter and number of participating proteins. Because of the size, the number of active complexes which can contribute cannot be significantly more than ∼ 10. These estimates are, we note, very much in line with super-resolution measurements of SAF-A clusters [14], which are associated with active transcription and hence it is reasonable to assume they colocalise with RNA and polymerase clusters.

      “Conversely, activities of similar TUs lying far from each other on the genetic map are often weakly negatively correlated, as the formation of one cluster sequesters some TFs to reduce the number available to bind elsewhere.” This point is interesting, and I strongly suspect that this indeed happening. But I don’t think it was shown in the analysis of the simulation results in sufficient clarity. We need direct assessment of this sequestration, currently it’s only indirectly inferred.

      Indeed, this is the mechanism underlying the emergence of negative long-range correlations among TU activity values. As the Reviewer correctly pointed out, the competition for a finite number of TFs was only indirectly inferred in the original manuscript. To address this, we have now included a new figure explicitly illustrating this effect. In Fig. S12, we show the kymograph of active TUs (left panel), as in Fig. 2E(i) of the main text, alongside a new kymograph depicting the number of green TFs within a sphere of radius 10σ centered on each green TU (right panel). For simplicity, we focus here only on green TUs and TFs. It can be observed that, during the initial part of the simulation, green TFs are localized near genomic position ∼ 2000(right panel), where green TUs are transcriptionally active (left panel). Toward the end of the simulation, TUs near genomic position ∼ 500 become active, coinciding with the relocation of TFs to this region and the depletion of the previous one.

      In the definition for the demixing coefficient (equation 1), what does the index i stand for?

      Here i is an index denoting each of the colors present in the model. We have now specified the meaning of i after Eq. 1.

      Reviewer 3 (Public Review):

      In this work, the authors present a chromatin polymer model with some specific pattern of transcription units (TUs) and diffusing TFs; they simulate the model and study TFclustering, mixing, gene expression activity, and their correlations. First, the authors designed a toy polymer with colored beads of a random type, placed periodically (every 30 beads, or 90kb). These colored beads are considered a transcription unit (TU). Same-colored TUs attract with each other mediated by similarly colored diffusing beads considered as TFs. This led to clustering (condensation of beads) and correlated (or anti-correlation) ”gene expression” patterns. Beyond the toy model, when authors introduce TUs in a specific pattern, it leads to emergence of specialized and mixed cluster of different TFs. Human chromatin models with realistic distribution of TUs also lead to the mixing of TFs when cluster size is large.

      Strengths.

      This is a valuable polymer model for chromatin with a specific pattern of TUs and diffusing TF-like beads. Simulation of the model tests many interesting ideas. The simulation study is convincing and the results provide solid evidence showing the emergence of mixed and demixed TF clusters within the assumptions of the model.

      Weaknesses.

      Weakness of the work: The model has many assumptions. Some of the assumptions are a bit too simplistic. Concerns about the work are detailed below:

      We thank the Referee for this overall positive evaluation.

      We thank the Referee for this important observation. The way we The authors assume that when the diffusing beads (TFs) are near a TU, the gene expression starts. However, mammalian gene expression requires activation by enhancer-promoter looping and other related events. It is not a simple diffusion-limited event. Since many of the conclusions are derived from expression activity, will the results be affected by the lack of looping details?

      We do not need to assume promoter-enhancer contact, this emerges naturally through the bridging-induced phase separation and indeed is a key strength of our model. Even though looping is not assumed as key to transcriptional initiation, in practice the vast majority of events in which a TF is near a TU are associated with the presence of a cluster where regulatory elements are looped. So transcription in our case is associated with the bridging-induced phase separation, and there is no lack of looping, looping is naturally associated with transcription, and this is an emergent property of the model (not an assumption), which is an important feature of our model. Accordingly, both contact maps and transcriptional activity are well predicted by our model, both in the version described here and in the more sophisticated single-colour HiP-HoP model [10] (an important ingredient of which is the bridging-induced phase separation).

      Authors neglect protein-protein interactions. Without proteinprotein interactions, condensate formation in natural systems is unlikely to happen.

      We thank the Reviewer for pointing out the absence of protein-protein interactions in our simulations. While we acknowledge this limitation, we would like to emphasize that experimental studies have not observed nuclear proteins forming condensates at physiological concentrations in the absence of DNA or chromatin. For example, studies such as Ryu et al. [15] and Shakya et al. [16] show that protein-protein interactions alone are insufficient to drive condensate formation in vivo. Instead, the presence of a substrate, such as DNA or chromatin, is essential to favor and stabilize the formation of protein clusters.

      In our simulations, we propose that protein liquid-liquid phase separation (LLPS) is driven by the presence of both strong and weak attractions between multivalent protein complexes and the chromatin filament. As stated in our manuscript, the mechanism leading to protein cluster formation is the bridging induced attraction. This mechanism involves a positive feedback loop, where protein binding to chromatin induces a local increase in chromatin density, which then attracts more proteins, further promoting cluster formation.

      While we acknowledge that adding protein-protein interactions could be incorporated into our simulations, we believe this would need to be a weak interaction to remain consistent with experimental data. Additionally, incorporating such interactions would not alter the conclusions of our study.

      What is described in this paper is a generic phenomenon; many kinds of multivalent chromatin-binding proteins can form condensates/clusters as described here. For example, if we replace different color TUs with different histone modifications and different TFs with Hp1, PRC1/2, etc, the results would remain the same, wouldn’t they? What is specific about transcription factor or transcription here in this model? What is the logic of considering 3kb chromatin as having a size of 30 nm? See Kadam et al. (Nature Communications 2023). Also, DNA paint experimental measurement of 5kb chromatin is greater than 100 nm (see work by Boettiger et al.).

      We thank the Reviewer for this important observation, which we now address. To begin, we consider the toy model introduced in the first part of the manuscript, where TUs are randomly positioned rather than derived from epigenetic data. As the Reviewer points out, in this simplified context, our results reflect a generic phenomenon: the composition of clusters depends primarily on their size, independent of the specific types of proteins involved. However, the main goal of our work is to gain insights into apparently contradictory experimental findings, which show that some transcription factories consist of a single type of transcription factors, while other contain multiple types. This led us to focus on TF clusters and their role in transcriptional regulation and co-regulation of distant genes. Therefore, in the second part of the manuscript, we use DNase I hypersensitive site (DHS) data to position TUs based on predicted TF binding sites, providing a more biological framework. In both the toy model and the more realistic HiP-HoP model, we observe a size-dependent transition in cluster composition. However, we refrain from generalizing these results to clusters composed of other protein complexes, such as HP1 and PRC, as their binding is governed by distinct epigenetic marks (e.g. H3K927me3 and H3K27me3), which exhibit different genomic distributions compared to DHS marks.

      Finally, the mapping of 3kb to 30nm is an estimate which does not significantly impact our conclusions. The relationship between genomic distance (in kbp) and spatial distance (in nm) is highly dependent on the degree of chromatin compaction, which can vary across cell types and genomic context. As such, providing an exact conversion is challenging [17]. For example, in a previous work based on the HiP-HoP model [12] we compared simulated and experimental FISH measurements and found that 1kbp typically corresponds to 15 − 20nm, implying that 3kbp could span 60nm. Nevertheless, we emphasize that varying this conversion factor does not affect the core results or conclusions of our study. We have now included a clarification in the revised SI to highlight this point.

      Recommendations for the authors:

      Other points.

      Figure 1(D) caption says 2.25σ = 1.6 nanometer. Is this a typo? Sigma is 30nm.

      Yes, it was. As 1σ ∼ 30nm, we have 2.25σ = 2.25 · 30 nm = 67.2 nm ∼ 6.7 × 10<sup>−8</sup>m. We have now corrected the caption.

      Page 6, column 2nd, 3rd para, it is written that θ<sub>dem</sub> (”defined in Fig.1”). There is no θ<sub>dem</sub> defined in Fig.1, is there? I can see it defined in Methods but not in Fig. 1.

      Correct, we replaced (defined in Fig.1) with (see Methods for definition).

      Page 6, column 2, 4th para: what does “correlations overlap and correlations diverge mean”?

      With reference to the plots from Fig. 5B, correlation overlap and diverge simply refers to the fact that same-colour (red curves) and different-colour (blue curves) correlation trends may or may not overlap on each other. We have now clarified this point.

      What is the precise definition of correlation in Fig 5B (Y-axis)?

      In Fig.5B, correlation means Pearson correlation. We have now specified this point in the revised text and in the caption of Fig.5.

      References

      (1) S. A. Quinodoz, J. W. Jachowicz, P. Bhat, N. Ollikainen, A. K. Banerjee, I. N. Goronzy, M. R. Blanco, P. Chovanec, A. Chow, Y. Markaki et al., “Rna promotes the formation of spatial compartments in the nucleus,” Cell, vol. 184, no. 23, pp. 5775–5790, 2021.

      (2) R. A. Beagrie, A. Scialdone, M. Schueler, D. C. Kraemer, M. Chotalia, S. Q. Xie, M. Barbieri, I. de Santiago, L.-M. Lavitas, M. R. Branco et al., “Complex multi-enhancer contacts captured by genome architecture mapping,” Nature, vol. 543, no. 7646, pp. 519–524, 2017.

      (3) R. A. Beagrie, C. J. Thieme, C. Annunziatella, C. Baugher, Y. Zhang, M. Schueler, A. Kukalev, R. Kempfer, A. M. Chiariello, S. Bianco et al., “Multiplex-gam: genome-wide identification of chromatin contacts yields insights overlooked by hi-c,” Nature Methods, vol. 20, no. 7, pp. 1037–1047, 2023.

      (4) L. Liu, B. Zhang, and C. Hyeon, “Extracting multi-way chromatin contacts from hi-c data,” PLOS Computational Biology, vol. 17, no. 12, p. e1009669, 2021.

      (5) R.-S. Nozawa, L. Boteva, D. C. Soares, C. Naughton, A. R. Dun, A. Buckle, B. Ramsahoye, P. C. Bruton, R. S. Saleeb, M. Arnedo et al., “Saf-a regulates interphase chromosome structure through oligomerization with chromatin-associated rnas,” Cell, vol. 169, no. 7, pp. 1214–1227, 2017.

      (6) E. A. Boyle, Y. I. Li, and J. K. Pritchard, “An expanded view of complex traits: from polygenic to omnigenic,” Cell, vol. 169, no. 7, pp. 1177–1186, 2017.

      (7) C. Brackley, N. Gilbert, D. Michieletto, A. Papantonis, M. Pereira, P. Cook, and D. Marenduzzo, “Complex small-world regulatory networks emerge from the 3d organisation of the human genome,” Nat. Commun., vol. 12, no. 1, pp. 1–14, 2021.

      (8) R. B. Brem and L. Kruglyak, “The landscape of genetic complexity across 5,700 gene expression traits in yeast,” Proceedings of the National Academy of Sciences, vol. 102, no. 5, pp. 1572– 1577, 2005.

      (9) M. Chiang, C. A. Brackley, D. Marenduzzo, and N. Gilbert, “Predicting genome organisation and function with mechanistic modelling,” Trends in Genetics, vol. 38, no. 4, pp. 364–378, 2022.

      (10) M. Chiang, C. A. Brackley, C. Naughton, R.-S. Nozawa, C. Battaglia, D. Marenduzzo, and N. Gilbert, “Genome-wide chromosome architecture prediction reveals biophysical principles underlying gene structure,” Cell Genomics, vol. 4, no. 12, 2024.

      (11) A. Buckle, C. A. Brackley, S. Boyle, D. Marenduzzo, and N. Gilbert, “Polymer simulations of heteromorphic chromatin predict the 3d folding of complex genomic loci,” Mol. Cell, vol. 72, no. 4, pp. 786–797, 2018.

      (12) G. Forte, A. Buckle, S. Boyle, D. Marenduzzo, N. Gilbert, and C. A. Brackley, “Transcription modulates chromatin dynamics and locus configuration sampling,” Nature Structural & Molecular Biology, vol. 30, no. 9, pp. 1275–1285, 2023.

      (13) P. R. Cook and D. Marenduzzo, “Transcription-driven genome organization: a model for chromosome structure and the regulation of gene expression tested through simulations,” Nucleic acids research, vol. 46, no. 19, pp. 9895–9906, 2018.

      (14) M. Marenda, D. Michieletto, R. Czapiewski, J. Stocks, S. M. Winterbourne, J. Miles, O. C. Flemming, E. Lazarova, M. Chiang, S. Aitken et al., “Nuclear rna forms an interconnected network of transcription-dependent and tunable microgels,” BioRxiv, pp. 2024–06, 2024.

      (15) J.-K. Ryu, C. Bouchoux, H. W. Liu, E. Kim, M. Minamino, R. de Groot, A. J. Katan, A. Bonato, D. Marenduzzo, D. Michieletto et al., “Bridging-induced phase separation induced by cohesin smc protein complexes,” Science advances, vol. 7, no. 7, p. eabe5905, 2021.

      (16) A. Shakya, S. Park, N. Rana, and J. T. King, “Liquid-liquid phase separation of histone proteins in cells: role in chromatin organization,” Biophysical journal, vol. 118, no. 3, pp. 753–764, 2020.

      (17) A.-M. Florescu, P. Therizols, and A. Rosa, “Large scale chromosome folding is stable against local changes in chromatin structure,” PLoS computational biology, vol. 12, no. 6, p. e1004987, 2016.

    1. eLife Assessment

      This important study identifies a metal transporter in the plasma membrane of the obligate intracellular pathogen, Toxoplasma gondii. Using an array of different approaches, the authors convincingly demonstrate that this transporter mediates iron and zinc uptake and regulates diverse cellular processes, including parasite metabolism and differentiation. This work will be of broad interest to cell biologists and biochemists studying metal ion transport mechanisms.

    2. Reviewer #1 (Public review):

      In this manuscript, Aghabi et al. present a comprehensive characterization of ZFT, a metal transporter located at the plasma membrane of the eukaryotic parasite Toxoplasma gondii. The authors provide convincing evidence that ZFT plays a crucial role in parasite fitness, as demonstrated by the generation of a conditional knock-down mutant cell line, which exhibits a marked impact on mitochondrial respiration, a process dependent on several iron-containing proteins. Consistent with previous reports, the authors also show that disruption of mitochondrial metabolism leads to conversion into the persistent bradyzoite stage.

      The study then employed advanced techniques, such as inductively coupled plasma-mass spectrometry (ICP-MS) and X-ray fluorescence microscopy (XFM), to demonstrate that ZFT depletion results in reduced parasite-associated metals, particularly iron and zinc. Additionally, the authors show that ZFT expression is modulated by the availability of these metals, although defects in the transporter could not be compensated by exogenous addition of iron or zinc. Finally, the authors used heterologous expression of ZFT in Xenopus oocytes and yeast mutants, highlighting the dual substrate specificity of the transporter. The ability of ZFT to transport both iron and zinc is thus supported by two experimental approaches in heterologous systems. First by demonstrating ZFT ability to transport zinc, as the expression of Toxoplasma ZFT can compensate for a lack of zinc transport in yeast. Then, by showing the ability of ZFT to transport iron, as assessed in the Xenopus oocytes model. Furthermore, phenotypic analyses suggest defects in iron availability upon ZFT depletion, particularly with regard to Fe-S mitochondrial proteins and mitochondrial function.

      Overall, the manuscript provides a solid, well-rounded argument for ZFT's role in metal transport, using a combination of complementary approaches. The converging evidence, including changes in metal concentrations upon ZFT depletion, data on metal transport obtained in heterologous systems, and phenotypic changes linked to iron deficiency, presents a convincing case. Given that metal acquisition remains largely uncharacterized in Toxoplasma, this manuscript provides an important first step in identifying a metal transporter in these parasites, and the data presented are generally convincing and insightful.

      Comments on revisions:

      The revised manuscript has successfully addressed all of the key points raised in the initial review. Notably, the metal transport experiments in Xenopus oocytes now provide compelling evidence supporting the role of ZFT function. I congratulate the authors on their efforts and have no further concerns to raise.

    3. Reviewer #2 (Public review):

      Summary:

      The intracellular pathogen Toxoplasma gondii scavenges metal ions such as iron and zinc to support its replication; however, mechanistic studies of iron and zinc uptake are limited. This study investigates the function of a putative iron and zinc transporter, ZFT. In this paper, the authors provide evidence that ZFT mediates iron and zinc uptake by examining the regulation of ZFT expression by iron and zinc levels, the impact of altered ZFT expression on iron sensitivity, and the effects of ZFT depletion on intracellular iron and zinc levels in the parasite. The effects of ZFT depletion on parasite growth are also investigated, showing the importance of ZFT function for the parasite.

      Strengths:

      A key strength of the study is the use of multiple complementary approaches to demonstrate that ZFT is involved in iron and zinc uptake. The heterologous expression of ZFT in a Xenopus oocyst system where ZFT was shown to transport iron and zinc is an important addition to the study. The authors also build on their finding that loss of ZFT impairs parasite growth by showing that ZFT depletion induces stage conversion and leads to defects in both the apicoplast and mitochondrion.

      Weaknesses:

      The inclusion of the data showing iron and zinc transport when ZFT is expressed in a Xenopus oocyst system alleviated one of the main weaknesses of the original paper - the lack of direct biochemical evidence that ZFT acted as an iron transporter.

    4. Reviewer #3 (Public review):

      Summary:

      Aghabi et al set out to characterize a T. gondii transmembrane protein with a ZIP domain, termed ZFT. The authors investigate the consequences of ZFT downregulation and overexpression for parasite fitness. Downregulation of ZFT causes defects in the parasite's endosymbiotic organelles, the apicoplast and the mitochondrion. Specifically, lack of ZFT causes a decrease in mitochondrial respiration, consistent with its role as an iron transporter. This impact on the mitochondria appears to trigger partial differentiation to bradyzoites. The authors furthermore demonstrate that expression of TgZFT can rescue a yeast mutant lacking its zinc transporter and perform an array of direct metal ion measurements including X-ray fluorescence microscopy and inductively coupled mass spectrometry (ICP-MS). These reveal reduced metal ions in parasites depleted in ZFT. In the manuscript's revision, the authors performed additional transport assays in Xenopus oocysts, providing further evidence for the transporter trafficking iron. Overall, the data by Aghabi et al. convincingly support that ZFT is a major metal ion transporter in T. gondii, importing iron and zinc for diverse essential processes.

      Strengths:

      This study's strength lies in the thorough characterization of the transporter. The authors combine a number of techniques to measure the impact of ZFT depletion, ranging form the direct measurement of metal ions to determining the consequences for the parasite's metabolism (mitochondrial respiration) as well as performing a yeast mutant complementation and transport assays in Xenopus oocysts expressing the T. gondii protein. This work is very thorough and clearly presented, leaving little doubt about this protein's function.

      Weaknesses:

      None. The authors have addressed all my previous queries/ concerns.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, Aghabi et al. present a comprehensive characterization of ZFT, a metal transporter located at the plasma membrane of the eukaryotic parasite Toxoplasma gondii. The authors provide convincing evidence that ZFT plays a crucial role in parasite fitness, as demonstrated by the generation of a conditional knockdown mutant cell line, which exhibits a marked impact on mitochondrial respiration, a process dependent on several iron-containing proteins. Consistent with previous reports, the authors also show that disruption of mitochondrial metabolism leads to conversion into the persistent bradyzoite stage. The study then employed advanced techniques, such as inductively coupled plasma-mass spectrometry (ICP-MS) and X-ray fluorescence microscopy (XFM), to demonstrate that ZFT depletion results in reduced parasite-associated metals, particularly iron and zinc. Additionally, the authors show that ZFT expression is modulated by the availability of these metals, although defects in the transporter could not be compensated for by exogenous addition of iron or zinc. 

      While the manuscript does not directly investigate the transport function of ZFT through biochemical assays, the authors indirectly support the notion that ZFT can transport zinc by demonstrating its ability to compensate for a lack of zinc transport in a yeast heterologous system. Furthermore, phenotypic analyses suggest defects in iron availability, particularly with regard to Fe-S mitochondrial proteins and mitochondrial function. Overall, the manuscript provides a solid, well-rounded argument for ZFT's role in metal transport, using a combination of complementary approaches. Although direct biochemical evidence for the transporter's substrate specificity and transport activity is lacking, the converging evidence, including changes in metal concentrations upon ZFT depletion, yeast complementation data, and phenotypic changes linked to iron deficiency, presents a convincing case. Some aspects of the results may appear somewhat unbalanced, particularly since iron transport could not be confirmed through heterologous complementation, while zinc-related phenotypes in the parasites have not been thoroughly explored (which is challenging given the limited number of zinc-dependent proteins characterized in Toxoplasma). Nevertheless, given that metal acquisition remains largely uncharacterized in Toxoplasma, this manuscript provides an important first step in identifying a metal transporter in these parasites, and the data presented are generally convincing and insightful. 

      We thank the reviewer for their assessment and would like to highlight that we now add direct biochemical characterisation in the new Figure 8, supporting our hypothesis and confirming iron transport by this protein.

      Reviewer #2 (Public review): 

      Summary: 

      The intracellular pathogen Toxoplasma gondii scavenges metal ions such as iron and zinc to support its replication; however, mechanistic studies of iron and zinc uptake are limited. This study investigates the function of a putative iron and zinc transporter, ZFT. In this paper, the authors provide evidence that ZFT mediates iron and zinc uptake by examining the regulation of ZFT expression by iron and zinc levels, the impact of altered ZFT expression on iron sensitivity, and the effects of ZFT depletion on intracellular iron and zinc levels in the parasite. The effects of ZFT depletion on parasite growth are also investigated, showing the importance of ZFT function for the parasite. 

      Strengths: 

      A key strength of the study is the use of multiple complementary approaches to demonstrate that ZFT is involved in iron and zinc uptake. Additionally, the authors build on their finding that loss of ZFT impairs parasite growth by showing that ZFT depletion induces stage conversion and leads to defects in both the apicoplast and mitochondrion. 

      Weaknesses: 

      (1) Excess zinc was shown not to alter ZFT expression, but a cation chelator (TPEN) did lead to decreased expression. While TPEN is often used to reduce zinc levels, does it have any effect on iron levels? Could the reduction in ZFT after TPEN treatment be due to a reduction in the level of iron or another cation?

      WE thank the reviewers for this comment, we agree that TPEN is a fairly unspecific cation chelator so to determine if its effects are due to removal of zinc or other cations we treated with TPEN and either zinc or iron. Co-incubation of TPEN and zinc prevented ZFT depletion, while TPEN+FAC had no effect compared to TPEN alone (new Figure 6h and i), strongly suggesting the effects on ZFT abundance are linked to zinc and not just iron.  

      (2) ZFT expression was found to be dynamic depending on the size of the vacuole, based on mean fluorescence intensity measurements. Looking at protein levels by Western blot at different times during infection would strengthen this finding. 

      We show here that ZFT expression is highly dynamic, depending both the iron status of the host cell and the number of parasites/vacuole. However, validating this finding by western would be complex due to the highly unsynchronised nature of parasite replication and the large number (5x10<sup>6</sup> - 1x10<sup>7</sup>cells) of parasites required to visualise ZFT. Further, we show that ZFT is apparently internalised prior to degradation. For this reason, we have not attempted to validate this finding by western blotting at this time.

      (3) ZFT localization remained at the parasite periphery under low iron conditions. However, in the images shown in Figure S1c, larger vacuoles (containing 4-8 parasites) are shown for the untreated conditions, and single parasite-containing vacuoles are shown for the low iron condition. As ZFT localization is predominantly at the basal end of the parasite in larger PV and at the parasite periphery for smaller vacuoles, it would be better to compare vacuoles of similar size between the untreated and low-iron conditions.

      The reviewer brings up a good point, the concentration of iron chelator that we used here does not enable parasite replication, making an assessment of changes in localisation challenging. To address this, have new data using a much lower concentration of chelator (20 mM), which is still expected to impact the parasites (Hanna et al, 2025), but allows for replication. In this low iron environment, ZFT localisation remained significantly more peripheral (Fig. S1d,e), supporting our hypothesis that ZFT localisation is iron dependent, independent of vacuolar stage.

      Reviewer #3 (Public review): 

      Summary:

      Aghabi et al set out to characterize a T. gondii transmembrane protein with a ZIP domain, termed ZFT. The authors investigate the consequences of ZFT downregulation and overexpression for parasite fitness. Downregulation of ZFT causes defects in the parasite's endosymbiotic organelles, the apicoplast and the mitochondrion. Specifically, lack of ZFT causes a decrease in mitochondrial respiration, consistent with its role as an iron transporter. This impact on the mitochondria appears to trigger partial differentiation to bradyzoites. The authors furthermore demonstrate that expression of TgZFT can rescue a yeast mutant lacking its zinc transporter and perform an array of direct metal ion measurements, including X-ray fluorescence microscopy and inductively coupled mass spectrometry (ICP-MS). These reveal reduced metal ions in parasites depleted in ZFT. Overall, the data by Aghabi et al. reveal that ZFT is a major metal ion transporter in T. gondii, importing iron and zinc for diverse essential processes. 

      Strengths:

      This study's strength lies in the thorough characterization of the transporter. The authors combine a number of techniques to measure the impact of ZFT depletion, ranging from the direct measurement of metal ions to determining the consequences for the parasite's metabolism (mitochondrial respiration), as well as performing a yeast mutant complementation. This work is very thorough and clearly presented, leaving little doubt about this protein's function. 

      Weaknesses:

      This study offers no major novel insights into the biology of T. gondii. The transporter was already annotated as a zinc transporter (ToxoDB), was deemed essential (PMID: 27594426), and localized to the plasma membrane (PMID: 33053376). This study mostly confirms and validates these previous datasets. The authors identify three other proteins with a ZIT domain. Particularly, the role of TGME49_225530 is intriguing, as it is likely fitness-conferring (score: -2.8, PMID: 27594426) and has no subcellular localization assigned. Characterizing this protein as well, revealing its localization, and identifying if and how these transporters coordinate metal ion transport would have been worthwhile. 

      We agree that the work presented here validates the previous datasets, and if that was all we had done, we agree that the biological insights would be limited. However, we have gone significantly beyond the predictions, demonstrating dynamic localisation changes, iron-mediated regulation, the lack of substrate-based complementation and validating transport activity of both zinc and iron. Although in silico predictions and screens can be informative, it remains important to validate biological functions experimentally. While we agree that characterisation of TGME49_225530 (as well as the other two annotated ZIP proteins) would be interesting, and will certainly form part of our future plans, it is significantly beyond the scope of the presented manuscript.

      Another weakness is the data related to the impact of ZFT downregulation on the apicoplast in Figure 4. The authors show that downregulation of ZFT causes an increase in elongated apicoplasts (Figure 4d). The subsequent panels seem to show that the parasites present a dramatic growth defect at that time point. This growth arrest can directly explain the elongated apicoplast, but does not allow any conclusion about an impact on the organelle. In any case, an assessment of 'delayed death' as presented in Figure 4c seems futile, since the many other processes affected by zinc and iron depletion likely cause a rapid death, masking any potential delayed death.

      To address this point, we agree that given the importance of iron and zinc to the parasite that we cannot differentiate the death of the parasite due to apicoplast defects from death from other causes and we have modified the discussion to reflect this, as below.

      “However, given the delayed phenotype typically seen upon apicoplast disruption, we cannot determine if this is a direct effect of ZFT, or a downstream consequence of metal depletion”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Specific Comments: 

      (1) The background on the typical sequence features that would identify Toxoplasma ZIP homologues should be expanded and clarified. While these proteins are likely quite divergent and may lack many conserved features, the manuscript currently does not provide enough detail to assess how similar (or different) TgZIPs are from well-characterized family members. Additionally, the justification for focusing on TGGT1_261720 (ZFT) over TGGT1_225530, as stated in the first paragraph of the results section, seems unclear. There is no predictive data supporting a potential plasma membrane localization for TGGT1_225530 (yet this cannot be excluded), and TGGT1_225530 appears to have more canonical metal-binding motifs. I believe that the fact that only TGGT1_261720 is iron-regulated should be sufficient justification for its selection, and this point could be emphasized more clearly. Furthermore, the discussion mentions a leucine residue that may be associated with broad substrate specificity, but this is not addressed in the initial comparative sequence analysis. These residues and the HK motif are not actually addressed in the Gyimesi et al. reference currently mentioned; thus this could be clarified and updated with references (such as PMID: 31914589) that provide more recent insights into key residues involved in metal selectivity in ZIP transporters.

      We thank you for this comment, to address these points:

      We agree that the iron-mediated regulation is sufficient for our focus on ZFT and have clarified the text to reflect this, as described above.

      We have also updated the references as suggested, our apologies for this oversight.

      We have further expanded the discussion, especially with reference to our new results using heterologous expression in oocytes (please see above).

      (2) Figure 1D, Figure 2A, C, H, Figure 3D, Figure 6F, H, corresponding text and paragraph 2 of the Discussion: It seems that most of the "non-specific bands" annotated in Figure 1D, which are lower molecular weight products, are not present in the parental cell line, suggesting they may not be non-specific after all. These bands also vary depending on the cell line (e.g., promoter used, see Figures 2H and 3D) or experimental conditions (e.g., iron excess or depletion). Given the dynamic localization of ZFT during intracellular development, it may be worth exploring whether these lower molecular weight bands represent degraded forms of TgZFT, possibly corresponding to the basally-clustered signal observed by immunofluorescence, with only the full-length protein associating with the plasma membrane. This possibility should be investigated or at least discussed further.

      While the lower bands are not present in the parental, we do see them in other HA-tagged lines, especially when the expression of the tagged protein is low, seen below (Author response image 1). We don’t currently have an explanation for these, but we can confirm that they do not change in abundance in parallel with the full length protein, supporting our hypothesis that these bands are an artefact of the anti-HA antibody in our system. Although ZFT is clearly degraded (e.g. Fig. 1g), we currently do not believe these bands are ZFT c-terminal degradation products.

      Author response image 1.

      Western blot of ZFT-3HA<sub>zft</sub> and another HA-tagged unrelated cytosolic protein, demonstrating that the lower bands are most likely nonspecific.

      (3) It is unfortunate that ZFT could not complement a yeast iron transporter mutant cell line, as this would have provided a strong argument for ZFT's role in iron transport. The manuscript does not provide much detail about the Δfet2/3 yeast mutant line. Fet3 is the ferroxidase subunit, while Ftr1 is the permease subunit of the high-affinity iron transport complex in yeast. Fet2, however, appears to be Saccharomyces cerevisiae's VPS41 homolog. Therefore, is Δfet2/3 the most appropriate mutant to use, or would another mutant line (e.g., ΔFtr1) be a better choice? Additionally, while Figure 7 suggests a decrease in metal uptake upon ZFT depletion, it would be useful to test whether overexpression of ZFT leads to enhanced metal incorporation, perhaps using a FerroOrange assay. 

      We thank the reviewer for their comments, which we have answered below:

      The Δfet2/3 yeast mutant was a typo and has been corrected, or apologies, we did use the  Δfet3/4 mutant line, based on previous successful experiments involving plant metal transporters (e.g  (DiDonato et al., 2004)).

      Unfortunately, we were unable to perform the FerroOrange assay in the overexpression line as this line is endogenously fluorescent in the same channel as FerroOrange.

      However, as detailed above we have now added significant new data, confirming our hypothesis that ZFT is an iron/zinc transporter through heterologous expression in Xenopus oocytes in the new figure 8. This provides direct evidence of transport of iron, and evidence that zinc can inhibit this transport, consistent with our hypothesis.  

      (4) The annotation of the blot in Figure 2H suggests that overexpressed ZFT-TY can only be detected in the absence of heat denaturation. However, this is not addressed in the text. Does heat denaturation also affect the detection of ZFT-3HA or the lower molecular weight products? This should be clarified in the manuscript. 

      Interestingly, ZFT is detectable after boiling at 95° C for 5 minutes when expressed at endogenous (or near endogenous) levels in the ZFT-3HA<sub>sag1</sub> and ZFT-3HA<sub>zft</sub> tagged parasite lines. However, overexpression of ZFT leads to a loss of detection via western blot when boiled, although the protein is detectable without heat denaturation.

      A possible explanation for this is that overexpression of protein may cause ZFT to miss-fold, making the protein more prone to aggregation following boiling, rendering the protein insoluble and unable to enter the gel. Moreover, heat aggregation can sometimes mask the epitope tags on the protein that is required for the antibody to be recognised, possibly explaining by ZFT is undetectable when overexpressed and exposed to boiling conditions, as has previously been observed for other transmembrane proteins (e.g. (Tsuji, 2020)).

      We have clarified this in the results section, although we do not have a full explanation for this, we consider it important to share for others who may be looking at expression of these proteins.

      (5) Figure 3G: It might be helpful to include an uncropped gel profile to allow readers to visualize that the main product does indeed correspond to a potential dimeric form in the native PAGE. 

      This has now been added in Figure S3e, thank you for this suggestion.

      (6) The investigation of the impact of ZFT depletion on the apicoplast could be improved. The authors suggest that ZFT knockdown inhibits apicoplast replication based on a modest increase in elongated organelles, but the term "delayed death" is not appropriate in that case, as it is typically linked to a loss of the organelle. This is not observed here and is also illustrated by the unchanged CPN60 processing profile. So, clearly, there seems to be no strong morphological effect on the apicoplast early on after ZFT depletion. On the other hand, the authors dismiss any impact on TgPDH-E2 lipoylation (which is iron-dependent) based on the fact that the lipoylated form of the protein is still detected by Western blot. However, closer inspection of the blot in Figure 4B suggests that the intensity of the annotated TgPDH-E2 signal is reduced compared to the -ATc condition (although there might be differences in protein loading, as indicated by the control) or even with the mitochondrial 2-oxoglutarate dehydrogenase-E2, whose lipoylation is presumably iron-independent (see PMID: 16778769). This experiment should be repeated, and the results quantified properly in case something was missed, and the duration of depletion conditions perhaps extended further. Of note, it would also be worthwhile to revisit size estimations, as the displayed profiles seem inconsistent with the typical sizes of lipoylated proteins detected with the anti-lipoyl antibody (e.g., ~100 kDa for PDH-E2, ~60 kDa for branched-chain 2-oxo acid dehydrogenase, and ~40 kDa 2-oxoglutarate dehydrogenase).

      We thank the reviewer for this comment. We agree that there is no strong defect on the apicoplast in the first lytic cycle and we have modified the language to remove reference to delayed death, as given the magnitude of changes associated with loss of iron and zinc, we cannot be certain about the role of the apicoplast.

      Based on this suggestion, we have now quantified the levels of lipoylation of PDH-E2, BDCK-E2 and OGDH-E2 and now include this in Figure S4b, c, d. Supporting our other results, we do not see a significant change in PDH-E2 lipolyation upon ZFT knockdown. However, although OGDH-E2 lipoylation is unchanged (Figure S4c) interestingly we do see a significant increase in BDCK-E2 lipoylation (Figure S4d). This process is not expected to be directly iron related, as mitochondrial lipoylation is through scavenging rather than synthesis however, speaks to the larger mitochondrial disruption that we see. We now consider this further in the discussion.

      For the sizes, we thank the reviewer for bringing this up, our apologies this was due to an error in the annotation, and we have now corrected this in the figure.

      (7) In the third paragraph of the discussion, the authors mention the inability to complement ZFT loss by adding exogenous metals. One argument is the potential lack of metal access to the parasitophorous vacuole (PV). Although largely unexplored, this point could be expanded further in the discussion, as the issue of metal transport to the parasite involves not only the parasite plasma membrane but also the PV membrane. Additionally, the authors mention the absence of functional redundancy in transporters, but it would be helpful to discuss potential stage-specific or differential expression of other ZIP candidates. Transcriptomic data available on Toxodb.org could provide useful insights into this, and experimental approaches, such as RT-PCR, could be used to assess the expression of these candidates in the absence of ZFT. 

      On the issue of metals crossing the PV membrane, we agree that while we do not currently know mechanisms of metal transport within the infected host cell, we do have experimental confirmation that the concentration and form of the metals that we are using can impact the parasites. We show that metal treatment inhibits parasites growth (e.g. Figure 3k-n, Figure 6a-d) and we can detect the increased metals through our experiments using FerroOrange and FluroZine (Figure 7a, c). In these experiments, parasites were treated intracellularly and so we can confirm that, regardless of the mechanism, iron and zinc can reach the parasite. While entry of metals across the PV is an intriguing question, it is beyond the scope of the present work which focuses on the role of the selected transporter.

      We agree that a more detailed discussion of the other ZIP transporters is warranted. We have extended this section of the discussion although for now, we cannot determine the role of the other ZIP transporters in Toxoplasma.

      (8) In the discussion, the authors mention that « Inhibition of respiration has previously been linked to bradyzoite conversion ». To strengthen their point, the authors could mention that mitochondrial Fe-S mutants, as well as mutants affecting mitochondrial translation or the mitochondrial electron transport chain, also initiate bradyzoite conversion (PMID: 34793583). This would reinforce the connection between mitochondrial dysfunction and stage conversion. 

      This is an excellent point and we have added this to the discussion as follows:

      “Inhibition of mitochondrial Fe-S biogenesis or mitochondrial respiration have both previously been linked to bradyzoite conversion (Pamukcu et al., 2021; Tomavo and Boothroyd, 1995), however we do not yet know the signalling factors linking iron, zinc or mitochondrial function to bradyzoite differentiation”.

      (9) As a general comment on manuscript formatting, providing page and line numbers would significantly improve the manuscript's readability and allow reviewers to more easily reference specific sections. This would help address the minor issues of typos (e.g., multiple occurrences of "promotor"). I suggest a careful read-through to correct these issues. 

      We thank the reviewer for this comment and in the resubmitted version we have corrected these issues. 

      Reviewer #2 (Recommendations for the authors): 

      (1) In the alignment (Figure 1a), the BPZIP sequence is from which organism (genus, species)? It would be helpful to include this information in the figure legend.

      Apologies for this oversight, this figure and section have been reworked and the species name (Bordetella bronchiseptica) added.

      (2) In reference to Figure 1a, the authors state, "Interestingly, all parasite ZIP-domain proteins examined have a HK motif at the M2 metal binding". I was wondering if by "all" the authors mean Toxoplasma and Plasmodium falciparum (shown in Figure 1a) or did the authors also look at other apicomplexan parasites such as Cryptosporidium or Neospora? Is this a general feature of apicomplexan parasites? 

      We looked at this, and the HK motif in the M2 binding site is conserved in Neospora Cryptosporidium, and even the digenic gregarine Porospora cf. gigantea. However, in the more distantly related Chromera we find a HH motif at the same position. This suggests that the HK motif is present in the Apicomplexa, but not conserved in the free-living Alveolata. Although we cannot speculate on the role of this motif currently, its role in metal import in Apicomplexa does deserve future scrutiny. To reflect this finding we have modified Figure 1a and the text.

      (3) In Figure 1e, to better visualize the ZFT-3HA staining at the basal pole, it would be better to omit the DAPI staining from the merged image. It is difficult to see the ZFT staining in the image of the large vacuole.

      We have removed the DAPI from this image to improve clarity.

      (4) Based on the "delayed-death" phenotype of the apicoplast, it is not surprising that no defects were observed in CPN60 processing or protein lipoylation. Have the authors considered measuring these phenotypes after a further round of growth (as was done for visualizing apicoplast morphology)? 

      We agree that changes in apicoplast function are often only seen in the second round of replication. However, here we wanted to check if ZFT depletion led to immediate changes in function of the organelle, which was not the case. It is highly likely that after the second round, we would see significant defects in the apicoplast function, however given the immediate importance of iron and zinc to many processes within the parasite, we believe that these experiments would be complicated to interpret.

      (5) Depleting ZFT led to a reduction in expression levels for the mitochondrial Fe-S protein SDHB but not for a cytosolic Fe-S protein. Is it expected that less intracellular iron (via depleted ZFT) would differentially affect mitochondrial versus cytosolic Fe-S proteins? 

      Previous studies (e.g., Maclean et al., 2024; Renaud et al., 2025) have shown that upon direct inhibition of the cytosolic Fe-S pathway, ABCE1 is fairly stable and levels can persist for 2-3 days post treatment. However, our recent work has shown that rapid and acute depletion of iron directly (though treatment with a chelator) can lead to ABCE1 levels decreasing within 24h (Hanna et al., 2025). In the case of ZFT knockdown, due to the more gradual reduction in iron levels seen (e.g. Figure 7j) we believe the parasites are prioritising key Fe-S pathways (e.g. essential proteostasis through ABCE1), probably while remodelling metabolism (as seen in our Seahorse assays). However, there are many proteins expected to be directly impacted by iron and zinc restriction that these parasites experience, and different protein classes are expected to behave differently in these conditions.

      Reviewer #3 (Recommendations for the authors): 

      (1) Is the effect on the plaque size between T7S4-ZFT (-aTc) in regular and 'high iron' conditions significant? The authors show convincingly that the plaque size is smaller due to the swapped promoter and the resulting overexpression of ZFT. But is the effect aggravated in high iron? This would be expected if excess iron were the problem.

      The plaque sizes are significantly smaller in the T7S4-ZFT line under high iron compared to the untreated condition, and compared to the parental untreated line. However, if we normalise plaque size to untreated conditions for both lines, there is not a significant change in plaque size in high iron between the parental and T7S4-ZFT. This is possibly due to the concentration of iron used (200 mM), which may not be optimal to see this effect, or the time taken for plaque assays (6-7 days), which may allow the excess iron to be stored by the host cells, changing the effective concentration of parasite exposure.

      (2) I struggle to understand the intracellular growth assay in Figure 5b. Here, T7S4-ZFT parasites show 25 % of vacuoles with more than 8 parasites (labelled 8+). But such large vacuoles are not observed in the parental strain. It appears as if the inducible strain grows faster even though it was earlier shown to have a fitness defect (see Figure 3j). Can you please clarify?

      This is a result of rapid growth of the parental line, some vacuoles in this line lysed and initiated a new round of replication at this time point while we saw no evidence at any timepoint that ZFT-depleted parasites were able to lyse the host cell. However, the initial (24-48h post ATc addition) replication rate of the ZFT KD remains similar to the parental. In this panel, we wanted to emphasize that the major phenotype we see upon ZFT depletion is vacuole disorganisation, which we believe is linked to the start of differentiation into bradyzoites.

      (3) Did the authors perform an IFA in addition to the Western blot to localize the 2nd Ty-tagged ZFT copy? It seems important to validate that the protein correctly localizes to the plasma membrane. 

      We have done so and now include these data in Figure S2b. Overexpression of ZFT-Ty localises to internal structures (probably vesicles) with some signal at the periphery, however, this limited expression at the periphery is sufficient to mediate the phenotypes that we see.

      (4) First sentence of the abstract and introduction: The authors speak of metabolism and cellular respiration as though they are two different processes. Is respiration not part of metabolism? 

      This is an excellent point, we wanted to distinguish mitochondrial respiration  from general cellular metabolism, but this was not clear. We have now changed this in the introduction to the below:

      “Iron, and other transition metals such as zinc, manganese and copper, are essential nutrients for almost all life, playing vital roles in biological processes such as DNA replication, translation, and metabolic processes including mitochondrial respiration (Teh et al., 2024)”

      (5) 2nd paragraph of the introduction: toxoplasmosis is written capitalized but should be lower case.

      This has been corrected.

      (6) Figure 4j legend: change 'shits parasites to a more quiescent stage' to 'shifts parasites'.

      This has been corrected, our apologies.

      (7) Please correct the following sentence: 'These data demonstrate ZFT depletion leads to the expression of the bradyzoite-specific markers BAG1 and DBL.' DBL is not expressed by the parasite. It is a lectin that binds to the sugars in the cyst wall.

      We have now modified this in the text. The sentence now reads: “These data show that ZFT depletion leads to the expression of the bradyzoite marker BAG1 and the production of the cyst wall, as detected by DBL”.

      (8) In the section on yeast complementation with TgZFT, the authors write: 'Based on this success, we also attempted to complement...'. Please consider changing 'Success' to something more neutral.

      We have modified the text to now read: “Based on these results, we also attempted to complement”…

      (9) In the discussion, the authors write: 'We see a delayed phenotype on the apicoplast, suggesting that metal import is also required in this organelle, although no apicoplast metal transporters have yet been identified.' Please consider the study Plasmodium falciparum ZIP1 Is a Zinc-Selective Transporter with Stage-Dependent Targeting to the Apicoplast and Plasma Membrane in Erythrocytic Parasites (PMID: (38163252).

      We thank the reviewer for the note and have modified the text to include this and the reference. Please see below:

      “Iron is known to be required in the apicoplast (Renaud et al., 2022), zinc also may be required, as the fitness-conferring Plasmodium zinc transporter ZIP1 is transiently localised to the apicoplast (Shrivastava et al., 2024), although the functional relevance of this localisation has not yet been established”.

      (10) The authors write: 'Iron is known to be required in the apicoplast (Renaud et al., 2022), although a potential role for zinc in this organelle has not yet been established.' The role for zinc in the apicoplast may not have been shown formally, but surely among its hundreds of proteins, and those involved in replication and transcription, there are some that depend on zinc...?

      Yes, we agree it would make sense, however multiple searches using ToxoDB and the datasets from Chen et al (2025) were unable to find any apicoplast-localised proteins with zinc-binding domains. We cannot exclude that zinc is in the apicoplast, and the results from Plasmodium (Shrivastava et al., 2024) may suggest that is, however currently we do not have any evidence for its role within this organelle.

      References

      DiDonato, R.J., Roberts, L.A., Sanderson, T., Eisley, R.B., Walker, E.L., 2004. Arabidopsis Yellow Stripe-Like2 (YSL2): a metal-regulated gene encoding a plasma membrane transporter of nicotianamine-metal complexes. Plant J 39, 403–414. https://doi.org/10.1111/j.1365-313X.2004.02128.x

      Hanna, J.C., Shikha, S., Sloan, M.A., Harding, C.R., 2025. Global translational and metabolic remodelling during iron deprivation in Toxoplasma gondii. https://doi.org/10.1101/2025.08.11.669662

      Maclean, A.E., Sloan, M.A., Renaud, E.A., Argyle, B.E., Lewis, W.H., Ovciarikova, J., Demolombe, V., Waller, R.F., Besteiro, S., Sheiner, L., 2024. The Toxoplasma gondii mitochondrial transporter ABCB7L is essential for the biogenesis of cytosolic and nuclear iron-sulfur cluster proteins and cytosolic translation. mBio 15, e00872-24. https://doi.org/10.1128/mbio.00872-24

      Pamukcu, S., Cerutti, A., Bordat, Y., Hem, S., Rofidal, V., Besteiro, S., 2021. Differential contribution of two organelles of endosymbiotic origin to iron-sulfur cluster synthesis and overall fitness in Toxoplasma. PLoS Pathog 17, e1010096. https://doi.org/10.1371/journal.ppat.1010096

      Renaud, E.A., Maupin, A.J.M., Berry, L., Bals, J., Bordat, Y., Demolombe, V., Rofidal, V., Vignols, F., Besteiro, S., 2025. The HCF101 protein is an important component of the cytosolic iron–sulfur synthesis pathway in Toxoplasma gondii. PLoS Biol 23, e3003028. https://doi.org/10.1371/journal.pbio.3003028

      Shrivastava, D., Jha, A., Kabrambam, R., Vishwakarma, J., Mitra, K., Ramachandran, R., Habib, S., 2024. Plasmodium falciparum ZIP1 Is a Zinc-Selective Transporter with Stage-Dependent Targeting to the Apicoplast and Plasma Membrane in Erythrocytic Parasites. ACS Infect. Dis. 10, 155–169. https://doi.org/10.1021/acsinfecdis.3c00426

      Teh, M.R., Armitage, A.E., Drakesmith, H., 2024. Why cells need iron: a compendium of iron utilisation. Trends in Endocrinology & Metabolism 35, 1026–1049. https://doi.org/10.1016/j.tem.2024.04.015 Tomavo, S., Boothroyd, J.C., 1995. Interconnection between organellar functions, development and drug resistance in the protozoan parasite, Toxoplasma gondii. International Journal for Parasitology 25, 1293–1299. https://doi.org/10.1016/0020-7519(95)00066-B.

    1. eLife Assessment

      This important study provides new insights into how Staphylococcus aureus adapts to disulfide stress through the redox-sensitive regulator Spx, which coordinates nutrient uptake, cysteine import, redox homeostasis, and bacterial growth. While the authors present compelling evidence supporting the central role of Spx in managing disulfide stress, several aspects require further clarification. In particular, the precise mechanisms regulating cysteine uptake and the proposed link between disulfide stress responses and iron limitation would benefit from additional explanation and experimental or conceptual justification.

    2. Reviewer #1 (Public review):

      Summary and Strengths:

      This manuscript presents a thoughtful and well-executed analysis of how S. aureus adapts to disulfide stress using a redox-sensitive regulator, Spx, as a lynchpin to coordinate nutrient uptake, redox balance, and growth. The work is strengthened by a systematic and complementary experimental approach that combines genetic, biochemical, and physiological measurements. The authors carefully test alternative explanations and build a coherent model linking stress sensing to downstream metabolic consequences. Several results, particularly those connecting cysteine uptake to growth defects, provide convincing support for the proposed trade-off. Overall, the authors largely achieve their aims, and the evidence generally supports the central conclusions. The conceptual framework and experimental approaches should be of broad interest to researchers studying S. aureus physiology and pathogenesis and to those studying bacterial stress responses and metabolic trade-offs.

      Weaknesses:

      Clarifying several interpretive points would further strengthen confidence in the proposed model. Some conclusions rely on data presentations or experimental designs that are not immediately clear to the reader. In particular, aspects of the protein stability analysis, global regulatory comparisons, and assays linking cysteine uptake to iron limitation would benefit from clearer justification and more precise interpretation. In addition, certain conclusions could be more carefully framed to reflect partial rather than complete rescue effects.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript titled "Activation of the Spx redox sensor counters cysteine-driven Fe(II) depletion under disulfide stress" by Hall and colleagues describes that an active redox switch is required for surviving under the diamide-induced disulfide stress. Furthermore, the SpxC10A mutant exhibits transcriptional dysregulation of genes involved in thiol maintenance and disulfide repair. The authors further demonstrate a role for Spx in regulating the uptake of L-cysteine, which otherwise leads to the chelation of intracellular iron and thus the repression of growth.

      Strengths:

      The authors demonstrate that the SpxC10A mutant accumulates high levels of thiols, leading to the chelation of intracellular iron and subsequent repression of the SpxC10A mutant's growth.

      Weaknesses:

      The authors did not show a direct regulation of L-cysteine uptake through CymR.

    4. Reviewer #3 (Public review):

      Summary:

      The paper from Hall et al. reports the effects of an altered function spx allele on the physiology of S. aureus. Since Spx is essential in this organism, the authors compare WT with a spx C10A allele that retains Spx functions that are independent of the formation of a C10-C13 disulfide. However, the major role of Spx in maintaining disulfide homeostasis in this organism appears to be reduced by this mutation, including a reduction (relative to WT) in the DIA-induction of thioredoxin, thioredoxin reductase, and BSH biosynthesis and reduction enzymes.

      Strengths:

      Based on a wide range of studies, the authors develop a model in which Spx is required for adaptation to disulfide stress, and this adaptation involves (in part) induction of both cystine/Cys uptake and the Fur regulon. Overall, the results are compelling, but further efforts to clarify the presentation will aid readers in being able to follow this very complicated story.

      Weaknesses:

      (1) More details are needed on how relative growth is defined and calculated (e.g., line 145 and Figure 1C). The raw data (growth curves) should be included when reporting relative growth so that readers can see what changed (lag, growth rate, final OD?). Later in the paper, the authors refer to "the diamide-induced growth delay of the spxC10A mutant" (line 379), but this is not apparent from the presented data.

      (2) Are the spx C10A, spx C13A, and spx C10A,C13A all really equivalent? In all cases, the Spx protein is presumably made (as confirmed for C10A in panel 1D). However, the only evidence to suggest that they are equivalent is the similar growth effects in panel 1C, and (as noted above), this data presentation can mask differences in how the mutations affect protein activity.

      (3) Figure 1D and Figure 1 Supplement 2 report results related to the effect of diamide treatment on protein half-life (t1/2). Only single results are shown for both panels, and the conclusions do not seem to be statistically robust. For example, in Figure 1, Supplement 2 concludes that Spx C10A has a t1/2 is 3.38 min (this should be labeled correctly in the Figure legend as the red line). and WT Spx is 8.69 min. However, Figure 1D suggests that the protein levels at time 0 may not be equivalent, and this is lost in the data processing. Indeed, there are significant differences in Spx levels between time 0 - and + DIA, which is curious. Further, the authors' conclusion relies very heavily on line-fitting that includes a final point that has very low signal intensity (as judged from Figure 1D) and therefore is likely the least reliable of all the data. It might be worth showing curve fitting for multiple gels. Regardless of the overfitting of the data, the general conclusion that Spx is partially stabilized against proteolysis by ClpXP, and that the C10A mutant is reduced in stabilization, is probably correct.

      (4) Figure 2 concludes that despite differences in the mRNA profiles between WT and spx C10A after 15 min. of DIA treatment, the overall level of responsiveness of the bacillithiol pool is unchanged. The authors find it "surprising" that the BSH pool responds normally despite some differences in gene expression. This is not surprising. The major events visualized in panel 2D are the chemical oxidation of BSH to BSSB and, presumably, the re-reduction by Bdr(YpdA). While it is seen that BSH synthesis (bshC) and ypdA expression may be less induced by DIA in the C10A mutant (2C), there is no evidence that the basal levels are different prior to stress. Therefore, the chemical oxidation and enzymatic re-reduction might be expected to occur at similar rates, as observed.

      (5) Line 215. For the reason stated above, there is no reason to invoke Cys uptake as needed for the reduction of BSSB. Further, since CySS (presumably an abbreviation for cystine) is imported, this itself can contribute to disulfide stress.

      (6) Line 235. Following on the above point, "diamide-induced disulfide stress increased L-CySS uptake in the spxC10A mutant to re-establish the BSH redox equilibrium." This is counterintuitive since LCySS is itself a disulfide and is thought to be reduced to 2 L-Cys in cells by BSH (leading to an increase in BSSB, not a reduction). Is there a known cystine reductase? Could cystine or L-cys be affecting gene regulation? (e.g., through CymR or Spx or ?). Cystine can also lead to mixed disulfide formation (e.g., could it modify Spx on C13?).

      (7) l. 247 "a functional Spx redox switch allows S. aureus to avoid this trade-off and maintain thiol homeostasis without excessive L-CySS uptake." Can the authors expand on how this is thought to work? Does Spx normally affect cystine uptake? I thought this was CymR? I am not following the logic here.

      (8) Line 258. "The fur mutant, which is known to accumulate iron...". My understanding is that fur mutant strains typically have higher bioavailable (free) Fe pools. This is seen in E. coli, for example, using EPR methods. However, they also often have lower total Fe due to the iron-sparing response, which represses the expression of abundant, Fe-rich proteins. Please provide a reference that supports this statement that in S. aureus fur mutants have higher total iron per cell.

      (9) Figure 4. For the reasons stated above (point 1), it is hard to interpret data presented only as "Rel. Growth". Perhaps growth curve data could be included in a supplement.

      (10) The interpretation of Figure 4 is complicated. It is not clear that there is necessarily a change in bioavailable Fe pools, although it does seem clear that Fe homeostasis is perturbed. It has been shown that one strong effect of DIA on B. subtilis physiology is to oxidize the BSH pool to BSSB (as shown also here), and this leads to a mobilization of Zn (buffered by BSH). Elevated Zn pools can inactivate some Fe(II)-dependent enzymes, which could account for the rescue by Fe(II) supplementation. Zn(II) can also dysregulate PerR and likely Fur regulons.

    1. eLife Assessment

      Optical tweezers have been instrumental to the determination of mechanical parameters of molecular motors. This study by Takamatsu et al. reports key mechanical parameters of kinesin KIF1A using fluorescence microscopy, wherein the motor is tethered to a DNA nanospring, without the use of an optical trapping apparatus, which represents an exciting development. The approach and the findings reported change current thinking about KIF1A‑mediated transport, with potential implications for understanding human disease. The findings are important and the strength of the evidence is compelling.

    2. Reviewer #1 (Public review):

      Summary:

      This study uses a novel DNA origami nanospring to measure the stall force and other mechanical parameters of the kinesin-3 family member, KIF1A, using light microscopy. The key is to use SNAP tags to tether a defined nanospring between a motor-dead mutant of KIF5B and the KIF1A to be integrated. The mutant KIF5B binds tightly to a subunit of the microtubule without stepping, thus creating resistance to the processive advancement of the active KIF1A. The nanospring is conjugated with 124 Cy3 dyes, which allows it to be imaged by fluorescence microscopy. Acoustic force spectroscopy was used to measure the relationship between the extension of the NS and force as a calibration. Two different fitting methods are described to measure the length of the extension of the NS from its initial diffraction-limited spot. By measuring the extension of the NS during an experiment, the authors can determine the stall force. The attachment duration of the active motor is measured from the suppression of lateral movement that occurs when the KIF1A is attached and moving. There are numerous advantages of this technology for the study of single molecules of kinesin over previous studies using optical tweezers. First, it can be done using simple fluorescence microscopy and does not require the level of sophistication and expense needed to construct an optical tweezer apparatus. Second, the force that is experienced by the moving KIF1A is parallel to the plane of the microtubule. This regime can be achieved using a dual beam optical tweezer set-up, but in the more commonly used single-beam set-up, much of the force experienced by the kinesin is perpendicular to the microtubule. Recent studies have shown markedly different mechanical behaviors of kinesin when interrogated by the two different optical tweezer configurations. The data in the current manuscript are consistent with those obtained using the dual-beam optical tweezer set-up. In addition, the authors study the mechanical behavior of several mutants of KIF1A that are associated with KIF1A-associated neurological disorder (KAND).

      Strengths:

      The technique should be cheaper and less technically challenging than optical tweezer microscopy to measure the mechanical parameters of molecular motors. The method is described in sufficient detail to allow its use in other labs. It should have a higher throughput than other methods.

      Weaknesses:

      The experimenter does not get a "real-time" view of the data as it is collected, which you get from the screen of an optical tweezer set-up. Rather, you have to put the data through the fitting routines to determine the length of the nanospring in order to generate the graphs of extension (force) vs time. No attempts were made to analyze the periods where the motor is actually moving to determine step-size or force-velocity relationships.

      Comments on revisions:

      I am satisfied with the revision made by the authors in response to my first round of criticisms.

    3. Reviewer #2 (Public review):

      Summary:

      This work is important in my view because it complements other single-molecule mechanics approaches, in particular optical trapping, which inevitably exerts off-axis loads. The nanospring method has its own weaknesses (individual steps cannot be seen), but it brings new clarity to our picture of KIF1A and will influence future thinking on the kinesins-3 and on kinesins in general.

      Strengths:

      By tethering single copies of the kinesin-3 dimer under test via a DNA nanospring to a strong binding mutant dimer of kinesin-1, the forces developed and experienced by the motor are constrained into a single axis, parallel to the microtubule axis. The method is imaging-based which should improve accessibility. In principle, at least, several single-motor molecules can be simultaneously tested. The arrangement ensures that only single molecules can contribute. Controls establish that the DNA nanospring is not itself interacting appreciably with the microtubule. Forces are convincingly calibrated and reading the length of the nanospring by fitting to the oblate fluorescent spot is carefully validated. The excursions of the wild type KIF1A leucine zipper-stabilised dimer are compared with those of neuropathic KIF1A mutants. These mutants can walk to a stall plateau, but the force is much reduced. The forces from mutant/WT heterodimers are also reduced.

      Weaknesses:

      The tethered nanospring method has some weaknesses; it only allows the stall force to be measured in the case that a stall plateau is achieved, and the thermal noise means that individual steps are not apparent. The nanospring does not behave like a Hookean spring - instead linearly increasing force is reported by exponentially smaller extensions of the nanospring under tension. The estimated stall force for Kif1A (3.8 pN) is in line with measurements made using 3 bead optical trapping, but those earlier measurements were not of a stall plateau, but rather of limiting termination (detachment) force, without a stall plateau.

      Comments on revisions:

      The authors have successfully addressed my previous criticisms.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      We thank Reviewer #1 for the careful reading of our manuscript and for the constructive comments. We have provided responses to each of the comments below.

      We greatly appreciate Reviewer #1’s accurate public review of our study on the kinesin motor using the DNA origami nanospring (NS). With respect to the strengths, we fully agree with Reviewer #1’s comments. Regarding the weakness, we would like to respond as follows.

      It is true that, unlike optical tweezers, our method does not provide real-time data display. Optical tweezers enable real-time observation and manipulation of kinesin molecules at arbitrary time points. Achieving real-time observation and manipulation is indeed an important challenge for the future development of the NS technique. On the other hand, Iwaki et al. (our co-corresponding author) has already investigated dynamic properties of motor proteins under load, such as step size and force–velocity relationship of myosin VI using NS. We are now preparing high spatiotemporal resolution microscopy experiments on the KIF1A system to measure its step size and force–velocity relationship, which inherently require such resolution.

      Reviewer #2 Public Review

      We appreciate the constructive comments of Reviewer #2, which have strengthened both the presentation and interpretation of our results.

      We would like to thank Reviewer #2 for providing a highly accurate assessment of the strengths of our experiments. Regarding the weaknesses, we would like to respond as follows. First, Iwaki et al. (our co-corresponding author) have already succeeded in observing the stepping motion of myosin VI using the nanospring (NS) in their previous work. We are also currently preparing high spatiotemporal resolution microscopy experiments to observe the stepping motion of KIF1A in our system. Second, while it is true that the NS does not follow Hooke’s law, it is possible to design and construct NSs with an appropriate dynamic range by tuning the spring constant to match the forces exerted by protein molecules. Finally, we agree that our first observation of the stall plateau in KIF1A using the NS is a meaningful achievement. However, with respect to the suggestion that “increasing validity requires also studying kinesin-1,” we have a somewhat different perspective. The validity of the NS method has already been thoroughly examined in the previous work on myosin VI by Iwaki et al., where results were compared with those obtained using optical tweezers. Moreover, the focus of this manuscript is on KAND caused by KIF1A mutations. From this perspective, although we appreciate the suggestion, we consider it important to keep the present study focused on KIF1A and its implications for KAND.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors detect the attachments that occur during a processive run by KIF1A by monitoring the suppression of the angular fluctuations of the fluorescent signal and plot this, for example, in Figure 3a as the Length of the NS (which presumably is a readout of force) vs time. This interval includes the time when the KIF1A is actively moving along the MT and when it is stalled. It would be interesting to know the actual stall time of the motor in order to be able to calculate a detachment rate constant. For attachment periods such as the first example highlighted in pink in Figure 3a, the stall time is pretty much equal to the attachment time since the motor is moving so fast and the stall period is so long. However, for short attachment times such as the fifth pink interval shown in this same figure or the traces with the mutant KIF1As in Figure 4 this is not so. Can the authors institute a program to identify the periods where the motor has stretched the NS spring to the point where it stalls, and then calculate this time in order to do an exponential fit to the "dwell time distribution"?

      By introducing another criterion (see Methods, “Rate of relative increase in NS’s length”), the attachment duration was separated into the two time regions noted by the reviewer. After reanalyzing all the data, we evaluated only the stall duration this time. As a result, the estimated stall-force values became more reliable and accurate. The dwell time analysis of was performed and included in the supplementary material for WT KIF1A, for which sufficient data were available.

      (2) The histogram of stall events in Figure 3b is quite broad. Please discuss.

      The newly added distributions from individual molecules (Fig. 3b) show that the variety in the stall force distribution is not due to multiple molecules, but is primarily an intrinsic property of single KIF1A molecules reflecting the complex kinetics of KIF1A under load, including occasional backward steps and reattachments. In addition, because the nanospring is a non-linear spring, a disadvantage is that even small fluctuations in extension can result in a substantial deviation in the measured stall force. These points have been added to the Discussion section.

      (3) Figure 3c, it is clear that for attachment times greater than 5s the attachment duration is independent of the Lstall, but this is not so clear for the short attachment durations. Some of this may relate to the fact that you're measuring attachment durations and not stall or dwell times as described in my first comment. Do you feel this is due to less precision in measuring the "attachment duration" during the short attachments, or just simply that more data is needed here? I assume that you do not want to imply that there is a load-dependence of the attachment durations here? Perhaps an expanded view of the data set from 0-10 seconds would clarify. 

      As described in our response to comment (1), the stall durations were separated from the attachment durations. This improved the measurement accuracy and revealed that and are uncorrelated (Fig. 3c). We appreciate this constructive comment.

      Reviewer #2 (Recommendations for the authors):

      (1) Off-axis forces are described as 'upward', 'perpendicular', and 'horizontal'. Consider referring to off-axis force, and if necessary, defining the direction of the force(s) relative to the axis of the immobilised MT. If necessary, a cartoon of XYZ axes might be added to F1c? 

      An XZ axis was added to the schematic in Fig. 1c.

      (2) If I understand correctly, stall forces are calculated by averaging the entire region in which the angular fluctuation is reduced below a threshold. In cases like the 3rd and 7th events on the trace in F1a, this will reduce the average. Perhaps consider separately averaging the later time points in each stall event? Perhaps also consider correlating the angular fluctuation signals and the spring length signal? Some fluctuations during stall plateaus might indicate slip back and re-engage events? 

      Instead of separately averaging the later time points in each stall event, we separated the stall force duration from the overall attachment duration (Fig. 3). This allowed us to obtain more accurate stall force values. The relationship between the NS length and the angular fluctuation during KIF1A slip-back events differed among individual stall events, and no clear trend was observed. Two representative examples are shown in the Author response image 1.

      Author response image 1.

      (3) Please describe all relevant methods fully instead of referencing previous work. For example, nanospring preparation refers readers to reference 21 (which in turn references an earlier paper).

      We revised the Methods section to include the procedures described in the previous reference, and we added the sequence information of the DNA origami to the supplementary information.

      (4) Were any experiments tried at reduced ATP concentration?

      (5) Were any data obtained from WT KIF5B? For kinesin-1, stall plateau forces of >7 pN are obtained.

      This study focused on comparing the stall forces of wild-type and KAND-related mutant KIF1A molecules under physiological ATP conditions, as our main goal was to characterize the disease-relevant phenotypes. Experiments at reduced ATP concentrations and with WT KIF5B are indeed important future directions but are beyond the scope of the present study. These follow-up experiments are currently in progress.

      (6) In Figure 1b, consider showing the attachment to the mutant KIF5B, and reversing the orientation so it corresponds to Figure 1c.

      KIF1A and KIF5B share the same binding method, so to indicate that the schematic in Fig. 1b represents both, we replaced ‘KIF1A’ with ‘Kinesin’.

      (7) In Figure 3d, add force axis. In general, please re-check all force axes. In Supplement S3, the stall plateau labels appear well above their corresponding axis ticks. In Figure 4, several mutants appear to be stalling at well over 5 pN, yet Table 1 gives a much lower value. Presumably, this reflects averaging effects?

      We added the force axis to Fig. 3d. Besides, we corrected Fig. S3 and Fig. 4 because there were errors in the conversion from length to force. As the reviewer pointed out, the apparent discrepancy between the force values in Fig. 4 and Table 1 arises mainly from averaging effects.

    1. eLife Assessment

      This study presents a valuable human stem cell-derived organoid model that captures key morphological and cellular features of spinal cord development and provides evidence for a YAP-dependent mechanism of lumen formation relevant to secondary neurulation. Overall, the evidence is convincing, using strong and validated approaches consistent with the current state of the art, including systematic protocol optimisation across multiple cell lines and quantitative analysis of tissue architecture. However, some claims regarding precise anterior-posterior and dorsoventral spinal cord identity, as well as several novelty claims, are at times overstated and would benefit from more direct validation and more careful positioning. The work will be of interest to developmental biologists and researchers studying neural tube defects.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Blanco-Ameijeiras et al. present an organoid-based model of the caudal neural tube that builds upon established principles from embryonic development and prior organoid work. By systematically testing and refining signaling conditions, the authors generate caudal progenitor populations that self-organize into neuroepithelia with molecular features consistent with secondary neurulation. Bulk-RNA sequencing supports the emergence of caudal neural identities, and the authors further examine cellular features such as apico-basal polarity and interkinetic nuclear migration. Finally, they provide evidence for a conserved, YAP-dependent mechanism of tube formation specific to secondary neurulation. The manuscript provides valuable methodological resources, including troubleshooting guidance that will be especially useful for the field. While this work represents a significant advance toward modeling human spinal cord development - particularly the process of secondary neurulation - the claims of complete caudalization and full AP-axis representation require additional experimental support and clarification.

      Strengths:

      (1) Methodological clarity and transparency: The first figure and accompanying text provide an exemplary explanation of protocol optimization and troubleshooting. This transparency - showing approaches that failed as well as those that succeeded - sets a high standard for reproducibility and will be highly beneficial to laboratories aiming to adopt or build upon this model.

      (2) Testing across multiple cell lines: Multiple hPSC and hiPSC lines were evaluated, strengthening the robustness and generalizability of the reported protocol.

      (3) Biological relevance: The focus on secondary neurulation fills a notable gap in current human organoid models of spinal cord development. The identification of YAP-dependent mechanisms in tube formation is a valuable insight with potential translational relevance.

      (4) Resource creation: The detailed parameters and signaling regimes will serve as a resource for the spinal cord and organoid communities.

      Weaknesses:

      (1) The manuscript over-interprets bulk RNA-seq data to make strong claims on the organoid AP patterning and caudalization. Bulk sequencing provides population-level averages and cannot confirm that individual organoids represent discrete AP levels. To support claims of generating every AP identity, the authors must perform staining or in situ hybridization for HOX genes on individual organoids. Further, the current interpretation of CDX2 as marking "very distal" identity is inaccurate in vitro; CDX2 marks caudal progenitors across the spinal cord axis. The language should be revised accordingly.

      (2) The claim of being the first organoid system to model secondary neurulation overlooks prior work showing HOXC9 in human organoids (Xue et al., Nature 2024; Libby et al., Development 2021), which would reflect the beginning of secondary neurulation. While this system may indeed be the first isolated secondary neurulation organoid model that expresses HOXD9/10 - a meaningful advance - bulk RNA-seq alone is insufficient to support the exclusivity of this claim. Additional single-organoid-level spatial analyses (via immunofluorescence of in situ hybridisation) and frequency quantification of regional identities are required to fully characterize the system.

      (3) Similarly, as written, there are overstatements taken from the bulk RNA sequencing to determine dorsal-ventral identity. Although dorsal markers are present, the dataset also contains ventral-associated genes (PAX6, SP8, NKX6-1, NKX6-2, PRDM12). To claim a "dorsal-only" identity, the authors should perform PAX7 immunostaining to demonstrate dorsalization of the entire organoid tissue.

      (4) The studies identifying YAP as a key driver of lumen fusion in Figure 6 are important and should be extended to the apical organoid system to demonstrate that this is truly a feature of secondary neurulation.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Blanco-Ameijeiras and colleagues present the use of stem cells to create human spinal cord organoids that recapitulate anterior-posterior identity, with a large focus on posterior fates. In particular, the authors show robust transcriptional landscape specification that reflects certain anterior-posterior spinal cord development.

      Recapitulation of spinal cord development is essential to understand the fundamentals of developmental defects in a systematic manner. This work provides a broad approach to test certain aspects of neural tube morphogenesis, particularly posterior and dorsal identities. Perhaps the shorter protocol is an interesting upgrade for current standards, and the mechanical interpretation provides good proof of concept work that aligns with the need to better understand neural tube mechanobiology.

      Strengths:

      The manuscript addresses a major gap by focusing on posterior spinal cord identity and secondary neurulation, a phase that is less well captured by existing neural tube organoid models (although some do recapitulate that). The manuscript situates the approach within vertebrate development and human embryology.

      Morphometric quantifications are well described and provide a dynamic interpretation of cell-level interpretation, and that is a true strength of the work. This is important to develop important metrics that can later be used to compare modulations and pathway disruption.

      The protocols are well described and documented.

      Weaknesses:

      Some key data lacks proper quantification to robustly support the claims. For example, it is not clear how many organoids in total are counted in Figure 1D to derive the % of organoids expressing certain markers (e.g. SOX2 or BRA).

      Some claims are overstated. In the manuscript, the organoids show primarily dorsal and posterior identities under the current conditions, yet the discussion sometimes reads as if a more complete dorsoventral recapitulation is achieved. Therefore, one can either demonstrate ventral patterning (e.g., SHH / FOXA2) or reduce the claims about spinal cord identity, which, given the results, are more specific to a particular region.

      The mention of anterior organoids seems to distract the reader from the important work, which primarily focuses on the posterior identity. Further, it is not understood why SOX2 identity is reduced by Day 7 in Figure 1D. Since SOX2 in the manuscript is considered a neural marker (although also pluripotency along with NANOG, etc.), a further explanation should be provided. The author should also test the presence of PAX6, which is one of the earliest neuroectoderm markers in humans (Zhang X. et al., Cell Stem Cell 2010).

      The authors position the work as a substantial addition to the field. The work is very much welcomed; however, some claims align with an interpretation that leads the readers to understand a novelty that is beyond the work presented here. For example, in certain instances in the intro, the manuscript conveys that this work consists of the first recapitulation of spinal cord fates anterior or posterior, while other works (Rifes P. Nature Cell Biology 2020, Xue X. Nature 2024) recapitulate dorsoventral and anterior-posterior patterning and identity (albeit not of secondary neurulation) through controlled gradients of WNT and RA activity. To clearly position the importance of this work, the intro should focus on secondary neurulation and posterior identities.

      In a similar fashion, the claim that "Importantly though, to our knowledge these are the first neural organoids exhibiting a robust spinal cord transcriptome identity" is not very well understood when other neural tube organoid systems (including spinal cord identities) have been exhaustively profiled at the single cell level (Rifes P. Xue X. Abdel Fattah A.). Further explanation is therefore needed.

      The mechanical angle is important and adds to the large body of research that traces NT morphogenesis to mechanics. However, the YAP localization images can be much improved. Lower magnification images are needed to show the entire organoid to robustly convince the reader of the correct and varying localization of the YAP protein. The authors should also check for YAP-associated genes in their bulk RNA sequencing.

      The quantification of the YAP analysis in a total of 23 and 18 cells in the two conditions and in 7 organoids is by no means enough to draw a conclusion about YAP localization, and an increase in the number of cells is needed. Moreover, the use of dasatinib as an inhibitor for YAP is great, but there is no evidence shown that in this culture system, the inhibitor actually inhibits YAP. As such, IF images are required to confirm cytosolic YAP. Additionally, the authors can try other inhibitors (such as verteporfin) since most inhibitors are broadband.

      Given the mechanically oriented conclusions, other relevant works have shown posteriorized and ventralized neural tube organoids using RA and SHH activation, which were also mechanically stimulated via actuation, such as work done from the Ranga lab (Nature comm. 2021/2023). Although not strictly related to YAP, the therein molecular profiling, mechanical stimulation, lumen measurements, and NTD-like phenotype using PCP-mutated genes make these important relevant mentions since the current work adds important aspects with YAP analysis.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Blanco-Ameijeiras and collaborators describe the 3D differentiation of human pluripotent stem cells into the posterior spinal cord. The authors first test the exposure of different combinations of extrinsic signals to generate human neural organoids with distinct antero-posterior identities, as shown by bulk transcriptome analysis. They show that neural organoids, whether anterior or posterior, display tissue architecture, organisation and dynamics resembling the in vivo situation. Increasing the size of initial cell aggregates leads to the formation of a single lumen through a multi-lumen stage and a process of cell intercalation, mimicking the situation that they recently described for chick secondary neurulation (Gonzalez-Gobartt et al. Dev Cell. 2021 PMID: 33878300). The authors go on to show that, as in chick, YAP is involved in the resolution of multiple lumens into a single lumen. They conclude that their human organoid approach faithfully models human secondary neurulation, which may be instrumental in unravelling the mechanisms of human neural tube defects.

      Strengths:

      Overall, this is an important study demonstrating that lumen formation in human spinal organoids recapitulates key aspects of secondary neurulation observed in animal models. This organoid approach may be instrumental in unravelling the mechanisms of human neural tube defects.

      Weaknesses:

      The significance of the findings is tempered by several limitations. While the authors show convincing evidence that organoids undergo lumen formation with similar morphological, cellular and molecular features as seen in chick in their previous work (Gonzalez-Gobartt et al. Dev Cell. 2021 PMID: 33878300), whether this is linked to their caudal spinal cord identity is unclear.

    1. eLife Assessment

      In this valuable study, the authors performed cell-specific ribosome pulldown to identify gene expression (translatome) differences in the anterior (NT1) vs middle & posterior (NT2-9) cells of the C. elegans intestine, under fed, starved, or refeeding conditions. The data generated will be very helpful to the C. elegans community, and the evidence supporting the conclusions of the study is assessed to be solid. Some methodological caveats remain and are discussed.

    2. Reviewer #1 (Public review):

      Summary

      In this study, the authors have performed tissue-specific ribosome pulldown to identify gene expression (translatome) differences in the anterior vs posterior cells of the C. elegans intestine. They have performed this analysis in fed and fasted states of the animal. The data generated will be very useful to the C. elegans community, and the role of pyruvate shown in this study will result in interesting follow-up investigations.

      However, several strong claims made in the study are solely based on in silico predictions and are not supported by experimental evidence.

      Strengths:

      Several studies in the past have predicted different functions of the anterior (INT1) vs posterior (INT2-9) epithelial cells of the C. elegans intestine based on their anatomy and ultrastructure, but detailed characterization of differences in gene expression between these cell types (and whether indeed these are different 'cell types') was lacking prior to this study. The genes and drivers identified to be exclusively expressed in the anterior vs posterior segments of the intestine will be very helpful to selectively modulate different parts of the C. elegans intestine in future studies.

      Another strength of this study is the careful experimental design to test how the anterior vs posterior cell types of the intestine respond differently to food deprivation and recovery after return to food. These comparisons between 'states' of a cell in different physiological conditions are difficult to pick up in single-cell analyses due to low sequencing depth, which can fail to identify subtle modulation of gene expression.

      The TRAP-associated bulk RNA-seq approach used in this study is more suitable for such comparisons and provides additional information on post-transcriptional regulation during metabolic stress.

      A key finding of this study is that pyruvate levels modulate the translation state of anterior intestinal cells during fasting. Characterization of pyruvate metabolism genes, especially of the enzymes involved in its mitochondrial breakdown, provides novel insights into how gut epithelial cells respond to the acute absence of food.

      Weaknesses:

      Unlike previous TRAP-seq studies (PMID: 30580965, 36044259, 36977417) that reported sequencing data for both input and IP samples, this study only reports the sequencing data for IP samples. Since biochemical pulldowns are variable across replicates, it is difficult to know if the observed differences between different conditions are due to biological factors or differences in IP efficiency. More importantly, since two different TRAP lines were utilized in this study and a large proportion of the results focus on the differences between the translational profiles of INT1 vs INT2-9 cells, it is essential to know if the IP worked with similar efficiency for both TRAP strains that likely have different expression levels of the HA-tagged ribosomal protein. One way to estimate this would be to perform qRT-PCR of genes that are known to be enriched in all intestinal cells and determine whether their fold-enrichment over housekeeping genes (normalized to input) is similar in INT1 vs INT2-9 TRAP strains and across the fed vs fasted conditions. The authors, in fact, mention variability across biological replicates, due to which certain replicates were excluded from their WGCNA analysis.

      It appears that GFP expression is also detectable in INT2 (in addition to strong expression in INT1 in Fig.1A). Compared to INT3-9, which looks red, INT2 cells appear yellow, suggesting that the expression patterns of the two TRAP drivers are not mutually exclusive, which changes the interpretation of many of the results described in the study.

      Some parts of the study overemphasize the differences between the INT1 vs INT2-9 cell types, which is a biased representation of the results. For example, the authors specifically point out that 270 genes are differentially expressed in opposite directions in INT1 vs INT2-9 cell types during acute (30 min) fasting without mentioning the 1,268 genes that are differentially expressed in the same direction. They also do not mention here that 96% of the genes are differentially expressed in the same direction in INT1 and INT2-9 cell types after prolonged (180 min) fasting, suggesting that the divergent translational responses of these cell types are only observed in the first 30 minutes of food deprivation. Similar results have also been reported for the effect of fasting on locomotory and feeding behaviors, where 30 min of fasting produces more variable effects, which become more consistent after longer periods of fasting (PMID: 36083280). Hence, the effects of brief food deprivation should be interpreted with caution.

      Many of the interpretations of this study primarily rely on pathway enrichment analyses, which are based on the known function of genes. The function of uncharacterized genes that were found to be differentially expressed in INT1 vs INT2-9 cell types, e.g., the ShKT proteins, was not explored in this study. In addition, overreliance on pathway enrichment tools (instead of functional validation) has resulted in several conflicting findings. For example, one of the main messages of this study is that INT1 cells specialize in immune and stress response in response to fasting, which relies on pathway analysis in Figs 5E and 5F. However, pathway analysis at a different time point (shown in Figure S5A) indicates that INT2-9 cells show a much stronger increase in translation of stress and pathogen-responsive genes compared to INT1 cells. Hence, some of the results should be interpreted as different translational effects in INT1 vs INT2-9 cells after different lengths of food deprivation, without making broad claims about selective pathways being affected only in specific cell types.

      The authors have compared their TRAP-seq results with genes enriched in the anterior and posterior intestine clusters from a previously published whole-animal adult scRNA dataset (PMID: 37352352). They claim that their TRAP-seq results are in agreement with the findings of the scRNA study. However, among the 10 genes from the 'posterior intestine' scRNA cluster in Fig.S1E, six are downregulated in the INT1 vs INT2-9 comparison, while four are upregulated. Hence, there is no clear agreement between the two studies in terms of the top enriched genes in the anterior vs posterior intestine, which should be considered for cross-study comparisons in the future.

      The authors describe in the manuscript that they have performed INT1-specific RNAi for two C-type lectin genes that are upregulated during fasting. Due to a recent expansion of C-type lectin genes in C. elegans, there is a high chance of off-target effects of RNAi that is designed for members of this gene family. More trustworthy results could have been obtained using CRISPR-based loss-of-function alleles for these genes, one of which is publicly available. Also, the authors do not provide any explanation for why knockdown of these stress-response genes, which are activated in INT1 cells in response to food deprivation, results in improved resistance to pathogens. This, in fact, suggests a role of INT1 cells in increasing pathogen susceptibility, and not pathogen resistance, during food deprivation.

      Many of the studies in this field (e.g., references 2-4 in this article) have investigated the effects of food deprivation ranging from 4 hr to 24 hr, which results in activation of starvation responses in C. elegans. In contrast, the authors have used shorter time periods of fasting (30 min and 180 min), and most of their follow-up experiments have used 30 min of food deprivation. Previous work has shown that the effects of food deprivation can either accumulate over time (i.e., the effect gets stronger with longer food deprivation) or can be transient (i.e., only observed briefly after removal of food and not observed during long-term food deprivation). Starvation-induced transcription factors such as DAF-16/FoxO and HLH-30 show strong translocation to the nucleus only after 30 min of fasting. Though gene expression changes in all stages of food deprivation are of biological relevance, the authors have missed the opportunity to explore whether increased INS-7 secretion from the anterior intestine is dependent on these starvation-induced transcription factors (which can be easily tested using loss-of-function alleles) or is due to other fast-acting regulatory mechanisms induced due to the absence of food contents in the gut lumen. A previous study (PMID: 40991693) has shown that DAF-16 activation during prolonged starvation shuts down insulin peptide secretion from the intestinal epithelial cells. Hence, it is not clear if increased INS-7 secretion is only a feature of short-term food deprivation or is also a signature of long-term starvation (e.g., at 8 hr or 16 hr timepoints). Since most of the INS-7 secretion data in this study are for 30 min of fasting, it remains unknown whether the discovered regulators of INS-7 secretion can be generalized for extended food deprivation that triggers major metabolic changes, such as fat loss (e.g., conditions shown in Figure 1D).

      Two previous studies (PMID: 18025456, 40991693) have shown a strong reduction in the expression of ins-7 in the anterior intestine using GFP-based reporters (both promoter fusions and endogenous CRISPR-generated) and in whole-animal RNA-seq data from starved animals. These results are in contrast to the increased INS-7 secretion from INT1 cells during fasting that is reported in this study. The authors here have reported that INS-7 translation is higher in INT1 compared to INT2-9 during fed, acute fasted, and chronic fasted conditions, but they have not shown whether INS-7 translation is upregulated during acute and chronic fasting in INT1 cells in their TRAP-seq analysis. Knowing whether increased INS-7 secretion during acute fasting is due to increased transcription, translation, or secretion of INS-7 is crucial to resolve the discrepancy between these studies.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors set out to understand whether the discrete segments of the C.elegans intestine were specialized to carry out distinct functions during an animal's exposure and adaptation to a fast-changing nutrient environment. To achieve this, the authors used a method called Translating ribosome affinity purification (TRAP), which provides a snapshot of what genes are being translated into proteins (and therefore functionally prioritized by the animal) under different fasting and re-feeding conditions. By expressing the TRAP constructs in two distinct segments of the intestine (INT1) and (INT2-9), the authors were able to identify how these segments responded to changing nutrient availability.

      Already under steady state nutrient conditions, the authors found that INT1 and INT2-9 appeared to have different 'tasks', with INT1 expressing more immune- and stress-response related genes. Exposing animals to different regimens of starvation and refeeding also showed marked differences between the intestinal segments, and the gene expression patterns in INT1 were consistent with INT1 cells playing an integrative role in linking nutrient cues to the secretion of insulin molecules that regulate fat metabolism with food intake. In summary, the data presented catalogue, for the first time, gene expression differences between two areas of the intestine, suspected to play different roles, and through clever experiments, links these gene expression changes to responses to nutrient availability.

      Strengths:

      The data presented catalogue - for the first time and in a careful manner - gene expression differences between two areas of the intestine. They strongly support the presence of intriguing differences between two areas of the intestine in immune, metabolic, and stress-response regulation, and link these gene expression changes to the responses of these regions to nutrient availability.

      Weaknesses:

      The conclusions of this paper are mostly well-supported by data, but the relevance of the changing gene expression patterns could be better clarified and extended in the discussion.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Liu and colleagues utilize TRAP-seq to profile the repertoire of actively translated mRNAs in different intestinal cell types (anterior INT1 vs. posterior INT2-9 cells) in C. elegans. A key goal of this study was to identify transcripts differentially expressed/translated between these intestinal cell subtypes in the context of animals being well fed or subjected to acute (30 minutes) or chronic (3 hours) starvation, followed by refeeding.

      The authors identify a number of differentially expressed genes across all of the conditions tested. They then provide an initial survey of the landscape of translatome changes through Weighted Gene Network Correlation Analysis (WGNA), and some high-level functional surveys via Gene Ontology (GO) term analysis and protein domain analysis. The authors validate the enriched expression patterns of some of their identified candidate genes using fluorescent promoter fusion reporters, confirming INT1-specific expression. The authors further implicate the role of several other candidate genes in pathogen avoidance and in response to nutritional cues by knocking them down specifically in INT1 cells by RNAi. Finally, the authors identify pyruvate as a major nutrient signal coming from the bacterial diet that suppresses the release of a key insulin peptide (INS-7), and identify some of the genes expressed in INT1 that are required for this response.

      Strengths:

      (1) Good use of and justification for TRAP-seq, because scRNA-seq would be difficult under the varied conditions used (starvation, refeeding).

      (2) The manuscript is generally clear to read, and the data are generally well-presented with good supporting data that includes replicates, sample sizes, error measurements, and associated statistics.

      (3) The dataset will be an interesting resource to mine for future studies focusing on mechanisms of how particular intestinal cell types respond to different environmental signals.

      Weaknesses:

      (1) A limitation of TRAP-seq, although powerful, is that only relative comparisons can be made between genotypes/conditions to identify differentially-expressed genes, rather than assessing whether a given gene is expressed at a certain level in a cell type under a certain condition. This limitation is due to the non-specific association of sticky RNA species with the beads during the immunoprecipitation step. This is a minor point, however, and the authors do a nice job of focusing their analysis on differentially expressed transcripts in the current study.

      (2) Another limitation of the current study is that the experiments testing the role of candidate genes identified by their profiling experiments do not delve a bit deeper into providing a mechanistic understanding of the phenotypes being studied. At present, the results are thus viewed more as a genomics-based screen with some limited follow-up on interesting hits. However, this reviewer appreciates that when placed in the context of the work presented, a presentation of the profiling data along with some validation is an excellent starting point for future mechanistic studies elaborating on these interesting candidates.

      Appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      The main goal of the study was to survey the dynamic responses at the level of actively translated mRNAs of the INT1 vs INT2-9 cells in response to metabolic challenge.

      Overall, the authors use established methods to perform their genome-wide analysis, and the set of differentially regulated genes is enriched for expected molecular functions and forms coherent networks in anticipated pathways.

      The validation experiments (promoter::GFP fusion reporters, INT1-specific knockdowns of highly regulated genes) further corroborate the quality of the TRAP-seq datasets generated.

      I have a few points for the authors that would further strengthen this work:

      (1) The authors rightfully focus on the top differentially-regulated candidates, but it's unclear at present how far down their fold change list would lead to expression pattern validations. It would be useful to test a few more promoter::GFP fusion reporters at different enrichment/fold-change/statistical cutoffs.

      (2) Although the INT1-specific RNAi provides a convenient strategy for rapidly perturbing and testing genes of interest for phenotypes, independently validating the knockdowns with genetic mutants, or alternatively (if genes are essential), degron alleles.

      Impact:

      The TRAP-seq data and list of differentially-expressed candidate genes will form an interesting set of high-priority candidates to study for their role in the reception and transduction of nutritional cues in response to food status and pathogens. This data will thus benefit the C. elegans community of researchers studying the mechanisms governing these phenomena.

    1. eLife Assessment

      In this useful paper, the authors present a comprehensive method for the purification of recombinant Snake Venom Metalloproteinases (SVMPs) using the MultiBac expression system, explain the self-activation of the enzymes by Zn2+ incubation, and establish high-throughput screening (HTS) techniques. The authors addressed a key problem: producing a substantial amount of pure and enzymatically active SVMPs required for structural and functional studies. Altogether, this work builds a solid foundation for the large-scale production of active SVMPs for future biochemical and structural characterization as well as for drug discovery, albeit leaving certain caveats about the universal applicability of the described methodology for the production of any recombinant SVMPs.

    2. Reviewer #1 (Public review):

      Summary:

      The authors Hall et al. establish a purification method for snake venom metalloproteinases (SVMPs). By generating a generic approach to purify this divergent class of recombinant proteins, they enhance the field's accessibility to larger quantities of SVMPs with confirmed activity and, for some, characterized kinetics. In some cases, the recombinant protein displayed comparable substrate specificity and substrate recognition compared to the native enzyme, providing convincing evidence of the authors' successful recombinant expression strategy. Beyond describing their route towards protein purification, they further provide evidence for self-activation upon Zn2+ incubation. They further provide insights on how to design high-throughput screening (HTS) methods for drug discovery and outline future perspectives for the in-depth characterization of these enzyme classes to enable the development of novel biomedical applications.

      Strengths:

      The study is well-presented and structured in a compelling way. The purification strategy results in highly pure protein products, well characterized by size exclusion chromatography, SDS page as well as confirmed by mass spectrometry analysis. Further, a significant portion of the manuscript focuses on enzyme activity, thereby validating function. Particularly convincing is the comparability between recombinant vs. native enzymes; this is successfully exemplified by insulin B digestion. By testing the fluorogenic substrate, the authors provide evidence that their production method of recombinant protein can open up possibilities in HTS. Since their purification method can be applied to three structurally variable SVMP classes, this demonstrates the robust nature of the approach.

      Weaknesses:

      The universal applicability of the approach could be emphasized more clearly. The potential for this generic protocol for recombinant SVMP zymogen production to be adapted to other SVMPs is somewhat obscured by the detailed optimization steps. A general schematic overview would strengthen the manuscript, presented as a final model, to illustrate how this strategy can be extended to other targets with similar features. Such a schematic might, for example, outline the propeptide fusion design, including its tags, relevant optimizations during expression, lysis, purification (e.g., strategies for metal ion removal and maintenance of protease inactivity), as well as the controllable auto-activation.

      The product obtained from the purification protocol appears to be a heterogeneous mixture of self-activated and intact protein species. The protocol would benefit from improved control over the self-activation process. The Methods section does not indicate whether residual metal ions were attempted to be removed during the purification, which could influence premature activation. Additionally, it has not been discussed whether the shift to pH 8 in the purification process is necessary from the initial steps onwards, given that a lower pH would be expected to maintain enzyme latency.

      The characterization of PIII activity using the fluorogenic peptide effectively links the project to its broader implications for drug design. However, the absence of comparable solutions for PI and PII classes limits the overall scope and impact of the finding.

      Overall, the authors successfully purified active SVMP proteins of all three structurally diverse classes in high quality and provided convincing evidence throughout the manuscript to support their claims. The described method will be of use for a broader community working with self-activating and cytotoxic proteases.

    3. Reviewer #2 (Public review):

      Summary:

      The aim of the study by Hall et al. was to establish a generic method for the production of Snake Venom Metalloproteases (SVMPs). These have been difficult to purify in the mg quantities required for mechanistic, biochemical, and structural studies.

      Strengths:

      The authors have successfully applied the MultiBac system and describe with a high level of detail the downstream purification methods applied to purify the SVMP PI, PII, and PIII. The paper carefully presents the non-successful approaches taken (such as expression of mature proteins, the use of protease inhibitors, prodomain segments, and co-expression of disulfide-isomerases) before establishing the construct and expression conditions required. The authors finally convincingly describe various activity assays to demonstrate the activity of the purified enzymes in a variety of established SVMP assays.

      Weaknesses:

      The manuscript suffers from a lack of bottoming out and stringent scientific procedures in the methodology and the characterization of the generated enzymes.

      As an example, a further characterization of the generated protein fragments in Figure 3 by intact mass spectroscopy would have aided in accurate mass determination rather than relying on SEC elution volumes against a standard. Protein shape and charge can affect migration in SEC. Also, the analysis of N-linked glycosylation demonstrates some reactivity of PIII to PNGase F, but fails to conclude whether one or more sites are occupied, or whether other types of glycosylation is present. Again, intact mass experiments would have resolved such issues.

      The activity assays in Figure 4 are not performed consistently with kinetic assays and degradation assays performed for some, but not all, enzymes, and there is no Echis ocellatus comparison in Figure 4h. Overall, whilst not affecting the main conclusion, this leaves the reader with an impression of preliminary data being presented. For consistency, application of the same assays to all enzymes (high-grade purified) would have provided the reader with a fuller picture.

      Overall, the data presented demonstrates a very credible path for the production of active SVMP for further downstream characterization. The generality of the approach to all SVMP from different snakes remains to be demonstrated by the community, but if generally applicable, the method will enable numerous studies with the aim of either utilizing SVMPS as therapeutic agents or to enable the generation of specific anti-venom reagents, such as antibodies or small molecule inhibitors.

    4. Reviewer #3 (Public review):

      Summary:

      The presented study describes the long journey towards the expression of members' SVMP toxins from snake venom, which are toxins of major importance in a snakebite scenario. As in the past, their functional analysis relied on challenging isolation; the toxins' heterologous expression offers a potential solution to some major obstacles hindering a better understanding of toxin pathophysiology. Through a series of laborious and elegantly crafted experiments, including the reporting of various failed attempts, the authors establish the expression of all three SVMP subtypes and prove their activity in bioassays. The expression is carried out as naturally occurring zymogens that autocleave upon exposure to zinc, which is a novel modus operandi for yielding fusion proteins and sheds also some new light on the potential mechanism that snakes use to activate enzymatic toxins from zymogenic preforms.

      Strengths:

      The manuscript draws from an extensive portfolio of well-reasoned and hypothesis-driven experiments that lead to a stepwise solution. The wetlands data generated is outstanding, although not all experiments along this rocky road to victory were successful. A major strength of the paper is that, translationally speaking, it opens up novel routes for biodiscovery since a first reliable platform for expression of an understudied, yet potent toxin class is established. The discovered strategy to pursue expression as zymogens could see broad application in venom biotechnology, where several toxin types are pending successful expression. The work further provides better insights into how snake toxins are processed.

      Weaknesses:

      The manuscript contains several chapters reporting failed experiments, which makes it difficult to follow in places. The reporting of experimental details, especially sample sizes and replicates, could be optimised. At the time of writing, it remains unclear whether the glycosilations detected at a pIII SVMP could have an impact on the bioactivities measured, which is a major aspect, and future follow-ups should clarify this. Finally, the work, albeit of critical importance, would benefit from a more down-to-earth evaluation of its findings, as still various persistent obstacles that need to be overcome.

      Major comments to the manuscript:

      (1) Lines 148-149: "indicating that expressing inactivated SVMPs could be a viable, although inefficient, approach". I think this text serves a good purpose to express some thoughts on the nature of how the current draft is set up. It is quite established that various proteases cause extreme viability losses to their expression host (whether due to toxicity, but surely also because of metabolic burden), which is why their expression as inactive fusion proteins is the default strategy in all cases I have thus far seen. I believe that, especially in venom studies, this is of importance given the increased toxicity often targeting cellular integrity, and especially here, because Echis are known to feed on arthropods at younger life history stages, making it very likely that some venom components are especially active against insects and other invertebrates. With that in mind, I would argue that exploring their production in inactive form is the obvious strategy one would come up with and not really the conclusion of a series of (well-conducted and scientifically sound!) experiments. For me, the insight of inactive expression is largely confirmatory of what is established, unless I miss something in the authors' rationale. If yes, it would be important to clarify that in the online version.

      (2) Line 173: Here, Alphafold 3 was used, whereas in previous sections (e.g., line 153, line 210), it was Alphafold 2. I suggest using one release across the manuscript.

      (3) Line 252-254: I fully agree, the PIII SVMP is glycosylated. Glycosylation is an important mediator of snake venom activity, and several works have described their importance in the field. This raises the question, which glycosylations have been introduced here in the SVMP, and to verify that these are glycosylations that belong to those found in snakes. This is important as insects facilitate thousands of N- and O- O-glycosylations to modulate the activity of their proteome, of which many are specific to insects. If some of these were integrated into the SVMP, this could have an impact on downstream produced bioassays and also antigenicity (the surface would be somewhat different from natural toxins, causing different selection).

      (4) General comment for the bioassays: It would be good to specify the replicates again and report the data, including standard deviations.

      Discussion:

      I think the data generated in the study is very valuable and will be instrumental for pushing the frontiers in SVMP research, but still I would like to see a bit of modesty in their discussion. As I have pointed out above, it is unclear which effect the glycosilations may have (i.e., are the glycosilations found reminiscent of natural ones?), despite their being functionally important. Also, yes, isolation of SVMPs is challenging, but the reality is that their expression is equally challenging, as evidenced by the heaps of presented negative data (with which I have no problems, I think reporting such is actually important). So far, the "generic" protocol has been used to express one member per structural class of Echis SVMP, but no evidence is provided that it would work equally well on other members from taxonomically more distant snakes (e.g., the pIII known from Naja oxiana). It is very likely, but at the time of writing, purely speculative. Lastly, the reality is also that the expression in insect cells can only be carried out by highly specialized labs (even in the expression world, as most laboratories work with bacterial or fungal hosts), whereas the isolation can be attempted in most venom labs. That said, production in insect cells also has economic repercussions as it will be very challenging to generate yields that are economically viable versus other systems, which is pivotal because the authors talk about bioprospecting and the toxins used in snakebite agent research. Again, I believe the paper is highly important and excellently crafted, but I think especially the discussion should see some refinement to address the drawbacks and to evaluate the paper's findings with more modesty.

    1. eLife Assessment

      The authors used genetic mutations in VANGL2 to study cell morphological changes during differentiation of hPSCs and understand the mechanisms underlying neural tube closure defects. The findings are important as they establish a quantitative, reproducible 2D human iPSC-to-neural-progenitor platform for analyzing cell-shape dynamics during differentiation. The convincing evidence provided, combined with the relative simplicity of the model and its tractability as a patient-specific and reverse genetic platform, make it attractive.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Ampartzidis et al. report the establishment of an iPSC-derived neuroepithelial model to examine how mutations from spina bifida patients disrupt fundamental cellular properties that underlie neural tube closure. The authors utilize an adherent neural induction protocol that relies on dual SMAD inhibition to differentiate three previously established iPSC lines with different origins and reprogramming methods. The analysis is comprehensive and outstanding, demonstrating reproducible differentiation, apical-basal elongation, and apical constriction over an 8-day period among the 3 lines. In inhibitor studies, it is shown that apical constriction is dependent on ROCK and generates tension, which can be measured using an annular laser ablation assay. Since this pathway is dependent on PCP signaling, which is also implicated in neural tube defects, the authors investigated whether VANGL2 is required by generating 2 lines with a pathogenic patient-derived sequence variant. Both lines showed reduced apical constriction and reduced tension in the laser ablation assays. The authors then established lines obtained from amniocentesis, including 2 control and 2 spina bifida patient-derived lines. These remarkably exhibited different defects. One line showed defects in apical-basal elongation, while the other showed defects in neural differentiation. Both lines were sequenced to identify candidate variants in genes implicated in NTDs. While no smoking gun was found in the line that disrupts neural differentiation (as is often the case with NTDs), compound heterozygous MED24 variants were found in the patient whose cells were defective in apical-basal elongation. Since MED24 has been linked to this phenotype, this finding is especially significant.

      Some details are missing regarding the method to evaluate the rigor and reproducibility of the study.

      Major Comments:

      It is mentioned throughout the manuscript that 3 plates were evaluated per line. I believe these are independently differentiated plates. This detail is critical concerning rigor and reproducibility. This should be clearly stated in the Methods section and in the first description of the experimental system in the Results section for Figure 1.

      For the patient-specific lines - how many lines were derived per patient?

      Was the Vangl2 variant introduced by prime editing? Base editing? The details of the methods are sparse.

      Significance:

      This paper is significant not only for verifying the cell behaviors necessary for neural tube closure in a human iPSC model, but also for establishing a robust assay for the functional testing of NTD-associated sequence variants. This will not only demonstrate that sequence variants result in loss of function but also determine which cellular behaviors are disrupted.

    3. Reviewer #2 (Public review):

      Summary:

      The authors' work focuses on studying cell morphological changes during differentiation of hPSCs into neural progenitors in a 2D monolayer setting. The authors use genetic mutations in VANGL2 and patient-derived iPSCs to show that (1) human phenotypes can be captured in the 2D differentiation assay, and (2) VANGL2 in humans is required for neural contraction, which is consistent with previous studies in animal models. The results are solid and convincing, the data are quantitative, and the manuscript is well written. The 2D model they present successfully addresses the questions posed in the manuscript. However, the broad impact of the model may be limited, as it does not contain NNE cells and does not exhibit tissue folding or tube closure, as seen in neural tube formation. Patient-derived lines are derived from amniotic fluid cells, and the experiments are performed before birth, which I find to be a remarkable achievement, showing the future of precision medicine.

      Major comments:

      (1) Figure 1. The authors use F-actin to segment cell areas. Perhaps this could be done more accurately with ZO-1, as F-actin cables can cross the surface of a single cell. In any case, the authors need to show a measure of segmentation precision: segmented image vs. raw image plus a nuclear marker (DAPI, H2B-GFP), so we can check that the number of segmented cells matches the number of nuclei.

      (2) Lines 156-166. The authors claim that changes in gene expression precede morphological changes. I am not convinced this is supported by their data. Fig. 1g (epithelial thickness) and Fig. 1k (PAX6 expression) seem to have similar dynamics. The authors can perform a cross-correlation between the two plots to see which Δt gives maximum correlation. If Δt < 0, then it would suggest that gene expression precedes morphology, as they claim. Fig. 1j shows that NANOG drops before the morphological changes, but loss of NANOG is not specific to neural differentiation and therefore should not be related to the observed morphological changes.

      (3) Figure 2d. The laser ablation experiment in the presence of ROCK inhibitor is clear, as I can easily see the cell outlines before and after the experiment. In the absence of ROCK inhibitor, the cell edges are blurry, and I am not convinced the outline that the authors drew is really the cell boundary. Perhaps the authors can try to ablate a larger cell patch so that the change in area is more defined.

      (4) Figure 2d. Do the cells become thicker after recoil?

      (5) Figure 3. The authors mention their previous study in which they show that Vangl2 is not cell-autonomously required for neural closure. It will be interesting to study whether this also the case in the present human model by using mosaic cultures.

      (6) Lines 403-415. The authors report poor neural induction and neuronal differentiation in GOSB2. As far as I understand, this phenotype does not represent the in vivo situation. Thus, it is not clear to what extent the in vitro 2D model describes the human patient.

      (7) The experimental feat to derive cell lines from amniotic fluid and to perform experiments before birth is, in my view, heroic. However, I do not feel I learned much from the in vitro assays. There are many genetic changes that may cause the in vivo phenotype in the patient. The authors focus on MED24, but there is not enough convincing evidence that this is the key gene. I would like to suggest overexpression of MED24 as a rescue experiment, but I am not sure this is a single-gene phenotype. In addition, the fact that one patient line does not differentiate properly leads me to think that the patient lines do not strengthen the manuscript, and that perhaps additional clean mutations might contribute more.

      Significance:

      This study establishes a quantitative, reproducible 2D human iPSC-to-neural-progenitor platform for analyzing cell-shape dynamics during differentiation. Using VANGL2 mutations and patient-derived iPSCs, the work shows that (1) human phenotypes can be captured in a 2D differentiation assay and (2) VANGL2 is required for neural contraction (apical constriction), consistent with animal studies. The results are solid, the data are quantitative, and the manuscript is well written. Although the planar system lacks non-neural ectoderm and does not exhibit tissue folding or tube closure, it provides a tractable baseline for mechanistic dissection and genotype-phenotype mapping. The derivation of patient lines from amniotic fluid and execution of experiments before birth is a remarkable demonstration that points toward precision-medicine applications, while motivating rescue strategies and additional clean genetic models. However, overall, I did not learn anything substantively new from this manuscript; the conclusions largely corroborate prior observations rather than extend them. In addition, the model was unsuccessful in one of the two patient-derived lines, which limits generalizability and weakens claims of patient-specific predictive value.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript by Ampartzidis et al., significantly extends the human induced pluripotent stem cell system originally characterized by the same group as a tool for examining cellular remodeling during differentiation stages consistent with those of human neural tube closure (Ampartzidis et al., 2023). Given that there are no direct ways to analyze cellular activity in human neural tube closure in vivo, this model represents an important platform for investigating neural tube defects which are a common and deleterious human developmental disease. Here, the authors carefully test whether this system is robust and reproducible when using hiPSC cells from different donors and pluripotency induction methods and find that despite all these variables the cellular remodeling programs that occur during early neural differentiation are statistically equivalent, suggesting that this system is a useful experimental substrate. Additionally, the carefully selected donor populations suggest these aspects of human neural tube closure are likely to be robust to sexual dimorphism and to reasonable levels of human genetic background variation, though more fully testing that proposition would require significant effort and be beyond the scope of the current work. Subsequent to this careful characterization, the authors next tested whether this system could be used to derive specific insights into cell remodeling during early neural differentiation. First, they used a reverse genetics approach to knock in a human point mutation in the critical regulator of planar cell polarity and apical constriction, Vangl2. Despite being identified in a patient, this R353C variant has not been directly functionally tested in a human system. The authors find that this variant, despite showing normal expression and phospho-regulation, leads to defects consistent with a failure in apical constriction, a key cell behavior required to drive curvature change during cranial closure. Finally, the authors test the utility of their hiPSC platform to understand human patient-specific defects by differentiating cells derived from two clinical spina bifida patients. The authors identify that one of these patients is likely to have a significant defect in fully establishing early proneural identity as well as defects in apicobasal thickening. While early remodeling occurs normally in the other patient, the authors observe significant defects in later neuronal induction and maturation. In addition, using whole exome sequencing the authors identify candidate variant loci that could underly these defects.

      Major comments:

      (1) One of my few concerns with this work is that the relative constriction of the apical surface with respect to the basal surface is not directly quantified for any of the experiments. This worry is slightly compounded by the 3D reconstructions Figure 1h, and the observation that overall cell volume is reduced and cell height increased simultaneously to area loss. Additionally, the net impact of apical constriction in tissues in vivo is to create local or global curvature change, but all the images in the paper suggest that the differentiated neural tissues are an uncurved monolayer even missing local buckles. I understand that these cells are grown on flat adherent surfaces limiting global curvature change, but is there evidence of localized buckling in the monolayer? While I believe-along with the authors-that their phenotypes are likely failures in apical constriction, I think they should work to strengthen this conclusion. I think the easiest way (and hopefully using data they already have) would be to directly compare apical area to basal area on a cell wise basis for some number of cells. Given the heterogeneity of cells, perhaps 30-50 cells per condition/line/mutant would be good? I am open to other approaches; this just seems like it may not require additional experiments.

      (2) Another slight experimental concern I have regards the difference in laser ablation experiments detailed in Figure 3h-i from those of Figure 2d-e. It seems like WT recoil values in 3h-I are more variable and of a lower average than the earlier experiments and given that it appears significance is reached mainly by impact of the lower values, can the authors explain if this variability is expected to be due to heterogeneity in the tissue, i.e. some areas have higher local tension? If so, would that correspond with more local apical constriction?

      Significance:

      Overall, I am enthusiastic about this work and believe it represents a significant step forward in the effort to establish precision medicine approaches for diagnoses of the patient-specific causative cellular defects underlying human neural tube closure defects. This work systematizes an important and novel tool to examine the cellular basis of neural tube defects. While other hiPSC models of neural tube closure capture some tissue level dynamics, which this model does not, they require complex microfluidic approaches and have limited accessibility to direct imaging of cell remodeling. Comparatively, the relative simplicity of the reported model and the work demonstrating its tractability as a patient-specific and reverse genetic platform make it unique and attractive. This work will be of interest to a broad cross section of basic scientists interested in the cellular basis of tissue remodeling and/or the early events of nervous system development as well as clinical scientists interested in modeling the consequences of patient specific human genetic deficits identified in neural tube defect pregnancies.

    5. Author response:

      General Statements

      In this manuscript we characterize an exquisitely reproducible model of iPSC differentiation into neuroepithelial cells, use it to mechanistically study cell shape changes and planar cell polarity signaling activation during this transition, then apply it to identify patient-specific cell deficiencies in both forward and reverse genetic screens as a power tool for patient-stratification in personalized medicine. To our knowledge, we provide the first evidence of a human pathogenic mutation directly impairing apical constriction: an evolutionarily conserved behavior of epithelial cells which is the subject of intense research. 

      We are very pleased with the balanced and rigorous reviews generated through Review Commons, which we have already used to improve our manuscript. Reviewer 1 highlights that our study “is significant not only for verifying the cell behaviors necessary for neural tube closure in a human iPSC model, but also for establishing a robust assay for the functional testing of NTD-associated sequence variants.” Reviewer 2 agrees that “results are solid and convincing, the data are quantitative, and the manuscript is well written”, and that our “derivation of patient lines from amniotic fluid and execution of experiments before birth is a remarkable demonstration that points toward precision-medicine applications, while motivating rescue strategies and additional clean genetic models.” Reviewer 3 is “enthusiastic about this work and believe it represents a significant step forward in the effort to establish precision medicine approaches for diagnoses of the patient-specific causative cellular defects underlying human neural tube closure defects.” 

      Below, we have replied to each of the reviewers’ comments.

      Description of the planned revisions

      R2.2. Lines 156-166. The authors claim that changes in gene expression precede morphological changes. I am not convinced this is supported by their data. Fig. 1g (epithelial thickness) and Fig. 1k (PAX6 expression) seem to have similar dynamics. The authors can perform a cross-correlation between the two plots to see which Δt gives maximum correlation. If Δt < 0, then it would suggest that gene expression precedes morphology, as they claim. Fig. 1j shows that NANOG drops before the morphological changes, but loss of NANOG is not specific to neural differentiation and therefore should not be related to the observed morphological changes.

      We are happy to do this analysis fully in revision. Our initial analysis performing crosscorrelation between apical area and CDH2 protein in one line shows the highest crosscorrelation at Δt = -1, suggesting neuroepithelial CDH2 increases before apical area decreases. In contrast, the same analysis comparing apical area versus PAX6 shows Δt = 0, suggesting concurrence. This analysis will be expanded to include the other markers we quantified and the manuscript text amended accordingly. We are keen to undertake additional experiments to test whether these cells swap their key cadherins – CDH1 and CDH2 - before they begin to undergo morphological changes (see the response to Reviewer 3’s minor comment 1 immediately below).

      R3.1(Minor) There seems to be a critical window at day 5 of the differentiation protocol, both in terms of cell morphology and the marker panel presented in Figure 1i. Do the authors have any data spanning the hours from day 5 to 6? If not, I don't think they need to generate any, but do I think this is a very interesting window worthy of further discussion for a couple of reasons. First, several studies of mouse neural tube closure have shown that various aspects of cell remodeling are temporally separable. For example, between Grego-Bessa et al 2016 and Brooks et al 2020 we can infer that apicobasal elongation rapidly increases starting at E8.5, whereas apical surface area reduction and constriction are apparent somewhat earlier at E8.0. I think it would be interesting to see if this separability is conserved in humans. Second, is there a sense of how the temporal correlation between the pluripotent and early neural fate marker data presented here corroborate or contradict the emerging set of temporally resolved RNA seq data sets of mouse development at equivalent early neural stages?

      Cell shape analysis between days 5 and 6 has now been added (see the response to point 2.1 below). As the reviewer predicted, this is a transition point when apical area begins to decrease and apicobasal elongation begins to increase.

      We also thank the reviewer for this prompt to more closely compare our data to the previous mouse publications, which we have added to the discussion. The Grego-Bessa 2016 paper appears to show an increase in thickness between E7.75 and E8.5, but these are not statistically compared. Previous studies showed rapid apicobasal elongation during the period of neural fold elevation, when neuroepithelial cells apically constrict. This has now been added to the discussion: 

      Discussion: “In mice, neuroepithelial apicobasal thickness is spatially-patterned, with shorter cells at the midline under the influence of SHH signalling[14,77,78]. Apicobasal thickness of the cranial neural folds increases from ~25 µm at E7.75 to ~50 µm at E8.5[79]: closely paralleling the elongation between days 2 and 8 of differentiation in our protocol. The rate of thickening is non-uniform, with the greatest increase occurring during elevation of the neural folds[80], paralleled in our model by the rapid increase in thickness between days 4-6 as apical areas decrease. Elevation requires neuroepithelial apical constriction and these cells’ apical area also decreases between E7.75 and E8.5 in mice[79], but we and others have recently shown that this reduction is both region and sex-specific[14,81]. Specifically, apical constriction occurs in the lateral (future dorsal) neuroepithelium: this corresponds with the identity of the cells generated by the dual SMAD inhibition model we use[56]. More recently, Brooks et al[82] showed that the rapid reduction in apical area from E8-E8.5 is associated with cadherin switching from CDH1 (E-cadherin) to CDH2 (N-cadherin). This is also directly paralleled in our human system, which shows low-level co-expression of CDH1 and CDH2 at day 4 of differentiation, immediately before apical area shrinks and apicobasal thickness increases.”

      Prompted by the in vivo data in Brooks et al (2025)[82], we are keen to further explore the timing of CDH1/CDH2 switching versus apical constriction with new experimental data in revisions.

      R3.2(Minor) 2) Can the authors elaborate a bit more on what is known regarding apicobasal thickening and pseudo-stratification and how their work fits into the current understanding in the discussion? This is a very interesting and less well studied mechanism critical to closure, which their model is well suited to directly address. I am thinking mainly of the Grego-Bessa at al., 2016 work on PTEN, though interestingly the work of Ohmura et al., 2012 on the NUAK kinases also shows reduced tissue thickening (and apical constriction) and I am sure I have missed others. Given that the authors identify MED24 as a likely candidate for the lack of apicobasal thickening in one of their patient derived lines, is there any evidence that it interacts with any of the known players?

      We have now added further discussion on the mechanisms by which the neuroepithelium undergoes apicobasal elongation. Nuclear compaction is likely to be necessary to allow pseudostratification and apicobasal elongation. The reviewer’s comment has led us to realise that diminished chromatin compaction is a potential outcome of MED24 down-regulation in our GOSB2 patient-derived line. Figure 4D suggests the nuclei of our MED24 deficient patientderived line are less compacted than control equivalents and we propose to quantify nuclear volume in more detail to explore this possibility.

      Additionally, we have already expanded our discussion as suggested by the reviewer:

      Discussion: “Mechanistic separability of apical constriction and apicobasal elongation is consistent with biomechanical modelling of Xenopus neural tube closure showing that both are independently required for tissue bending[61]. Nonetheless, neuroepithelial apical constriction and apicobasal elongation are co-regulated in mouse models: for example, deletion of Nuak1/2[83], Cfl1[84], and Pten[79] all produce shorter neuroepithelium with larger apical areas. Neuroepithelial cells of the GOSB2 line described here, which has partial loss of MED24, similarly produces a thinner neuroepithelium with larger apical areas. Although apical areas were not analysed in mouse models of Med24 deletion, these embryos also have shorter and non-pseudostratified neuroepithelium.

      Our GOSB2 line – which retains readily detectable MED24 protein – is clearly less severe than the mouse global knockout, and the clinical features of the patient from which this line was derived are milder than the phenotype of Med24 knockout embryos[68]. Mouse embryos lacking one of Med24’s interaction partners in the mediator complex, Med1, also have thinner neuroepithelium and diminished neuronal differentiation but successfully close their neural tube[85]. As general regulators of polymerase activity, MED proteins have the potential to alter the timing or level of expression of many other genes, including those already known to influence pseudostratification or apicobasal elongation. MED depletion also causes redistribution of cohesion complexes[86] which may impact chromatin compaction, reducing nuclear volume during differentiation.”

      R3.3(Minor) 3) Is there any indication that Vangl2 is weakly or locally planar polarized in this system? Figure 2F seems to suggest not, but Supplementary Figure 5 does show at least more supracellular cable like structures that may have some polarity. I ask because polarization seems to be one of the properties that differs along the anteroposterior axis of the neural plate, and I wonder if this offers some insight into the position along the axis that this system most closely models?

      VANGL2 does not appear to be planar polarised in this system. This is similar to the mouse spinal neuroepithelium, in which apical VANGL2 is homogenous but F-actin is planar polarised (Galea et al Disease Models and Mechanisms 2018). We do observe local supracellular cablelike enrichments of F-actin in the apical surface of iPSC-derived neuroepithelial cells:

      Author response image 1.

      Preliminary identification of apical supracellular cables suggestive of local polarity. Top: F-actin staining shown in inverted grey LUT highlighting enrichment along directionally-polarised cell borders (blue arrows). Bottom: Staining orientation (blue ~ X axis, red ~ Y axis) based on OrientationJ analysis illustrating localised organisation of F-actin enrichment.

      We propose to compare the length of F-actin cables and coherency of their orientation at the start and end of neuroepithelial differentiation, and in wild-type versus VANGL2mutant epithelia.

      Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1:

      Major points

      (1) It is mentioned throughout the manuscript that 3 plates were evaluated per line. I believe these are independently differentiated plates. This detail is critical concerning rigor and reproducibility. This should be clearly stated in the Methods section and in the first description of the experimental system in the Results section for Figure 1.

      These experimental details have now been clarified. Unless otherwise stated, all findings were confirmed in three independently differentiated plates from the same line or at least one differentiation from each of three lines. 

      Methods: Unless otherwise stated, for each iPSC line three independently differentiated plates were generated and analysed, with each plate representing a separate differentiation experiment performed on different days.

      (2) For the patient-specific lines - how many lines were derived per patient?

      This has now been clarified in the methods. Microfluidic reprogramming of a small number of amniocytes produces one line per patient representing a pool of clones. Subcloning from individual cells would not be possible within the timeframe of a pregnancy. 

      Methods: For patient-specific iPSC lines, one independent iPSC line was obtained per patient following microfluidic mmRNA reprogramming.

      (3) Was the Vangl2 variant introduced by prime editing? Base editing? The details of the methods are sparse.

      We have now expanded these details:

      Methods: “VANGL2 knock-in lines were generated using CRSIPR-Cas9 homology directed repair editing by Synthego (SO-9291367-1). The guide sequence was AUGAGCGAAGGGUGCGCAAG and the donor sequence was CAATGAGTACTACTATGAGGAGGCTGAGCATGAGCGAAGGGTGTGCAAGAGGAGGGCCAGGTGGGTCCCTGGGGGAGAAGAGGAGAG.

      Sequence modification was confirmed by Sanger sequencing before delivery of the modified clones, and Sanger sequencing was repeated after expansion of the lines (Supplementary Figure 5) as well as SNP arrays (Illumina iScan, not shown) confirming genomic stability.”

      Author response image 2.

      Snapshot of Illumina iScan SNP array showing absence of chromosomal duplications or deletions in the CRISPR-modified VANGL2-knockin lines or their congenic control.

      (4) Suggested text changes.

      Some additional suggestions for improvement.

      The abstract could be more clearly written to effectively convey the study's importance. Here are some suggestions

      Line 26: Insert "apicobasal" before "elongation" - the way it is written, I initially interpreted it as anterior-posterior elongation.

      Line 29: Please specify that the lines refer to 3 different established parent iPSC lines with distinct origins and established using different reprogramming methods, plus 2 control patient-derived lines. - The reproducibility of the cell behaviors is impressive, but this is not captured in the abstract.

      Line 32: add that this mutation was introduced by CRISPR-Cas9 base/prime editing.

      The last sentence of the abstract states that the study only links apical constriction to human NTDs, but also reveals that neural differentiation and apical-basal elongation were found. The introduction could also use some editing.

      Line 71: insert "that pulls actin filaments together" after "power strokes" Line 73: "apically localized," do you mean "mediolaterally" or "radially"?

      Line 75: Can you specify that PCP components promote "mediolaterally orientated" apical constriction Lines 127: Specify that NE functions include apical basal elongation and neurodifferentiation are disrupted in patient-derived models

      All have now been corrected.

      Reviewer #2:

      Major comments:

      (1) Figure 1. The authors use F-actin to segment cell areas. Perhaps this could be done more accurately with ZO-1, as F-actin cables can cross the surface of a single cell. In any case, the authors need to show a measure of segmentation precision: segmented image vs. raw image plus a nuclear marker (DAPI, H2B-GFP), so we can check that the number of segmented cells matches the number of nuclei.

      We used ZO-1 to quantify apical areas of the VANGL2-konckin lines in Figure 3. Segmentation of neuroepithelial apical areas based on F-actin staining is commonplace in the field (e.g. in the Brooks et al 2022 paper cited by another reviewer), and is generally robust because the cell junctions are much brighter than any apical fibres not associated with the apical cortex. However, we accept that at earlier stages of differentiation there may be more apical fibres when cells are cuboidal. We have therefore repeated our analysis of apical area using ZO-1 staining as suggested, analysing a more temporally-detailed time course in one iPSC line. This new analysis confirms our finding of lack of apical area change between days 2-4 of differentiation, then progressive reduction of apical area between days 4-8, further validating our system. Including nuclear images is not helpful because of the high nuclear index of pseudostratified epithelia (e.g. see Supplementary Figure 7) which means that nuclei overlap along the apicobasal axis. Individual nuclei cannot be related to their apical surface in projected images.

      (3) Figure 2d. The laser ablation experiment in the presence of ROCK inhibitor is clear, as I can easily see the cell outlines before and after the experiment. In the absence of ROCK inhibitor, the cell edges are blurry, and I am not convinced the outline that the authors drew is really the cell boundary. Perhaps the authors can try to ablate a larger cell patch so that the change in area is more defined.

      The outlines on these images are not intended to show cell boundaries, but rather link landmarks visible at both timepoints to calculate cluster (not cell) change in area. This is as previously shown in Galea et al Nat Commun 2021 and Butler et al J Cell Sci 2019. We have now amended the visualisation of retraction to make representation of differences between conditions more intuitive. 

      (4) Figure 2d. Do the cells become thicker after recoil?

      This is unlikely because the ablated surface remains in the focal plane. Unfortunately, we are unable to image perpendicularly to the direction of ablation to test whether their apical surface moves in Z even by a very small amount. This has now been clarified in the results:

      Results: “The ablated surface remained within the focal plane after ablation, indicating minimal movement along the apical-basal axis.”

      (6) Lines 403-415. The authors report poor neural induction and neuronal differentiation in GOSB2. As far as I understand, this phenotype does not represent the in vivo situation. Thus, it is not clear to what extent the in vitro 2D model describes the human patient.

      The GOSB2 iPSC line we describe does represent the in vivo situation in Med24 knockout mouse embryos, but is clearly less severe because we are still able to detect MED24 protein expressed in this line. We do not have detailed clinical data of the patient from which this line was obtained to determine whether their neurological development is normal. However, it is well established that some individuals who have spina bifida also have abnormalities in supratentorial brain development. It is therefore likely that abnormalities in neuron differentiation/maturation are concomitant with spina bifida. Our findings in the GOSB2 line complement earlier studies which also identified deficiencies in the ability of patient-derived lines to form neurons, but were unable to functionally assess neuroepithelial cell behaviours we studied. This has now been clarified in the discussion:

      Discussion: “Neuroepithelial cells of the GOSB2 line described here, which has partial loss of MED24, similarly produces a thinner neuroepithelium with larger apical areas. Although apical areas were not analysed in mouse models of Med24 deletion, these embryos also have shorter and non-pseudostratified neuroepithelium. 

      Our GOSB2 line – which retains readily detectable MED24 protein – is clearly less severe than the mouse global knockout, and the clinical features of the patient from which this line was derived are milder than the phenotype of Med24 knockout embryos[68].

      Mouse embryos lacking one of Med24’s interaction partners in the mediator complex, Med1, also have thinner neuroepithelium and diminished neuronal differentiation but successfully close their neural tube[85].”

      (7) The experimental feat to derive cell lines from amniotic fluid and to perform experiments before birth is, in my view, heroic. However, I do not feel I learned much from the in vitro assays. There are many genetic changes that may cause the in vivo phenotype in the patient. The authors focus on MED24, but there is not enough convincing evidence that this is the key gene. I would like to suggest overexpression of MED24 as a rescue experiment, but I am not sure this is a single-gene phenotype. In addition, the fact that one patient line does not differentiate properly leads me to think that the patient lines do not strengthen the manuscript, and that perhaps additional clean mutations might contribute more.

      We appreciate the reviewer’s praise of our personalised medicine approach and fully agree that neural tube defects are rarely monogenic. The patient lines we studied were not intended to provide mechanistic insight, but rather to demonstrate the future applicability of our approach to patient care. Our vision is that every patient referred for fetal surgery of spina bifida will have amniocytes (collected as part of routine cystocentesis required before surgery) reprogrammed and differentiated into neuroepithelial cells, then neural progenitors, to help stratify their postnatal care. One could also picture these cells becoming an autologous source for future cellbased therapies if they pass our reproducible analysis pipeline as functional quality control. This has now been clarified in the discussion:

      Discussion: “The multi-genic nature of neural tube defect susceptibility, compounded by uncontrolled environmental risk factors (including maternal age and parity[102]), mean that patient-derived iPSC models are unlikely to provide mechanistic insight. They do provide personalised disease models which we anticipate will enable functional validation of genetic diagnoses for patients and their parents’ recurrence risk in future pregnancies, and may eventually stratify patients’ postnatal care. We also envision this model will enable quality control of patient-derived cells intended for future autologous cell replacement therapies, as is being developed in post-natal spinal cord injury[103]. Thus, the highly reproducible modelling platform we evaluate – which is robust to differences in iPSC reprogramming method, sex and ethnicity – represents a valuable tool for future mechanistic insights and personalised disease modelling applications.”

      Significance:

      In addition, the model was unsuccessful in one of the two patient-derived lines, which limits generalizability and weakens claims of patient-specific predictive value.

      We disagree with the reviewer that “the model was unsuccessful in one of the two patientderived lines”. The GOSB1 line demonstrated deficiency of neuron differentiation independently of neuroepithelial biomechanical function, whereas the GOSB2 line showed earlier failure of neuroepithelial function. We also do not, at this stage, make patient-specific predictive claims: this will require longer-term matching of cell model findings with patient phenotypes over the next 5-10 years.  

      Reviewer #3:

      Major comments

      (1) One of my few concerns with this work is that the relative constriction of the apical surface with respect to the basal surface is not directly quantified for any of the experiments. This worry is slightly compounded by the 3D reconstructions Figure 1h, and the observation that overall cell volume is reduced and cell height increased simultaneously to area loss. Additionally, the net impact of apical constriction in tissues in vivo is to create local or global curvature change, but all the images in the paper suggest that the differentiated neural tissues are an uncurved monolayer even missing local buckles. I understand that these cells are grown on flat adherent surfaces limiting global curvature change, but is there evidence of localized buckling in the monolayer? While I believe-along with the authors-that their phenotypes are likely failures in apical constriction, I think they should work to strengthen this conclusion. I think the easiest way (and hopefully using data they already have) would be to directly compare apical area to basal area on a cell wise basis for some number of cells. Given the heterogeneity of cells, perhaps 30-50 cells per condition/line/mutant would be good? I am open to other approaches; this just seems like it may not require additional experiments.

      As the reviewer observes, our cultures cannot bend because they are adhered on a rigid surface. The apical and basal lengths of the cultures will therefore necessarily be roughly equal in length. Some inwards bending of the epithelium is expected at the edges of the dish, but these cannot be imaged. The live imaging we show in Figure 2 illustrates that, just as happens in vivo, apical constriction is asynchronous. This means not all cells will have ‘bottle’ shapes in the same culture. We now illustrate the evolution of these shapes in more detail in Supplementary Figure 1.

      Additionally, the reviewer’s comment motivated us to investigate local buckles in the apical surface of our cultures when their apical surfaces are dilated by ROCK inhibition. We hypothesised that the very straight apical surface in normal cultures is achieved by a balance of apical cell size and tension with pressure differences at the cell-liquid interface. Consistent with our expectation, the apical surface of ROCK-inhibited cultures becomes wrinkled (Supplementary figure 4). The VANGL2-KI lines do not develop this tortuous apical surface (as shown in Figure 3), which is to be expected given their modification is present throughout differentiation unlike the acute dilation caused by ROCK inhibition.

      This new data complements our visualisation of apical constriction in live imaging, apical accumulation of phospho-myosin, and quantification of ROCK-dependent apical tension as independent lines of evidence that our cultures undergo apical constriction. 

      (2) Another slight experimental concern I have regards the difference in laser ablation experiments detailed in Figure 3h-i from those of Figure 2d-e. It seems like WT recoil values in 3h-I are more variable and of a lower average than the earlier experiments and given that it appears significance is reached mainly by impact of the lower values, can the authors explain if this variability is expected to be due to heterogeneity in the tissue, i.e. some areas have higher local tension? If so, would that correspond with more local apical constriction?

      There is no significant difference in recoil between the control lines in Figures 2 and 3, albeit the data in Figure 3 is more variable (necessitating more replicates: none were excluded). We also showed laser ablation recoil data in Supplementary Figure 10, in which we did identify a graphing error (now corrected, also no significant difference in recoil from the other control groups as shown in Author response image 3).

      Author response image 3.

      Recoil following laser ablation is not significantly different between different experiments. X axis labels indicate the figure panel each set of ablation data is shown in. Points represent an independent differentiation dish.

      (4)(Minor) I think some of the commentary on the strengths and limitations of the model found in the Results section should be collated and moved to the discussion in a single paragraph. For example, this could also briefly touch on/compare to some of the other models utilizing hiPSCs (These are mentioned briefly in the intro, but this comparison could be elaborated on a bit after seeing all the great data in this work).

      These changes have now been made:

      Discussion: “Some of these limitations, potentially including inclusion of environmental risk factors, can be addressed by using alternative iPSC-derived models[93,94]. For example, if patients have suspected causative mutations in genes specific to the surface (non-neural) ectoderm, such as GRHL2/3, 3D models described by Karzbrun et al[49] or Huang et al[95] may be informative. Characterisation of surface ectoderm behaviours in those models is currently lacking. These models are particularly useful for high-throughput screens of induced mutations[95], but their reproducibility between cell lines, necessary to compare patient samples to non-congenic controls, remains to be validated. Spinal cell identities can be generated in human spinal cord organoids, although these have highly variable morphologies[96,97]. As such, each iPSC model presents limitations and opportunities, to which this study contributes a reductionist and highly reproducible system in which to quantitatively compare multiple neuroepithelial functions.”

      (5) While the authors are generally good about labeling figures by the day post smad inhibition, in some figures it is not clear either from the images or the legend text. I believe this includes supplemental figures 2,5,6,8, and 10 (apologies if I simply missed it in one or more of them)

      These have now been added.

      (6) The legend for Figure 2 refers to a panel that is not present and the remaining panel descriptions are off by a letter. I'm guessing this is a versioning error as the text itself seems largely correct, but it may be good to check for any other similar errors that snuck in

      This has now been corrected.

      (7) The cell outlines in Figure 3d are a bit hard to see both in print and on the screen, perhaps increase the displayed intensity?

      This has now been corrected.

      Description of analyses that authors prefer not to carry out

      R2.5. Figure 3. The authors mention their previous study in which they show that Vangl2 is not cell-autonomously required for neural closure. It will be interesting to study whether this also the case in the present human model by using mosaic cultures.

      The reviewer is correct that this is one of the exciting potential future applications of our model, which will first require us to generate stable fluorescently-tagged lines (to identify those cells which lack VANGL2). We will also need to extensively analyze controls to validate that mixing fluo-tagged and untagged lines does not alter the homogeneity of differentiation, or apical constriction, independently of VANGL2 deletion. As such, the reviewer is suggesting an altogether new project which carries considerable risk and will require us to secure dedicated funding to undertake.

      R3.8(Minor) The authors show a fascinating piece of data in Supplementary Figure 1, demonstrating that nuclear volume is halved by day 8. Do they have any indication if the DNA content remains constant (e.g., integrated DAPI density)? I suppose it must, and this is a minor point in the grand scheme, but this represents a significant nuclear remodeling and may impact the overall DNA accessibility.

      We agree with the reviewer that the reduction in nuclear volume is important data both because it informs understanding of the reduction in total cell volume, and because it suggests active chromatin compaction during differentiation. Unfortunately, the thicker epithelium and superimposition of nuclei in the differentiated condition means the laser light path is substantially different, making direct comparisons of intensity uninterpretable. Additionally, the apical-most nuclei will mostly be in G2/M phase due to interkinetic nuclear migration. As such, the comparison of DAPI integrated density between epithelial morphologies would not be informative (Author response image 4).

      Author response image 4.

      Lateral views of DAPI-stained nuclei on Days 2 and 8 of differentiation. Note the rapid loss of staining intensity below the apical pseudo-row of nuclei on Day 8. This intensity change is likely due to the apical nuclei being in G2/M phase and therefore having more DNA, and rapid loss of 405nm wavelength signal at depth.

    1. eLife Assessment

      The authors describe an interesting approach to studying the dynamics and function of membrane proteins in different lipid environments. The fundamental findings have theoretical and practical implications beyond the study of EGFR to all membrane signalling proteins. The evidence supporting the conclusions is compelling, based on the use of a nanodisk system to study membrane proteins in vitro, combined with state-of-the-art single-molecule FRET. The work will be of broad interest to cell biologists and biochemists.

    2. Reviewer #1 (Public review):

      Summary:

      This work addresses a key question in cell signalling, how does the membrane composition affect the behaviour of a membrane signalling protein? Understanding this is important, not just to understand basic biological function but because membrane composition is highly altered in diseases such as cancer and neurodegenerative disease. Although parts of this question have been addressed on fragments of the target membrane protein, EGFR, used here, Srinivasan et al. harness a unique tool, membrane nanodisks, which allow them to probe full length EGFR in vitro in great detail with cutting-edge fluorescent tools. They find interested impacts on EGFR conformation in differently charged and fluid membranes, explaining previously identified signalling phenotypes.

      Strengths:

      The nanodisk system enables full length EGFR to be studied in vitro and in a membrane with varying lipid and cholesterol concentrations. The authors combine this with single-molecule FRET utilising multiple pairs of fluorophores at different places on the protein to probe different conformational changes in response to EGF binding under different anionic lipid and cholesterol concentrations. They further support their findings using molecular dynamics simulations which help uncover the full atomistic detail of the conformations they observe.

      Weaknesses:

      Much of the interpretation of the results comes down to a bimodal model of an 'open' and 'closed' state between the intracellular tail of the protein and the membrane. Some of the data looks like a bimodal model is appropriate but not all. The authors have just this bimodal model statistically and although adding a third component is a better fit, I agree with the authors that it cannot be justified statistically, given the data. Further work beyond the scope of this study would be needed to try to define further states.

    3. Reviewer #2 (Public review):

      Summary:

      Nanodiscs and synthesized EGFR are co-assembled directly in cell-free reactions. Nanodiscs containing membranes with different lipid compositions are obtained by providing liposomes with corresponding lipid mixtures in the reaction. The authors focus on the effects of lipid charge and fluidity on EGFR activity.

      Strengths:

      The authors implement a variety of complementary techniques to analyze data and to verify results. They further provide a new pipeline to study lipid effects on membrane protein function. The manuscript describes a comprehensive study on the analysis of membrane protein function in context of different lipid environments.

      Weaknesses:

      As the implemented strategy is relatively new, some uncertainties in the interpretation of the data consequently remain. However, using state-of-the-art techniques, the authors support their results by appropriate data and sufficient controls in the revised manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work addresses a key question in cell signalling: how does the membrane composition affect the behaviour of a membrane signalling protein? Understanding this is important, not just to understand basic biological function but because membrane composition is highly altered in diseases such as cancer and neurodegenerative disease. Although parts of this question have been addressed on fragments of the target membrane protein, EGFR, used here, Srinivasan et al. harness a unique tool, membrane nanodisks, which allow them to probe full-length EGFR in vitro in great detail with cutting-edge fluorescent tools. They find interesting impacts on EGFR conformation in differently charged and fluid membranes, explaining previously identified signalling phenotypes.

      Strengths:

      The nanodisk system enables full-length EGFR to be studied in vitro and in a membrane with varying lipid and cholesterol concentrations. The authors combine this with single-molecule FRET utilising multiple pairs of fluorophores at different places on the protein to probe different conformational changes in response to EGF binding under different anionic lipid and cholesterol concentrations. They further support their findings using molecular dynamics simulations, which help uncover the full atomistic detail of the conformations they observe.

      Weaknesses:

      Much of the interpretation of the results comes down to a bimodal model of an 'open' and 'closed' state between the intracellular tail of the protein and the membrane. Some of the data looks like a bimodal model is appropriate, but its use is not sufficiently justified (statistically or otherwise) in this work in its current form. The experiments with varying cholesterol in particular appear to suggest an alternate model with longer fluorescent lifetimes. More justification of these interpretations of the central experiment of this work would strengthen the paper.

      We thank the reviewer for highlighting the strengths of the study, including the use of nanodiscs, single-molecule FRET, and MD simulations to probe full-length EGFR in controlled membrane environments.

      We agree that statistical justification is important for interpreting the distributions. To address this, we performed global fits of the data with both two- and three-Gaussian models and evaluated them using the Bayesian Information Criterion (BIC), which balances the model fit with a penalty for additional parameters. The three-Gaussian model gave a substantially lower BIC, indicating statistical preference for the more complex model. However, we also assessed the separability of the Gaussian components using Ashman’s D, which quantifies whether peaks are distinct. This analysis showed that two Gaussians (µ = 2.64 and 3.43 ns) are not separable, implying they represent one broad distribution rather than two states.

      Author response table 1.

      Both the two- and three-Gaussian models include a low-value component (µ = ~1.3 ns), but the apparent improvement of the three-Gaussian model arises only from splitting the central population into two overlapping Gaussians. Thus, while the BIC favors the three-Gaussian model statistically, Ashman’s D demonstrates that the central peak should not be interpreted as bimodal. Therefore, when all the distributions are fit globally, the data are best explained as two Gaussians, one centered at ~1.3 ns and the other at ~2.7 ns, with cholesterol-dependent shifts reflecting changes in the distribution of this population rather than the emergence of a separate state. Finally, we acknowledge that additional conformations may exist, but based on this analysis a bimodal model describes the populations captured in our data and so we limit ourselves to this simplest framework.

      We have clarified this in the revised manuscript by adding a section in the Methods (page 26) titled Model Selection and Statistical Analysis, which describes the results of the global two- versus three-Gaussian fits evaluated using BIC and Ashman’s D. Additional details of these analyses are also provided in response to Reviewer #1, Question 8 (Recommendations for the authors).

      Reviewer #2 (Public review):

      Summary:

      Nanodiscs and synthesized EGFR are co-assembled directly in cell-free reactions. Nanodiscs containing membranes with different lipid compositions are obtained by providing liposomes with corresponding lipid mixtures in the reaction. The authors focus on the effects of lipid charge and fluidity on EGFR activity.

      Strengths:

      The authors implement a variety of complementary techniques to analyze data and to verify results. They further provide a new pipeline to study lipid effects on membrane protein function.

      We thank the reviewer for noting the strengths of our approach, particularly the use of complementary techniques and the development of a new pipeline to study lipid effects on membrane protein function.

      Weaknesses:

      Due to the relative novelty of the approach, a number of concerns remain.

      (1) I am a little skeptical about the good correlation of the nanodisc compositions with the liposome compositions. I would rather have expected a kind of clustering of individual lipid types in the liposome membrane, in particular of cholesterol. This should then result in an uneven distribution upon nanodisc assembly, i.e., in a notable variation of lipid composition in the individual nanodiscs. Could this be ruled out by the implemented assays, or can just the overall lipid composition of the complete nanodisc fraction be analyzed?

      We monitored insertion of anionic lipids into nanodiscs by performing zeta potential measurements, which report on surface charge, and cholesterol insertion by Laurdan fluorescence, which reports on membrane order. Both assays provide information at the ensemble level, not single-nanodisc resolution. We clarified this in the Methods section (see below).

      Cholesterol clustering is well documented in ternary systems with saturated lipids and sphingolipids [Veatch, Biophys J., 2003; Risselada, PNAS, 2008]. However, in unsaturated POPC-cholesterol mixtures such as those used here, cholesterol primarily alters bilayer order and large-scale segregation is not typically observed.  The addition of POPS to the POPC-cholesterol mixture perturbs cholesterol-induced ordering, lowering the likelihood of cholesterol-rich domains [Kumar, J. Mol. Graphics Modell., 2021].

      Lipid heterogeneity between nanodiscs would be expected to give rise to heterogeneity in hydrodynamic properties, including potentially broadening the dynamic light scattering (DLS) distributions. However, the full width at half maximum (FWHM) values from the DLS measurements (see Author response table 2) do not indicate a broadening with cholesterol. Statistical testing (Mann-Whitney U test for non-normal data) showed no significant difference between samples with and without cholesterol (p = 0.486; n = 4 per group). While the sample size is small making firm conclusions challenging, these results suggest that large-scale heterogeneity is unlikely.

      Author response table 2.

      In the case of POPS lipids, clustering of POPS in EGFR embedded nanodiscs is a recognized property of receptor-lipid interactions. Molecular dynamics simulations have shown that POPS, although constituting only 30% of the inner leaflet, accounts for ~50% of the lipids directly contacting EGFR [Arkhipov, Cell, 2013], underscoring that anionic lipids are preferentially recruited to the receptor’s immediate environment.

      For nanodiscs containing cholesterol and anionic lipids, our smFRET experiments were designed to isolate the effect of EGF binding. The nanodisc population is the same in the ± EGF conditions as EGF was introduced just prior to performing sm-FRET experiments, and not during nanodisc assembly. Thus, for a given lipid composition, any observed differences between ligand-free and ligand-bound states reflect conformational changes of EGFR.

      Methods, page 23, “Zeta potential measurements to quantify surface charge of nanodiscs: Data analysis was processed using the instrumental Malvern’s DTS software to obtain the mean zeta-potential value. This ensemble measurement reports the average surface charge of the nanodisc population, verifying incorporation of anionic POPS lipids.”

      Methods, page 23, “Fluorescence measurements with Laurdan to confirm cholesterol insertion into nanodiscs: The excitation spectrum was recorded by collecting the emission at 440 nm and emission spectra was recorded by exciting the sample at 385 nm. Laurdan fluorescence provides an ensemble readout of membrane order and confirms cholesterol incorporation into the nanodisc population. While laurdan does not resolve the composition of individual nanodiscs, prior work has shown that POPC–cholesterol mixtures are miscible without forming cholesterol-rich domains[91,92], thus the observed ordering changes likely reflect the intended input cholesterol content at the ensemble level.”

      (91) Veatch, S. L. & Keller, S. L. Separation of liquid phases in giant vesicles of ternary mixtures of phospholipids and cholesterol. Biophysical journal, 85(5), 3074-3083 (2003).

      (92) Risselada, H. J. & Marrink, S. J. The molecular face of lipid rafts in model membranes. Proceedings of the National Academy of Sciences 105(45), 17367–17372 (2008).

      (2) Both templates have been added simultaneously, with a 100-fold excess of the EGFR template. Was this the result of optimization? How is the kinetics of protein production? As EGFR is in far excess, a significant precipitation, at least in the early period of the reaction, due to limiting nanodiscs, should be expected. How is the oligomeric form of the inserted EGFR? Have multiple insertions into one nanodisc been observed?

      We thank the reviewer for these insightful questions. Yes, the EGFR:ApoA1∆49 template ratio of 100:1 was empirically determined through optimization experiments now shown in the revised Supplementary Fig. 3. Cell-free reactions were performed across a range of EGFR:ApoA1∆49 template ratios (1:2 to 1:200) and sampled at different time points (2-19 hours). As shown in the gels, EGFR expression increased with higher template ratios and longer reaction times up to ~9 hours, while ApoA1 expression became clearly detectable only after 6 hours. Based on these results, we selected an EGFR:ApoA1∆49 ratio of 100:1 and 8-hour reaction time as the optimal condition, which yielded sufficient full-length EGFR incorporated into nanodiscs for ensemble and single-molecule experiments.

      In cell-free systems, protein yield does not scale directly with DNA template concentration, as translation efficiency is limited by factors such as ribosome availability and co-translational membrane insertion [Hunt, Chem. Rev., 2024; Blackholly, Front. Mol. Biosci., 2022]. Consistent with this, we observed that ApoA1∆49 is produced at higher levels than EGFR despite the lower DNA input (Supplementary Fig. 2b). Providing an excess EGFR template prevents the reaction from becoming limited by scaffold availability and helps compensate for the fact that, as a large multi-domain receptor, EGFR expression can yield truncated as well as full-length products. This strategy ensures that sufficient full-length receptors are available for nanodisc incorporation. We will clarify this in the Methods section (see below).

      We observed little to no visible precipitation under the reported cell-free conditions, likely due to the following reasons: (i) EGFR and ApoA1∆49 are co-expressed in the cell-free reaction, and ApoA1∆49 assembles into nanodiscs concurrently with receptor translation, providing an immediate membrane sink (ii) ApoA1∆49 is expressed at high levels, maintaining disc concentrations that keep the reaction in a soluble regime.

      The sample contains donor-labeled EGFR (snap surface 594) together with acceptor-labeled lipids (cy5-labeled PE doped in the nanodisc). We assess the oligomerization state of EGFR in nanodiscs using single-molecule photobleaching of the donor channel. Snap surface 594 is a benzyl guanine derivative of Atto 594 that reacts with the SNAP tag with near-stoichiometry efficiency [Sun, Chembiochem, 2011]. Most molecules (~75%) exhibited a single photobleaching step, consistent with incorporation of a single EGFR per nanodisc [Srinivasan, Nat. Commun., 2022]. A minority of traces (~15%) showed two photobleaching steps and about ~10% of traces showed three or more photobleaching steps, consistent with occasional multiple insertions. For all smFRET analysis, we restricted the dataset to single-step photobleaching traces, ensuring measurements were performed on monomeric EGFR.

      Methods, page 20, “Production of labeled, full-length EGFR nanodiscs: Briefly, the E.Coli slyD lysate, in vitro protein synthesis E.Coli reaction buffer, amino acids (-Methionine), Methionine, T7 Enzyme, protease inhibitor cocktail (Thermofisher Scientific), RNAse inhibitor (Roche) and DNA plasmids (20ug of EGFR and 0.2ug of ApoA1∆49) were mixed with different lipid mixtures. The DNA template ratio of EGFR:ApoA1∆49 = 100:1 was empirically chosen by testing different ratios on SDS-PAGE gels and selecting the condition that maximized full-length EGFR expression in DMPC lipids (Supplementary Fig. 3).”

      (3) The IMAC purification does not discriminate between EGFR-filled and empty nanodiscs. Does the TEM study give any information about the composition of the particles (empty, EGFR monomers, or EGFR oligomers)? Normalizing the measured fluorescence, i.e., the total amount of solubilized receptor, with the total protein concentration of the samples could give some data on the stoichiometry of EGFR and nanodiscs.

      Negative-stain TEM was performed to confirm nanodisc formation and morphology, but this method does not resolve whether a given disc contains EGFR. To directly assess receptor stoichiometry, we instead relied on single-molecule photobleaching of snap surface 594-labeled EGFR (see response to Point 2). These experiments showed that the majority of nanodiscs contain a single receptor, with a minority containing two receptors. For all smFRET analyses, we restricted data to single-step photobleaching traces, ensuring measurements were performed on monomeric EGFR.

      We did not normalize EGFR fluorescence to total protein concentration because the bulk protein fraction after IMAC purification includes both receptor-loaded and empty nanodiscs. The latter contribute to ApoA1∆49 mass but do not contain receptors and including them would underestimate receptor occupancy. Importantly, the presence of empty nanodiscs does not affect our measurements as photobleaching and single-molecule FRET analyses selectively report only on receptor-containing nanodiscs. This clarification has been added to the Methods.

      Methods, page 26, “Fluorescence Spectroscopy: Traces with a single photobleaching step for the donor and acceptor were considered for further analysis. Regions of constant intensity in the traces were identified by a change-point algorithm95. Donor traces were assigned as FRET levels until acceptor photobleaching. The presence of empty nanodiscs does not influence these measurements, as photobleaching and single-molecule FRET analyses selectively report on receptor-containing nanodiscs.”

      (4) The authors generally assume a 100% functional folding of EGFR in all analyzed environments. While this could be the case, with some other membrane proteins, it was shown that only a fraction of the nanodisc solubilized particles are in functional conformation. Furthermore, the percentage of solubilized and folded membrane protein may change with the membrane composition of the supplied nanodiscs, while non-charged lipids mostly gave rather poor sample quality. The authors normalize the ATP binding to the total amount of detectable EGFR, and variations are interpreted as suppression of activity. Would the presence of unfolded EGFR fractions in some samples with no access to ATP binding be an alternative interpretation?

      We agree that not all nanodisc-embedded EGFR molecules may be fully functional and that the fraction of folded protein could vary with lipid composition. In our ATP-binding assay, EGFR detection relies on the C-terminal SNAP-tag fused to an intrinsically disordered region. Successful labeling requires that this segment be translated, accessible, and folded sufficiently to accommodate the SNAP reaction, which imposes an additional requirement compared to the rigid, structured kinase domain where ATP binds. Misfolded or truncated EGFR molecules would therefore likely fail to label at the C-terminus. These factors strongly imply that our assay predominantly reports on receptor molecules that are intact and well folded.

      Additionally, our molecular dynamics simulations at 0% and 30% POPS support the experimental ATP-binding measurements (Fig. 2c, d). This consistency between both the experimental and simulated evidence, including at 0% POPS where reduced receptor folding might be expected, suggests that the observed lipid-dependent changes are more likely due to modulation of the functional receptor rather than receptor misfolding. We have clarified these points by adding the following

      Results, page 7, “Role of anionic lipids in EGFR kinase activity: In the presence of EGF, increasing the anionic lipid content decreased the number of contacts from 71.8 ± 1.8 to 67.8 ± 2.4, indicating increased accessibility, again in line with the experimental findings. Because detection of EGFR relies on labeling at the C-terminus and ATP binding requires an intact kinase domain, the ATPbinding assay is for receptors that are properly folded and competent for nucleotide binding. The consistency between experimental results and MD simulations suggests that the observed lipiddependent changes are more likely due to modulation of functional EGFR than to artifacts from misfolding.”

      Reviewer #1 (Recommendations for the authors):

      The experimental program presented here is excellent, and the results are highly interesting. My enthusiasm is dampened by the presentation in places which is confusing, especially Figure 3, which contains so many of the results. I also have some reservations about the bimodal interpretation of the lifetime data in Figure 3.

      We thank the reviewer for their positive assessment of our experimental approach and results. In the revised version, we have improved figure organization and readability by adding explicit labels for lipid composition and EGF presence/absence in all lifetime distributions, moving key supplementary tables into main text, and reorganizing the supplementary figures as Extended Data Figures following eLife’s format. Figures and tables now appear in the order in which they are referenced in the text to further improve readability.

      Regarding the bimodal interpretation of the lifetime distribution, we have performed global fits of the data with both two- and three-Gaussian models and evaluated them using the Bayesian Information Criterion (BIC) and Ashman’s D analysis, which supported the bimodal interpretation. Details of this analysis are provided in our response to comment (8) below and included in the manuscript.

      Specific comments below:

      (1) Abstract -"Identifying and investigating this contribution have been challenging owing to the complex composition of the plasma membrane" should be "has".

      We have corrected this error in the revised manuscript.

      (2) Results - p4 - some explanation of what POPC/POPS are would be helpful.

      We have added the text below discussing POPC and POPS.

      Results, page 4, “POPC is a zwitterionic phospholipid forming neutral membranes, whereas POPS carries a net negative charge and provides anionic character to the bilayer[56]. Both PC and PS lipids are common constituents of mammalian plasma membranes, with PC enriched in the outer leaflet and PS in the inner leaflet[22].”

      (22) Lorent, J. H., Levental, K. R., Ganesan, L., Rivera-Longsworth, G., Sezgin, E., Doktorova, M., Lyman, E. & Levental, I. Plasma membranes are asymmetric in lipid unsaturation, packing and protein shape. Nature Chemical Biology 16, 644–652 (2020).

      (56) Her, C., Filoti, D. I., McLean, M. A., Sligar, S. G., Ross, J. A., Steele, H. & Laue, T. M. The charge properties of phospholipid nanodiscs. Biophysical journal 111(5), 989–998 (2016).

      (3) Figure 2b - it would be easier to compare if these were plotted on top of each other. Are we at saturating ATP binding concentration or below it? Also, please put a key to say purple - absent and orange +EGF on the figure. I am also confused as to why, with no EGF, ATP binding is high with 0% POPS, but low when EGF is present, but that then reverses with physiological lipid content.

      While we agree that a direct comparison would be easier, the ATP-binding experiments for the ± EGF conditions were actually performed independently on separate SDS-PAGE gels, which unfortunately precludes such a comparison. We have added a color key to clarify the -EGF and +EGF datasets.

      The experiments were carried out at 1 µM of the fluorescently labeled ATP analogue (atto647Nγ ATP). Reported kinetic measurements for the isolated EGFR kinase domain indicate an K<sub>m</sub> of 5.2 µM suggesting that our experimental concentration is below, but close to the saturating range ensuring sensitivity to changes in accessibility of the binding site rather than saturating all available receptors.

      We have revised the manuscript to clarify these details by including the following text:

      Results, page 6, “To investigate how the membrane composition impacts accessibility, we measured ATP binding levels for EGFR in membranes with different anionic lipid content. 1 µM of fluorescently-labeled ATP analogue, atto647N-γ ATP, which binds irreversibly to the active site, was added to samples of EGFR nanodiscs with 0%, 15%, 30% or 60% anionic lipid content in the absence or presence of EGF.”

      Methods, page 24, “ATP binding experiments: Full-length EGFR in different lipid environments was prepared using cell-free expression as described above. 1μM of snap surface 488 (New England Biolabs) and atto647N labeled gamma ATP (Jena Bioscience) was added after cell-free expression and incubated at 30 °C , 300 rpm for 60 minutes. 1μM of atto647N-γ ATP was used, corresponding to a concentration near the reported Km of 5.2 µM for ATP binding to the isolated EGFR kinase domain[93], ensuring sensitivity to lipid-dependent changes in ATP accessibility.”

      (ii) Nucleotide binding is suppressed under basal conditions, likely to ensure that the catalytic activity is promoted only upon EGF stimulation.

      The molecular dynamics simulations at 0% and 30% POPS further support this interpretation, showing that anionic lipids modulate the accessibility of the ATP-binding site in a manner consistent with experimental trends (Fig. 2c and 2d).

      We have clarified these points in the main text with the following additions:

      Results, page 6, “In the presence of EGF, ATP binding overall increased with anionic lipid content with the highest levels observed in 60% POPS bilayers. In the neutral bilayer, ligand seemed to suppress ATP binding, indicating anionic lipids are required for the regulated activation of EGFR.”

      Results, page 7, “In the absence of EGF, increasing the anionic lipid content from 0\% POPS to 30% POPS increased the number of ATP-lipid contacts 58.6±0.7 to 74.4±1.2, indicating reduced accessibility, consistent with the experimental results and suggesting anionic lipids are required for ligand-induced EGFR activity.”

      (93) Yun, C. H., Mengwasser, K. E., Toms, A. V., Woo, M. S., Greulich, H., Wong, K. K., Meyerson,M. & Eck, M.J. The T790M mutation in EGFR kinase causes drug resistance by increasing the affinity for ATP. PNAS, 105(6), 2070–2075 (2008).

      (4) Figure 2d - how was the 16A distance arrived at?

      We thank the reviewer for pointing this out. The 16 Å cutoff was chosen based on the physical dimensions of the ATP analogue used in the experiments. Specifically, the largest radius of the atto647N-γ ATP molecule is ~16.9 Å, which defines the maximum distance at which lipid atoms could sterically obstruct access of ATP to the binding pocket. Accordingly, in the simulations, contacts were defined as pairs of coarse-grained atoms between lipid molecules and the residues forming the ATP-binding site (residues 694-703, 719, 766-769, 772-773, 817, 820, and 831) separated by less than 16 Å.

      We have rewritten the rationale for selecting the 16 Å cutoff in the Methods section to improve clarity.

      Methods, page 28, “Coarse-grained, Explicit-solvent Simulations with the MARTINI Force Field: We analyzed our simulations using WHAM[108,109] to reweight the umbrella biases and compute the average values of various metrics introduced in this manuscript. Specifically, we calculated the distance between Residue 721 and Residue 1186 (EGFR C-terminus) of the protein. To quantify the accessibility of the ATP-binding site, we calculated the number of contacts between lipid molecules and the residues forming the ATP-binding pocket (residues 694-703, 719, 766-769, 772-773, 817, 820, and 831)[110]. Close contact between the bilayer and these residues would sterically hinder ATP binding; thus, the contact number serves as a proxy for ATP-site accessibility. The cutoff distance for defining a contact was set to 16 Å, corresponding to the largest molecular radius of the fluorescent ATP analogue (atto647N-γ ATP, 16.96 Å111). Accordingly, we defined a contact as a pair of coarse-grained atoms, one from the lipid membrane and one from the ATP binding site, within a mutual distance of less than 16 Å.”

      (5) Figure 2e-h - I think a bar chart/violin plot/jitter plot would make it easier to compare the peak values. The statistics in the table should just be quoted in the text as value +/- error from the 95% confidence interval. The way it is written currently is confusing, as it implies that there is no conformational change with the addition of EGF in neutral lipids, but there is ~0.4nm one from the table. I don't understand what you mean by "The larger conformational response of these important domains suggests that the intracellular conformation may play a role in downstream signaling steps, such as binding of adaptor proteins"?

      We thank the reviewer for these suggestions. For the smFRET lifetime distributions (Figure 2j, k; previously Figure 2e, f), we have now included jitter plots of the donor lifetimes in the Supplementary Figure 11 to facilitate direct visual comparison of the median and distribution widths for each lipid composition and ±EGF conditions. The distance distributions for the ATP to C-terminus in Figure 2e, f (previously Figure 2g, h) were obtained from umbrella-sampling simulations that calculate free-energy profiles rather than raw, unbiased distance values. Because the sampling is guided by biasing potentials, individual distance values cannot be used to construct violin or jitter plots. We therefore present the simulation data only as probability density distributions, which best reflect the equilibrium distributions derived from them.

      We have also revised the text to report the median ± 95% confidence interval, improving clarity and consistency with the statistical table.

      Results, page 9: “In the neutral bilayer (0% POPS), the distributions in the absence of EGF peaks at 8.1 nm (95% CI: 8.0–8.2 nm) and in the presence of EGF peaks at 8.6 nm (95% CI: 8.5–8.7 nm) (Table 1, Supplementary Table 1). In the physiological regime of 30% POPS nanodiscs, the peak of the donor lifetime distribution shifts from 9.1 nm (95% CI: 8.9–9.2 nm) in the absence of EGF to 11.6 nm (95% CI: 11.1–12.6 nm) in the presence of EGF (Table 1, Supplementary Table 1), which is a larger EGF-induced conformational response than in neutral lipids.”

      Finally, we have rephrased the sentence in question for clarity. The revised text now reads:

      Results, page 9: “The larger conformational response observed in the presence of anionic lipids suggests that these lipids enhance the responsiveness of the intracellular domains to EGF, potentially ensuring interactions between C-terminal sites and adaptor proteins during downstream signaling.”

      (6) "r, highlighting that the charged lipids can enhance the conformational response even for protein regions far away from the plasma membrane" - is it not that the neutral membrane is just very weird and not physiological that EGFR and other proteins don't function properly?

      We agree with the reviewer that completely neutral (0% POPS) membranes are not physiological and likely do not support the native organization or activity of EGFR. We have revised the text to clarify that the 30% POPS condition represents a more native-like lipid environment that restores or stabilizes the expected conformational response, rather than "enhancing" it. The revised sentence now reads:

      Results, page 10: “Both experimental and computational results show a larger EGF-induced conformational change in the partially anionic bilayer, consistent with the notion that a partially anionic lipid bilayer provides a more native environment that supports proper receptor activation, compared to the non-physiological neutral membrane.”

      (7) "snap surface 594 on the C-terminal tail as the donor and the fluorescently-labeled lipid (Cy5) as the acceptor (Supplementary Fig. 2, 11)." Why not refer to Figure 3a here to make it easier to read?

      We have added the reference to Figure 3a, and we thank the Reviewer for the suggestion.

      (8) Figure 3 - the bimodality in many of these plots is dubious. It's very clear in some, i.e. 0% POPS +EGF, but not others. Can anything be done to justify bimodality better?

      We agree that statistical justification is important for interpreting lifetime distributions. To address this, we performed global fits of the data with both two- and three-Gaussian models and evaluated them using the Bayesian Information Criterion (BIC), which balances the model fit with a penalty for additional parameters. The three-Gaussian model gave a substantially lower BIC, indicating statistical preference for the more complex model. However, we also assessed the separability of the Gaussian components using Ashman’s D, which quantifies whether peaks are distinct. This analysis showed that two of the Gaussians are not separable, implying they represent one broad distribution rather than two discrete states. Therefore, when all the distributions are fit globally, the data are best described as two Gaussians, one centered at ~1.3 ns and the other at ~2.7 ns, with cholesterol-dependent shifts reflecting changes in the distribution of this population rather than the emergence of a separate state. We better justified our choice of model by incorporating the results of the global two- vs three-Gaussian fits with BIC and Ashman’s D analysis in the revised manuscript.

      Methods, page 27: “Model Selection and Statistical Analysis

      Global fitting of lifetime distributions was performed across all experimental conditions using maximum likelihood estimation. Both two-Gaussian and three-Gaussian distribution models were evaluated as described previously.62 Model performance was compared using the Bayesian Information Criterion (BIC),[101] which balances model likelihood and complexity according to

      BIC = -2 ln L + k ln n

      where L is the likelihood, k is the number of free parameters, and n is the number of singlemolecule photon bunches across all experimental conditions. A lower BIC value indicates a statistically better model[101]. The separation between Gaussian components was subsequently assessed using the Ashman’s D where a score above 2 indicates good separation[102]. For two Gaussian components with means µ1, µ2 and standard deviations σ1, σ2,

      where Dij represents the distance metric between Gaussian components i and j. All fitted parameters, likelihood values, BIC scores, and Ashman’s D values are summarized in Supplementary Table 5.”

      (101) Schwarz, G. Estimating the dimension of a model. The Annals of Statistics, 461–464 (1978).

      (102) Ashman, K. M., Bird, C. M. & Zepf, S. E. Detecting bimodality in astronomical datasets. The Astronomical Journal 108(6), 2348–2361 (1994).

      (9) Figure 3c - can you better label the POPS/POPC on here?

      We thank the reviewer for this suggestion. In the revised manuscript, Figure 3b (previously Figure 3c) has been updated to label the lipid composition corresponding to each smFRET distribution to make the comparison across conditions easier to follow.

      (10) Figure 3g - it looks like cholesterol causes a shift in both the peaks, such that the previous open and closed states are not the same, but that there are 2 new states. This is key as the authors state: "Remarkably, high anionic lipids and cholesterol content produce the same EGFR conformations but with opposite effects on signaling-suppression or enhancement." But this is only true if there really are the same conformational states for all lipid/cholesterol conditions. Again, the bimodal models used for all conditions need to be justified.

      We appreciate the reviewer’s insightful comment. We agree that the interpretation of the lifetime distributions depends on whether cholesterol and anionic lipids modulate existing conformational states or create new ones. To test this, we performed global fits of all distributions using the two- and three-Gaussian models and compared them using the Bayesian Information Criterion (BIC) and Ashman’s D, the results of which are described in detail in response to (8) above.

      Both fitting models, two- and three-Gaussian, identified the same short lifetime component (µ = 1.3 ns), suggesting this reflects a well separated conformation. While the three-Gaussian model gave a lower BIC, Ashman’s D analysis indicated that the two of the three components (µ = 2.6 ns and 3.4 ns) are not statistically separable, suggesting they represent a single broad conformational population rather than distinct states. If instead these two components reflected distinct states present under different conditions, Ashman’s D analysis would have found the opposite result. This supports our interpretation that high cholesterol and high anionic lipid content produce similar conformation ensembles with opposite effects on signaling output.

      Finally, we acknowledge that additional conformations may exist, but based on this analysis a bimodal model describes the populations captured in our data and so we limit ourselves to this simplest framework. We have clarified this rationale in the revised manuscript and added the results of the BIC and Ashman’s D analysis to support this interpretation.

      (11) Why are we jumping about between figures in the text? Figure 1d is mentioned after Figure 2. Also, DMPC is shown in the figures way before it is described in the text. It is very confusing. Figure 3 is so compact. I think it should be spread out and only shown in the order presented in the text. Different parts of the figure are referred to seemingly at random in the text. Why is DMPC first in the figure, when it is referred to last in the text?

      Following the Reviewer’s comment, we have revised the figure order and layout to improve readability and ensure consistency with the text. The previous Figures 1d-f which introduce the single-molecule fluorescence setup are now Figure 2g-i, positioned immediately before the first single-molecule FRET experiments (Fig 2j, k). The DMPC distribution in Figure 3 has been moved to the Supplementary Information (Supplementary Fig. 17), where it is shown alongside POPC, as these datasets are compared in the section “Mechanism of cholesterol inhibition of EGFR transmembrane conformational response”. The smFRET distributions in Figure 3 are now presented in the same sequence as they are discussed in the text, and the figure has been spread out for better clarity.

      (12) Throughout, I find the presentation of numerical results, their associated error, and whether they are statistically significantly different from each other confusing. A lot of this is in supplementary tables, but I think these need to go in the main text.

      To improve clarity and ensure that key quantitative results are easily accessible, we have moved the relevant supplementary tables to the main text. Specifically, the following tables have been incorporated into the main manuscript:

      (i) Median distance between the ATP binding site and the EGFR C-terminus, or between membrane and EGFR C-terminus from smFRET measurements (previously supplementary table 1 is now main table 1)

      (ii) Median distance between the membrane and the EGFR C-terminus in different anionic lipid environments (previously supplementary table 4 is now main table 2)

      (iii) Median distance between the membrane and the EGFR C-terminus in different cholesterol environments (previously supplementary table 8 and 12 is now combined to be main table 3)

      (13) Supplementary figures - in general, there is a need to consider how to combine or simplify these for eLife, as they will have to become extended data figures.

      We thank the reviewer for this helpful suggestion. In the revised manuscript, we have reorganized the supplementary figures into extended data figures in accordance with eLife’s format. Specifically:

      - Supplementary Figs. 1–7 are now grouped as Extended Data Figures for Figure 1 in the main text. They are now Figure 1 - figure supplements 1–7.

      - Supplementary Fig. 8–11 is now Extended Data Figure associated with Figure 2. It is now Figure 2 - figure supplements 1–4.

      - Supplementary Figs. 12–17 are now grouped as Extended Data Figures for Figure 3. They are now Figure 3 - figure supplements 1–6.

      (14) Supplementary Figure 2 - label what the two bands are in the EGFR and pEGFR sets at the bottom of panel c.

      We thank the reviewer for this comment. The two bands shown in the EGFR and pEGFR blots in Supplementary Fig. 2d (previously Supplementary Fig. 2c) corresponds to replicate samples under identical conditions. We have now clarified this in the figure legend and labeled the lanes as “Rep 1” and “Rep 2” in the revised figure and modified the figure legend.

      Supplementary Figure 2, page 31: “(d) Western blots were performed on labelled EGFR in nanodiscs. Anti-EGFR Western blots (left) and anti-phosphotyrosine Western blots (right) tested the presence of EGFR and its ability to undergo tyrosine phosphorylation, respectively, consistent with previous experiments on similar preparations[18, 54, 55]. The two lanes in each blot correspond to replicate samples under identical conditions.”

      (15) Supplementary Figures 3+4 - a bar chart/boxplot or similar would be easier for comparison here.

      In the revised version, we have replaced the histograms with jitter plots showing the nanodisc size distributions for each condition in supplementary figures 4 and 5 (previously supplementary figures 3 and 4). The plots display individual measurements with a horizontal line indicating the mean size (mean ± standard deviation values provided in the caption).

      (16) Supplementary Figures 10, 12, 13, 15, 16 - I would jitter these.

      We have incorporated jitter plots for the relevant datasets in Supplementary Figures 11, 13, 15, 16 and 17 (previously supplementary figures 10, 12 13, 15 and 16) to provide a clearer visualization of the data distributions and median values.

      Reviewer #2 (Recommendations for the authors):

      (1) Reactions were performed in 250 µL volumes. What is the average yield of solubilized EGFR in those reactions? Are there differences in the EGFR solubilization with the various lipid mixtures?

      The amount of solubilized EGFR produced in each 250 µL cell-free reaction was below the reliable detection limit for quantitative absorbance assays. At these protein levels, little to no EGFR precipitation was observed for all lipid compositions. Although exact yields could not be determined, fluorescence-based detection confirmed the presence of functional, nanodiscincorporated EGFR suitable for smFRET and ensemble fluorescence experiments. We observed variability in total yield between independent reactions within the same lipid composition, which is common for cell-free systems, but no consistent trend attributable to lipid composition.

      (2) Figure S2: It would be better to have a larger overview of the particles on a grid to get a better impression of sample homogeneity.

      TEM images showing a larger field of view have been added for each lipid composition in Supplementary Figures 4 and 5.

      (3) Figure 2b: It appears that there is some variation in the stoichiometry of ApoA1 and EGFR within the samples. Have equal amounts of each sample been analyzed? Are there, in addition, some precipitates of EGFR? It would further be good to have a negative control without expression to get more information about the additional bands in Figure S2b. As they do not appear in the fluorescent gel, it is unlikely that they represent premature terminations of EGFR.

      The fluorescence intensity from the bound ATP analogue (Atto 647N-ATP) and from the snap surface 488 label, which binds stoichiometrically to the SNAP tag at the EGFR C-terminus, was measured for each sample. The relative amount of ATP binding was quantified for each sample by normalizing to the EGFR content (Figure 2b). This normalization accounts for the different amounts of EGFR produced in each condition.

      We did not observe any visible precipitation under the reported cell-free conditions, likely due to the following reasons:

      (i) EGFR and ApoA1 are co-expressed in the cell-free reaction, and ApoA1 assembles into nanodiscs concurrently with receptor translation, providing an immediate membrane sink

      (ii) ApoA1 is expressed at high levels, maintaining disc concentrations that keep the reaction in a soluble regime.

      A control cell-free reaction containing only ApoA1∆49 (1 µg) and no EGFR template, analyzed after affinity purification, showed a single prominent band at ~ 25 kDa (gel image below), corresponding to ApoA1, along with faint background bands typical of Ni-NTA purification from cell-lysates. These weak, non-specific bands likely arise from co-purification of endogenous E.coli proteins.  

      The ApoA1∆49-only control gel has now been included as part of the supplementary figure 2.

      (4) Figure S2c: It would be better to show the whole lanes to document the specificity of the antibodies. Anti-Phosphor antibodies are frequently of poor selectivity. In that case, a negative control with corresponding tyrosine mutations would be helpful.

      We have updated Figure S2d (previously Figure S2c) to include the full gel lanes to better illustrate the specificity of both the total EGFR and phospho-EGFR (Y1068) antibodies. The results show a single clear band at the expected molecular weight for EGFR, conforming antibody specificity.

      (5) The Results section already contains quite some discussion. I would thus recommend combining both sections.

      We thank the reviewer for the suggestion. We have now created a results and discussion section to better reflect the content of these paragraphs, with the previous discussion section now a subsection focused on implications of these results.

    1. eLife Assessment

      This valuable paper advances understanding of the role of the HGF receptor, MET, in cancer cell invasion by demonstrating HGF-induced coordinated trafficking of MET and metalloprotease MT1-MMP into invadopodia. The results are generally solid, but there are concerns about the cell biology and whether the trafficking mechanism is clinically relevant. It's also unclear whether this is a general mechanism or specific to triple-negative breast cancer cells. The paper will be of interest to cancer cell biologists.

    2. Reviewer #1 (Public review):

      Summary:

      This study identifies a mechanism responsible for the accumulation of the MET receptor in invadopodia, following stimulation of Triple-negative breast cancer (TNBC) cells with HGF. HGF-driven accumulation and activation of MET in invadopodia causes the degradation of the extracellular matrix, promoting cancer cell invasion, a process here investigated using gelatin-degradation and spheroid invasion assays.

      Mechanistically, HGF stimulates the recycling of MET from RAB14-positive endodomes to invadopodia, increasing their formation. At invadopodia, MET induces matrix degradation via direct binding with the metalloprotease MT1-MMP. The delivery of MET from the recycling compartment to invadopodia is mediated by RCP, which facilitates the colocalization of MET to RAB14 endosomes. In this compartment, HGF induces the recruitment of the motor protein KIF16B, promoting the tubulation of the RAB14-MET recycling endosomes to the cell surface. This pathway is critical for the HGF-driven invasive properties of TNBC cells, as it is impaired upon silencing of RAB14.

      Strengths:

      The study is well-organized and executed using state-of-the-art technology. The effects of MET recycling in the formation of functional invadopodia are carefully studied, taking advantage of mutant forms of the receptor that are degradation-resistant or endocytosis-defective.

      Data analyses are rigorous, and appropriate controls are used in most of the assays to assess the specificity of the scored effects. Overall, the quality of the research is high.

      The conclusions are well-supported by the results, and the data and methodology are of interest for a wide audience of cell biologists.

      Weaknesses:

      The role of the MET receptor in invadopodia formation and cancer cell dissemination has been intensively studied in many settings, including triple-negative breast cancer cells. The novelty of the present study mostly consists of the detailed molecular description of the underlying mechanism based on HGF-driven MET recycling. The question of whether the identified pathway is specific for TNBC cells or represents a general mechanism of HGF-mediated invasion detectable in other cancer cells is not addressed or at least discussed.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Khamari and colleagues investigate how HGF-MET signaling and the intracellular trafficking of the MET receptor tyrosine kinase influence invadopodia formation and invasion in triple-negative breast cancer (TNBC) cells. They show that HGF stimulation enhances both the number of invadopodia and their proteolytic activity. Mechanistically, the authors demonstrate that HGF-induced, RAB4- and RCP-RAB14-KIF16B-dependent recycling routes deliver MET to the cell surface specifically at sites where invadopodia form. Moreover, they report that MET physically interacts with MT1-MMP - a key transmembrane metalloproteinase required for invadopodia function- and that these two proteins co-traffic to invadopodia upon HGF stimulation.

      Although the HGF-MET axis has previously been implicated in invadopodia regulation (e.g., by Rajadurai et al., Journal of Cell Science 2012), studies directly linking ligand-induced MET trafficking with the spatial regulation of MT1-MMP localization and activity have been lacking.

      Overall, the manuscript addresses a relevant and timely topic and provides several novel insights. However, some sections require clearer and more concise writing (details below). In addition, the quality, reliability, and robustness of several data sets need to be improved.

      Strengths:

      A key strength of the study is the novel demonstration that HGF-mediated, RAB4- and RAB14-dependent recycling of MET delivers this receptor, together with MT1-MMP, to invadopodia -highlighting a previously unrecognized mechanism, regulating the formation and proteolytic function of these invasive structures. Another strong point is the breadth of experimental approaches used and the substantial amount of supporting data. The authors also include an appropriate number of biological replicates and analyze a sufficiently large number of cells in their imaging experiments, as clearly described in the figure legends.

      Weaknesses:

      (1) Inappropriate stimulation times for endocytosis and recycling assays.

      The experiments examining MET endocytosis and recycling following HGF stimulation appear to use inappropriate incubation times. After ligand binding, RTKs typically undergo endocytosis within minutes and reach maximal endosomal accumulation within 5-15 minutes. Although continuous stimulation allows repeated rounds of internalization, the temporal dynamics of MET trafficking should be examined across shorter time points, ideally up to 1 hour (e.g., 15, 30, and 60 minutes). The authors used 2-, 3-, or 6-hour HGF stimulation, which, in my opinion, is far too long to study ligand-induced RTK trafficking.

      (2) Low efficiency of MET silencing in Figure S1I.

      The very low MET knockdown efficiency shown in Figure S1I raises concerns. Given the potential off-target effects of a single shRNA and the insufficient silencing level, it is difficult to conclude whether the reduction in invadopodia number in Figure 1F is genuinely MET-dependent. The authors later used siRNA-mediated silencing (Figure S5C), which was more effective. Why was this siRNA not used to generate the data in Figure 1F? Why did the authors rely on the inefficient shRNA C#3?

      (3) Missing information on incubation times and inconsistencies in MET protein levels.

      The figure legends do not indicate how long the cells were incubated with HGF or the MET inhibitor PHA665752 prior to immunoblotting. This information is crucial, particularly because both HGF and PHA665752 cause a substantial decrease in the total MET protein level. Notably, such a decrease is absent in MDA-MB-231 cells treated with HGF in the presence of cycloheximide (Figure S2F). The authors should comment on these inconsistencies.

      Additionally, the MET bands in Figure S1J appear different from those in Figure S1C, and MET phosphorylation seems already high under basal conditions, with no further increase upon stimulation (Figure S1J). The authors should address these issues.

      (4) Insufficient representation and randomization of microscopic data.

      For microscopy, only single representative cells are shown, rather than full fields containing multiple cells. This is particularly problematic for invadopodia analysis, as only a subset of cells forms these structures. The authors should explain how they ensured that image acquisition and quantification were randomized and unbiased. The graphs should also include the percentage of cells forming invadopodia, a standard metric in the field. Furthermore, some images include altered cells - for example, multinucleated cells - which do not accurately represent the general cell population.

      (5) Use of a single siRNA/shRNA per target.

      As noted earlier, using only one siRNA or shRNA carries the risk of off-target effects. For every experiment involving gene silencing (MET, RAB4, RAB14, RCP, MT1-MMP), at least two independent siRNAs/shRNAs should be used to validate the phenotype.

      (6) Insufficient controls for antibody specificity.

      The specificity of MET, p-MET, and MT1-MMP staining should be demonstrated in cells with effective gene silencing. This is an essential control for immunofluorescence assays.

      (7) Inadequate demonstration of MET recycling.

      MET recycling should be directly demonstrated using the same approaches applied to study MT1-MMP recycling. The current analysis - based solely on vesicles near the plasma membrane - is insufficient to conclude that MET is recycled back to the cell surface.

      (8) Insufficient evidence for MET-MT1-MMP interaction.

      The interaction between MET and MT1-MMP should be validated by immunoprecipitation of endogenous proteins, particularly since both are endogenously expressed in the studied cell lines.

      (9) Inconsistent use of cell lines and lack of justification.

      The authors use two TNBC cell lines: MDA-MB-231 and BT-549, without providing a rationale for this choice. Some assays are performed in MDA-MB-231 and shown in the main figures, whereas others use BT-549, creating unnecessary inconsistency. A clearer, more coherent strategy is needed (e.g., present all main findings in MDA-MB-231 and confirm key results in BT-549 in supplementary figures).

      (10) Inconsistency in invadopodia numbers under identical conditions.

      The number of invadopodia formed in Figure 1E is markedly lower than in Figure 1C, despite identical conditions. The authors should explain this discrepancy.

      (11) Questionable colocalization in some images.

      In some figures - for example, Figure 2G - the dots indicated by arrows do not convincingly show colocalization. The authors should clarify or reanalyze these data.

      (12) Abstract, Introduction, and Discussion require substantial rewriting.

      (a) The abstract should be accessible to a broader audience and should avoid using abbreviations and protein names without context.

      (b) The introduction should better describe the cellular processes and proteins investigated in this study.

      (c) The discussion currently reads more like an extended summary of results. It lacks deeper interpretation, comparison with existing literature, and consideration of the broader implications of the findings.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study identifies a mechanism responsible for the accumulation of the MET receptor in invadopodia, following stimulation of Triple-negative breast cancer (TNBC) cells with HGF. HGF-driven accumulation and activation of MET in invadopodia causes the degradation of the extracellular matrix, promoting cancer cell invasion, a process here investigated using gelatin-degradation and spheroid invasion assays.

      Mechanistically, HGF stimulates the recycling of MET from RAB14-positive endosomes to invadopodia, increasing their formation. At invadopodia, MET induces matrix degradation via direct binding with the metalloprotease MT1-MMP. The delivery of MET from the recycling compartment to invadopodia is mediated by RCP, which facilitates the colocalization of MET to RAB14 endosomes. In this compartment, HGF induces the recruitment of the motor protein KIF16B, promoting the tubulation of the RAB14-MET recycling endosomes to the cell surface. This pathway is critical for the HGF-driven invasive properties of TNBC cells, as it is impaired upon silencing of RAB14.

      Strengths:

      The study is well-organized and executed using state-of-the-art technology. The effects of MET recycling in the formation of functional invadopodia are carefully studied, taking advantage of mutant forms of the receptor that are degradation-resistant or endocytosis-defective.

      Data analyses are rigorous, and appropriate controls are used in most of the assays to assess the specificity of the scored effects. Overall, the quality of the research is high.

      The conclusions are well-supported by the results, and the data and methodology are of interest for a wide audience of cell biologists.

      We sincerely thank the reviewer for his/her positive feedback and for considering our study to be well executed and rigorous. The valuable suggestions and comments will certainly improve the understanding of the role of the RAB14-RCP-KIF16B axis in MET trafficking and breast cancer invasion. Below we have addressed each of the concerns and suggestions point to point raised by the reviewer.

      Weakness:

      The role of the MET receptor in invadopodia formation and cancer cell dissemination has been intensively studied in many settings, including triple-negative breast cancer cells. The novelty of the present study mostly consists of the detailed molecular description of the underlying mechanism based on HGF-driven MET recycling. The question of whether the identified pathway is specific for TNBC cells or represents a general mechanism of HGF-mediated invasion detectable in other cancer cells is not addressed or at least discussed

      We thank the reviewer for raising this point. We want to clarify that in TNBCs, the lack of the hormonal receptor progesterone receptor, estrogen receptor, and HER2 makes the overexpression of EGFR and MET crucial in terms of prognosis and treatment (PMID: 27655711, 25368674). Hence study of MET signalling and trafficking is more relevant for TNBCs compared to other cancer cells. We will add an explanation in the discussion section in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Khamari and colleagues investigate how HGF-MET signaling and the intracellular trafficking of the MET receptor tyrosine kinase influence invadopodia formation and invasion in triple-negative breast cancer (TNBC) cells. They show that HGF stimulation enhances both the number of invadopodia and their proteolytic activity. Mechanistically, the authors demonstrate that HGF-induced, RAB4- and RCP-RAB14-KIF16B-dependent recycling routes deliver MET to the cell surface specifically at sites where invadopodia form. Moreover, they report that MET physically interacts with MT1-MMP - a key transmembrane metalloproteinase required for invadopodia function- and that these two proteins co-traffic to invadopodia upon HGF stimulation.

      Although the HGF-MET axis has previously been implicated in invadopodia regulation (e.g., by Rajadurai et al., Journal of Cell Science 2012), studies directly linking ligand-induced MET trafficking with the spatial regulation of MT1-MMP localization and activity have been lacking.

      Overall, the manuscript addresses a relevant and timely topic and provides several novel insights. However, some sections require clearer and more concise writing (details below). In addition, the quality, reliability, and robustness of several data sets need to be improved.

      Strengths:

      A key strength of the study is the novel demonstration that HGF-mediated, RAB4- and RAB14-dependent recycling of MET delivers this receptor, together with MT1MMP, to invadopodia -highlighting a previously unrecognized mechanism, regulating the formation and proteolytic function of these invasive structures. Another strong point is the breadth of experimental approaches used and the substantial amount of supporting data. The authors also include an appropriate number of biological replicates and analyze a sufficiently large number of cells in their imaging experiments, as clearly described in the figure legends.

      We greatly appreciate the positive assessment we have from the reviewer, who also acknowledged the novelty and relevance of our study. Below, we have carefully addressed the comments/concerns raised regarding this study and will strengthen the reliability and robustness by revisiting the data, providing additional analyses where required, and clarifying methodological details.

      Weakness:

      (1) Inappropriate stimulation times for endocytosis and recycling assays. The experiments examining MET endocytosis and recycling following HGF stimulation appear to use inappropriate incubation times. After ligand binding, RTKs typically undergo endocytosis within minutes and reach maximal endosomal accumulation within 5-15 minutes. Although continuous stimulation allows repeated rounds of internalization, the temporal dynamics of MET trafficking should be examined across shorter time points, ideally up to 1 hour (e.g., 15, 30, and 60 minutes). The authors used 2-, 3-, or 6-hour HGF stimulation, which, in my opinion, is far too long to study ligandinduced RTK trafficking.

      We understand the reviewer’s concern regarding the HGF stimulation time point for endocytosis and recycling. We want to highlight that to study the recycling/surface delivery of MET in response to HGF, we performed TIRF microscopy-based imaging, where images were taken within 1h of HGF addition (Fig. 2I). Additionally, we will incorporate surface biotinylation to show the recycling of MET as suggested in comment -7. Moreover, we have observed the effect of HGF on gelatin degradation and invadopodia formation after 3h of HGF stimulation. We were curious to know where MET resides with prolonged ligand stimulation. Hence, to study the localization of MET to invadopodia or the endocytic markers, the cells were stimulated with HGF for 2-3 hours. 

      (2) Low efficiency of MET silencing in Figure S1I. The very low MET knockdown efficiency shown in Figure S1I raises concerns. Given the potential off-target effects of a single shRNA and the insufficient silencing level, it is difficult to conclude whether the reduction in invadopodia number in Figure 1F is genuinely MET-dependent. The authors later used siRNA-mediated silencing (Figure S5C), which was more effective. Why was this siRNA not used to generate the data in Figure 1F? Why did the authors rely on the inefficient shRNA C#3?

      We understand the concern raised by the reviewer. We want to emphasize that we have employed three different approaches to investigate the effect of MET silencing/inhibition on invadopodia formation. (i) A MET kinase inhibitor, PHA665752, which shows reduced invadopodia formation. (Fig. 1D, E). (ii) Silencing with shRNA: Since the level of silencing of MET with the shRNA was not sufficient, cells were stained with MET as a readout for MET silencing, and images of the cells with depleted MET expression were captured, and invadopodia numbers were quantified (Fig. 1F). (iii) Using the SMARTpool siRNA of MET, we have shown the MT1-MMP containing invadopodia in Fig S5E, which shows another evidence of the role of MET in invadopodia activity. An additional graph showing invadopodia formation derived from the siRNA-mediated MET silencing will be added to the revised figure.

      (3) Missing information on incubation times and inconsistencies in MET protein levels. The figure legends do not indicate how long the cells were incubated with HGF or the MET inhibitor PHA665752 before immunoblotting. This information is crucial, particularly because both HGF and PHA665752 cause a substantial decrease in the total MET protein level. Notably, such a decrease is absent in MDA-MB-231 cells treated with HGF in the presence of cycloheximide (Figure S2F). The authors should comment on these inconsistencies. Additionally, the MET bands in Figure S1J appear different from those in Figure S1C, and MET phosphorylation seems already high under basal conditions, with no further increase upon stimulation (Figure S1J). The authors should address these issues. 

      We apologise for the unintentional omission of experimental detailing about HGF or drug incubation time, which will be incorporated into the figure legend appropriately. The blot will be replaced with a more appropriate representative image.

      Regarding the decreased MET level in the drug-treated condition: literature suggests that the MET inhibitor PHA665752 also promotes MET degradation, corroborating our result shown in Fig. S1J (PMID: 15788682, 18327775). Further in Fig. S1J, the relative phosphorylation of MET when compared to the total MET level in the HGF-treated condition is higher. We will add the quantification in the revised manuscript to add more clarity.

      Next, in the fig. S1C, the rabbit anti-MET (CST, D1C2 XP) antibody has been used, which binds to a c-terminal motif of MET and identifies both the 170kDa as well as 140kDa protein representing the uncleaved and cleaved form of MET. In Fig. S1J, the mouse antiMET (CST, L6E7) antibody has been used, which binds to an N-terminal motif of MET and recognizes only the 140kDa protein.

      (4) Insufficient representation and randomization of microscopic data. For microscopy, only single representative cells are shown, rather than full fields containing multiple cells. This is particularly problematic for invadopodia analysis, as only a subset of cells forms these structures. The authors should explain how they ensured that image acquisition and quantification were randomized and unbiased. The graphs should also include the percentage of cells forming invadopodia, a standard metric in the field. Furthermore, some images include altered cells - for example, multinucleated cells - which do not accurately represent the general cell population.

      We thank the reviewer for raising this point. The single-cell images are shown for clarity and to visualize the subcellular features; however, the conclusions are made based on the quantitative analysis of multiple cells collected from multiple frames (at least 30 frames per condition). Here, we would like to highlight that the image acquisition has been done over random fields in a coverslip. In the graphs shown in Fig. 1B, 1C, 4F, S1F, S1H, S5J’ it can be seen that there are frames where there is no degradation or invadopodia formed, which has also been taken into account. For a better representation of the population of cellforming invadopodia, a graph showing the percentage of cells forming invadopodia will be added to the figure.

      (5) Use of a single siRNA/shRNA per target. As noted earlier, using only one siRNA or shRNA carries the risk of off-target effects. For every experiment involving gene silencing (MET, RAB4, RAB14, RCP, MT1-MMP), at least two independent siRNAs/shRNAs should be used to validate the phenotype.

      We would like to clarify that we are using SMARTPool siRNA, which contains 4 individual siRNAs for the target gene. Literature suggests that using a pool of siRNA has reduced offtarget effects compared to using single oligos for gene silencing (PMID: 14681580, 33584737, 24875475).

      While SMARTpool siRNA minimizes the off-target effect, it does not eliminate the possibility of it. To confirm that the observed phenotypes are specifically attributable to the genes investigated in this study, we will perform additional experiments using two independent siRNAs targeting RCP and RAB14. RAB4 is known to be associated with MET trafficking (PMID: 21664574, 30537020), and we have taken RAB4 as a positive control. Hence, we feel the suggested experiment is not required to support the conclusion made regarding RAB4.

      For MET, we have used shRNA and an inhibitor to show the effect of MET inhibition/perturbation in the invadopodia-associated activity, which validates the observations of siRNA-mediated gene silencing.

      We have shown the effect of MT1-MMP depletion on invadopodia formation using a CRISPR-based gene knock-out study, and another study from our group has shown the effect using siRNA (PMID: 31820782), which supports our MT1-MMP KO cell observation.    

      (6) Insufficient controls for antibody specificity. The specificity of MET, p-MET, and MT1-MMP staining should be demonstrated in cells with effective gene silencing. This is an essential control for immunofluorescence assays.

      MET immunofluorescence staining in the MET-depleted condition has been provided in Fig. 1F, and an immunoblot for the siRNA-mediated gene silencing has been provided in Fig. S5C. We will add the entire field of view to show the MET silencing in Fig. 1F.

      The inhibition of MET kinase activity using PHA665752 abolished the MET phosphorylation, as shown in Fig S1J. In line with Joffre et.al. Fig 3C, S2I shows increased Tyr 1234/1235 phosphorylation of M1250T MET mutant (PMID: 21642981). Further, studies have shown the specificity of the antibody by immunoblotting and immunofluorescence using MET inhibitors (PMID: 21973114, 41009793).

      For the MT1-MMP immunoblot showing significant depletion in MT1-MMP protein level by the SMARTpool siRNA has been provided in Fig. S5L. Further MT1-MMP silencing has been validated by immunofluorescence in the following studies. PMID: 22291036, 21571860, 20505159.

      (7) Inadequate demonstration of MET recycling. MET recycling should be directly demonstrated using the same approaches applied to study MT1-MMP recycling. The current analysis - based solely on vesicles near the plasma membrane - is insufficient to conclude that MET is recycled back to the cell surface.

      We appreciate the reviewer’s suggestion for an alternative approach to show MET trafficking. We aim to demonstrate MET trafficking using a biochemical approach, which will be included in the revised version. 

      (8) Insufficient evidence for MET-MT1-MMP interaction. The interaction between MET and MT1-MMP should be validated by immunoprecipitation of endogenous proteins, particularly since both are endogenously expressed in the studied cell lines.

      We thank the reviewer for pointing out the lack of MET-MT1-MMP interaction at the endogenous level. We have carried out the immunoprecipitation of endogenous MET to validate the interaction with MT1-MMP. However, we could not capture the interaction of these proteins at endogenous levels. We hypothesize that the interaction between MT1MMP and MET may be weak in nature, with a high K<sub>d</sub> value, and accordingly, it was difficult to precipitate the endogenous MT1-MMP by MET. The immunoblot will be added to the revised manuscript and discussed.

      (9) Inconsistent use of cell lines and lack of justification. The authors use two TNBC cell lines: MDA-MB-231 and BT-549, without providing a rationale for this choice. Some assays are performed in MDA-MB-231 and shown in the main figures, whereas others use BT-549, creating unnecessary inconsistency. A clearer, more coherent strategy is needed (e.g., present all main findings in MDA-MB-231 and confirm key results in BT549 in supplementary figures).

      MDA-MB-231 and BT-549 are two well-characterized TNBC cell lines, which are being used extensively to study breast cancer cell invasion. These two cell lines also show overexpression of MET, making them suitable model cell lines for our study. 

      MDA-MB-231 has less transfection efficiency compared to BT-549. Additionally, MET is also a difficult gene to transfect, making it hard to perform experiments in MDA-MB-231 with MET overexpression. Though most of the experiments have been performed in both cell lines, a few of the studies have been performed only in the BT-549 cells. Further, we have focused on displaying the different approaches taken to validate an observation in the main figure, which led to showing the data in distinct cell lines.

      Also, showing observations in different cell lines is a practice that has been followed by multiple authors in the past. (PMID:  39751400, 41079612, 25049275, 22366451)

      (10) Inconsistency in invadopodia numbers under identical conditions. The number of invadopodia formed in Figure 1E is markedly lower than in Figure 1C, despite identical conditions. The authors should explain this discrepancy.

      We sincerely thank the reviewer for pointing out the inconsistency in invadopodia numbers across 2 experiments. Fig. 1C has 2 conditions: UT and the HGF-treated condition. The Untreated condition has the serum-free media without any stimulation. Whereas we have added vehicle (DMSO) in Fig. 1D, E, since the drug is resuspended in DMSO. This difference in the treatment is likely to be responsible for the decreased numbers of invadopodia in Fig. 1E.

      (11) Questionable colocalization in some images. In some figures - for example, Figure 2G - the dots indicated by arrows do not convincingly show colocalization. The authors should clarify or reanalyze these data.

      We thank the reviewer for the valuable comment. The apparent lack of convincing colocalization is likely due to the relatively lower fluorescence intensity of MET at these structures. We will add the line intensity plots for the indicated puncta to show the intensity of both channels in the figure.

      To quantify the colocalization of two channels, we have used the automated image analysis software motiontracking (motiontracking.mpi-cbg.de), which has been detailed in the method section. Motiontracking considers only those objects to be colocalized if there is an overlapping area of more than 35% between the two channels. Lastly, the apparent colocalization is corrected for random colocalization, which is the random permutation of object colocalization. This makes object-based colocalization more reliable than intensitybased colocalization. 

      (12) Abstract, Introduction, and  Discussion require substantial rewriting. a) The abstract should be accessible to a broader audience and should avoid using abbreviations and protein names without context. b) The introduction should better describe the cellular processes and proteins investigated in this study. c) The discussion currently reads more like an extended summary of results. It lacks deeper interpretation, comparison with existing literature, and consideration of the broader implications of the findings.

      We thank the reviewer for this suggestion. We will modify the abstract, introduction, and discussion as per the suggestion.

    1. eLife Assessment

      This study presents valuable findings by demonstrating that specific GPCR subtypes induce distinct extracellular vesicle miRNA signatures, highlighting a potential novel mechanism for intercellular communication with implications for receptor pharmacology within the field. The evidence is solid, however, more experiments are needed to determine whether the distinct extracellular vesicle miRNA signatures result from GPCR-dependent miRNA expression or GPCR-dependent incorporation of miRNAs into extracellular vesicles.

    2. Reviewer #1 (Public review):

      Summary:

      GPCRs affect the EV-miRNA cargoes

      Strengths:

      Novel idea of GPCRs-mediated control of EV loading of miRNAs

      Weaknesses:

      Incomplete findings failed to connect and show evidence of any physiological parameters that are directly related to the observed changes. The mechanical detail is completely lacking.

      Comments on revisions:

      The revised version of the manuscript falls short of the required standard by lacking additional experiments. Some of the conditions for acceptability could have been met only through clarifying uncertainties via further experiments, which, unfortunately, have not been conducted.

    3. Reviewer #2 (Public review):

      Summary:

      This study examines how activating specific G protein-coupled receptors (GPCRs) affects the microRNA (miRNA) profiles within extracellular vesicles (EVs). The authors seek to identify whether different GPCRs produce unique EV miRNA signatures and what these signatures could indicate about downstream cellular processes and pathology processes.

      Methods:

      Used U2OS human osteosarcoma cells, which naturally express multiple GPCR types.

      Stimulated four distinct GPCRs (ADORA1, HRH1, FZD4, ACKR3) using selective agonists.

      Isolated EVs from culture media and characterized them via size exclusion chromatography, immunoblotting, and microscopy.

      Employed qPCR-based miRNA profiling and bioinformatics analyses (e.g., KEGG, PPI networks) to interpret expression changes.

      Key Findings:

      No significant change in EV quantity or size following GPCR activation.

      Each GPCR triggered a distinct EV miRNA expression profile.

      miRNAs differentially expressed post-stimulation were linked to pathways involved in cancer, insulin resistance, neurodegenerative diseases, and other physiological/pathological processes.

      miRNAs such as miR-550a-5p, miR-502-3p, miR-137, and miR-422a emerged as major regulators following specific receptor activation.

      Conclusions:

      The study offers evidence that GPCR activation can regulate intercellular communication through miRNAs encapsulated within extracellular vesicles (EVs). This finding paves the way for innovative drug-targeting strategies and enhances understanding of drug side effects that are mediated via GPCR-related EV signaling.

      Strengths:

      Innovative concept: The idea of linking GPCR signaling to EV miRNA content is novel and mechanistically important.

      Robust methodology: The use of multiple validation methods (biochemical, biophysical, and statistical) lends credibility to the findings.

      Relevance: GPCRs are major drug targets, and understanding off-target or systemic effects via EVs is highly valuable for pharmacology and medicine.

      Weaknesses:

      Sample Size & Scope: The analysis included only four GPCRs. Expanding to more receptor types or additional cell lines would enhance the study's applicability.

      Exploratory Nature: This study is primarily descriptive and computational. It lacks functional validation, such as assessing phenotypic effects in recipient cells, which is acknowledged as a future step.

      EV heterogeneity: The authors recognize that they did not distinguish EV subpopulations, potentially confounding the origin and function of miRNAs.

      Comments on revisions:

      All the comments have been taken into account. I wish the authors success in their future research.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors explore a novel concept: GPCR-mediated regulation of miRNA release via extracellular vesicles (EVs). They perform an EV miRNA cargo profiling approach to investigate how specific GPCR activations influence the selective secretion of particular miRNAs. Given that GPCRs are highly diverse and orchestrate multiple cellular pathways - either independently or collectively - to regulate gene expression and cellular functions under various conditions, it is logical to expect alterations in gene and miRNA expression within target cells.

      Strengths:

      The novel idea of GPCRs-mediated control of EV loading of miRNAs.

      Weaknesses:

      Incomplete findings failed to connect and show evidence of any physiological parameters that are directly related to the observed changes. The mechanical detail is lacking.

      We appreciate the reviewer's acknowledgment of the novelty of this study. We agree with the reviewer that further mechanistic insights would strengthen the manuscript. The mechanisms by which miRNA is sorted into EVs remain poorly understood. Various factors, including RNAbinding protein, sequence motifs, and cellular location, can influence this sorting process(Garcia-Martin et al., 2022; Liu & Halushka, 2025; Villarroya-Beltri et al., 2013; Yoon et al., 2015). Ago2, a key component of the RNA-induced silencing complexes, binds to miRNA and facilitates miRNA sorting. Ago2 has been found in the EVs and can be regulated by the cellular signaling pathway.  For instance, McKenzie et al. demonstrated that KRAS-dependent activation of MEK-ERK can phosphorylate Ago2 protein, thereby regulating the sorting of specific miRNAs into EVs(McKenzie et al., 2016). In the differentiated PC12 cells, Gαq activation leads to the formation of Ago2-associated granules, which selectively sequester unique transcripts(Jackson et al., 2022). Investigating GPCR, G protein, and GPCR signaling on Ago2 expression, location, and phosphorylation states could provide valuable insights into how GPCRs regulate specific miRNAs within EVs. We have expanded these potential mechanisms and future research in the discussion section (page 16-17).

      The manuscript falls short of providing a comprehensive understanding. Identifying changes in cellular and EV-associated miRNAs without elucidating their physiological significance or underlying regulatory mechanisms limits the study's impact. Without demonstrating whether these miRNA alterations have functional consequences, the findings alone are insufficient. The findings may be suitable for more specialized journals.

      Thank you for the feedback. We acknowledge that validating the target genes of the top candidate miRNAs is an important next step. In response to the reviewer's concerns, we have expanded the discussion of future research in the manuscript (page 19-20). Although this initial study is primarily descriptive, it establishes a novel conceptual link between GPCR signaling and EV-mediated communication.

      Furthermore, a critical analysis of the relationship between cellular miRNA levels and EV miRNA cargo is essential. Specifically, comparing the intracellular and EV-associated miRNA pools could reveal whether specific miRNAs are preferentially exported, a behavior that should be inversely related to their cellular abundance if export serves a beneficial function by reducing intracellular levels. This comparison is vital to strengthen the biological relevance of the findings and support the proposed regulatory mechanisms by GPCRs.

      We appreciate the valuable suggestions from the reviewer. EV miRNA and cell miRNAs may exhibit distinct profiles as miRNAs can be selectively sorted into or excluded from EVs(Pultar et al., 2024; Teng et al., 2017; Zubkova et al., 2021). Investigating the difference between cellular miRNA levels and EV miRNA cargo would provide insight into the mechanism of miRNA sorting and the functions of miRNAs in the recipient cells. The expression of the cellular miRNAs is a highly dynamic process. To accurately compare the miRNA expression levels, profiling of EV miRNA and cellular miRNA should be conducted simultaneously. However, as an exploratory study, we were unable to measure the cellular miRNAs without conducting the entire experiment again.

      Reviewer #2 (Public review):

      Summary:

      This study examines how activating specific G protein-coupled receptors (GPCRs) affects the microRNA (miRNA) profiles within extracellular vesicles (EVs). The authors seek to identify whether different GPCRs produce unique EV miRNA signatures and what these signatures could indicate about downstream cellular processes and pathological processes.

      Methods:

      (1) Used U2OS human osteosarcoma cells, which naturally express multiple GPCR types.

      (2) Stimulated four distinct GPCRs (ADORA1, HRH1, FZD4, ACKR3) using selective agonists.

      (3) Isolated EVs from culture media and characterized them via size exclusion chromatography, immunoblotting, and microscopy.

      (4) Employed qPCR-based miRNA profiling and bioinformatics analyses (e.g., KEGG, PPI networks) to interpret expression changes.

      Key Findings:

      (1) No significant change in EV quantity or size following GPCR activation.

      (2) Each GPCR triggered a distinct EV miRNA expression profile.

      (3) miRNAs differentially expressed post-stimulation were linked to pathways involved in cancer, insulin resistance, neurodegenerative diseases, and other physiological/pathological processes.

      (4) miRNAs such as miR-550a-5p, miR-502-3p, miR-137, and miR-422a emerged as major regulators following specific receptor activation.

      Conclusions:

      The study offers evidence that GPCR activation can regulate intercellular communication through miRNAs encapsulated within extracellular vesicles (EVs). This finding paves the way for innovative drug-targeting strategies and enhances understanding of drug side effects that are mediated via GPCR-related EV signaling.

      Strengths:

      (1) Innovative concept: The idea of linking GPCR signaling to EV miRNA content is novel and mechanistically important.

      (2) Robust methodology: The use of multiple validation methods (biochemical, biophysical, and statistical) lends credibility to the findings.

      (3) Relevance: GPCRs are major drug targets, and understanding off-target or systemic effects via EVs is highly valuable for pharmacology and medicine.

      Weaknesses:

      (1) Sample Size & Scope: The analysis included only four GPCRs. Expanding to more receptor types or additional cell lines would enhance the study's applicability.

      We are encouraged that the reviewer recognized the novelty, methodological rigor, and significance of our work. We recognize the limitations of our current model system and emphasize the need to test additional GPCR families and cell lines in the future studies, as detailed in the discussion section (Page 19, second paragraph).

      (2) Exploratory Nature: This study is primarily descriptive and computational. It lacks functional validation, such as assessing phenotypic effects in recipient cells, which is acknowledged as a future step.

      We appreciate the feedback. We recognize the importance of validating the function of the top candidate miRNAs in the recipient cells, and this will be included in future studies (page 19-20).  

      (3) EV heterogeneity: The authors recognize that they did not distinguish EV subpopulations, potentially confounding the origin and function of miRNAs.

      Thank you for the comment. EV isolation and purification are major challenges in EV research. Current isolation techniques are often ineffective at separating vesicles produced by different biogenetic pathways. Furthermore, the lack of specific markers to differentiate EV subtypes adds to this complexity. We recognize that the presence of various subpopulations can complicate the interpretation of EV cargos. In our study, we used a combined approach of ultrafiltration followed by size-exclusion chromatography to achieve a balance between EV purity and yield. We adhere to the MISEV (Minimal Information for Studies of Extracellular Vesicles 2023) guidelines by reporting detailed isolation methods, assessing both positive and negative protein markers, and characterizing EVs by electron microscopy to confirm vesicle structure, as well as nanoparticle tracking analysis to verify particle size distribution(Welsh et al., 2024). By following these guidelines, we can ensure the quality of our study and enhance the ability to compare our findings with other studies.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Suggestions for Future Research:

      (1) Functionally validate top candidate miRNAs in recipient cells.

      We acknowledge that validating the target genes of the top candidate miRNAs is a crucial next step. In response to the reviewer's concerns, we have included this in the discussion as future research in the manuscript (page 19-20).

      (2) Investigate other GPCR families and repeat in primary or disease-relevant cell lines.

      The inclusion of different GPCRs and cell lines is suggested as an area for further investigation in the discussion. (Page 19).

      (3) Apply similar approaches in in vivo models or patient samples to assess clinical relevance.

      In response to the reviewer's concerns, we have included this in the discussion as future research in the manuscript (page 19-20).

      References

      Garcia-Martin, R., Wang, G., Brandão, B. B., Zanotto, T. M., Shah, S., Kumar Patel, S., Schilling, B., & Kahn, C. R. (2022). MicroRNA sequence codes for small extracellular vesicle release and cellular retention. Nature, 601(7893), 446-451. https://doi.org/10.1038/s41586021-04234-3  

      Jackson, L., Rennie, M., Poussaint, A., & Scarlata, S. (2022). Activation of Gαq sequesters specific transcripts into Ago2 particles. Sci Rep, 12(1), 8758. https://doi.org/10.1038/s41598022-12737-w  

      Liu, X.-M., & Halushka, M. K. (2025). Beyond the Bubble: A Debate on microRNA Sorting Into Extracellular Vesicles. Laboratory Investigation, 105(2), 102206. https://doi.org/10.1016/j.labinv.2024.102206  

      McKenzie, A. J., Hoshino, D., Hong, N. H., Cha, D. J., Franklin, J. L., Coffey, R. J., Patton, J. G., & Weaver, A. M. (2016). KRAS-MEK Signaling Controls Ago2 Sorting into Exosomes. Cell  Rep, 15(5), 978-987. https://doi.org/10.1016/j.celrep.2016.03.085  

      Pultar, M., Oesterreicher, J., Hartmann, J., Weigl, M., Diendorfer, A., Schimek, K., Schädl, B., Heuser, T., Brandstetter, M., Grillari, J., Sykacek, P., Hackl, M., & Holnthoner, W. (2024).Analysis of extracellular vesicle microRNA profiles reveals distinct blood and lymphatic endothelial cell origins. J Extracell Biol, 3(1), e134. https://doi.org/10.1002/jex2.134  

      Teng, Y., Ren, Y., Hu, X., Mu, J., Samykutty, A., Zhuang, X., Deng, Z., Kumar, A., Zhang, L., Merchant, M. L., Yan, J., Miller, D. M., & Zhang, H.-G. (2017). MVP-mediated exosomal sorting of miR-193a promotes colon cancer progression. Nature Communications, 8(1), 14448. https://doi.org/10.1038/ncomms14448  

      Villarroya-Beltri, C., Gutiérrez-Vázquez, C., Sánchez-Cabo, F., Pérez-Hernández, D., Vázquez, J., Martin-Cofreces, N., Martinez-Herrera, D. J., Pascual-Montano, A., Mittelbrunn, M., & Sánchez-Madrid, F. (2013). Sumoylated hnRNPA2B1 controls the sorting of miRNAs into exosomes through binding to specific motifs. Nat Commun, 4, 2980. https://doi.org/10.1038/ncomms3980

      Welsh, J. A., Goberdhan, D. C. I., O'Driscoll, L., Buzas, E. I., Blenkiron, C., Bussolati, B., Cai, H., Di Vizio, D., Driedonks, T. A. P., Erdbrügger, U., Falcon-Perez, J. M., Fu, Q. L., Hill, A. F., Lenassi, M., Lim, S. K., Mahoney, M. G., Mohanty, S., Möller, A., Nieuwland, R., . . .Witwer, K. W. (2024). Minimal information for studies of extracellular vesicles (MISEV2023): From basic to advanced approaches. J Extracell Vesicles, 13(2), e12404. https://doi.org/10.1002/jev2.12404  

      Yoon, J. H., Jo, M. H., White, E. J., De, S., Hafner, M., Zucconi, B. E., Abdelmohsen, K., Martindale, J. L., Yang, X., Wood, W. H., 3rd, Shin, Y. M., Song, J. J., Tuschl, T., Becker, K. G., Wilson, G. M., Hohng, S., & Gorospe, M. (2015). AUF1 promotes let-7b loading on Argonaute 2. Genes Dev, 29(15), 1599-1604. https://doi.org/10.1101/gad.263749.115  

      Zubkova, E., Evtushenko, E., Beloglazova, I., Osmak, G., Koshkin, P., Moschenko, A., Menshikov, M., & Parfyonova, Y. (2021). Analysis of MicroRNA Profile Alterations in Extracellular Vesicles From Mesenchymal Stromal Cells Overexpressing Stem Cell Factor. Front Cell Dev Biol, 9, 754025. https://doi.org/10.3389/fcell.2021.754025

    1. eLife Assessment

      This valuable study presents a thorough analysis of protein abundance changes caused by amino acid substitutions, using structural context to improve predictive accuracy. By deriving substitution response matrices based on solvent accessibility, the authors demonstrate that simple structural features can predict abundance effects with accuracy comparable to complex methods such as free energy calculations. The strength of the evidence is convincing, supported by robust experimental design and comprehensive analyses.

    2. Reviewer #1 (Public review):

      Significance:

      While most MAVEs measure overall function (which is a complex integration of biochemical properties, including stability), VAMP-seq-type measurements more strongly isolate stability effects in a cellular context. This work seeks to create a simple model for predicting the response for a mutation on the "abundance" measurement of VAMP-seq.

      Public Review:

      Of course, there is always another layer of the onion, VAMP-seq measures contributions from isolated thermodynamic stability, stability conferred by binding partners (small molecule and protein), synthesis/degradation balance (especially important in "degron" motifs), etc. Here the authors' goal is to create simple models that can act as a baseline for two main reasons:

      (1) how to tell when adding more information would be helpful for a global model;

      (2) how to detect when a residue/mutation has an unusual profile indicative of an unbalanced contribution from one of the factors listed above.

      As such, the authors state that this manuscript is not intended to be a state-of-the-art method in variant effect prediction, but rather a direction towards considering static structural information for the VAMP-seq effects. At its core, the method is a fairly traditional asymmetric substitution matrix (I was surprised not to see a comparison to BLOSUM in the manuscript) - and shows that a subdivision by burial makes the model much more predictive. Despite only having 6 datasets, they show predictive power even when the matrices are based on a smaller number. Another success is rationalizing the VAMPseq results on relevant oligomeric states.

      Comments on revision:

      We have no further comments on this manscript.

    3. Reviewer #3 (Public review):

      "Effects of residue substitutions on the cellular abundance of proteins" by Schulze and Lindorff-Larsen revisits the classical concept of structure-aware protein substitution matrices through the scope of modern protein structure modelling approaches and comprehensive phenotypic readouts from multiplex assays of variant effects (MAVEs). The authors explore 6 unique protein MAVE datasets based on protein abundance through the lens of protein structural information (residue solvent accessibility, secondary structure type) to derive combinations of context-specific substitution matrices that predict variant impact on protein abundance. They are clear to outline that the aim of the study is not to produce a new best abundance predictor, but to showcase the degree of prediction afforded simply by utilizing structural information.

      Both the derived matrices and the underlying 'training' data are comprehensively evaluated. The authors convincingly demonstrate that taking structural solvent accessibility contexts into account leads to more accurate performance than either a structure-unaware matrix, secondary structure-based matrix, or matrices combining both solvent accessibility and secondary structure. The capacity for the approach to produce generalizable matrices is explored through training data combinations, highlighting factors such as the variable quality of the experimental MAVE data and the biochemical differences between the protein targets themselves, which can lead to limitations. Despite this, the authors demonstrate their simple matrix approach is generally on par with dedicated protein stability predictors in abundance effect evaluation, and even outperforms them in a niche of solvent accessible surface mutations, revealing their matrices provide orthogonal abundance-specific signal. More importantly, the authors further develop this concept to creatively show their matrices can be used to identify surface residues that have buried-like substitution profiles, which are shown to correspond to protein interface residues, post-translational modification sites, functional residues or putative degrons.

      The paper makes a strong and well-supported main point, demonstrating the widespread utility of the authors' approach, empowered through protein structural information and cutting edge MAVE datasets. This work creatively utilizes a simple concept to produce a highly interpretable tool for protein abundance prediction (and beyond), which is inspiring in the age of impenetrable machine learning models.