10,000 Matching Annotations

Jun 2026
www.biorxiv.org www.biorxiv.org

α/β-Hydrolase domain-containing 6 (ABHD6) accelerates the desensitization and deactivation of TARP γ-2-containing AMPA receptors

1
1. Public_Reviews 05 Jun 2026
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 Public Reviews:
 
 Reviewer #1 (Public Review):
 
 Summary:
 
 This research sheds light on the nuanced role of ABHD6 in the regulation of AMPARs, highlighting its interaction with TARP γ-2 as a critical factor in modulating receptor-gating kinetics. It is crucial to understand that while ABHD6 alone does not alter AMPAR kinetics, its presence alongside TARP γ-2 leads to accelerated deactivation and desensitization of AMPARs, impacting synaptic transmission dynamics.
 
 Strengths:
 
 Important findings in the research include:
 
 ABHD6 does not affect the gating kinetics of GluA1 and GluA2(Q) homomeric receptors independently.
 
 In the presence of TARP γ-2, ABHD6 accelerates deactivation and desensitization of these receptors, regardless of their splicing or editing isoforms.
 
 The effect is consistent for both homomeric GluA1 and GluA2(Q) receptors and heteromeric GluA1i/GluA2(R)i-G receptors.
 
 The recovery from desensitization of GluA1 with the flip splicing isoform is slowed by ABHD6 in the presence of TARP γ-2.
 
 We are grateful for the reviewer's positive comments. It is really exciting to have one’s comments like “This research sheds light on the nuanced role of ABHD6 in the regulation of AMPARs”.
 
 Weaknesses:
 
 However, the study focuses on specific receptor subunits and isoforms, which may not fully represent the diversity of AMPAR compositions found in vivo (e.g. though the authors have claimed that TARP γ-2 failed to increase GluA3-induced currents significantly, the effect on GluA4 or the explanation was missing). Further research is needed to explore the implications of these findings in more complex neuronal environments.
 
 Thank the reviewer for raising this point. To investigate whether ABHD6 is involved in the kinetic regulation of neurons, we recorded glutamate-induced currents at –70 mV using ABHD6 knockout neurons. We found that ABHD6 knockout neurons exhibited significantly slower deactivation and desensitization kinetics (Fig. 6, Table. EV7.1, EV7.2). Regarding the diversity of AMPAR subunit compositions, we obtained consistent results for GluA4, which is expressed at higher levels in the cerebellum and brainstem (Fig. 7, EV7, Table EV8.1, EV8.2). Specifically, we observed that ABHD6 accelerates the deactivation and desensitization of homomeric GluA4–TARP γ-2 complexes.
 
 Reviewer #2 (Public Review):
 
 Summary:
 
 Cong et al. investigated the regulatory effects of ABHD6 on AMPARs. The authors performed adequate electrophysiology recordings to show the exact pattern of this regulation and covered major critical points.
 
 Strengths:
 
 The authors have performed high-quality ephys recordings and examined all potential regulatory aspects of ABHD6 on AMPARs. This is important to understand the AMPAR functions.
 
 We greatly appreciate the reviewer’s positive comment on our manuscript and recognition of our quality ephys recordings.
 
 Weaknesses:
 
 (1) The authors discussed CNIH-2 extensively from line 92-110 in the introduction, however, they did not perform related experiments. I suggest they move this part to the discussion where they also discussed the roles of CNIH.
 
 We thank the reviewer for the suggestions. Accordingly, we have moved the discussion of CNIH‑2 to the Discussion section (lines 355–372) of the revised manuscript: “Other key modulators include cornichon family AMPA receptor auxiliary proteins (CNIH-2/3) and GSG1L, which generally slow receptor kinetics in heterologous expression systems (Kato et al., 2010; Schwenk et al., 2012), although their effects in neurons can be context-dependent (Gu et al., 2016; Mao et al., 2017). Additional diversity arises from synapse-enriched proteins such as SynDIG4 and CKAMP44, which exert complex and sometimes opposing effects on different kinetic parameters (Matt et al., 2018; Khodosevich et al., 2014). This diversity comes from the known co-assembly of AMPA receptor subunits (the pore-forming GluA subunit) with three classes of auxiliary proteins—collectively comprising 21 components, most of which are secretory or transmembrane proteins. Importantly, multiple auxiliary subunits (e.g., TARP γ-8 and CNIH-2) can co-assemble within a single AMPAR complex, and their combined presence modulates functional outcomes in ways not predicted by individual subunits alone, underscoring a combinatorial regulatory logic (Shi et al., 2010; Yu et al., 2021; Herring et al., 2013). Given that native synaptic AMPARs predominantly exist as GluA2-containing hetero-oligomers (e.g., GluA1/2, GluA2/3), although homo-oligomers have also structurally validated, understanding how novel auxiliary proteins such as ABHD6 integrate into this complex framework becomes paramount (Lu et al., 2009; Wenthold et al., 1996; Zhao et al., 2016; Malinow and Malenka, 2002).”
 
 (2) The authors need to report the "n" for all the experiments they have presented in this manuscript. How many cells were recorded in each condition? How many batches? This information has to be in all of the figure legends, but it is missing except Fig. 4.
 
 We appreciate the reviewer for pointing out these weaknesses, we added the cell number and corresponding batches in every figure and table in the revised manuscript.
 
 (3) One question is what the physiological meanings of this regulatory effect are. The authors may consider adding some discussions.
 
 We thank the reviewer for the suggestions. In the revised manuscript, we have included a discussion on the physiological implications of this regulatory effect in lines 386–412, as follows: “Although there is no direct evidence indicating that ABHD6 and TARP γ-2 bind to each other, both are known to associate with AMPA receptors, suggesting the possibility of indirect or regulatory interactions. For example, their relationship could be transient, condition-dependent, or mediated through mechanisms such as conformational changes or steric hindrance (Gill et al., 2011b; Sumioka, 2013; Wei et al., 2017). Studies have reported that scaffold proteins participate in the binding, anchoring, maintenance, and removal of AMPA receptors, either through direct interaction with receptors or through indirect binding via auxiliary subunits (Danielson et al., 2014). Additionally, we extended the same experimental approach to AMPA receptors containing the GluA1 flip subtype together with TARP γ-8. Our results demonstrate that this ABHD6-dependent regulatory mechanism also applies to other TARP family members, including TARP γ-8 (Figure 7, EV7, Table. EV9.1, EV9.2). Our findings indicate that ABHD6 plays a critical negative regulatory role on AMPA receptor function. It suppresses synaptic current amplitude and accelerates the deactivation and desensitization kinetics in a TARP γ-2-dependent manner. By shortening synaptic response duration and reducing total charge transfer, ABHD6 may thereby restrain neuronal excitability and narrow the temporal window for synaptic integration. Loss of ABHD6 function—as observed in our knockout neurons, which exhibit slowed kinetics—could promote excitatory hyperactivity. Thus, as a key “molecular brake” on synaptic excitability, dysregulation of ABHD6 may directly contribute to the pathogenesis of neurological disorders. Insufficient braking function may lead to excessive synaptic transmission, strongly correlating with hyperexcitability conditions such as epilepsy. Conversely, overly potent braking might result in synaptic dysfunction, potentially contributing to early synaptic impairment in cognitive disorders like Alzheimer’s disease. Overall, our research highlights ABHD6 as a promising target for novel therapeutic strategies in neurological disorders and provides a solid theoretical foundation for further investigation in this field.”
 
 (4) About statistics. The authors need to add more details and make sure their statistics sound. For example, they also need to check the equality of variances. In their Table EVs, where the P values are reported, the authors need to report which statistics they have used, one-way ANOVA, K-W test, or others, and the exact post-hoc test type for each comparison. For one-way ANOVA, report the F values simultaneously with the P values in all figure legends.
 
 We appreciate your thoughtful advice. Accordingly, we have added the description of statistical strategy in the revised manuscript in line 530-536: “Data were first assessed for normality using the D’Agostino–Pearson test (n＜50) or the Kolmogorov-Smirnov test (n＞50), and for equality of variances using the Brown-Forsythe ANOVA test. Depending on the outcome of these tests, data were analyzed by parametric (one-way ANOVA) or non-parametric methods (Kruskal-Wallis test) followed by Tukey's Honest Significant Difference (HSD) test as a post hoc analysis to determine specific differences among groups. Correlation was evaluated with Pearson correlation analysis. Values of P < 0.05 were considered statistically significant.”
 
 (5) Fig. 3J, the authors need to correct the label of the Y axis. It is shifted
 
 Thank the reviewer for raising this point, we have corrected the label of the Y axis of Fig. 3J in the revised manuscript.
 
 Recommendations for the authors:
 
 Reviewer #1 (Recommendations For The Authors):
 
 The manuscript is well-structured and the findings are presented clearly. While the study addresses multiple isoforms, a more detailed explanation of the isoform-specific effects observed, e.g. the unique behavior of the GluA2(Q)i-G isoform in terms of deactivation, would be beneficial.
 
 We appreciate the reviewer for pointing out these weaknesses. In response, we have added a discussion in the revised manuscript in line 330-345 that addresses RNA editing as a key regulatory mechanism of AMPAR function beyond subunit composition and splicing variants: “Beyond subunit composition and splicing variants, the function of AMPARs is also finely regulated by RNA editing. Q/R editing enables the conversion of neutral to positively charged residues in the ion-selective filter of the channel, causing impermeability to divalent cations such as Ca2+. This not only alters channel conductance and current but also contributes to neuronal dysfunction and excitotoxicity (Kawahara et al., 2004; Kwak and Kawahara, 2004). R/G editing markedly influences receptor desensitization and recovery kinetics, and may modulate interactions with auxiliary proteins, thereby playing a critical role in synaptic plasticity and development (Stern-Bach et al., 1998; Coombs et al., 2012; Wright and Vissel, 2012). The conversion from R to G weakens inter-dimer interactions within the binding domains, leading to structurally more flexible receptors (Lomeli et al., 1994). Furthermore, R/G editing exhibits strong developmental regulation and varies across brain regions and cell types (Geiger et al., 1995). Therefore, in this study, we systematically examined the effect of ABHD6 on different flip/flop splice variants and R/G editing subtypes. Our results demonstrate that ABHD6 also suppresses currents in HEK 293T cells expressing flop splice variants and R/G-edited receptors.”
 
 The authors should consider discussing potential mechanisms underlying the interaction between ABHD6 and TARP γ-2 in greater depth. This could include hypotheses on how ABHD6 might be influencing TARP γ-2's modulation of AMPARs if applicable (though the authors have mentioned either the potential binding domain of ABHD6 to AMPARs or TARP γ-2 to AMPARs, the proposed direct interaction between ABHD6 and TARP γ-2 is unknown). It's also unclear whether the effect of ABHD6 is specific to TARP γ-2 or is general to other TARP family members.
 
 We appreciate your suggestion and use affinity chromatography to examine the interaction between ABHD6 and TARP γ-2. Our investigation revealed no direct evidence of a physical binding between the two proteins. Accordingly, we have supplemented the discussion in the revised manuscript (lines 386–393) as follows: “Although there is no direct evidence indicating that ABHD6 and TARP γ-2 bind to each other, both are known to associate with AMPA receptors, suggesting the possibility of indirect or regulatory interactions. For example, their relationship could be transient, condition-dependent, or mediated through mechanisms such as conformational changes or steric hindrance (Gill et al., 2011b; Sumioka, 2013; Wei et al., 2017). Studies have reported that scaffold proteins participate in the binding, anchoring, maintenance, and removal of AMPA receptors, either through direct interaction with receptors or through indirect binding via auxiliary subunits (Danielson et al., 2014).”
 
 Expanding the discussion to include the potential physiological and pathophysiological implications of ABHD6's modulatory effects on AMPAR kinetics would provide a broader context for the findings.
 
 We thank the reviewer for the suggestions, in the revised manuscript we discussed the physiological meanings of this regulatory effect in line 386-412: “Although there is no direct evidence indicating that ABHD6 and TARP γ-2 bind to each other, both are known to associate with AMPA receptors, suggesting the possibility of indirect or regulatory interactions. For example, their relationship could be transient, condition-dependent, or mediated through mechanisms such as conformational changes or steric hindrance (Gill et al., 2011b; Sumioka, 2013; Wei et al., 2017). Studies have reported that scaffold proteins participate in the binding, anchoring, maintenance, and removal of AMPA receptors, either through direct interaction with receptors or through indirect binding via auxiliary subunits (Danielson et al., 2014). Additionally, we extended the same experimental approach to AMPA receptors containing the GluA1 flip subtype together with TARP γ-8. Our results demonstrate that this ABHD6-dependent regulatory mechanism also applies to other TARP family members, including TARP γ-8 (Figure 7, EV7, Table. EV9.1, EV9.2). Our findings indicate that ABHD6 plays a critical negative regulatory role on AMPA receptor function. It suppresses synaptic current amplitude and accelerates the deactivation and desensitization kinetics in a TARP γ-2-dependent manner. By shortening synaptic response duration and reducing total charge transfer, ABHD6 may thereby restrain neuronal excitability and narrow the temporal window for synaptic integration. Loss of ABHD6 function—as observed in our knockout neurons, which exhibit slowed kinetics—could promote excitatory hyperactivity. Thus, as a key “molecular brake” on synaptic excitability, dysregulation of ABHD6 may directly contribute to the pathogenesis of neurological disorders. Insufficient braking function may lead to excessive synaptic transmission, strongly correlating with hyperexcitability conditions such as epilepsy. Conversely, overly potent braking might result in synaptic dysfunction, potentially contributing to early synaptic impairment in cognitive disorders like Alzheimer’s disease. Overall, our research highlights ABHD6 as a promising target for novel therapeutic strategies in neurological disorders and provides a solid theoretical foundation for further investigation in this field.”.
 
 Some typos:
 
 p7L144, might miss a word 'of' after 'properties';
 
 Thanks for your careful advice, we have corrected “the channel properties TARP γ-2-containing AMPA receptors” to “the channel properties of TARP γ-2-containing AMPA receptors” in the revised manuscript.
 
 p9L178, remove '.';
 
 Thanks for your careful advice, we have corrected the subheading “ABHD6 accelerated the deactivation of homomeric AMPAR-TARP γ-2 complexes.” to “ABHD6 accelerated the deactivation of homomeric AMPAR-TARP γ-2 complexes” in the revised manuscript.
 
 p9L195, might be 'deact' instead of 'deac';
 
 Thanks for your careful advice, we have corrected “τw, deac” to “τ w, deact " in the revised manuscript.
 
 p12L276, might be a missing 'ABDH6' after 'whether'.
 
 Thanks for your advice, we have added “ABHD6” after “whether” in the revised manuscript.
 
 Reviewer #2 (Recommendations For The Authors):
 
 (1) Line, 366, grammar mistake. The author used the expression "In this study, we systematically studies", which should be “study" instead of :”studies"
 
 Thanks for your advice, we have corrected “studies” to “study” in the revised manuscript.
 
 (2) Line 370, the author used the expression "However, previous studies also found poorly expressed but significant population of GluA1 homomeric receptors in the hippocampus". It looks like "poorly expressed" is somewhat contradictory to "significant". I suggest the authors revise this sentence.
 
 Thanks for your advice, we have deleted the statement in the revised manuscript.
 
 (3) Line 407-409. The authors stated, "The flip and flop isoforms were cloned into an IRES-GFP expression vector using polymerase chain reaction (PCR). ...editing variants were generated using PCR". It is impossible to use PCR only to finish all cloning, especially with IRES-GFP. This must be done via restriction enzyme, or Gibson assembly, or another method. The author probably PCRed the isoforms and then put them into the vectors using other methods. The authors need to revise their statement and make it complete and clear.
 
 We thank the reviewer for their suggestion. In response, we have added a description of the expression vector construction to the revised manuscript in line 431-437: “The flip and flop isoforms were cloned into an IRES-GFP expression vector using polymerase chain reaction (PCR). Q/R and R/G editing variants were generated by PCR-based cloning and FastCloning. GluA1 and TARP γ-2 were subcloned using EcoRI and SalI sites (Milstein et al., 2007), GluA2 and GluA3 were inserted with XhoI and SalI, and GluA4 was inserted with EcoRI and BamHI. All constructs were verified by restriction mapping and sequencing of PCR-amplified regions.”
 
 (4) It would help if the authors could show some WB blots or PCR results or other evidence that their transfection was successful, in particular with these many plasmid combinations.
 
 We thank the reviewer for raising this point. In response, we have included additional experiments in the revised manuscript in line 138-142: “Immunofluorescence assays and Western blot analysis were performed on cells co-transfected with GluA1, TARP γ-2, and ABHD6. These experiments were conducted to verify co-transfection efficiency and corresponding protein expression. Immunofluorescence results confirmed a high degree of co‑localization among GluA1, TARP γ-2, and ABHD6 (Fig. EV1).”
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.06.20.599978v2
www.biorxiv.org www.biorxiv.org

PIK3CA-related overgrowth spectrum (PROS) zebrafish models reveal pan-lineage developmental dysregulation

5
1. Public_Reviews 05 Jun 2026
 
 in eLife
 
 eLife Assessment
 
 This is an important study that establishes a zebrafish model of PIK3CA-related overgrowth syndrome. The imaging characterization of the mesodermal, particularly vascular, lesions of the model is compelling. The scRNA-Seq analysis is convincing, revealing key perturbations in the PIK3CA-mutation model, although deeper investigation of the exact mechanism leading to the lesions, as well as validation at different time points, could further strengthen the findings. This work will be of interest to medical biologists working on PROS, and potentially to a broader audience interested in non-cell-autonomous signaling of PIK3CA and its implications in other diseases.
 
 Summary
2. Public_Reviews 05 Jun 2026
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 Brunsdon et al. present a zebrafish model of mosaic PIK3CA activation to investigate mechanisms underlying PIK3CA-related overgrowth spectrum (PROS), with a particular focus on non-cell-autonomous mechanisms of tissue overgrowth. The study is timely and addresses an important gap in the understanding of how mosaic activation of PI3K signaling leads to tissue-specific developmental abnormalities.
 
 Using a Tol2-based mosaic expression system combined with single-cell transcriptomics, the authors provide evidence suggesting that mutant PIK3CA-expressing cells influence surrounding wild-type tissues through indirect signaling mechanisms, contributing to vascular malformations and tissue overgrowth.
 
 Overall, the work presents an interesting and potentially impactful model for studying mosaic PIK3CA-driven overgrowth and non-cell-autonomous signaling mechanisms. However, several aspects require clarification, additional controls, and improved presentation to strengthen the mechanistic conclusions and overall impact of the study.
 
 Strengths:
 
 This study addresses an important and timely question by investigating the mechanisms underlying mosaic PIK3CA activation in the context of PROS, a condition for which developmental mechanisms remain poorly understood. The use of a mosaic zebrafish model is particularly appropriate, as it closely reflects the mosaic nature of PIK3CA mutations observed in patients and allows the investigation of non-cell-autonomous effects.
 
 Another major strength of the study is the integration of single-cell transcriptomics, which provides valuable insight into potential signaling pathways involved in indirect tissue overgrowth and offers a rich dataset for hypothesis generation. The authors also propose an interesting conceptual framework in which PI3K-activated cells influence surrounding tissues through paracrine signaling, which could have broader implications beyond PROS and contribute to understanding mosaic developmental disorders more generally.
 
 Finally, the work has potential translational relevance, as identifying mechanisms driving mosaic PI3K activation and non-cell-autonomous signaling could inform future therapeutic strategies for PROS and related conditions.
 
 Weaknesses:
 
 Despite these strengths, several aspects of the study require clarification and additional experimentation.
 
 Major comments:
 
 (1) The Tol2-based system results in mosaic overexpression of mutant PIK3CA in the presence of endogenous wild-type PIK3CA, making it difficult to determine how co-expression of WT and mutant proteins influences the observed phenotypes. While mosaic expression is relevant to PROS, a complementary approach in which endogenous PIK3CA is knocked out prior to introducing mutant variants would allow clearer interpretation of mutant-specific effects.
 
 (2) The authors do not clearly describe the validation of editing or integration efficiency. It would be important for the authors to clarify whether sequencing was performed to confirm integration, to quantify the proportion of mosaic expression, and to measure transgene expression levels. These controls would strengthen confidence in the model and interpretation of the results.
 
 (3) The manuscript would benefit from rescue experiments to strengthen causal conclusions. It remains unclear whether the phenotypes induced by PIK3CA PROS variants can be rescued, either through expression of wild-type PIK3CA, pharmacological inhibition of PI3K signaling, or assessment of developmental reversibility. Such experiments would strengthen the link between PI3K activation and the observed phenotypes.
 
 (4) The authors propose candidate signaling molecules mediating non-cell-autonomous effects downstream of PI3K hyperactivation; however, these conclusions remain speculative, as no functional validation is provided. Testing selected candidate mediators identified in the RNA-seq dataset would significantly strengthen the mechanistic conclusions.
 
 Review 1
3. Public_Reviews 05 Jun 2026
 
 in eLife
 
 Reviewer #2 (Public review):
 
 In this manuscript, Burnsdon et al. aim to study PIK3CA-related overgrowth spectrum (PROS) by establishing a mosaic zebrafish model with overexpression of pik3ca carrying hotspot mutations, coupled with an mScarlet+ reporter. Using fluorescence microscopy, the authors demonstrated that overexpression of pik3ca with a number of hotspot mutations led to mesodermal and particularly vascular malformations in the zebrafish model. Interestingly, they found a paucity of mScarlet+ mutant cells in the vascular lesions, consistent with the finding of low PIK3CA mutation burden in PROS tissue. Such data suggest a non-cell-autonomous effect of PIK3CA mutation. Following this logic, the authors performed single-cell RNA-Sequencing on zebrafish overexpressing WT pik3ca and mutant pik3ca at 19 hpf, and demonstrated widespread transcriptomic perturbations across multiple lineages, including lineage frequencies, key cell pathways, and cell-cell interactions. Importantly, they demonstrate that mScarlet+ cells carrying mutant pik3ca cluster separately from other cell types, do not demonstrate clear lineage identity, and have a general downregulation in signaling components.
 
 Overall, the conclusions in the manuscript are well-supported by the presented data. The imaging studies are particularly convincing. The transcriptomic analysis generated a list of potential pathways to further investigate and potentially target with future therapeutic interventions. Importantly, this study provides a valuable in vivo model of PROS that: 1) recapitulates key features of PROS (e.g., multiple mesodermal defects, paucity of mutation burden in lesions suggesting non-cell-autonomous interactions); 2) is scalable; and 3) offers direct visualization of lesion development, compatible with time-course live imaging. This model will be valuable to further understand PROS and potentially study other diseases where the PIK3CA pathway is altered (e.g., certain cancers).
 
 The following are not necessarily weaknesses of the data, but rather suggestions where the manuscript could be further strengthened:
 
 (1) The model recapitulates the variability of mesodermal lesions in PROS. It would be valuable to utilize this model to further study factors that are associated with the development of more severe lesions (e.g., by comparing samples with more severe lesions to those unaffected despite carrying the mutations, Figure 1F).
 
 (2) ScRNA-seq analysis could be enriched with a comparison between cells overexpressing mutant pik3ca vs. those overexpressing WT pik3ca.
 
 (3) In the scRNA-Seq analysis, it is curious that the C0 cluster, enriched with mScarlet+ cells, is found to have downregulated signaling interactions (Fig. 5C), yet exerts a widespread non-cell-autonomous effect. Meanwhile, there is also a noticeable loss of certain lineages (e.g., notochord, Figure 4E) and related cell-cell interactions (e.g., notochord-related interaction, Figure 5A). A deeper exploration of the basis of the non-cell-autonomous effect would be valuable.
 
 (4) The scRNA-Seq analysis was performed at one time point (19 hpf). Additional analysis (not necessarily by scRNA-Seq) at other time points to study whether findings at 19 hpf are persistent throughout development or undergo dynamic changes (e.g., cell fate/state of mSc+ mutant cells) would be helpful.
 
 (5) The scRNA-Seq analysis provides a valuable list of perturbed interactions that could be targeted by future therapeutic approaches. Validation of the scRNA-Seq findings with protein-level analysis, and studying the effect of targeting some of the pathways on the disease phenotype, would offer valuable data for the community.
 
 Review 2
4. Public_Reviews 05 Jun 2026
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 The study "PIK3CA-related overgrowth spectrum (PROS) zebrafish models reveal pan-lineage developmental dysregulation" presents important findings that extend significantly beyond a single subfield, bridging developmental biology, vascular medicine, and cancer-related PI3K signalling. By developing mosaic zebrafish models of PROS and combining live imaging with single-cell transcriptomics, the authors provide compelling evidence for a non-cell-autonomous mechanism of tissue overgrowth, a conceptual shift with meaningful therapeutic implications.
 
 Strengths:
 
 The evidence is overall convincing, with methodology appropriate and well-validated relative to the current state of the art; the integration of multiple approaches (in vivo modelling, scRNA-seq, ligand-receptor inference) strengthens the central claims. However, some aspects of the proposed non-cell-autonomous signalling mechanisms remain partly correlative, and direct functional validation of the rewired ligand-receptor interactions would further consolidate the conclusions.
 
 Weaknesses:
 
 The transgenic overexpression approach chosen by the authors represents a well-established and effective strategy for generating mosaic models in zebrafish. However, this approach introduces notable limitations: the lack of control over transgene dosage and unknown integration sites may generate non-physiological effects, potentially confounding the interpretation of key findings.
 
 The authors are certainly aware that alternative approaches (though technically more demanding) could be considered in future studies to further strengthen the model. For instance, a CRISPR/Cas9-mediated knock-in of the pik3ca-PROS allele at the endogenous locus (retaining upstream native regulatory elements with only a minimal promoter in the construct, co-expressed with a fluorescent reporter via P2A) could allow even more physiological, lineage-restricted expression while enabling direct visualisation of mutant cells. Mesodermal specificity could potentially be further refined by driving mosaic Cas9 expression under a pan-mesodermal tbx promoter, restricting editing to the relevant lineage while simultaneously marking mutant cells fluorescently, thus even more closely mimicking the post-zygotic mutational events characteristic of PROS. As a complementary strategy, blastula transplantation experiments using pik3ca-PROS donor cells (ideally co-expressing a distinct fluorescent marker such as mCherry) into fli1:GFP transgenic hosts could provide a powerful and technically consolidated approach to directly visualise and quantify non-cell-autonomous effects on host vasculature, with precise control over mutant cell burden. This combinatorial framework, separating donor mutant cells from host tissue in a two-colour imaging setup, could be particularly compelling for validating the ligand-receptor rewiring predicted by single-cell transcriptomics in future investigations.
 
 These reflections are offered in the spirit of prospective methodological development and do not diminish the value of the current work, which opens a valuable new avenue for therapeutic investigation, suggesting that targeting indirect overgrowth-propagating signals, alongside PI3K inhibition, deserves serious consideration.
 
 Review 3
5. Public_Reviews 05 Jun 2026
 
 in eLife
 
 Author response:
 
 eLife Assessment
 
 This is an important study that establishes a zebrafish model of PIK3CA-related overgrowth syndrome. The imaging characterization of the mesodermal, particularly vascular, lesions of the model is compelling. The scRNA-Seq analysis is convincing, revealing key perturbations in the PIK3CA-mutation model, although deeper investigation of the exact mechanism leading to the lesions, as well as validation at different time points, could further strengthen the findings. This work will be of interest to medical biologists working on PROS, and potentially to a broader audience interested in non-cell-autonomous signaling of PIK3CA and its implications in other diseases.
 
 We are delighted that the Editors and Reviewers consider the work of value and that it is interesting to a broad audience. We also appreciate and take on board the areas that the reviewers identify for improvement, and their suggestions on how this could be achieved.
 
 There are two major pieces of work suggested by the reviewers which we plan to carry out for this manuscript. The first of these is an additional scRNA-seq experiment at a later developmental stage when vascular malformations are established. Through comparison between pik3caPROS, pik3caWT and no-pik3ca injected controls, this would help answer if the global lineage and transcriptional dysregulation observed at 19 hpf persists over time, and if the largely inert ‘C0’ cluster of PROS mScarlet+ cells changes during development (Reviewer 2 comment 3).
 
 Secondly, we are already optimising rescue experiments with the specific Pik3ca inhibitor alpelisib, which is currently used as a therapy for PROS. Some troubleshooting has been required for the best delivery method and concentration for this to rescue vascular malformations in embryos, and to cause measurable decreases in PI3K signalling at the protein level through Akt and S6 pathways.
 
 Public Reviews:
 
 Reviewer #1 (Public review):
 
 Summary:
 
 Brunsdon et al. present a zebrafish model of mosaic PIK3CA activation to investigate mechanisms underlying PIK3CA-related overgrowth spectrum (PROS), with a particular focus on non-cell-autonomous mechanisms of tissue overgrowth. The study is timely and addresses an important gap in the understanding of how mosaic activation of PI3K signaling leads to tissue-specific developmental abnormalities.
 
 Using a Tol2-based mosaic expression system combined with single-cell transcriptomics, the authors provide evidence suggesting that mutant PIK3CA-expressing cells influence surrounding wild-type tissues through indirect signaling mechanisms, contributing to vascular malformations and tissue overgrowth.
 
 Overall, the work presents an interesting and potentially impactful model for studying mosaic PIK3CA-driven overgrowth and non-cell-autonomous signaling mechanisms. However, several aspects require clarification, additional controls, and improved presentation to strengthen the mechanistic conclusions and overall impact of the study.
 
 We thank Reviewer 1 for their support of our work, and constructive and helpful comments.
 
 Strengths:
 
 This study addresses an important and timely question by investigating the mechanisms underlying mosaic PIK3CA activation in the context of PROS, a condition for which developmental mechanisms remain poorly understood. The use of a mosaic zebrafish model is particularly appropriate, as it closely reflects the mosaic nature of PIK3CA mutations observed in patients and allows the investigation of non-cell-autonomous effects.
 
 Another major strength of the study is the integration of single-cell transcriptomics, which provides valuable insight into potential signaling pathways involved in indirect tissue overgrowth and offers a rich dataset for hypothesis generation. The authors also propose an interesting conceptual framework in which PI3K-activated cells influence surrounding tissues through paracrine signaling, which could have broader implications beyond PROS and contribute to understanding mosaic developmental disorders more generally.
 
 Finally, the work has potential translational relevance, as identifying mechanisms driving mosaic PI3K activation and non-cell-autonomous signaling could inform future therapeutic strategies for PROS and related conditions.
 
 Weaknesses:
 
 Despite these strengths, several aspects of the study require clarification and additional experimentation.
 
 Major comments:
 
 (1) The Tol2-based system results in mosaic overexpression of mutant PIK3CA in the presence of endogenous wild-type PIK3CA, making it difficult to determine how co-expression of WT and mutant proteins influences the observed phenotypes. While mosaic expression is relevant to PROS, a complementary approach in which endogenous PIK3CA is knocked out prior to introducing mutant variants would allow clearer interpretation of mutant-specific effects.
 
 PROS/CLOVES patients co-express endogenous wild-type and mutant PIK3CA in affected cells, which in turn constitute only a small proportion of cells in affected tissues (Madsen et al. 2018). As our intent was strictly to model human PROS/CLOVES (an aim informed by support from and close collaboration with the CLOVES Syndrome Community, a key patient advocacy group), we designed our model to reflect this as closely as possible. It is not clear to us what translational end would be served by expressing mutants in a null background, interesting though this may be. Given our transgenic strategy, we did experiment with overexpressing wildtype pik3ca as a control for some experiments to test whether overexpression of pik3ca itself drives overgrowth phenotypes, without the presence of hotspot PROS mutations (Figure 3D, Supplementary Figure 1A). We found that ubiquitous or mesodermal overexpression of pik3caWT did not cause vascular malformations or cause the ectopic fli1:eGFP endothelial cell phenotype observed when overexpressing pik3caPROS variants. While not precisely addressing the reviewer’s comment, this adds to evidence that increased expression of wildtype pik3ca does not confound the observed gain of function phenotype in the PROS model.
 
 (2) The authors do not clearly describe the validation of editing or integration efficiency. It would be important for the authors to clarify whether sequencing was performed to confirm integration, to quantify the proportion of mosaic expression, and to measure transgene expression levels. These controls would strengthen confidence in the model and interpretation of the results.
 
 We used secondary transgenesis markers, such as the cardiac reporter cmlc2:GFP, as a visual readout of integration efficiency and confirmation of integration – for example, embryos with >50% of GFP+ heart cells indicates that Tol2 transgenesis has occurred efficiently and so these would be included in an experiment, whereas the presence of only 1 or 2 green cardiac cells would suggest the levels of transgene in the embryo would be negligible and so this would be excluded from the experiment. Independently of this reporter, we showed an upregulation of pik3ca transcript in PROS mosaics compared to control by scRNA-seq (Figure 4D, Supplementary Figure 4A) confirming the transgene produces a measurable upregulation of pik3ca.
 
 We agree that it would be optimal to quantify the transgene expression and copy number for each individual embryo. However, for experiments where phenotypes are scored, hundreds of embryos are injected each time. Therefore, although it would be valuable to quantify the transgene expression and transgene copy number in terms of finding its correlation to phenotype severity, it is not feasible to do this at this scale. In the future, we would like to refine our model to include more sophisticated inducible transgenic models, with stable integration sites to control for integration site/copy number variation. However, for this manuscript, the priority as set out by our charity funders was to generate and characterise a pik3caPROS model that could rapidly test different patient hotspot alleles as well as tissue-specific promoter drivers. Thus, we chose this simpler model for now, but we would be very interested in continuing this work with a more refined model for one or two mutations (See Reviewer comment 1).
 
 This heterogeneity in transgene dosage and expression levels will inevitably have introduced ‘noise’ into our data. We can account for this somewhat by large numbers of embryos injected per experiment and reproducibility across populations of zebrafish between experiments. We also note that this strategy reflects the heterogeneity in human PROS, with disease mosaicism, presentation, and severity being highly variable from person to person. Therefore, we don’t necessarily see this as a drawback for our current approach.
 
 (3) The manuscript would benefit from rescue experiments to strengthen causal conclusions. It remains unclear whether the phenotypes induced by PIK3CA PROS variants can be rescued, either through expression of wild-type PIK3CA, pharmacological inhibition of PI3K signaling, or assessment of developmental reversibility. Such experiments would strengthen the link between PI3K activation and the observed phenotypes.
 
 We agree this is an exciting direction and a great next step for this research to take. This work is currently ongoing, using the specific Pik3ca inhibitor alpelisib, and optimizing treatment conditions to ensure our experimental readouts are meaningful. Through phenotype scoring we do see a significant rescue in the severity of vascular malformations in PROS mosaic embryos. However, we didn’t feel this work was ready for the initial submission because (1) the concentrations we must add to the zebrafish medium by immersion are far higher than the doses needed for inhibition of PI3K signalling in human cell lines and (2) we do not see an obvious decrease in pAkt or pS6 levels by western blot analyses of embryos at alpelisib doses of up to 100 μM, for either short or long term exposure. This drug is poorly soluble in water, and so we are also experimenting with introducing it to embryos intravenously.
 
 (4) The authors propose candidate signaling molecules mediating non-cell-autonomous effects downstream of PI3K hyperactivation; however, these conclusions remain speculative, as no functional validation is provided. Testing selected candidate mediators identified in the RNA-seq dataset would significantly strengthen the mechanistic conclusions.
 
 We thank the reviewer for this suggestion, and it is indeed a long-term aim of our work to find better treatments for PROS by combining inhibition of PI3K signalling with other candidate mediators to treat overgrowth. Our scRNA-seq experiments suggest that Notch, Wnt and Ephrin signalling pathway components may contribute to disease, and so a lot of potential for treatment strategies. After we have optimised treatment with alpelisib to rescue our disease phenotype in line with current mammalian models (see response to Comment 3 above), then we will start to look at other candidate mediators alone or in conjunction with alpelisib. However, given the challenges we are facing with the alpelisib treatment, we may need to develop this work in a subsequent study.
 
 Reviewer #2 (Public review):
 
 In this manuscript, Brunsdon et al. aim to study PIK3CA-related overgrowth spectrum (PROS) by establishing a mosaic zebrafish model with overexpression of pik3ca carrying hotspot mutations, coupled with an mScarlet+ reporter. Using fluorescence microscopy, the authors demonstrated that overexpression of pik3ca with a number of hotspot mutations led to mesodermal and particularly vascular malformations in the zebrafish model. Interestingly, they found a paucity of mScarlet+ mutant cells in the vascular lesions, consistent with the finding of low PIK3CA mutation burden in PROS tissue. Such data suggest a non-cell-autonomous effect of PIK3CA mutation. Following this logic, the authors performed single-cell RNASequencing on zebrafish overexpressing WT pik3ca and mutant pik3ca at 19 hpf, and demonstrated widespread transcriptomic perturbations across multiple lineages, including lineage frequencies, key cell pathways, and cell-cell interactions. Importantly, they demonstrate that mScarlet+ cells carrying mutant pik3ca cluster separately from other cell types, do not demonstrate clear lineage identity, and have a general downregulation in signaling components.
 
 Overall, the conclusions in the manuscript are well-supported by the presented data. The imaging studies are particularly convincing. The transcriptomic analysis generated a list of potential pathways to further investigate and potentially target with future therapeutic interventions. Importantly, this study provides a valuable in vivo model of PROS that: 1) recapitulates key features of PROS (e.g., multiple mesodermal defects, paucity of mutation burden in lesions suggesting non-cell-autonomous interactions); 2) is scalable; and 3) offers direct visualization of lesion development, compatible with time-course live imaging. This model will be valuable to further understand PROS and potentially study other diseases where the PIK3CA pathway is altered (e.g., certain cancers).
 
 We thank Reviewer 2 for their careful reading and support of our manuscript, and their helpful suggestions.
 
 The following are not necessarily weaknesses of the data, but rather suggestions where the manuscript could be further strengthened:
 
 (1) The model recapitulates the variability of mesodermal lesions in PROS. It would be valuable to utilize this model to further study factors that are associated with the development of more severe lesions (e.g., by comparing samples with more severe lesions to those unaffected despite carrying the mutations, Figure 1F).
 
 This is a very interesting question, and something that we have wondered ourselves. The clinical observation that PROS mutations cause pathology in mesodermal-derived tissues suggests that there is a lineage permissivity of PROS mutations. We plan to perform additional scRNA-seq experiments on later stage embryos (aligned with Figure 1) and hope to incorporate comparison of embryos with more severe lesions to those unaffected despite carrying pik3caPROS mutations.
 
 (2) ScRNA-seq analysis could be enriched with a comparison between cells overexpressing mutant pik3ca vs. those overexpressing WT pik3ca.
 
 The scRNA-seq experiment presented in this paper was limited by funding constraints at the time, and so we focussed on choosing samples that were likely to yield the most meaningful data. Ideally, we would have included a WT overexpression control in addition to an injected no-pik3ca control, however as we did not observe any phenotypes associated with mosaic pik3caWT transgenic embryos (Supplementary Figure 1A, Figure 3D), we chose to not include this condition. We are grateful for subsequent funding that will allow us to perform a scRNAseq experiment at a later timepoint, detailed below, where we plan to include this control.
 
 (3) In the scRNA-Seq analysis, it is curious that the C0 cluster, enriched with mScarlet+ cells, is found to have downregulated signaling interactions (Fig. 5C), yet exerts a widespread noncell-autonomous effect. Meanwhile, there is also a noticeable loss of certain lineages (e.g., notochord, Figure 4E) and related cell-cell interactions (e.g., notochord-related interaction, Figure 5A). A deeper exploration of the basis of the non-cell-autonomous effect would be valuable.
 
 Thank you for this important comment. We agree that this finding is very interesting and warrants further investigation, although a definitive answer may be too difficult for this current revision. Using conventional differential expression analyses on our scRNA-seq data (such as was used in Figure 4), we could not find significant upregulation of many genes and pathways, and CellChat and NICHES analyses did suggest that signalling between C0 and other clusters was weak. Nevertheless, using the Decoupler package, we did find significant upregulation of some footprint signatures enriched in mScarlet+ vs - cells in PROS mosaics (Supplementary Figure 4B) including PI3K and EGFR (as one would expect), but also apoptosis and UV response suggesting that overexpression of pik3caPROS may cause cellular stress. Using NICHES, we also found Myc, Notch, Wnt and Ephrin ligand-receptor pairs to be upregulated in PROS mosaic C0 sending and receiving interactions compared to controls, which would be candidates for validating in subsequent studies (Supplementary Figure 4C). We will be interested to determine if C0 like cells are present in older embryos in our scRNA-seq analysis, and if they have similar signalling activity.
 
 (4) The scRNA-Seq analysis was performed at one time point (19 hpf). Additional analysis (not necessarily by scRNA-Seq) at other time points to study whether findings at 19 hpf are persistent throughout development or undergo dynamic changes (e.g., cell fate/state of mSc+ mutant cells) would be helpful.
 
 We agree that the inclusion of a later timepoint in our scRNA-seq experiment would be valuable in answering a lot of our questions about the fate of C0 cells and the persistence of the transcriptional dysregulation, including non-cell autonomous interactions that we see at 19 hpf. As mentioned above, we were constrained by time and funding for the original experiment but are now in a position to add to this work and address this point.
 
 (5) The scRNA-Seq analysis provides a valuable list of perturbed interactions that could be targeted by future therapeutic approaches. Validation of the scRNA-Seq findings with proteinlevel analysis, and studying the effect of targeting some of the pathways on the disease phenotype, would offer valuable data for the community.
 
 Thank you for this comment. We agree that this an essential next step to take and is also a priority for our patient advocates. As mentioned above (Reviewer 1, point 4), we would like to be confident that alpelisib is on-target in our system first, and then we very much want to identify new therapeutic venues to explore in this pre-clinical space.
 
 Reviewer #3 (Public review):
 
 Summary:
 
 The study "PIK3CA-related overgrowth spectrum (PROS) zebrafish models reveal panlineage developmental dysregulation" presents important findings that extend significantly beyond a single subfield, bridging developmental biology, vascular medicine, and cancerrelated PI3K signalling. By developing mosaic zebrafish models of PROS and combining live imaging with single-cell transcriptomics, the authors provide compelling evidence for a noncell-autonomous mechanism of tissue overgrowth, a conceptual shift with meaningful therapeutic implications.
 
 We thank Reviewer 3 for their time and thoughtful comments considering our work.
 
 Strengths:
 
 The evidence is overall convincing, with methodology appropriate and well-validated relative to the current state of the art; the integration of multiple approaches (in vivo modelling, scRNAseq, ligand-receptor inference) strengthens the central claims. However, some aspects of the proposed non-cell-autonomous signalling mechanisms remain partly correlative, and direct functional validation of the rewired ligand-receptor interactions would further consolidate the conclusions.
 
 Weaknesses:
 
 The transgenic overexpression approach chosen by the authors represents a well-established and effective strategy for generating mosaic models in zebrafish. However, this approach introduces notable limitations: the lack of control over transgene dosage and unknown integration sites may generate non-physiological effects, potentially confounding the interpretation of key findings.
 
 Thank you for this important comment. We agree that there are limitations in our current model, and we are working to refine it such that we have temporal as well as spatial control over the expression of pik3caPROS.
 
 Our funding for the start of this study came from the CLOVES Syndrome community charity, and in collaboration with them, we decided that for this work, our priority was to understand more about the disease mechanisms at disease onset, and also to be able to test multiple pik3ca hotspot mutations that affect patients. One question for families is if the pik3ca hotspot mutations contribute differently to patient overgrowths. Our data here suggests that all mutations are able to promote overgrowth equally, and that differences between disease presentation in patients likely reflects the timing and cellular origins of the mutation.
 
 As a side note, together with CLOVES Syndrome community, we also felt that we wanted to test actual patient mutations, rather than artificial hyperactivated variants of Pik3ca such as the widely used p110a* allele (Hu et al. 1995; Venot et al. 2018), which can inform important mechanisms about pathway dysregulation, but less about actual patient-specific disease mutations.
 
 The authors are certainly aware that alternative approaches (though technically more demanding) could be considered in future studies to further strengthen the model. For instance, a CRISPR/Cas9-mediated knock-in of the pik3ca-PROS allele at the endogenous locus (retaining upstream native regulatory elements with only a minimal promoter in the construct, co-expressed with a fluorescent reporter via P2A) could allow even more physiological, lineage-restricted expression while enabling direct visualisation of mutant cells. Mesodermal specificity could potentially be further refined by driving mosaic Cas9 expression under a pan-mesodermal tbx promoter, restricting editing to the relevant lineage while simultaneously marking mutant cells fluorescently, thus even more closely mimicking the postzygotic mutational events characteristic of PROS. As a complementary strategy, blastula transplantation experiments using pik3ca-PROS donor cells (ideally co-expressing a distinct fluorescent marker such as mCherry) into fli1:GFP transgenic hosts could provide a powerful and technically consolidated approach to directly visualise and quantify non-cell-autonomous effects on host vasculature, with precise control over mutant cell burden. This combinatorial framework, separating donor mutant cells from host tissue in a two-colour imaging setup, could be particularly compelling for validating the ligand-receptor rewiring predicted by single-cell transcriptomics in future investigations.
 
 These reflections are offered in the spirit of prospective methodological development and do not diminish the value of the current work, which opens a valuable new avenue for therapeutic investigation, suggesting that targeting indirect overgrowth-propagating signals, alongside PI3K inhibition, deserves serious consideration.
 
 Thank you for these excellent suggestions and feedback. We are keen to try to generate fish that more closely align with what is happening in patients. Two challenges we have faced include:
 
 (1) In our hands, the pik3ca promoter itself is not strong enough to drive fluorophore expression to an extent that we can observe fluorescent PROS cells in zebrafish. As a control, after we saw no fluorescence attempting to knock-in fluorophores at the 5’ end of endogenous pik3ca, we tried making a transgenic using various lengths of pik3ca promoter regions driving GFP expression. Despite having stable integration of the transgene shown by a secondary transgene reporter inherited through to F1 generation, we could not visualise GFP/mNeonGreen expression at any stage of development.
 
 (2) A drawback of the IRES approach we used here is that the fluorophore expression levels will be lower than using a short cleavable peptide sequence such as P2A. Unfortunately, the critical kinase region (and location of the orthologous hotspot codon 1048) is located only a few amino acids from the stop codon, and we found that the function of Pik3ca was likely impeded by the addition of several extra amino acids after the P2A cleaves itself.
 
 Despite these challenges, we hope to be able to generate models in future with more precise control over mutant cell burden.
 
 References
 
 Hu Q, Klippel A, Muslin AJ, Fantl WJ, Williams LT. 1995. Ras-dependent induction of cellular responses by constitutively active phosphatidylinositol-3 kinase. Science 268: 100102.
 
 Madsen RR, Vanhaesebroeck B, Semple RK. 2018. Cancer-Associated PIK3CA Mutations in Overgrowth Disorders. in Trends in Molecular Medicine, pp. 856-870. Elsevier Ltd.
 
 Venot Q, Blanc T, Rabia SH, Berteloot L, Ladraa S, Duong JP, Blanc E, Johnson SC, Hoguin C, Boccara O et al. 2018. Targeted therapy in patients with PIK3CA-related overgrowth syndrome. Nature 558: 540-546.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2025.12.03.691507v2
www.biorxiv.org www.biorxiv.org

Single-cell spatial mapping reveals reproducible cell type organization and spatially-dependent gene expression in gastruloids

4
1. Public_Reviews 05 Jun 2026
  
  in eLife
  
  eLife Assessment
  
  This work presents important findings on quantifying gene coexpression from spatial omics. These quantification methods have been applied to gastruloid to describe how genes are spatialised. The description of the quantifying tools might be incomplete, which also weakens the biological message. Clearer formalization and justification of quantification will improve the study.
  
  Summary
2. Public_Reviews 05 Jun 2026
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The authors performed seqFISH in 26 gastruloids and performed a variety of computational analyses on these novel spatial data sets. Whilst the data is valuable and the computational concepts useful (exposure index, L-metric, ... ), the article falls short on novelty and is written using a very clunky language, often with contradictory conclusions.
  
  Major issues:
  
  (1) The authors did well in explaining and detailing the provenance of data and the individual experiments performed. However, their 26 gastruloid data still constitute a very limited sampling from their total organoids: one experiment pooled 4 plates at an 80-94% success rate; 6 different aggregation experiments were done, making a total of 1843 gastruloids, sampled 26 (~1-2%). A simple IF stain of 2-3 markers in a bigger sample could have given a more accurate picture of specific domains of interest and their proximity. Regardless, more information should be given about the existing samples: variation across experimental batches, differences between 300-cell vs 100-cell gastruloids that were used.
  
  (2) Language in the manuscript should be revised. Overall the manuscript is very long, descriptive and written "impressions and beliefs" are often not adequately justified and indeed can be contradictory, e.g. in Section 1: the title states "cell types' locations ...are consistent", a few sentences down we find "there was substantial variation" and "within range of what would be considered a 'morphologically normal' gastruloid". "quite consistent", "compelling patterning", "we don't believe"... these types of expressions are best avoided and replaced with data or used and bolstered with quantitative numbers such as percentages when a given cutoff is used. Another example: "location of each cell type relative to gastruloid morphology was quite consistent the posterior region ... mainly consisted in NMPs." Given T expression in the posterior, this result phrased as such appears quite inflated, in fact, looking at cell types in Figures S1, 2a/b/c, this reviewer would state they are all but consistent and indeed it takes sophisticated analyses to find a pattern (of sorts) beyond the coarse domains expected!
  
  (3) Figure 6 is one of the most valuable parts of the work, as the authors use the battery of analyses developed to investigate the variable and not-so-robust endothelial clusters in gastruloids. However, this investigation is still very preliminary, and it should be further linked with known biology. It is still unclear what the unique organization of this cell type is (circularity isn't convincing) and whether any signalling cues of adjacent cells could explain it. Is there any evidence that more mature endodermal cell types are generated (like the suggested "liver") to give rise to endothelial cells? It would certainly be interesting to perform IF for this cell type together with mesodermal and endodermal markers to validate seqFISH predictions on a bigger sample.
  
  (4) Figures 1c and 6b need statistical significance assessments.
  
  (5) The article should include an analysis of Hox colinearity expression in these gastruloids as a validation of the system.
  
  Review 1
3. Public_Reviews 05 Jun 2026
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This manuscript presents an ambitious and technically challenging spatial-transcriptomic atlas of 26 gastruloids using seqFISH. The authors introduce quantitative metrics (mixing score, exposure index, L-metric / scL-metric, spatial L-metric, triplets) to characterize spatial organization at multiple scales. The dataset is valuable, and several analyses are original, particularly the rank-based L-metric family for mutual exclusivity.
  
  Strengths:
  
  The authors generate one of the most detailed spatial transcriptomic datasets of gastruloids to date. They propose creative computational metrics (L-metric/scL-metric) to quantify mutual exclusivity of gene expression without predefined thresholds, and they explore organizational principles from single-cell topology to cluster-level structure. Many observations align well with known gastruloid biology, such as posterior robustness and anterior variability. The writing is generally clear, and the figures are rich.
  
  Weaknesses:
  
  Several central claims rely on metrics whose computation and justification are insufficiently explained, making it difficult to assess how robust or interpretable the results are. Many choices in the analysis appear arbitrary or are insufficiently motivated (normalization schemes, choice of parameters such as the number of neighbors, the distance cutoffs, hierarchical clustering setup, and so on). The interpretations of spatial consistency, gene-program inference, and endothelial heterogeneity are plausible but might be stronger than the evidence currently supports.
  
  The manuscript would benefit from stronger benchmarking, quantification of uncertainty, and explicit controls for known artifacts in spatial transcriptomics (e.g., spillover, 2D slicing, cell type assignment entropy). The biological insights are promising, but since several depend on methodological assumptions that have not yet been demonstrated to be stable, they would benefit from clearer methodological explanation.
  
  The work is rich and could become a reference dataset. Then, clarifying and validating the quantitative methods will considerably strengthen the impact and reliability of the conclusions.
  
  Review 2
4. Public_Reviews 05 Jun 2026
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  Triandafillou and colleagues report a single-cell resolved spatial atlas of gene expression of 26 gastruloids. While previous work had analyzed either single-cell gene expression or spatially coarse-grained patterns of gene expression (van den Brink et al, 2020), the authors here use multiplexed sequential RNA FISH (seqFISH) to create the first gastruloid atlas, which is simultaneously spatially and cellularly resolved. This atlas adds to a growing list of resources cataloging gastruloid development (see also Suppinger et al 2023).
  
  To analyze this dataset, the authors also describe a novel analytical framework. Their analysis centers around the 'L-metric', which measures the degree to which pairs of genes are either coexpressed or mutually exclusive. While this metric is similar to calculating correlations in gene expressions, it has important differences (including that it can, in principle, be asymmetric; although the authors symmetrize much of their analysis). In addition to the gene-centric L-metric analysis, the authors also analyze cells in their dataset according to the cell type entropy (an information-theoretical measure of confidence in cell type assignment) and the 'exposure index' (a measure of the similarity of nearest cellular neighbors).
  
  Using this framework, the authors focus their analysis on two major features of development. The first is the differentiation of the bipotent neuromesodermal progenitor (NMP) cells in the posterior of the gastruloid into either presomitic mesoderm (PSM) or spinal cord SC lineages. They use L-metric analysis to compare overlap in marker genes used to separate NMP, PSM, and SC fates. They highlight that L-metric analysis can recover spatial patterns of gene expression (without explicit spatial information) and discern subtle features of marker genes beyond simple binning of cell types (e.g., that Epha5 expression in anterior NMPs may predict future SC differentiation).
  
  The second is the formation of endothelial (spatial) clusters within the gastruloid. The authors highlight two subtypes of endothelial clusters: (1) smaller clusters within the somitic anterior region, and (2) larger clusters associated with endoderm. While the authors discern some subtle differences in gene expression between these two clusters, their different spatial patterns suggest a potential physiological difference that would not be captured in traditional droplet microfluidic-based scRNAseq pipelines.
  
  Overall, this manuscript is a sophisticated and technically sound study that will provide a valuable beachhead for future studies of developmental patterning in gastruloids and organoids.
  
  Strengths:
  
  The major strengths of this study are the overall technical sophistication of the data set and analysis, as well as its potential generalizability to other developmental systems (both in vitro and in vivo). The data are extensively analyzed and reasonably interpreted, and this atlas makes good use of the variability in gastruloid development to extract the statistical structure of developmental processes. The L-metric offers a parameter-free tool to analyze transcriptomic datasets that could overcome the pitfalls of other approaches.
  
  Weaknesses:
  
  The major limitations of this study are the depth and novelty of the developmental processes studied. The authors provide very convincing proof-of-concept that their data set can recover known features of gastruloid development, including NMP differentiation and endothelial development. However, further analysis and/or investigation would be required to discover new principles of gastruloid development and patterning.
  
  Review 3
Visit annotations in context

Tags

Review 3

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.07.14.664617v2
www.biorxiv.org www.biorxiv.org

Ribosomal RNA methylation by GidB modulates discrimination of mischarged tRNA

3
1. Public_Reviews 04 Jun 2026
  
  in eLife
  
  eLife Assessment
  
  This important study by Bi and colleagues employed a clever genetics screen to uncover the role of the GidB rRNA methylase in translation fidelity, under certain conditions, in Mycobacterium smegmatis. The findings are solid, supporting the findings that the loss of GidB results in mistranslation. The work contributes to a more in-depth understanding of mycobacterial translation fidelity and will be of interest to microbiologists.
  
  Summary
2. Public_Reviews 04 Jun 2026
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Protein synthesis - translation - involves repeated recognition and incorporation of amino-acyl-tRNAs by the ribosome. This process is a trade-off between the rate and accuracy of selection (for review see (Johansson et al, 2008; Wohlgemuth et al, 2011)). The ribosome does not just maximise the rate or the accuracy, it balances the two. Therefore, it is possible to select mutants that translate faster than the wt (but are sloppy) or that are very accurate (more than the wt) but translate slower. Slow translation is detrimental as it limits the rate of protein synthesis (and, therefore, growth) and hyper-accurate mutants accumulate mis-translated proteins, which is detrimental for the cell.
  
  Bi and colleagues employ genetics, MIC measurements, reporter assays and structural biology to characterise the role of GidB rRNA methylase in translational accuracy in Mycobacterium smegmatis.
  
  Strengths:
  
  The genetics and phenotypic assays are convincing and establish the biological role of the methylase. The authors use a powerful set of complementary assays that convincingly demonstrates that the loss of GidB results in mistranslation.
  
  Weaknesses:
  
  Cryo-EM analysis of vacant 70S ribosomes is not sufficient for understanding the mechanisms underlying the accuracy defects in the gidB KO. Ideally, one should assemble and solve structurally near-cognate and non-cognate complexes.
  
  References:
  
  Johansson M, Lovmar M, Ehrenberg M (2008) Rate and accuracy of bacterial protein synthesis revisited. Curr Opin Microbiol 11: 141-147
  
  Wohlgemuth I, Pohl C, Mittelstaet J, Konevega AL, Rodnina MV (2011) Evolutionary optimization of speed and accuracy of decoding on the ribosome. Philos Trans R Soc Lond B Biol Sci 366: 2979-2986
  
  Review 1
3. Public_Reviews 04 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  In this manuscript, Javid and colleagues worked to understand the molecular mechanisms involved in mistranslation in mycobacteria. They had previously discovered that mistranslation is an important mechanism underlying antibiotic tolerance in mycobacteria. Using a clever genetic screen they identify that deletion of gidB, a 16S ribosomal RNA methyltransferase, leads to lowered mistranslation (i.e. higher translational fidelity), but only in genetic backgrounds or environmental conditions that increase mistranslation rates.
  
  Strengths:
  
  The strengths of this manuscript are the clever genetic screen, the powerful mistranslation assays, and the clear writing and figures explaining a complex biological problem. Their identification of gidB as a factor important for mistranslation deepens our knowledge about this interesting phenomenon.
  
  We thank the Reviewer for their summary of our work and the strength of coupling specific mistranslation assays with the genetic screen approach.
  
  Weaknesses:
  
  The structural work at the end feels like both an afterthought in terms of the science and the writing. I would suggest re-writing that section to be clearer about what the figure says and does not say. For example, the caption of Figure 6 appears to be more informative than the text and refers to concepts not present in the main text. In general, I found this section to be the most difficult to understand.
  
  We have revised this section, including re-analysis of the structural data and completely new figures, as well as revised comments placing the findings in the context with the other data. See Revised Figs. 6.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Protein synthesis - translation - involves repeated recognition and incorporation of amino-acyl-tRNAs by the ribosome. This process is a trade-off between the rate and accuracy of selection (for review see (Johansson et al, 2008; Wohlgemuth et al, 2011)). The ribosome does not just maximise the rate or the accuracy, it balances the two. Therefore, it is possible to select mutants that translate faster than the wt (but are sloppy) or that are very accurate (more than the wt) but translate slower. Slow translation is detrimental as it limits the rate of protein synthesis (and, therefore, growth) and hyper-accurate mutants accumulate mis-translated proteins, which is detrimental for the cell.
  
  Bi and colleagues employ genetics, MIC measurements, reporter assays, and structural biology to characterise the role of GidB rRNA methylase in translational accuracy in Mycobacterium smegmatis.
  
  Strengths:
  
  The genetics and phenotypic assays are convincing and establish the biological role of the methylase. The authors use a powerful set of complementary assays that convincingly demonstrate that the loss of GidB results in mistranslation.
  
  We thank the Reviewer for their recognition of the strengths of our work, including the combination of genetic screens and specific assays to demonstrate the contribution of GidB in specific translational fidelity in mycobacteria.
  
  Weaknesses:
  
  (1) It would be essential to provide information regarding the growth rate and, ideally, translation rates in the gidB KO and the isogenic WT. As translation balances accuracy and speed, only characterising the speed is not sufficient to understand the phenomenon.
  
  We have now performed these assays (New Fig. S6). (1) The growth rate of gidB1-KO is the same as the respective background (WT or HWS19) strain with functional GidB. (2). We have performed a measure of translational efficiency as a surrogate for speed (see PMID 32723820), New Fig. S7. As can be seen, deletion of GidB does not affect translation of Nluc luciferase, in both WT and HWS19 backgrounds, suggesting that discrimination of mischarged tRNAs (even in a context in which that is the dominant form of translational error), is not rate-limiting, and that this form of accuracy is distinct to ribosomal mRNA decoding. This is further corroborated by a new preprint from our group (https://www.biorxiv.org/content/10.1101/2024.10.20.619312v2) that a novel small molecule that also increases specific translational fidelity does not affect translational efficiency, suggesting that this is a conserved phenomenon in mycobacterial translation.
  
  (2) Cryo-EM analysis of vacant 70S ribosomes is not sufficient for understanding the mechanisms underlying the accuracy defects in the gidB KO. One should assemble and solve structurally near-cognate and non-cognate complexes. I believe the authors are over-interpreting the scant structural data they have. Furthermore, current representation makes it impossible to assess the resolution of the structure, especially in the areas of interest.
  
  While we agree with the Reviewer that structures of translating ribosomes will be most informative in elucidating the molecular mechanism(s) by which methylation (or not) by GidB contributes to mistranslation, those experiments are ongoing and beyond the scope of the current study. Unlike E. coli ribosomes, for which there are a plethora of structures for mutants available, there are very structures of mycobacterial ribosomes beyond wild-type apo ribosomes. Therefore, we feel that the structures of apo mycobacterial ribosomes +/- GidB-mediated methylation are still of value, and a necessary “first step” for the mechanistic work alluded to above. Secondly, the apo ribosome structures still hint at potential mechanisms by which mistranslation and 16S rRNA methylation may impact on each other – as in the comments to R#1 above, we have revised the text to increase clarity and coherence of this section.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2021.03.02.433644v5
www.biorxiv.org www.biorxiv.org

Experimental verification of the error minimization theory using non-standard genetic codes constructed in vitro

5
1. Public_Reviews 04 Jun 2026
 
 in eLife
 
 eLife Assessment
 
 This valuable work addresses a longstanding question of how the extant genetic code came to be selected and conserved almost universally across life. Using a mutational approach and a small set of reporters, the authors demonstrate that the mutational impact was similar for non-standard genetic codes. The data provide solid support for the claim of having provided experimental verification of the error minimization theory.
 
 Summary
2. Public_Reviews 04 Jun 2026
 
 in eLife
 
 Reviewer #1 (Public review):
 
 [Editors' note: This version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review satisfactorily and toned down the comments as advised.]
 
 In this manuscript, the authors investigate the relationship between genetic codes and their robustness to single-point mutations. They construct ten alternative genetic codes by reassigning nine codons to Leu, Ser, or Ala, and assess mutational robustness using three reporter proteins subjected to error-prone PCR. This represents an interesting experimental approach to addressing the hypothesis that the standard genetic code is optimized for mutational robustness.
 
 Review 1
3. Public_Reviews 04 Jun 2026
 
 in eLife
 
 Reviewer #2 (Public review):
 
 The study addresses the long-standing question in molecular biology and genetics: why has nature selected the current genetic code (SGC, or standard genetic code)? The authors have tested 'error minimization theory', one of the prevailing hypotheses to explain this. Their approach is to create a minimum genetic code (MGC) and its variants (3^9 theoretical possible codes). Using three parameters to quantify the effect of mutations (Polarity, volume, and hydropathy), they computationally test the cost of these genetic codes (3^9) by simulations. Finally, they test this cost experimentally using an in vitro translation system with 10 select genetic code variants with a range of costs (low to high). They use three randomly mutated reporter genes for this purpose - beta-galactosidase, luciferase, and mSG. They find no correlation between the cost of the genetic code and the reporters' output. Based on these observations, they suggest that error-minimization theory may not explain the current egocentric code.
 
 The question they are asking is very exciting, and their approach is solid. The authors are very careful in their analyses and conclusions.
 
 Review 2
4. Public_Reviews 04 Jun 2026
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 In this manuscript, Miyachi and Ichihashi investigate whether the arrangement of the genetic code affects mutational robustness. Using an in vitro minimal genetic code with vacant codons, they constructed 10 non-standard genetic codes by reassigning Ala, Ser, and Leu, generating codes with replacement costs that were generally higher than those of the standard genetic code across several amino acid property measures. They then tested how random mutations affected the activity of reporter proteins translated under these altered codes. Although error minimization theory predicts that higher-cost codes should make mutations more harmful, the authors report that protein function declined to a similar extent across all codes examined, suggesting that mutational robustness remains largely unchanged within the range of genetic code alterations tested here.
 
 Strengths:
 
 This is an interesting study that investigates one of the most fundamental and intriguing questions in molecular evolution: the emergence of the genetic code, which is nearly universal across nature. The in vitro approach is a powerful aspect of the work and provides an opportunity to examine this phenomenon experimentally at a depth that has previously been inaccessible.
 
 Review 3
5. Public_Reviews 04 Jun 2026
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 Public Reviews:
 
 Reviewer #1 (Public review):
 
 In this manuscript, the authors investigate the relationship between genetic codes and their robustness to single-point mutations. They construct ten alternative genetic codes by reassigning nine codons to Leu, Ser, or Ala, and assess mutational robustness using three reporter proteins subjected to error-prone PCR. This represents an interesting experimental approach to addressing the hypothesis that the standard genetic code is optimized for mutational robustness.
 
 We sincerely thank the reviewer for the positive evaluation of our experimental approach. We are encouraged that the reviewer recognizes the value of constructing multiple non-standard genetic codes in vitro and using them to experimentally examine the relationship between genetic code arrangement and mutational robustness. In the revised manuscript, we have further clarified the scope of our experimental system and the interpretation of the results, particularly emphasizing that our conclusions concern the mutational robustness of individual reporter protein activity measured in an in vitro translation system.
 
 Major comment:
 
 While I find the experimental design valuable, I am not fully convinced by the authors' conclusion that "alterations of the genetic code within the ranges explored in this study have no significant effect on mutational robustness". The current analysis is based on the functional output of three individual reporter proteins. Given that cellular systems involve far more complex interactions, it would be more appropriate to limit this conclusion to mutational robustness at the level of individual protein activity, rather than making broader generalizations.
 
 We thank the reviewer for this important comment. We agree that our original wording was broader than what can be directly supported by the present experiments. Because our analysis is based on the functional outputs of three individual reporter proteins translated in a reconstituted in vitro system, the results do not directly address mutational robustness at the level of the cellular system, protein interaction networks, or organismal fitness.
 
 Accordingly, we have revised the manuscript to limit our conclusion to the mutational robustness of individual reporter protein activity. In the revised Abstract, Results, and Discussion, we now state that within the experimentally tested range of non-standard genetic codes, we did not detect a dependence of the mutation-induced decrease in reporter protein activity on mutational cost. We have also added a statement in the Discussion noting that cellular systems involve many additional layers, including protein–protein interactions, metabolic networks, quality-control systems, and growth selection, and that whether genetic code arrangement affects robustness at these higher biological levels remains an important question for future work.
 
 Specifically, we have added this explanation and the new experiment to the revised manuscript as follows.
 
 Abstract
 
 “This result provides direct experimental evidence that mutational robustness does not significantly change in individual reporter protein activity when the genetic code is altered within the range of mutational cost tested in this study…”
 
 Introduction
 
 “Random mutations decreased reporter protein function at similar levels across all genetic codes examined, implying that alterations of the genetic code within the ranges explored in this study have no significant effect on mutational robustness of individual protein activity.”
 
 Result
 
 “Taken together, these results indicate that mutational robustness of individual reporter protein function did not substantially differ among the genetic codes…”
 
 Discussion
 
 “…suggesting that mutational robustness of protein activity remained largely unchanged within at least the ranges of mutational cost tested in this study. It should be noted that this conclusion is limited to the activity of individual reporter proteins translated in a reconstituted in vitro system. Therefore, whether similar trends would be observed at the level of cellular fitness or long-term evolution remains an open question.”
 
 Specific comments
 
 (1) tRNA modification and expression efficiency (Page 5, line 131)
 
 The authors attribute the observed inefficiency to the lack of chemical modifications in the tRNAs used. However, gene expression efficiency can also be strongly influenced by DNA sequence design. To better support this claim, it would be helpful to compare luciferase activity when expressed using native E. coli tRNAs. This comparison could clarify whether the observed effects are due to tRNA modification status or other sequence-dependent factors.
 
 We thank the reviewer for this important suggestion. We agree that the translation efficiency of NanoLuc templates with 21-, 32-, and 46-codons may be affected not only by the chemical modification of tRNAs but also by sequence-dependent factors, such as codon context and mRNA structure.
 
 To examine this possibility, we performed an additional comparison using native E. coli tRNAs in the tfPURE system. When the NanoLuc templates encoded with 21, 32, or 46 codons were translated using native E. coli tRNAs, the observed luminescence values were 1.2 × 1010, 0.78 × 1010, and 0.60 × 1010, respectively. Thus, the 46-codon NanoLuc template showed lower activity than the 21- and 32-codon templates even with native tRNAs, indicating that sequence-dependent effects indeed contribute to translation efficiency.
 
 However, the difference among these templates with native E. coli tRNAs was within approximately two-fold. This effect was much smaller than the marked decrease observed when the 46-codon template was translated using the in vitro prepared 46 tRNAs SGC system. Therefore, while sequence-dependent effects cannot be excluded, the inefficient translation in the reconstructed 46 tRNAs SGC is likely to be mainly attributable to the limited functionality of unmodified tRNAs decoding NNA codons.
 
 We have revised the manuscript to clarify this interpretation and have added the new comparison using native E. coli tRNAs.
 
 “We also examined whether the lower translation efficiency of the 46-codon NanoLuc template could be explained by sequence-dependent effects, such as codon context or mRNA structure. When the 21-, 32-, and 46-codon NanoLuc templates were translated using native E. coli tRNAs in the tfPURE system (Figure 1–figure supplement 2), the 46-codon template showed lower activity than the 21- and 32-codon templates; however, this difference was within approximately two-fold. Accordingly, we decided to use only the 32 codons used in near-SGC (i.e., excluding NNA codons) in the subsequent construction of non-standard genetic codes.”
 
 (2) Discrepancy between expression level and activity (Figure S7 vs Figure S8).
 
 Although GAL expression levels appear similar across different genetic codes (Figure S7), their activities differ substantially (Figure S8), even in the low-mutation library. This discrepancy warrants further investigation. Possible explanations include differences in protein folding efficiency or translational error rates, as mentioned by the authors in the main text.
 
 To address this, the authors could analyze the protein products using mass spectrometry. If this is not feasible due to low expression levels, alternative approaches such as SDS-PAGE (e.g., with radiolabeling or Western blotting) would still provide valuable information. Additionally, comparing activity after in vitro refolding could help distinguish between folding defects and sequence-level errors. While I understand that the primary aim of this study is to compare mutational robustness across genetic codes, discussing these observations would significantly enhance the mechanistic insight of the work.
 
 We agree that the discrepancy between similar GAL expression levels and different GAL activities across genetic codes is important for interpreting the results.
 
 In our experiment, GAL protein amounts were quantified using a C-terminal HiBiT tag. Because the HiBiT tag was fused to the C-terminus of GAL, this assay indicates that the amount of C-terminally completed GAL products did not differ substantially among genetic codes. However, we agree that this assay does not evaluate the sequence fidelity, amino acid misincorporation patterns, or folding state of the translated products. Therefore, the observed differences in GAL activity despite similar HiBiT signals may reflect genetic code-dependent differences in translational error rates, amino acid misincorporation, protein folding efficiency, or other effects on the fraction of catalytically active protein.
 
 We have revised the Discussion to explicitly describe this interpretation and to clarify that detailed mechanistic dissection of these baseline activity differences, for example by mass spectrometry, SDS-PAGE/Western blotting, or refolding analysis, is an important future direction but beyond the scope of the present study. We also clarified that the main analysis in this study uses the ratio of activity from the high-mutation library to that from the corresponding low-mutation library within each genetic code.
 
 We have added this explanation to the revised manuscript as follows.
 
 “Although protein amounts quantified by the HiBiT tag were comparable among genetic codes, GAL activities differed substantially. This indicates that the activity differences among genetic codes were not primarily attributable to differences in the amount of C-terminally completed translation products. The HiBiT assay does not provide information on the fraction of catalytically active protein, including sequence fidelity or folding state, and therefore cannot distinguish among these possibilities. Detailed characterization of translated products by mass spectrometry would provide further mechanistic insight into how individual non-SGCs affect protein quality. However, the primary objective of the present study was to compare mutation-dependent activity loss across genetic codes. Therefore, we evaluated this effect by normalizing the activity of the high-mutation library to that of the corresponding low-mutation library within each genetic code.”
 
 (3) Protein expression analysis for additional reporters.
 
 Since protein expression levels are critical for interpreting reporter activity, similar analyses should also be performed for luciferase (Luc) and mSG in both high- and low-mutation libraries. This would ensure that differences in activity are not confounded by variations in protein abundance.
 
 We agree that protein abundance is an important factor for interpreting reporter activity. In this study, we performed HiBiT-based protein quantification for GAL because GAL showed the largest variation in absolute activity among genetic codes, even in the low-mutation library. This analysis showed that the amount of C-terminally completed GAL products was broadly comparable among genetic codes and between low- and high-mutation libraries, indicating that the observed GAL activity differences were not primarily attributable to differences in total protein abundance.
 
 For all three reporters, our main analysis was based on the ratio of activity from the high-mutation library to that from the corresponding low-mutation library within each genetic code. This normalization was intended to evaluate mutation-dependent activity loss while reducing the influence of code-specific baseline differences in expression level or protein quality. We believe that the data are sufficient to evaluate the effect of mutations on protein activities. Nevertheless, we agree that protein quantification for Luc and mSG would provide useful information regarding variation in the baseline levels of reporter activity, and this is an important direction for future work.
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The study addresses the long-standing question in molecular biology and genetics: why has nature selected the current genetic code (SGC, or standard genetic code)? The authors have tested 'error minimization theory', one of the prevailing hypotheses to explain this. Their approach is to create a minimum genetic code (MGC) and its variants (3^9 theoretical possible codes). Using three parameters to quantify the effect of mutations (Polarity, volume, and hydropathy), they computationally test the cost of these genetic codes (3^9) by simulations. Finally, they test this cost experimentally using an in vitro translation system with 10 select genetic code variants with a range of costs (low to high). They use three randomly mutated reporter genes for this purpose - beta-galactosidase, luciferase, and mSG. They find no correlation between the cost of the genetic code and the reporters' output. Based on these observations, they suggest that error-minimization theory may not explain the current egocentric code.
 
 The question they are asking is very exciting, and their approach is solid. The authors are very careful in their analyses and conclusions.
 
 We sincerely thank the reviewer for the positive assessment of our study and for the helpful suggestions. We are encouraged that the reviewer found the question exciting and the approach solid. In the revised manuscript, we have clarified the rationale for using the MGC/near-SGC framework, added further analyses and explanations of the mutational cost calculations, and revised the wording of our conclusions to more explicitly define the scope and limitations of the present experimental system.
 
 (1) The rationale for using MGC instead of SGC: It is unclear why the authors rely on the MGC for this analysis when the central question concerns the SGC. If the goal is to evaluate whether the SGC minimizes mutational cost, a more direct approach would be to generate alternative variants of the SGC itself and compare their mutational cost distributions. At present, it is difficult to assess whether conclusions drawn from this comparison are fully relevant to the stated biological question.
 
 We thank the reviewer for this important comment. We agree that directly constructing alternative variants of the SGC by changing amino acid assignment from SGC would be the most straightforward approach to testing whether the SGC minimizes mutational cost. However, this approach is currently not feasible in our reconstituted translation system for two reasons.
 
 First, our attempt to construct a 46-tRNA SGC-like system revealed that translation using the 46-codon NanoLuc template was approximately 100-fold less efficient than translation using the MGC or near-SGC (Fig. 1). This low activity likely reflects inefficient decoding of NNA codons by in vitro-prepared tRNAs, which lack native post-transcriptional modifications. Because this system did not provide sufficient translational activity for systematic reporter assays, we restricted subsequent experiments to the 32-codon near-SGC framework, excluding NNA codons. We now describe this technical limitation more explicitly in the revised manuscript.
 
 Second, the MGC framework provides vacant codons that can be reassigned by adding anticodon-variant tRNAs. This feature is essential for constructing multiple genetic code variants in parallel under controlled in vitro conditions. We, therefore, constructed the near-SGC-based non-SGC by adding each tRNA variant to the MGC as an experimentally tractable model system to verify whether differences in genetic code arrangement affect mutation-induced decreases in reporter protein activity.
 
 We have added this explanation to the revised manuscript as follows.
 
 “We first established a minimal genetic code, composed of 21 tRNAs with vacant codons, which allows multiple alternative codon assignments to be introduced under otherwise comparable translation conditions.”
 
 Despite this technical limitation, we believe that the central conclusion of this study—that mutational robustness in individual reporter protein activity does not change significantly when the genetic code is altered within the range of mutational costs tested here—remains well-supported by the present results.
 
 (2) The mutational cost analysis appears biologically oversimplified because all amino acid substitutions are treated equivalently. The analysis assumes that all mutations contribute equally to fitness consequences, which does not reflect biological reality. In natural proteins, the impact of an amino acid substitution depends strongly on its structural and functional context. For example, substitutions affecting catalytic residues, ligand-binding interfaces, phosphorylation sites, or other regulatory motifs can severely impair protein function even when associated changes in polarity, hydropathy, or volume are minimal. Conversely, substitutions in structurally permissive or functionally dispensable regions may have little or no measurable effect despite larger physicochemical differences. Therefore, changes in polarity, hydropathy, and volume alone do not necessarily predict functional consequences.
 
 We agree that the mutational cost used in this study is a simplified measure and does not capture the full biological complexity of amino acid substitutions. As the reviewer pointed out, the functional consequence of a substitution depends strongly on its structural and functional context, including whether the affected residue is involved in catalysis, ligand binding, protein–protein interactions, regulatory motifs, folding, or structurally permissive regions.
 
 In this study, we used physicochemical-property-based mutational costs because this type of definition has been widely used in classical formulations of the error minimization theory. Our aim was therefore not to construct a comprehensive predictor of protein fitness effects, but to experimentally test whether the conventional theoretical cost metrics used to discuss genetic code optimality are reflected in the average mutation-induced decrease in reporter protein activity. We have now clarified this rationale in the revised manuscript.
 
 “It should be noted that this conclusion is limited to the activity of individual reporter proteins translated in a reconstituted in vitro system. Therefore, whether similar trends would be observed at the level of cellular fitness or long-term evolution remains an open question.”
 
 (3) It is not clear why they increased the concentration of the two tRNAs in near-SGC. Have they maintained the same tRNA concentrations in experiments explained in Fig 5 for all 10 genetic codes tested?
 
 We apologize that the rationale for increasing the concentrations of tRNAValCAC and tRNAArgCCU was not sufficiently clear in the original manuscript. As we wrote in the previous manuscript, “To improve translation efficiency with near-SGC, we focused on two tRNA concentrations (tRNAValCAC and tRNAArgCCU), which were suggested to have low activities in a previous study (Iwane et al., 2016),” we tested whether increasing their concentrations would improve translation efficiency. As shown in Figure 1–figure supplement 1, NanoLuc activity increased as the concentrations of these two tRNAs were raised and used at 100 ng/µL for tRNAValCAC and tRNAArgCCU in the optimized near-SGC, referred to as near-SGC (RV), and in all subsequent experiments. Additional anticodon-variant tRNAs required for each non-SGC were used at optimized concentrations determined from Figure 2–figure supplement 1. For each genetic code, the same tRNA composition and concentrations were used for the low- and high-mutation libraries (See Supplementary Table S7). To clarify this point, we added the sentence, “The increased concentrations of these two tRNAs were used in all the subsequent experiments,” in the corresponding part.
 
 Reviewer #3 (Public review):
 
 In this manuscript, Miyachi and Ichihashi investigate whether the arrangement of the genetic code affects mutational robustness. Using an in vitro minimal genetic code with vacant codons, they constructed 10 non-standard genetic codes by reassigning Ala, Ser, and Leu, generating codes with replacement costs that were generally higher than those of the standard genetic code across several amino acid property measures. They then tested how random mutations affected the activity of reporter proteins translated under these altered codes. Although error minimization theory predicts that higher-cost codes should make mutations more harmful, the authors report that protein function declined to a similar extent across all codes examined, suggesting that mutational robustness remains largely unchanged within the range of genetic code alterations tested here.
 
 Strengths:
 
 This is an interesting study that investigates one of the most fundamental and intriguing questions in molecular evolution: the emergence of the genetic code, which is nearly universal across nature. The in vitro approach is a powerful aspect of the work and provides an opportunity to examine this phenomenon experimentally at a depth that has previously been inaccessible.
 
 Weaknesses:
 
 However, the authors' use of random mutation libraries has certain limitations that prevent the study from realizing its full potential to uncover the mechanisms governing the molecular evolution of the genetic code.
 
 We sincerely thank the reviewer for the positive evaluation of our study and for recognizing the strength of the in vitro approach. We are encouraged that the reviewer considers this system a powerful way to experimentally address the emergence of the genetic code.
 
 We also appreciate the reviewer’s constructive comments regarding the limitations of random mutation libraries. We agree that pooled random libraries do not allow us to assign functional effects to individual mutations or to fully uncover the molecular mechanisms underlying mutational robustness. In the revised manuscript, we therefore clarify that our conclusions concern the library-averaged effects of random mutations on individual reporter protein activity, rather than the effects of specific mutations or cellular-level fitness. To address this limitation, we have added explanations of the scope and limitations of the present approach.
 
 (1) Statistical analyses are missing for several of the manuscript's main claims. This issue applies throughout the paper, including, but not limited to, Figures 1D, 2B, 4B-D, and 5B.
 
 We thank the reviewer for this important comment. We agree that statistical analyses are necessary to support the major claims of the manuscript. We have therefore added statistical analyses appropriate for the purpose and experimental design of each figure.
 
 For Fig. 1D, we performed one-way ANOVA followed by Tukey’s post hoc test on NanoLuc activity to compare translation efficiencies among the MGC, near-SGC, near-SGC (RV), and SGC conditions. This analysis showed a significant overall difference among conditions (one-way ANOVA, p < 0.0001). Tukey’s post hoc test showed that near-SGC was significantly lower than MGC, that near-SGC (RV) significantly improved near-SGC translation, and that near-SGC (RV) was not significantly different from MGC. In contrast, the 46-tRNA SGC remained significantly less efficient than near-SGC (RV). We have summarized the major comparisons in Supplementary Table S8.
 
 For Fig. 2B, we compared NanoLuc activity between the 21-code control and the corresponding 21+1-code condition for each codon reassignment using Welch’s t-test on luminescence. This analysis was added to statistically support whether each anticodon-variant tRNA increased NanoLuc translation from the corresponding reassigned template. The statistical results are summarized in Supplementary Table S9.
 
 For Fig. 4B–D, we converted mutation rates per base to estimated numbers of mutations per gene and performed Spearman’s rank correlation analysis to evaluate whether reporter activity decreased monotonically with increasing mutational load. This analysis showed strong negative monotonic trends between mutation rate (estimated mutation number) and reporter activity for all three reporters (ρ = −0.90 to −1.00), supporting that the random mutation libraries reduced protein activity in a mutation-load-dependent manner.
 
 For Fig. 5B, replicate-level data were available for GAL, and we therefore performed two-way ANOVA using genetic code and mutation level as factors. This analysis detected significant main effects of genetic code and mutation level, indicating that GAL activity differed among genetic codes and decreased in the high-mutation library. However, no significant interaction between genetic code and mutation level was detected, indicating that the magnitude of mutation-induced activity reduction was not strongly code-dependent under the conditions examined.
 
 Finally, because the central claim of Fig. 5C, 5E, and 5G is that mutational cost does not systematically predict mutation-induced activity loss, we performed Spearman’s rank correlation analysis between each mutational cost metric and the high-/low-mutation activity ratio. No significant correlations were detected for any reporter or cost metric (Spearman’s ρ = −0.23 to 0.25), supporting the conclusion that mutational cost did not show a detectable monotonic relationship with mutation-induced activity loss within the tested range.
 
 We have added these statistical analyses to the revised manuscript. The following sentences were added to the figure legends:
 
 Fig. 1
 
 “Statistical comparisons in (D) were performed using one-way ANOVA followed by Tukey’s post hoc test on NanoLuc activity; major comparisons are summarized in Table S8.”
 
 Fig. 2
 
 “For each template, NanoLuc activity in the 21-code and corresponding 21+1-code conditions was compared using Welch’s t-test on luminescence. Statistical results are summarized in Table S9.”
 
 Fig. 4
 
 “Spearman’s rank correlation coefficients were ρ = −0.90 for GAL, ρ = −1.00 for Luc, and ρ = −1.00 for mSG”
 
 Fig. 5
 
 “For GAL activity in (B), two-way ANOVA was performed using genetic code and mutation level as factors. Significant main effects of genetic code and mutation level were detected (both p < 0.0001), whereas their interaction was not significant. For (C), (E), and (G), Spearman’s rank correlation analysis was performed between each mutational cost metric and the high-/low-mutation activity ratio. Statistical details are summarized in Table S10.”
 
 (2) In Figure 2A, the authors modify the NanoLuc gene by reassigning Ala, Leu, or Ser to new codons and elegantly show that the in vitro availability of the corresponding tRNAs is important for protein function. However, the functional importance of the specific modified positions within NanoLuc is not clear. As a result, it is difficult to determine what the expected consequences of these codon changes should be, which in turn limits the interpretation of the observed changes in protein activity. To improve the interpretability of this experiment, the authors should report exactly how many codons were modified in each variant and, ideally, examine the effect of progressively increasing the number of reassigned codons.
 
 We agree that the exact positions and numbers of codon replacements should be clearly reported. In the revised manuscript, we have added a list of the modified amino acid positions. In brief, two Ala codons, three Ser codons, or four Leu codons were replaced with the target vacant codon; the modified positions were Ala16 and Ala120, Ser31, Ser49, and Ser150, and Leu32, Leu67, Leu144, and Leu170, respectively.
 
 We also agree that progressively increasing the number of reassigned codons would provide additional mechanistic insight. However, the purpose of Fig. 2 was to test whether each vacant codon could be decoded by the corresponding anticodon-variant tRNA to produce functional NanoLuc, rather than to analyze the positional contribution of each replacement. We previously performed such progressive codon replacement analysis for one reassigned codon, ACG, in a related study (Miyachi et al., 2025), and the results supported the same qualitative interpretation. Although we did not repeat this progressive analysis for all codons in the present study, we expect that the qualitative interpretation of Fig. 2 would not be substantially changed.
 
 We have revised the figure text to clarify the scope of the experiment and added the detailed codon replacement information.
 
 “(A) Schematic illustration of reassignment experiments. Translation with the original MGC and NanoLuc template is shown at the top for comparison. An example of Ala reassignment to the UUG codon is shown at the bottom. In this example, three Ala codons in the NanoLuc sequence were replaced with one type of vacant codon (e.g., UUG), generating a 21 + 1 (UUG-Ala) codon set. Similar reassignment experiments were performed for three amino acids (Ala, Ser, and Leu) and nine vacant codons. Specifically, two Ala codons (Ala16 and Ala120), three Ser codons (Ser31, Ser49, and Ser150), or four Leu codons (Leu32, Leu67, Leu144, and Leu170) were replaced.”
 
 (3) The calculations presented in Figure 3 raise an interesting conceptual question: why does the near-standard genetic code not exhibit the lowest cost? One possible explanation is that the standard genetic code evolved under multiple competing constraints and is therefore not expected to be optimal for any single cost metric, while still achieving strong overall performance. In this context, it would be informative if the authors combined the three cost measures into a single integrated index and examined whether the near-SGC performs more favorably when all three dimensions are considered together. Such an analysis could add important depth to the study.
 
 We agree that the near-SGC is not necessarily expected to minimize each individual cost metric, because the standard genetic code may reflect multiple competing physicochemical, translational, biosynthetic, and evolutionary constraints rather than optimization of a single property.
 
 To address this point, we added an integrated cost analysis combining the three physicochemical cost metrics, CostPR, CostMV, and CostHI. Because these three metrics have different numerical scales, we normalized each metric before integration. We used two types of integrated indices.
 
 First, for each metric m 𝛜 {PR, MV, HI}, we calculated a min–max normalized cost,
 
 Where G denotes the set of 19,683 candidate non-SGCs generated by assigning Ala, Ser, or Leu to the nine vacant codon boxes. We then defined the integrated min–max cost as
 
 Second, we calculated a z-score-normalized cost for each metric,
 
 Where µm,G and 𝜎m,G are the mean and standard deviation of Costmnorm across the candidate non-SGCs. The integrated z-score cost was then defined as
 
 Using both integrated indices, the near-SGC ranked first when compared with all 19,683 candidate non-SGCs; in other words, no candidate non-SGC showed a lower integrated cost than the near-SGC. The integrated min–max cost of the near-SGC was 0.01525, whereas the lowest value among candidate non-SGCs was 0.12301. Similarly, the integrated z-score cost of the near-SGC was −2.47947, whereas the lowest candidate value was −1.90838.
 
 We have added this integrated cost analysis as Supplementary Figure 5–figure supplement 7. We have also revised the Discussion to note that the near-SGC does not necessarily minimize every individual physicochemical cost, but performs most favorably when PR, MV, and HI are considered comprehensively. This result is consistent with the idea that the standard genetic code may represent a compromise among multiple constraints rather than optimization of a single physicochemical property.
 
 “We consider that the cost ranges examined in this study represent substantial fractions, especially for MV and HI. Although the near-SGC did not necessarily exhibit the lowest cost for each individual physicochemical metric, this does not mean that it is unfavorable in the multidimensional cost space. Because the SGC may reflect a balance among multiple physicochemical constraints rather than optimization of a single property, we also calculated integrated cost indices by combining Cost_PR, Cost_MV, and Cost_HI after min–max normalization or z-score normalization. In both integrated indices, the near-SGC showed the lowest overall cost when compared with all 19,683 candidate non-SGCs (Figure 5–figure supplement 7), indicating that no candidate non-SGC exhibited a lower combined cost than the near-SGC when the three physicochemical properties were considered comprehensively.”
 
 (4) It is difficult to assess the consequences of the random mutations presented in Figure 4 on reporter gene function based solely on the reported "error rate/base" parameter. In particular, the x-axis in Figure 4B should be converted into the estimated number of mutations per gene. This would make the results more intuitive and would allow the reader to better evaluate the expected degree of disruption to protein function.
 
 We agree that the mutation rate per base alone does not provide an intuitive sense of the expected mutational burden for each reporter gene. We therefore added a second x-axis to Fig. 4B–D showing the estimated number of mutations per gene. This value was calculated by multiplying the mutation rate per base by the coding sequence length of each reporter gene.
 
 We retained the original mutation rate per base axis to preserve the direct link to the sequencing-based mutation rate measurement, while adding the estimated mutations per gene axis to improve interpretability. We have revised the figure and figure 4 legend accordingly.
 
 “The lower x-axis indicates the estimated number of mutations per gene, calculated by multiplying the mutation rate per base by the coding sequence length of each reporter gene.”
 
 (5) A central limitation of the random mutagenesis libraries used in Figure 5, which also underlie one of the manuscript's main claims, is that the exact mutations and their distribution across the reporter genes are not reported. In addition, protein activity is measured only at the level of the entire library, without directly linking individual mutations to their functional consequences. This substantially limits mechanistic interpretation. In my view, this issue can only be addressed convincingly if the authors test a set of defined variants carrying specific mutations and directly evaluate their functional effects.
 
 (6) Related to the previous point, in Figures 5C, 5E, and 5G, the authors present the ratio between low-mutation-rate and high-mutation-rate libraries. However, because each library contains a different collection of mutations, it is unclear what can be inferred from these comparisons. To overcome this limitation, the authors should assess the effects of altered genetic codes on specific, defined mutations rather than on heterogeneous mutation pools alone.
 
 (7) Along the same lines, in Figures 5C, 5E, and 5G, it is unclear why the effects of random mutations would be expected to correlate with the three calculated cost metrics, given that the positions, identities, and functional relevance of the mutations within the genes are not known. Without this information, the biological meaning of these correlations remains difficult to evaluate.
 
 We agree that using pooled random mutation libraries does not allow us to directly link individual mutations to their functional consequences. We also agree that testing defined variants carrying specific mutations would provide a more direct and mechanistic understanding of how each genetic code affects the functional impact of particular amino acid substitutions. However, the purpose of the present study was different from such a defined-variant analysis. Our aim was to experimentally test whether the conventional mutational cost metrics used in error minimization theory predict the average effect of random mutational loads on protein activity. Because these theoretical costs are themselves defined as average expected physicochemical effects over many possible single-nucleotide substitutions, we reasoned that pooled random mutation libraries provide an appropriate first experimental framework to evaluate whether such average-cost metrics are reflected in the average functional output of translated proteins.
 
 We agree that low- and high-mutation libraries do not contain identical sets of mutations. Therefore, the high-/low-mutation activity ratio should not be interpreted as the effect of the same individual variants before and after additional mutations. Rather, it represents the relative reduction in average activity caused by increasing the mutational burden in a heterogeneous mutation pool under each genetic code. We have revised the text to clarify this interpretation.
 
 We also agree that the positions, identities, and functional relevance of individual mutations are not resolved in this pooled assay. This limitation prevents us from assigning mechanistic effects to specific substitutions. At the same time, using a small set of defined variants would introduce its own selection bias, because the conclusions could strongly depend on which mutations and which protein positions were chosen. Therefore, we consider the random-library approach to be a useful first step for testing library-averaged effects, whereas systematically defined variant analysis or genotype-resolved activity assays will be necessary to reveal mutation-specific mechanisms in future studies.
 
 In response to the reviewer’s concern, we have revised the Discussion to explicitly limit our conclusion to library-averaged effects on individual reporter protein activity. We now state that this approach does not identify the functional effects of individual mutations and that future studies using defined variants or high-throughput genotype–phenotype mapping will be required to determine how specific substitutions contribute to genetic code-dependent mutational robustness.
 
 Result
 
 “To estimate the average activity reduction associated with increased mutational burden under each genetic code, we calculated the ratio of activity obtained from the high-mutation library to that from the corresponding low-mutation library and plotted this ratio against each of the three mutational costs (Fig. 5C).”
 
 Discussion
 
 “A further limitation of this study is that the reporter activities were measured at the level of pooled random mutation libraries. Therefore, the high-/low-mutation activity ratio used in this study should be interpreted as the relative reduction in average activity caused by increasing the mutational burden in a heterogeneous mutation pool, rather than as the effect of identical variants before and after additional mutations. This library-averaged approach was chosen because the mutational costs considered here are also defined as average expected physicochemical effects over many possible single-nucleotide substitutions. In addition, because the non-SGCs constructed in this study were generated by reassigning only Ala, Ser, and Leu, the detectable effects may depend on how frequently mutations involving these amino acids occur in each reporter gene and whether the affected positions are functionally important. If genetic code dependent effects are restricted to a small subset of deleterious variants, such effects may be masked in pooled activity measurements. Future studies using defined variants or high-throughput genotype–phenotype mapping assays will be required to determine the mutation-specific and position-specific mechanisms underlying genetic code dependent effects on protein function (Rozhoňová et al., 2024).”
 
 (8) For each mutagenesis library, the number of variants, the average number of mutations per variant, and the distribution of mutation positions should be reported clearly and transparently. These details are important for evaluating the strength of the conclusions.
 
 We agree that a more transparent characterization of the random mutagenesis libraries is necessary for evaluating the strength and limitations of our conclusions.
 
 In the revised manuscript, we have added the estimated number of mutations per gene to the Results section. This value was calculated by multiplying the mutation rate per base by the coding sequence length of each reporter gene. For the high-mutation libraries used in Fig. 5, the estimated numbers of mutations per gene were approximately 8.0 for GAL, 4.5 for Luc, and 3.3 for mSG. We also added position-wise mutation profiles along each reporter gene (Figure 4–figure supplement 2), in addition to the heatmap shown in the original manuscript. These analyses clarify the mutational burden of each library and show that mutations were broadly distributed across the analyzed regions (approximately 300 nt in the middle of each gene) of the reporter genes.
 
 Regarding the number of variants, the translation reactions were performed using 5 nM DNA template in a 5 µL reaction, corresponding to approximately 1.5 × 1010 DNA molecules. However, this value represents the total number of DNA molecules introduced into the reaction and does not directly indicate the number of unique full-length sequence variants, because multiple molecules can share the same genotype, and our sequencing analysis was designed to quantify mutation frequencies and positional distributions rather than to reconstruct full-length genotypes of individual library members. Therefore, we do not infer the exact number of unique variants in each library. Instead, we report the average mutation burden and position-wise non-reference rate distributions.
 
 We have revised the Results and added Supplementary Figure 4–figure supplement 2 accordingly.
 
 “For this experiment, two random mutation libraries were used: a low-mutation library prepared using the high-fidelity polymerase and a high-mutation library prepared using Taq DNA polymerase at a Mn2+ concentration that yields mutation rates of 0.002 – 0.005 per base (0.0026 for GAL, 0.0027 for Luc, and 0.0048 for mSG, corresponding to approximately 8.0, 4.5, and 3.3 mutations per gene). We also plotted position-wise non-reference rates along the analyzed regions of each reporter gene, confirming that mutations were broadly distributed across the amplicons (Figure 4–figure supplement 2).”
 
 (9) Because only three amino acids were manipulated in the non-standard genetic codes, it remains unclear whether these particular amino acids occupy positions in the reporter proteins that are especially important for function and therefore likely to generate strong phenotypic effects. More broadly, it is not clear whether the assay is sufficiently sensitive to detect the effects of only a subset of deleterious variants within a pooled library. This point should be addressed more explicitly.
 
 We agree that this is an important limitation of the present study. Because our non-SGCs were constructed by reassigning only Ala, Ser, and Leu, the mutation-dependent effects that can differ among genetic codes are limited to mutations involving these reassigned codons or amino acid substitutions affected by these assignments. Therefore, the sensitivity of the assay depends on how frequently such substitutions occur in the reporter genes and whether the affected Ala, Ser, and Leu-related positions are functionally important.
 
 We have revised the Discussion to address this point more explicitly. In the revised manuscript, we now state that the absence of a detectable cost-dependent effect may reflect not only the limited cost range examined, but also the limited set of reassigned amino acids, the position-dependent importance of Ala/Ser/Leu residues in the reporter proteins, and the sensitivity limit of pooled activity measurements. We further note that future studies using genotype-resolved activity assays (defined variants) will be required to determine whether specific amino acid substitutions or specific protein positions exhibit stronger genetic code-dependent effects.
 
 “A further limitation of this study is that the reporter activities were measured at the level of pooled random mutation libraries. Therefore, the high-/low-mutation activity ratio used in this study should be interpreted as the relative reduction in average activity caused by increasing the mutational burden in a heterogeneous mutation pool, rather than as the effect of identical variants before and after additional mutations. This library-averaged approach was chosen because the mutational costs considered here are also defined as average expected physicochemical effects over many possible single-nucleotide substitutions. In addition, because the non-SGCs constructed in this study were generated by reassigning only Ala, Ser, and Leu, the detectable effects may depend on how frequently mutations involving these amino acids occur in each reporter gene and whether the affected positions are functionally important. If genetic code-dependent effects are restricted to a small subset of deleterious variants, such effects may be masked in pooled activity measurements. Future studies using defined variants or high-throughput genotype–phenotype mapping assays will be required to determine the mutation-specific and position-specific mechanisms underlying genetic code-dependent effects on protein function (Rozhoňová et al., 2024).”
 
 Recommendations for the authors:
 
 Reviewing Editor Comments:
 
 While we suggest that you address all the technical points raised by the reviewers, you may specifically want to limit the conclusion of the study to mutational robustness at the level of individual protein activity, rather than making broader generalizations. Also, the statistical analysis needs to be strengthened, as indicated in the reviews.
 
 We thank the Reviewing Editor for these important suggestions. We agree that the conclusion of the original manuscript was broader than what can be directly supported by the present experiments. In the revised manuscript, we have therefore limited our conclusion to mutational robustness at the level of individual reporter protein activity measured in a reconstituted in vitro translation system. We now explicitly state that our results do not directly address robustness at the level of cellular fitness, protein interaction networks, or long-term evolution.
 
 We have also strengthened the statistical analyses throughout the manuscript. Specifically, we added one-way ANOVA followed by Tukey’s post hoc test for Fig. 1D, Welch’s t-tests for Fig. 2B, Spearman’s rank correlation analyses for Fig. 4B–D and Fig. 5C/E/G, and two-way ANOVA for GAL activity in Fig. 5B. These analyses have been incorporated into the revised Results, figure legends, and supplementary information.
 
 Reviewer #2 (Recommendations for the authors):
 
 (1) Discuss other alternative hypotheses if the error minimization theory is unlikely.
 
 We thank the reviewer for this helpful suggestion. We think that the absence of a detectable relationship between mutational cost and reporter protein activity in our assay should not be interpreted as excluding all possible roles of error minimization in the evolution of the genetic code. Our results specifically address one aspect of the error minimization theory: whether physicochemical-property-based mutational cost predicts the average effect of random point mutations on individual reporter protein activity within the experimentally accessible range of non-SGCs tested here.
 
 In the revised Discussion, we have clarified that the organization of the SGC may have been shaped by multiple factors, including robustness to translational errors, historical constraints associated with genetic code expansion, biosynthetic or coevolutionary processes, stereochemical interactions, and the evolvability of proteins. Our results suggest that the contribution of mutational robustness at the level of individual protein activity may be limited within the range examined here, but they do not exclude the possibility that the SGC provides advantages under other forms of error, at the level of translation fidelity, cellular fitness, or long-term evolution.
 
 We have added a short discussion to clarify this point without expanding the scope of the manuscript beyond the present experimental results.
 
 “It should be noted that this conclusion is limited to the activity of individual reporter proteins translated in a reconstituted in vitro system. Therefore, whether similar trends would be observed at the level of cellular fitness or long-term evolution remains an open question. Moreover, our results do not exclude other possible roles of SGC organization. The SGC may have been shaped by multiple factors, including robustness to translational errors, historical constraints during genetic code expansion, biosynthetic or coevolutionary relationships among amino acids, stereochemical interactions, and effects on protein evolvability (Katoh and Suga, 2023; Koonin and Novozhilov, 2017, 2009; Novozhilov et al., 2007; Wong, 2005).”
 
 (2) A brief description of the PURE translation system can be provided for people from outside the field.
 
 We have added a brief description of the PURE system in the Introduction to make the experimental platform more accessible to readers outside the field. Specifically, we now explain that the PURE system is a reconstituted cell-free translation system composed of purified translation factors, ribosomes, aminoacyl-tRNA synthetases, tRNAs, amino acids, and energy-regeneration components. We also clarify that, in this study, we used a tRNA-free version of the PURE system, in which defined synthetic tRNA sets were supplied externally to reconstruct each genetic code.
 
 Introduction
 
 “A representative platform for such reconstitution is the PURE system (Shimizu et al., 2001), a reconstituted cell-free translation system composed of purified translation components, including ribosomes, translation factors, aaRSs, amino acids, and energy-regeneration components. In particular, a tRNA-free PURE system (Miyachi et al., 2022), in which endogenous tRNA activity is minimized and defined tRNA sets are supplied externally, enables genetic codes to be reconstructed by controlling the supplied tRNAs.”
 
 (3) Figure 5D and F - Technical replicates are provided only for GAL. A similar approach should be taken for LUC and mSG.
 
 We agree that replicate-level measurements for Luc and mSG would further improve reliability. However, repeating the full translation experiments for these reporters was not feasible in the current revision, as each experiment requires large amounts of freshly prepared tRNA-free PURE system and multiple defined tRNA mixtures for every genetic code variant tested. Given these material and technical constraints, we were unable to perform additional biological replicates within the scope of this revision. We would like to emphasize, however, that the GAL replicates shown in Fig. 5D and F are fully consistent across independent experiments, providing direct evidence for the reproducibility of the assay itself. Furthermore, the key metric in our analysis, the activity ratio between high- and low-mutation groups within each genetic code, is an internally normalized measure that is inherently less sensitive to between-experiment variability than absolute activity values. The correlation analyses further showed no significant relationship between mutational cost and this ratio across all three reporters, and this conclusion is consistent regardless of which reporter is examined. Together, we believe these results provide a robust basis for the conclusions drawn, even in the absence of full replication for Luc and mSG.
 
 (4) Provide statistical analysis wherever it is relevant (e.g, to support a lack of correlation).
 
 We have strengthened the statistical analyses throughout the revised manuscript. In particular, to support the lack of detectable correlation between mutational cost and mutation-induced activity loss, we performed Spearman’s rank correlation analyses between each mutational cost metric and the high-/low-mutation activity ratio for all three reporters. No significant correlations were detected for any reporter or cost metric. In addition, we added statistical analyses for other relevant figures, including one-way ANOVA followed by Tukey’s post hoc test for Fig. 1D, Welch’s t-tests for Fig. 2B, Spearman’s rank correlation analyses for Fig. 4B–D, and two-way ANOVA for GAL activity in Fig. 5B.
 
 Reviewer #3 (Recommendations for the authors):
 
 (1) In line 122, the phrase "as evenly as possible" is ambiguous and should be explained more precisely.
 
 We thank the reviewer for pointing this out. We have revised the phrase “as evenly as possible” to describe the codon design more precisely. Specifically, we now state that the NanoLuc coding sequences were designed so that the codons available in each genetic code were used with minimal differences in codon counts, while preserving the amino acid sequence of NanoLuc.
 
 “For near-SGC and SGC, the NanoLuc coding sequences were designed so that the codons available in each genetic code were used with minimal differences in codon counts, while preserving the amino acid sequence (Fig. 1B, 32 codons and 46 codons).”
 
 (2) For Figure 1D, a Western blot or another protein gel-based assay would be helpful to exclude the possibility that the observed differences arise from variation in translation efficiency rather than differences in protein activity.
 
 We agree that a protein gel-based assay such as Western blotting would in principle allow us to distinguish differences in translated protein amount from differences in specific activity, and we understand why such data would be informative. However, we would like to clarify that the primary purpose of Fig. 1D was to evaluate the overall functional translation output of each reconstructed genetic code, rather than to determine the mechanistic basis of any observed differences. In this context, NanoLuc luminescence serves as an integrated readout of the entire translation process, encompassing both translational efficiency and protein folding/activity. Crucially, regardless of whether the observed differences in NanoLuc luminescence reflect lower protein yield, reduced specific activity, or a combination of both, the conclusion of Fig. 1D remains the same. Although we did not perform Western blotting in this study, we believe that such an analysis would not change this interpretation and that the current data are sufficient to support this conclusion.
 
 (3) The number 3^9 is not immediately intuitive. It would be helpful if the authors also stated that this corresponds to approximately 20,000 possible non-standard genetic codes.
 
 We have revised the text to state both the exact number and the approximate value: 39 = 19,683, approximately 20,000 possible non-standard genetic codes.
 
 (4) The rationale for using the three cost parameters (PR, MV, and HI) should be explained in greater detail. Because these parameters are central to the manuscript, a citation alone is not sufficient. A concise explanation of their biological relevance would improve the clarity and accessibility of the study.
 
 We agree that the biological relevance of the three cost parameters should be explained more clearly. In the revised manuscript, we have added a concise explanation of why polar requirement (PR), molecular volume (MV), and hydropathy index (HI) were used.
 
 These parameters were selected because they have been widely used in theoretical studies of genetic code optimality and represent distinct physicochemical aspects of amino acid substitutions. PR reflects polarity-related interactions and has been a classical metric in error minimization analyses of the genetic code. MV represents side-chain size and steric volume, which could influence packing and structural stability in proteins. HI reflects hydrophobicity, which is closely related to protein folding and hydrophobic core formation. We have also clarified that these metrics are simplified descriptors and do not capture residue-specific structural or functional context, which we now discuss as a limitation of the study.
 
 “PR reflects polarity-related interactions of amino acids and has been used as a classical measure of amino acid similarity in error minimization analyses. MV represents side-chain size and steric volume, which could affect protein packing and structural stability, whereas HI reflects hydrophobicity, which could be closely related to protein folding or hydrophobic core formation.”
 
 (5) In Figure 3, the experimental framework would be easier to follow if the authors included a schematic and data for one representative non-SGC, explicitly illustrating how it differs from the near-SGC with respect to each of the three cost measures.
 
 We agree that showing one representative non-SGC would make the experimental framework and cost calculation more intuitive.
 
 In the revised manuscript, we added a new panel to Fig. 3 comparing the near-SGC with a representative non-SGC. We selected the PRmax code as the representative example because it clearly illustrates how reassignment of vacant codon boxes can increase one mutational cost metric relative to the near-SGC. In this panel, we first show the codon assignment schemes of the near-SGC and PRmax code in the same genetic-code format used in Fig. 1. We then show the corresponding heatmap representations for the three physicochemical properties used in the cost calculation: polar requirement, molecular volume, and hydropathy index. The CostPR, CostMV, and CostHI values are shown for each code.
 
 This new panel illustrates how changes in codon assignment are translated into different physicochemical cost landscapes and clarifies how the representative non-SGC differs from the near-SGC with respect to each of the three cost measures.
 
 “To make the design of non-SGCs more explicit, we show one representative non-SGC together with the near-SGC in Fig. 3B. This comparison illustrates how assignment of Ala, Ser, or Leu to the vacant codon boxes changes the three mutational cost metrics, CostPR, CostMV, and CostHI.”
 
 (6) In line 329, the phrase "similar pattern" is ambiguous and should be explained more explicitly.
 
 We have revised the ambiguous phrase “similar pattern” to describe the observation more explicitly. Specifically, we now state that the relative differences in GAL activity among genetic codes observed in the low-mutation library were broadly retained in the high-mutation library, although overall activity decreased.
 
 “For the high-mutation library, GAL activity decreased overall, while the relative differences in activity among genetic codes observed in the low-mutation library were broadly retained.”
 
 (7) Figure S7 appears to be an important control for the experiments shown in Figure 5, and I recommend moving it to the main figures.
 
 We thank the reviewer for this helpful suggestion. We agree that the HiBiT-based quantification of GAL protein amount is an important control for interpreting the GAL activity measurements in Fig. 5, and we appreciate the recommendation to increase its visibility. This analysis shows that the amount of C-terminally completed GAL products was broadly comparable among genetic codes, indicating that the large differences in GAL activity were not primarily attributable to differences in total translated protein amount.
 
 After careful consideration, we have opted to retain this analysis in the supplementary figures because the main focus of Fig. 5 is the relationship between mutational cost and mutation-induced activity loss, quantified by the high-/low-mutation activity ratio. The HiBiT experiment addresses a related but distinct question: whether differences in absolute GAL activity among genetic codes can be explained by differences in protein abundance, and we felt that including it in the main figures might shift the emphasis away from the central message of Fig. 5. Nevertheless, we have added a clear reference to Figure 4–figure supplement 1 in the main text and the figure legend to ensure that readers are directed to this control when interpreting Fig. 5.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.02.24.707864v2
www.biorxiv.org www.biorxiv.org

Enteropathogenic E. coli-mediated Fast and Coordinated Ca2+ responses regulate NF-κB activation

4
1. Public_Reviews 04 Jun 2026
 
 in eLife
 
 eLife Assessment
 
 This study reports important advances in our understanding of how enteropathogenic E. coli (EPEC) interacts at the intestinal interface. Compelling data describe a novel model of spatially coordinated calcium signaling to modulate NF-kB activation. These findings, which integrate imaging, genetics, and computational modeling, provide a new way to consider host-pathogen interactions in EPEC infections that may lead to improved therapies.
 
 Summary
2. Public_Reviews 04 Jun 2026
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 In their article, Guo and coworkers investigate the Ca²⁺ signaling responses induced by Enteropathogenic Escherichia coli (EPEC) in epithelial cells and how these responses regulate NF-κB activation. The authors show that EPEC induces rapid, spatially coordinated Ca²⁺ transients mediated by extracellular ATP released through the type III secretion system (T3SS). Using high-speed Ca²⁺ imaging and stochastic modeling, they propose that low ATP levels trigger "Coordinated Ca²⁺ Responses from IP₃R Clusters" (CCRICs) via fast Ca²⁺ diffusion and Ca²⁺-induced Ca²⁺ release. These responses may dampen TNF-α-induced NF-κB activation through Ca²⁺-dependent modulation of O-GlcNAcylation of p65. The interdisciplinary work suggests a new perspective on calcium-mediated immune response by combining quantitative imaging, bacterial genetics, and computational modeling.
 
 Strengths:
 
 The study provides a new concept for host responses to bacterial infections and introduces the concept of Coordinated Ca²⁺ Responses from IP₃R Clusters (CCRICs) as synchronized, whole-cell-scale Ca²⁺ transients with the fast kinetics typical of local events. This is elegantly done by an interdisciplinary approach using quantitative measurements and mechanistic modelling.
 
 Comments on revised version.
 
 The revised version of the manuscript has addressed all my raised points. I'd like to thank the authors for the work they have put into the revision to make this a very compelling publication.
 
 Review 1
3. Public_Reviews 04 Jun 2026
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The authors of this study are trying to resolve how cellular infection by enteropathogenic E. coli (EPEC) subverts cellular signaling pathways to promote infection and dampen immune responses. Specifically, alteration in calcium dynamics has been evidenced in the prior literature as a potential initiator of these adaptions, and this study provides ideas and mechanistic detail as to how cellular calcium dynamics may be subverted by pathogens.
 
 Strengths:
 
 The clear strengths of this paper relate to the new ideas inherent in the proposed hypothesis and their support from the experimental approaches used. Overall, the proposed work provides new ideas in this area, which will benefit from further investigation. Certainly, this is an interesting and challenging paradigm to pick apart mechanistically, and is important for improving treatments from intestinal infections. The authors have provided additional data to clarify and expand on concerns raised during the original review, and these additions are helpful.
 
 Comments on revised version.
 
 Thorough response to original review. No further comments.
 
 Review 2
4. Public_Reviews 04 Jun 2026
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 Public Reviews:
 
 Reviewer #1 (Public review):
 
 Summary:
 
 In their article, Guo and coworkers investigate the Ca²⁺ signaling responses induced by Enteropathogenic Escherichia coli (EPEC) in epithelial cells and how these responses regulate NF-κB activation. The authors show that EPEC induces rapid, spatially coordinated Ca²⁺ transients mediated by extracellular ATP released through the type III secretion system (T3SS). Using high-speed Ca²⁺ imaging and stochastic modeling, they propose that low ATP levels trigger "Coordinated Ca²⁺ Responses from IP₃R Clusters" (CCRICs) via fast Ca²⁺ diffusion and Ca²⁺-induced Ca²⁺ release. These responses may dampen TNF-α-induced NF-κB activation through Ca²⁺-dependent modulation of O-GlcNAcylation of p65. The interdisciplinary work suggests a new perspective on calcium-mediated immune response by combining quantitative imaging, bacterial genetics, and computational modeling.
 
 Strengths:
 
 The study provides a new concept for host responses to bacterial infections and introduces the concept of Coordinated Ca²⁺ Responses from IP₃R Clusters (CCRICs) as synchronized, whole-cell-scale Ca²⁺ transients with the fast kinetics typical of local events. This is elegantly done by an interdisciplinary approach using quantitative measurements and mechanistic modelling.
 
 Weaknesses:
 
 (1) The effect of coordination by fast diffusion for small eATP concentrations is explained by the resulting low Ca2+ concentration that is not as strongly affected by calcium buffers compared to higher concentrations. While I agree with this statement on the relative level, CICR is based on the resulting absolute concentration at neighboring IP3Rs (to activate them). Thus, I do not fully agree with the explanation, or at least would expect to use the modelling approach to demonstrate this effect. Simulations for different activation and buffer concentrations could strengthen this point and exclude potential inhibition of channels at higher stimulation levels.
 
 We fully agree that CICR is determined by the local Ca2+ concentration at each IP3R cluster, not by a global cytosolic average. In our stochastic model, IP3 R clusters are represented as phenomenological entities at discrete spatial sites. Each cluster senses the local Ca2+ concentration at its position, and its stochastic gating depends only on this local [Ca2+] and on [IP3]. Buffers are not included explicitly. Instead, we use an effective Ca2+ diffusion coefficient Deff, which accounts for the effect of endogenous Ca2+ buffers. To reproduce the coordinated low-amplitude Ca2+ responses observed experimentally, we found that we had to use Deff = 100 µm2/s. In the supplementary analysis, we show that an effective diffusion coefficient of this order is indeed plausible for a realistic mixture of mobile and immobile Ca2+ buffers (Supplementary Note 2. Figure 1).
 
 In the revised manuscript, we now provide a supplementary analysis (Supplementary Note 2) to justify this choice. Using an equation to compute the effective diffusion coefficient considering a plausible mixture of mobile and immobile buffers and an explicit reaction–diffusion model, we show that:
 
 - The effective diffusion coefficient of Ca2+ becomes Ca2+ dependent, and
 
 - There exists a regime in which low-amplitude Ca2+ elevations are characterized by an effective diffusion coefficient of Deff = 100 µm2/s and a larger spatial extent than higher-amplitude transients (Supplementary Note 2. Figure 1).
 
 Thus, the value of Deff used in the cluster model is quantitatively consistent with classical buffering theory and with plausible cytosolic buffer mixtures. This provides a mechanistic basis for the observation that small-amplitude, short-lived events can nevertheless produce coordinated signals with large spatial extent and, occasionally, almost immediate activation of IP3R clusters at distant locations in both simulations and experiments.
 
 In this respect, I would also include the details of the modelling, such as implementation environment, parameters, and benchmarking. The description in the Supplementary Methods is very similar to the description in the main text. In terms of reproducibility, it would be important to at least provide simulation parameters, and providing the code would align with the emerging standards for reproducible science.
 
 We apologize for the lack of details of the modelling in the previous submission. In this revised version, we are providing with a full description of the model in the Supplementary Information, Note1.
 
 To address the reviewer’s request for simulations at different activation levels, we now show an additional simulation in which [IP3] is higher (0.1 µM, constant in time and space) and Deff is set to 40 µm2/s (Supplementary Note 3). This lower effective diffusion coefficient is consistent with the stronger buffering and reduced Ca2+ mobility expected for higher-amplitude signals. In this case, the same phenomenological cluster model generates a global Ca2+ response with larger amplitude and longer duration, rather than a loss of activity due to excessive inhibition ((Supplementary Note 3, Figure 1, left panel). The Supplementary Note 3. Figure 1, right panel shows the 2D cell geometry, where dots indicate the random positions of IP3R clusters whose behavior is described by our phenomenological cluster model.
 
 (2) Quantitative characterization of CCRICs:
 
 The paper would benefit from a clearer definition of the term CCRICs and quantitative descriptors like duration, amplitude distribution, frequency, and spatial extent (also in relation to the comment on the EGTA measurements below). Furthermore, it remains unclear to me whether CCRICs represent a population of rapidly propagating micro-waves or truly simultaneous events. Maybe kymographs or wave-front propagation analyses (at least from simulations if experimental resolution is too bad) would strengthen this point.
 
 We agree and completed the description of the CCRICs by adding:
 
 In the Results section, p. 8, l. 27:
 
 “…with a duration of 2.1 ± 1.0 sec (mean ± SEM) (N = 4, 128 responses)”. p. 9, l. 13:
 
 “In rare instances (less than 3%), typical local “Puff” responses elicited by these ATP concentrations could also be detected often occurring at the cell periphery (Figs. 4B, red region and 4C, red arrow; Fig. S6D, blue trace) (N > 20, cells > 500). As expected from the small concentrations of Ca2+ released at puff sites, no increase in cytosolic Ca2+ was detected in a distal cell region (Fig. S6D, top), indicating that isotropic Ca2+ diffusion from a puff release site cannot account for Ca2+ increase over large cell area. Puffs could also be detected concomitantly with CCRICs in different ROIs of the same cell (Fig. S6D, bottom). In contrast to puffs, CCRICs often showed responses of comparable amplitude in distal regions over the whole cell (Figs. 4C and S6A, B), suggesting the contribution from IP3R cluster activation by Ca2+-Induced Ca2+ Release (CICR). Within a given cell, the vast majority of CCRICs appeared quasi-synchronized at the fatest acquisition rate of 22 ms / frame that we could achieve. However, in few instances a delay could be detected in the elicitation of a peak in distant region of a cell (Fig. S6C). These observations suggest that the quasi-synchronization of CCRICs result from the fast diffusion of Ca2+ leading to the activation of IP3R clusters over large cell area, which may be delayed in a some instances. Scrutinizing of CCRICs showed that while their profiles were comparable, the amplitude of these responses varied in different regions of the cell, with often a single 1 µm2 region, likely corresponding the initial firing cluster, showing a prominent amplitude and other regions with smaller amplitude for a given response (Figs. 4B and 4C). For example, in Fig. 4C, the highest amplitude is observed in the red region for peaks 1 and 3, whereas it is observed and in the purple region for peak 2. Thus, for a given CCRIC, the respective contribution of local IP3R cluster activation and isotropic diffusion of Ca2+from other release sites in Ca2+ increase may vary in different regions of the cell”.
 
 In the Discussion section, 2nd sentence p. 12:
 
 “CCRICs showed rapid kinetics with an average duration of ca 2.1 seconds and amplitude corresponding to an increase in Ca2+ cytosolic concentration of a few hundreds nM, seemingly smaller than that of puffs (Fig. S6D), often occurring repeatedly with a frequency of up to 12 CCRICs / min over the whole cell.”
 
 We have tried to clarify the notion of coordination versus synchronization of CCRICs by showing the delay observed in some instances in the elicitation of CCRICs at distal regions of the cell, now illustrated shown in Fig S6C.
 
 (3) Specificity of pharmacological tools:
 
 Suramin and U73122 are known to have off-target effects. Control experiments using alternative P2 receptor antagonists like PPADS or inactive U73343 analogs would strengthen the causal link.
 
 As suggested by the referee, we have performed complementary experiments showing the inhibitory effects of PPADS and absence of effects of U73343 on EPEC-induced Ca2+ responses including CCRICs now shown in the amended Fig. S2.
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The authors of this study are trying to resolve how cellular infection by enteropathogenic E. coli (EPEC) subverts cellular signaling pathways to promote infection and dampen immune responses. Specifically, alteration in calcium dynamics has been evidenced in the prior literature as a potential initiator of these adaptations, and this study provides ideas and mechanistic detail as to how cellular calcium dynamics may be subverted by pathogens.
 
 Strengths:
 
 The clear strengths of this paper relate to the new ideas inherent in the proposed hypothesis and their support from the experimental approaches used. Overall, the proposed work provides new ideas in this area, which will benefit from further investigation. Certainly, this is an interesting and challenging paradigm to pick apart mechanistically, and is important for improving treatments from intestinal infections.
 
 Weaknesses:
 
 Additional insight is needed in three specific areas to convincingly support the conclusions drawn by the authors. These three areas are: first, a better description of the infection-associated calcium signals. Second, a mechanistic definition of the relevant purinoceptors versus other pathways to increase cellular calcium. Third, an effort to show that the proposed pathways have relevance in a polarized epithelial cell.
 
 (1) first, a better description of the infection-associated calcium signals.
 
 We agree and have added a more detailed description of the CCRICs in the results and discussion section, as detailed in response to referee 1, Weakness 2 by adding:
 
 In the Results section, p. 8, l. 27:
 
 “…with a duration of 2.1 ± 1.0 sec (mean ± SEM) (N = 4, 128 responses)”. p. 9, l. 13:
 
 “In rare instances (less than 3%), typical local “Puff” responses elicited by these ATP concentrations could also be detected often occurring at the cell periphery (Figs. 4B, red region and 4C, red arrow; Fig. S6D, blue trace) (N > 20, cells > 500). As expected from the small concentrations of Ca2+ released at puff sites, no increase in cytosolic Ca2+ was detected in a distal cell region (Fig. S6D, top), indicating that isotropic Ca2+ diffusion from a puff release site cannot account for Ca2+ increase over large cell area. Puffs could also be detected concomitantly with CCRICs in different ROIs of the same cell (Fig. S6D, bottom). In contrast to puffs, CCRICs often showed responses of comparable amplitude in distal regions over the whole cell (Figs. 4C and S6A, B), suggesting the contribution from IP3R cluster activation by Ca2+-Induced Ca2+ Release (CICR). Within a given cell, the vast majority of CCRICs appeared quasi-synchronized at the fatest acquisition rate of 22 ms / frame that we could achieve. However, in few instances a delay could be detected in the elicitation of a peak in distant region of a cell (Fig. S6C). These observations suggest that the quasi-synchronization of CCRICs result from the fast diffusion of Ca2+ leading to the activation of IP3R clusters over large cell area, which may be delayed in a some instances. Scrutinizing of CCRICs showed that while their profiles were comparable, the amplitude of these responses varied in different regions of the cell, with often a single 1 µm2 region, likely corresponding the initial firing cluster, showing a prominent amplitude and other regions with smaller amplitude for a given response (Figs. 4B and 4C). For example, in Fig. 4C, the highest amplitude is observed in the red region for peaks 1 and 3, whereas it is observed and in the purple region for peak 2. Thus, for a given CCRIC, the respective contribution of local IP3R cluster activation and isotropic diffusion of Ca2+ from other release sites in Ca2+ increase may vary in different regions of the cell” In the Discussion section, 2nd sentence p. 12:
 
 “CCRICs showed rapid kinetics with an average duration of ca 2.1 seconds and amplitude corresponding to an increase in Ca2+ cytosolic concentration of a few hundreds nM, seemingly smaller than that of puffs (Fig. S6D), often occurring repeatedly with a frequency of up to 12 CCRICs / min over the whole cell.”
 
 We have tried to clarify the notion of coordination versus synchronization of CCRICs by showing the delay observed in some instances in the elicitation of CCRICs at distal regions of the cell, now illustrated shown in Fig S6C.
 
 CRICCs are observed over the whole cell or very large cell area. We agree that this point as well as comparison with previously described puffs needed clarification. We have added the following sentences in the discussion and inserted the seminal Thomas et al. 1999 citation in the references, p. 13, l. 18:
 
 “Consistently, while CRICCs were detected in the vast majority of cells at these very low agonist concentrations, in rare instances, local “puff-like” responses were also detected at the cell periphery. These observations are in contrast to previously described Ca2+ puffs preceding global responses reported to occur preferentially in perinuclear area (Thomas et aL., 1999). These earlier studies, however, involved higher agonist concentrations (1-5 µM ATP) expected to lead to the release of higher IP3 concentrations, which may preferentially stimulate larger IP3R clusters at the perinuclear region because of the higher density of IP3 Rs. In addition, larger IP3 clusters may release higher amounts of Ca2+ for which, as opposed to CCRICs, diffusion would be restrained by Ca2+ buffers thereby favoring the spatial confinement of the response. “
 
 (2) Second, a mechanistic definition of the relevant purinoceptors versus other pathways to increase cellular calcium
 
 We do not believe that CCRICs are specific to EPEC, since they are also elicited by low agonist concentrations. The discrete action of Type III translocons leading to the release of small amounts of extracellular ATP at the onset of EPEC prompted us to perform fast Ca2+ imaging at low agonists concentrations (150 nM ATP, 100 nM histamine now shown in Fig. S4), which to our knowledge, differ from higher agonist concentrations used in all previous studies describing puffs. Our modelling studies support the notion that CCRICs correspond to generic Ca2+ release-dependent responses triggered by low levels of IP3.
 
 We now show inhibition of CCRICs by PPADS, another purinergic receptor antagonist, and extracellular ATP depletion by addition of hexokinase in the extracellular medium in Figs. S4 and S7.
 
 Knocking down ATP receptors represents a challenging task since HeLa cells were shown to express transcripts for most of the described 8 P2Xs and 7 P2Ys purinergic receptors (10.1016/j.bbamem.2009.03.006). Mostly, we do not believe that CCRICs are triggered by a specific ATP receptor and do not expect to see inhibition of CCRICs in single knock-down experiments. Our experimental and modelling studies suggest that CCRICs are not specific to EPEC nor to a particular ATP receptor, but instead correspond instead to generic Ca2+ elicited at low agonist concentrations such as ATP or histamine.
 
 Zhong et al., 2020 indeed previously showed a role for Ca2+ influx mediated by the TRPV2 receptor in EPEC-mediated cell death. However, this influx occurred following 8 hours of cell infection with EPEC. We do not detect significant cell death or Ca2+ influx at the onset of infection corresponding to the 12 hours infection kinetics that we used. Our experiments indicate that CCRICs do not involve Ca2+ influx.
 
 (3) Third, an effort to show that the proposed pathways have relevance in a polarized epithelial cell.
 
 We agree and have performed complementary experiments showing induction of CCRICs by EPEC and eATP in polarized intestinal epithelial cells, now shown in Figure S8.
 
 Recommendations for the authors:
 
 Reviewer #1 (Recommendations for the authors):
 
 (1) Statistical treatment and data presentation:
 
 Some figure legends lack clarity on replicates (n = cells vs N = independent experiments). Timecourse quantifications of p-IκB and p-p65 should include normalized fold-change plots with clear statistical tests.
 
 To clarify, we replaced “n” by “cells”. The number of determinations and independent experiments (N) has been added in the legends to all relevant Figures and Supplementary Figures.
 
 As requested, we now show the p-IκB and p-p65 plots as plots normalized to basal p-IκB and p-p65 levels. We mentioned in legend to Fig. 6 that we used an ANCOVA test showing significance of the effects of eATP on TNF-∝-induced IκB- and p65 phosphorylation.
 
 (2) Clarification on the temperature used in imaging (why measured at 35{degree sign} C)?
 
 We have added the following clarification in the Materials and Methods section p. 14, l. 21:
 
 “Imaging was then carried out at 35°C to allow for bacterial type III secretion, …”
 
 (3) Figure 4A:
 
 The image shows a lower image acquisition interval than every 2s that is stated in the caption.
 
 We apologize for the mistake. The legend to Fig. 4A now reads:
 
 “Image acquisition every 52 ms (A)…”
 
 (4) Figure 4B:
 
 The color of ROIs could be more intense for better identification.
 
 We have replaced the colors of blue and green ROIs, by light cyan and purple ROIs
 
 (5) Figure 4c:
 
 I don't understand the meaning of the dashed lines described by "The dashed red and green lines point at the aggregation of responses throughout the cell" in the caption or in the text.
 
 We apologize for the lack of clarity and have re-written the corresponding text p. 9, l.25 as follows:
 
 “Scrutinization of CCRICs showed that while their profiles were comparable, the amplitude of these responses varied in different regions of the cell, with often a ca 3 µm2 single region, likely corresponding to a source point release, showing a prominent amplitude and other regions with smaller amplitude for a given response (Figs. 4B and 4C). For example, in Fig. 4C, the highest amplitude is observed in the red region for peaks 1 and 3, whereas it is observed and in the purple region for peak 2. Thus, for a given CCRIC, the respective contribution of local IP3R cluster activation and isotropic diffusion of Ca2+ from other release sites in Ca2+ increase may vary in different regions of the cell.”
 
 (6) Figure S4A:
 
 The responses for EGTA are not really pointed out. Are the traces meant to show events?
 
 We have added arrowheads in traces corresponding to ATP + EGTA-AM treatment pointing at “flattened Ca2+ responses”. The Legend to Fig. S4A now includes the sentence: “ATP + EGTA-AM treatment led to an inhibition of Ca2+ responses, associated with small variations in the Ca2+ baseline, that were arbitrarily scored as flattened Ca2+ pseudo-responses (ATP+EGTA-AM, red arrows).”
 
 (7) Figure S5:
 
 Could not identify the purple arrow for the less mobile cluster.
 
 We agree that the former Figure lacked clarity and have remade Figure S5, now Figure S6, with higher magnification of panels with fast acquisition. The previously purple arrows pointing at larger and less mobile clusters are now shown in black in these enlarged panels. The legend has been changed accordingly.
 
 (8) There are some typos and suboptimal formulations throughout the manuscript, such as:
 
 P8: "minute amount" could be changed to low, minor or similar.
 
 “minute” amounts of eATP was replaced by “low amounts of eATP”.
 
 P8: put a "%" to the numbers 61.2 {plus minus} 5.8.
 
 “%” was added.
 
 P16: "manuscript".
 
 Thank you.
 
 Reviewer #2 (Recommendations for the authors):
 
 Suggestions relate to the following three topics.
 
 First, a better description of the infection-associated calcium signals. The authors emphasize throughout the paper that their imaging data challenge established concepts in the calcium signaling field (discussion). I do not see the calcium imaging data explained either with data or textually with sufficient clarity to evaluate this assertion. A start would be a clear description of the characteristics of the EPEC-evoked calcium signals relative to other local and global domains of calcium signaling previously described in HeLa cells. Prior work has shown that PI-coupled agonists evoke local calcium signals that are perinuclear in HeLa cells (PMID: 10660296), but the relationship of EPEC-evoked transients to these previously defined responses is not clear.
 
 We agree and have added a more detailed description of the CCRICs in the results and discussion section, as detailed in response to referee 1, Weakness 2.
 
 Most importantly, it is ambiguous where in the HeLa cell recordings are made. Are these recordings close to the plasma membrane and/or deeper within the cell? The only spatial information is provided in Figure 3A, and these responses are not well described in the text or presented in a way that comparisons can be made to responses from a PI-coupled agonist.
 
 CRICCs are observed over the whole cell or very large cell area. We agree that this point as well as comparison with previously described puffs needed clarification. We have added the following sentences in the discussion and inserted the seminal Thomas et al. 1999 citation in the references, p. 13, l. 18:
 
 “Consistently, while CRICCs were detected in the vast majority of cells at these very low agonist concentrations, in rare instances, local “puff-like” responses were also detected at the cell periphery. These observations are in contrast to previously described Ca2+ puffs preceding global responses reported to occur preferentially in perinuclear area (Thomas et aL., 1999). These earlier studies, however, involved higher agonist concentrations (1-5 µM ATP) expected to lead to the release of higher IP3 concentrations, which may preferentially stimulate larger IP3R clusters at the perinuclear region because of the higher density of IP3Rs. In addition, larger IP3 clusters may release higher amounts of Ca2+ for which, as opposed to CCRICs, diffusion would be restrained by Ca2+ buffers thereby favoring the spatial confinement of the response. “
 
 If I understand the described responses correctly, could not these rapid local responses result from a change in cellular calcium buffering capacity consequent to infection? Are the authors proposing that these responses occur in other cells also, or represent a pathogen-specific signaling mode?
 
 We do not believe that CCRICs are specific to EPEC, since they are also elicited by low agonist concentrations. The discrete action of Type III translocons leading to the release of small amounts of extracellular ATP at the onset of EPEC prompted us to perform fast Ca2+ imaging at low agonists concentrations (150 nM ATP, 100 nM histamine now shown in Fig. S4), which to our knowledge, differ from higher agonist concentrations used in all previous studies describing puffs. Our modelling studies support the notion that CCRICs correspond to generic Ca2+ release-dependent responses triggered by low levels of IP3.
 
 Second, evidence supporting a mechanistic role of ATP comes from prior literature, together with the authors' presented data showing the effects of PLC (to inhibit IP3), pharmacological inhibition (suramin, a non-selective purinoceptor blocker), and the effects of T3SS-deficient mutants (to prevent ATP release). However, there are missing steps here to mechanistically identify how ATP is working. First, does degradation of extracellular ATP (e.g., apyrase) block these responses? Second, given HeLa cells are easily amenable to knockdown approaches, does knockdown of particular ATP receptors, or TRPV2 as suggested in the prior literature, impact the calcium signal dynamics?
 
 We now show inhibition of CCRICs by PPADS, another purinergic receptor antagonist, and extracellular ATP depletion by addition of hexokinase in the extracellular medium in Figs. S4 and S7.
 
 Knocking down ATP receptors represents a challenging task since HeLa cells were shown to express transcripts for most of the described 8 P2Xs and 7 P2Ys purinergic receptors (10.1016/j.bbamem.2009.03.006). Mostly, we do not believe that CCRICs are triggered by a specific ATP receptor and do not expect to see inhibition of CCRICs in single knock-down experiments. Our experimental and modelling studies suggest that CCRICs are not specific to EPEC nor to a particular ATP receptor, but instead correspond instead to generic Ca2+ elicited at low agonist concentrations such as ATP or histamine.
 
 Zhong et al., 2020 indeed previously showed a role for Ca2+ influx mediated by the TRPV2 receptor in EPEC-mediated cell death. However, this influx occurred following 8 hours of cell infection with EPEC.
 
 We do not detect significant cell death or Ca2+ influx at the onset of infection corresponding to the 12 hours infection kinetics that we used. Our experiments indicate that CCRICs do not involve Ca2+ influx.
 
 Third, while the use of HeLa cells provides advantages for imaging and mechanistic assays, the effort to replicate findings in an intestinal cell line would heighten relevance, given the likely importance of cell type and cell polarity on the pathogen-evoked responses.
 
 We agree and have performed complementary experiments showing induction of CCRICs by EPEC and eATP in polarized intestinal epithelial cells, now shown in Figure S8.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.08.06.668902v2
www.biorxiv.org www.biorxiv.org

Boosting Hyperalignment Performance with Age-Specific Templates

4
1. Public_Reviews 04 Jun 2026
 
 in eLife
 
 eLife Assessment
 
 This valuable study advances our understanding of best practices for analyzing population-level data using advanced functional alignment methods. It provides convincing evidence that demographic-specific functional templates improve functional neuroimaging studies that use hyperalignment. This study will be of interest to cognitive neuroscientists, neuroimaging methodologists, and computational researchers with an interest in the human brain.
 
 Summary
2. Public_Reviews 04 Jun 2026
 
 in eLife
 
 Reviewer #1 (Public review):
 
 The authors present a compelling case for the necessity of age-specific templates in functional hyperalignment. Given that the brain undergoes substantial developmental, structural, and functional changes across the lifespan, a 'one-size-fits-all' canonical template is often insufficient. This study effectively demonstrates that incorporating age-congruent features significantly enhances the performance and sensitivity of hyperalignment models. By validating these findings across two independent datasets (Cam-CAN and DLBS), the paper provides robust evidence that accounting for age-related functional organization is a critical prerequisite for accurate functional alignment in lifespan research
 
 Comments on revised version:
 
 The authors have been exceptionally thorough in addressing the concerns raised by the reviewers. In particular, the inclusion of the supplemental analysis on the middle-aged cohort is a valuable addition that strengthens the manuscript. Furthermore, the rationale for employing a congruent template is well-articulated; this approach clearly provides a more robust and accurate foundation for reconstructing individualized connectomes. I appreciate the authors' detailed responses and have no further comments.
 
 Review 1
3. Public_Reviews 04 Jun 2026
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 In this study, Zhang and colleagues examine the role of participant selection in creating and using functional templates to improve analyses using hyperalignment. Hyperalignment aligns participants' functional MRI data to a shared functional template, analogous to the anatomical templates used to bring anatomical MRI data into a shared space (e.g., MNI152). The question of appropriate template creation is especially pressing for population-level analyses, where a large number of demographic groups (e.g., different age ranges, clinical statuses) may be included in the same analysis. These different demographic groups may have differences in their functional organization that complicate the creation of a single study-specific functional template.
 
 To provide an initial investigation of the potential effect of demographic-specific templates, the authors use the publicly available Cam-CAN dataset which contains participants from 18 to 87 years of age. They define a young adult (< 45 years of age) and an older adult group (> 65 years of age) from this dataset with approximately the same number of participants. They investigate whether "age-congruent" templates (i.e. defined in the same age group they are used) improve three analyses where hyperalignment has been previously shown to boost performance: inter-subject correlation, predicting individual connectomes, and predicting individual functional responses. Using the Cam-CAN derived older adult template, they then replicate the ISC analyses using the publicly available Dallas Lifespan Brain Study (DLBS).
 
 Overall, the presented results are highly suggestive that age-congruent templates consistently improve performance, though the absolute effects are small.
 
 Strengths:
 
 The use of a separate validation sample-re-using the same template calculated with Cam-CAN-highlights the potential of developing independent templates for individual demographic groups and then distributing these for wider use, analogous to the MNI templates that are widely used throughout the field of neuroimaging. This suggests that the potential impact of this framework is significant.
 
 Weaknesses:
 
 In their revision, the authors have addressed the previously raised "weaknesses" by providing guidance for researchers interested in using age-specific hyperalignment templates in practice.
 
 Impact:
 
 Overall, this work is likely to encourage future development of age-specific functional templates in the imaging community.
 
 Review 2
4. Public_Reviews 04 Jun 2026
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 Public Reviews:
 
 Reviewer #1 (Public review):
 
 Summary:
 
 The authors present a compelling case for the necessity of age-specific templates in functional hyperalignment. Given that the brain undergoes substantial developmental, structural, and functional changes across the lifespan, a 'one-size-fits-all' canonical template is often insufficient. This study effectively demonstrates that incorporating age-congruent features significantly enhances the performance and sensitivity of hyperalignment models. By validating these findings across two independent datasets (Cam-CAN and DLBS), the paper provides robust evidence that accounting for age-related functional organization is a critical prerequisite for accurate functional alignment in lifespan research.
 
 Strengths:
 
 (1) The authors used three metrics to evaluate performance. Across all metrics, they found that age-congruent templates outperformed age-incongruent templates, suggesting that age-specific templates can improve alignment.
 
 (2) These findings highlight the superiority of age-congruent templates for hyperalignment. This work underscores the importance of age-matching in cross-subject functional mapping and represents a vital step forward for the methodology.
 
 We thank the reviewer for the summary and the positive evaluation of our manuscript.
 
 Weaknesses:
 
 (1) Participant Demographics and Group Separation:
 
 The study defines the 'older' cohort as 65-90 years and the 'younger' cohort as 18-45 years. While this 20-year gap (ages 46-64) effectively maximizes the contrast between groups, the results in Figure 4a suggest that the predicted individualized connectomes follow a continuous distribution. Given this continuity, could the authors provide the average median trends for Figures 2a and 2b to illustrate how the model behaves across the missing age range?
 
 Thanks for raising this important point. We had calculated the results for the middle-aged cohort template and have included them in the Supplementary Figures 4 & 5. Similar to Figure 2a, 2b, 3a and 3b, we directly compare the intersubject correlation and prediction performance of the middle-aged participants when aligned to their congruent middle-aged template versus an incongruent template. We observed consistent results across validation analyses (ISC and prediction) and groups (young vs. middle-aged, middle-age vs. old). Consistent with our main findings, the middle-aged cohort exhibits significantly higher intersubject correlation and prediction performance when using the age-congruent middle-age template. These results confirm that the age-related shifts in functional brain organization captured by the hyperalignment templates follow a continuous trajectory across the lifespan.
 
 (2) Request for Implementation:
 
 I have been unable to locate the source code associated with this publication. Could the authors please provide a link to the repository or clarify if the implementation is available for reproduction?
 
 We have made our scripts public in GitHub and here’s the link: https://github.com/yuqi98/Aging_templates_scripts
 
 (3) Analysis of Prediction Performance and Distribution:
 
 While Figures 3b and 5b clearly demonstrate that the congruent template improves correlation, Figure 4a shows a distinct shift in the scatter distribution. Could the authors provide a detailed explanation of the prediction performance metrics used? Specifically, I would like to understand how the underlying method accounts for the distribution differences observed when applying the congruent template.
 
 Our prediction performance metric is the average Pearson correlation. We calculated the correlation between the model-predicted data (the individualized connectome in Figure 3 and the movie response in Figure 5) and the participant's actual measured data for each cortical vertex and averaged the correlations across vertices. A higher correlation indicates that the group template, when combined with the participant’s individualized transformation matrix, more accurately reconstructs the individualized functional connectome and responses to stimuli.
 
 The distinct upward shift in prediction performance when using a congruent template occurs because brain functional organization shows age-specific features. A congruent template captures these age-specific connectivity and response features. Importantly, the template creation algorithm aims to reflect the central tendency of the training data, including representational/connectivity geometry and functional topographies. Therefore, the observed differences in templates reflect differences in functional organization across age groups. As a result, when projecting the common template back into an individual’s native cortical space using the transformation matrix derived from independent data, the congruent template provides a richer, more accurate basis for reconstructing the individualized connectome and movie-watching responses.
 
 Reviewer #2 (Public review):
 
 Summary:
 
 In this study, Zhang and colleagues examine the role of participant selection in creating and using functional templates to improve analyses using hyperalignment. Hyperalignment aligns participants' functional MRI data to a shared functional template, analogous to the anatomical templates used to bring anatomical MRI data into a shared space (e.g., MNI152). The question of appropriate template creation is especially pressing for population-level analyses, where a large number of demographic groups (e.g., different age ranges, clinical statuses) may be included in the same analysis. These different demographic groups may have differences in their functional organization that complicate the creation of a single study-specific functional template.
 
 To provide an initial investigation of the potential effect of demographic-specific templates, the authors use the publicly available Cam-CAN dataset, which contains participants from 18 to 87 years of age. They define a young adult (< 45 years of age) and an older adult group (> 65 years of age) from this dataset with approximately the same number of participants. They investigate whether "age-congruent" templates (i.e. defined in the same age group they are used) improve three analyses where hyperalignment has been previously shown to boost performance: inter-subject correlation, predicting individual connectomes, and predicting individual functional responses. Using the Cam-CAN-derived older adult template, they then replicate the ISC analyses using the publicly available Dallas Lifespan Brain Study (DLBS).
 
 Overall, the presented results are highly suggestive that age-congruent templates consistently improve performance, though the absolute effects are small.
 
 Strengths:
 
 The use of a separate validation sample, reusing the same template calculated with Cam-CAN, highlights the potential of developing independent templates for individual demographic groups and then distributing these for wider use, analogous to the MNI templates that are widely used throughout the field of neuroimaging. This suggests that the potential impact of this framework is significant.
 
 We thank the reviewer for the summary and the positive evaluation of our manuscript.
 
 Weaknesses:
 
 While the authors appropriately highlight the potential applications of this result (e.g., to different clinical statuses), it is not apparent how to appropriately extend this methodology to many common experimental paradigms. For example, in case-control studies (where researchers are interested in comparing clinical and non-clinical participants) the use of two different functional templates may complicate rather than ease analyses. Providing this as a potential limitation of the current template construction method, or providing recommendations to researchers interested in comparing across groups, would help to increase the impact of this work.
 
 We appreciate the reviewer raising this important practical consideration. We have added additional explanation to the Discussion section to provide clear recommendations for researchers applying this methodology, which we summarize below:
 
 When the goal of a case-control study is to directly compare functional organization or brain responses between clinical and non-clinical participants, it is essential that all individuals are hyperaligned to the same common template. For these analyses, researchers should either construct a joint template containing a balanced, representative sample from both groups, or align all participants to a normative control template. This ensures that the resulting data share a single coordinate system, allowing for valid statistical comparisons between groups.
 
 However, disease-specific or age-specific templates are highly advantageous when the research objective is to maximize decoding accuracy or predictive performance within a specific population. In real world clinical or lifespan research, if the goal is to build a reliable diagnostic biomarker for disease progression or map individualized connectomes for a specific patient's cohort, researchers should use a template congruent with that specific group. The congruent template will preserve the group-specific representational geometry, providing a better individual-level prediction than a general cortical template.
 
 Recommendations for the authors:
 
 Reviewer #2 (Recommendations for the authors):
 
 In general, there appears to be significantly more spread in the values for older adults (e.g., Figure 4b). It would be useful to know whether subdividing this group improves its relative performance; however, this will likely require additional investigation into the number of participants needed to establish a minimal template.
 
 We thank the reviewer for this constructive comment. We agree that older adults exhibit greater inter-individual variability in functional organization, which likely drives the larger spread observed in Figure 4b. We also appreciate the suggestion to subdivide this group to see if narrower age bins improve relative performance.
 
 We have constructed templates using narrower, 10-year age intervals and evaluated their performance. Because model performance increases with the amount of training data, we use a fixed number of training participants for each age group (two thirds of the people from the group with the minimal number of people) to build the templates to make a fair comparison. We have added the results in the Supplementary Figure 6. The results show a continuous gradient of age-related divergence. When predicting data for the 80–90 cohort, the 20–30 template performs the worst and the performance steadily improves as the template age gets closer to the target demographic. This systematic gradient further supports our main finding: the penalty for using an incongruent template increases with the discrepancy between the template age and participant age.
 
 Interestingly, we noticed that at the extreme ends of the age range (20–30 and 80–90), the strictly congruent template was slightly outperformed by the immediately adjacent age bin (i.e., the 30–40 template for young participants, and the 70–80 template for the oldest participants). Because we strictly matched the number of training subjects across all bins, this slight dip is likely driven by differences in raw data quality. It is common for fMRI data from the extreme ends of the lifespan to have slightly lower signal-to-noise ratios or higher head motion compared to the intermediate 30–40 or 70–80 cohorts. This suggests that while age congruency is a key driver of hyperalignment success, the intrinsic data quality of the cohort used to build the template also plays a practical role in its overall performance.
 
 This brings up the reviewer’s second point regarding the number of participants needed to establish a minimal template. Subdividing the age groups reduces the sample size available to construct each template. Previous research has demonstrated that while a hyperalignment template derived from a relatively small number of participants can achieve acceptable performance, increasing the amount of data and the number of subjects in the template space consistently and robustly improves alignment quality (See Supplementary Figure 7 in Feilong et al., 2023). Ultimately, our long-term goal is to build highly robust, standardized templates for fine-grained age cohorts across the entire lifespan. We are preparing to collect large-scale datasets from age 20 to 100 to build age-specific templates and provide them as open resources. This will allow future researchers to directly align their data to an age-appropriate template without needing to construct one from their own limited samples.
 
 Reference
 
 Feilong, M., Nastase, S. A., Jiahui, G., Halchenko, Y. O., Gobbini, M. I., & Haxby, J. V. (2023). The individualized neural tuning model: Precise and generalizable cartography of functional architecture in individual brains. Imaging Neuroscience, 1, 1–34. https://doi.org/10.1162/imag_a_00032
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.02.19.639148v3
www.biorxiv.org www.biorxiv.org

Tunable Bessel beam two-photon fluorescence microscopy for high-speed volumetric imaging of brain dynamics

5
1. Public_Reviews 04 Jun 2026
 
 in eLife
 
 eLife Assessment
 
 This important study substantially advances the imaging toolbox available to neuroscientists by presenting a tunable Bessel (tBessel-TPFM) platform that enables high-speed volumetric two-photon imaging. The evidence supporting the novel methodology is convincing, with rigorous benchmarking and demonstrations of a wide range of neuroimaging applications covering vascular dynamics, neurovascular coupling, optogenetic perturbation, and microglial responses. The work will be of broad interest to neuroscientists and imaging system tool developers.
 
 Summary
2. Public_Reviews 04 Jun 2026
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 This manuscript presents a tunable Bessel-beam two-photon fluorescence microscopy (tBessel-TPFM) platform that enables high-speed volumetric imaging with stable axial focus. The work is technically strong and broadly significant, as it substantially improves the flexibility and practicality of Bessel-beam-based two-photon microscopy. The demonstrations are generally strong and bridge a wide range of neuroimaging applications, namely vascular dynamics, neurovascular coupling, optogenetic perturbation, and microglial responses. These convincingly show that the approach enables biological measurements that are difficult or impractical with existing methods.
 
 The evidence supporting the technical and biological claims is generally strong. The optical design is carefully motivated, clearly described, and validated through a combination of simulations and experimental characterization. The biological applications are diverse and well chosen to highlight the strengths of the proposed method, and the data are of high quality, with appropriate controls and comparative measurements where relevant.
 
 Strengths:
 
 (1) The optical innovation addresses a well-recognized limitation of existing Bessel-TPFM implementations, namely axial focus drift during tuning, and does so using a relatively simple, light-efficient, and cost-effective design.
 
 (2) The manuscript provides convincing experimental evidence for this being a versatile platform to map flow dynamics across diverse vessel sizes and orientations in both healthy and pathological states.
 
 (3) Biological demonstrations are comprehensive and span multiple domains such as hemodynamics, neurovascular coupling, and neuroimmune responses.
 
 (4) Quantitative analyses of blood flow across vessel sizes and orientations, including kilohertz line scanning, are particularly compelling and clearly beyond the reach of standard Gaussian TPFM.
 
 (5) Particular advantages are that higher blood slow speeds become measurable up to 23mm/sec (20x more than conventional frame scanning), and that simultaneous (Bessel-)imaging and (Gaussian-)perturbation are possible because of the stable axial focus.
 
 Weaknesses:
 
 (1) At present, the paper does not properly position the new Bessel-beam method against previous work, and fails to compare it to alternative fast volumetric imaging methods without Bessel beams.
 
 (2) The cost-effectiveness of the proposed method is not well described or supported by evidence; it would be useful to include more detail or remove this claim.
 
 (3) Some biological conclusions, e.g., regarding novel features of microglial dynamics (i.e., the observed two-wave responses and coordinated extension-retraction), are based on relatively limited sample size and would benefit from clearer discussion of variability across animals and fields of view.
 
 (4) The use of neural network-based denoising for microglial imaging is reasonable but introduces potential concerns about trustworthiness; additional clarification of validation or failure modes would strengthen confidence in these results.
 
 To conclude, most of the authors' claims are well supported by the data. The central conclusion, namely that tBessel-TPFM provides tunable volumetric imaging enabling experiments not feasible with existing two-photon approaches, is justified. Some biological interpretations would benefit from a more cautious framing, but they do not undermine the main technical and methodological contributions of the study. This is a strong and technically rigorous manuscript that makes a substantial methodological advance with clear relevance to neuroscience and intravital imaging. Minor clarifications and a slightly more measured discussion of certain biological findings are recommended.
 
 Review 1
3. Public_Reviews 04 Jun 2026
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The authors describe a tunable Bessel beam two-photon microscope (tBessel-TPFM) designed to overcome a common limitation of Bessel-based volumetric imaging: axial shifts of the effective focus during Bessel beam parameter tuning. Their optical design allows independent control of axial beam length and resolution while keeping the axial center fixed. This is extensively validated through simulations and experiments.
 
 Strengths:
 
 A major strength of the work is the breadth of validation combined with the level of technical detail provided. The authors carefully characterize the optical performance of the system and clearly explain the design choices and underlying derivations, which will make it easier for others to understand and implement. The authors demonstrate the utility of the method across several in vivo applications, including neurovascular imaging, blood flow measurements, optogenetic stimulation, and microglial dynamics.
 
 Weaknesses:
 
 In the in vivo demonstrations, the authors employ different Bessel beam configurations across experiments, but the beam parameters are not dynamically tuned during live imaging. A video example showing continuous or interactive tuning of the Bessel beam within a single in vivo imaging sequence would further highlight the practical advantages of this platform and strengthen the case for its potential applications. In addition, while excitation powers are reported, the manuscript does not place these values in the broader context of known photodamage thresholds for two-photon microscopy, which would be helpful to the readers. Denoising/image restoration are applied in one of the in vivo examples, but it is unclear why this step was used specifically for this dataset and whether it was necessary to achieve adequate SNR or primarily included as an additional demonstration.
 
 Review 2
4. Public_Reviews 04 Jun 2026
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 The manuscript presents an elegant and cost-effective approach for generating a tunable Bessel beam on a conventional two-photon microscope. The authors assemble a compact optical module comprising three axicons and a series of lenses that permits rapid adjustment of both lateral resolution and axial extent without modifying the focal plane. This flexibility enables the system to be readily adapted to a variety of biological preparations. As a proof of concept, the authors employ the device to record blood flow velocities in cortical microcapillaries, arterioles, and venules, thereby directly visualizing vasodilatation and vasoconstriction dynamics and permitting quantitative analysis of neurovascular coupling across cortical layers in awake mice.
 
 The authors demonstrate that the tunability of the Bessel beam can be exploited to match the numerical aperture to the vessel type: a high NA configuration, albeit slower scan, is optimal for resolving flow in capillaries, whereas a low NA setting provides faster acquisition suitable for arterioles and venules. By implementing a one-dimensional line scan with the Bessel beam, they achieve an imaging speed that is twentyfold faster than conventional frame-by-frame scanning, which proves sufficient to capture hemodynamic transients before and after an induced ischemic stroke.
 
 In addition to pure observation, the authors integrate a co-propagating Gaussian line to the system, allowing simultaneous imaging and photostimulation within the same focal plane. This capability addresses a common limitation of other Bessel beam implementations, in which the observation and perturbation planes often become misaligned when the Bessel beam is altered. The manuscript also emphasizes the advantage of Bessel beam excitation for calcium imaging after a perturbation, because it captures neuronal activity in planes both above and below the nominal focal plane, signals that would be missed with a standard Gaussian focus. Finally, the authors apply the technique to investigate the neuroimmune response following targeted microglial ablation; they report that adjacent microglia extend processes toward the injury site while retracting processes in the opposite direction.
 
 Overall, the work offers a technically straightforward yet powerful extension to existing two-photon platforms, providing high-speed, volumetric imaging and stimulation capabilities that are well-suited to a broad range of neurovascular and neuroimmune studies. The experimental validation is quite thorough, and the presented data convincingly illustrates the benefits of the approach.
 
 Strengths:
 
 The authors present a truly clever and inexpensive optical module that can be integrated into almost any two-photon microscope, providing a tunable Bessel beam with a minimal modification of the existing system. The experimental data and accompanying quantitative analysis convincingly demonstrate that the system can reveal physiological events, such as capillary flow, calcium transients across multiple axial planes, and microglial process dynamics, that are difficult or impossible to capture with a conventional Gaussian beam. The breadth of experiments chosen for the manuscript illustrates the practical utility of the device and supports the authors' conclusions that it extends the functional repertoire of standard two-photon microscopy.
 
 Weaknesses:
 
 The manuscript would benefit from a more detailed contextualisation of the claimed speed advantage. Although the authors mention other techniques in the introduction, they do not provide any direct comparison with other state-of-the-art high-speed two-photon approaches such as light beads microscopy (Demas et al., Nat. Methods 2021), temporal multiplexing schemes (Weisenburger et al., Cell 2019), or random access microscopy (Villette et al., Cell 2019). A brief comparison of imaging speed, spatial resolution, and instrumental complexity would enable readers to assess the relative merits of the present method.
 
 A second limitation that warrants discussion is the inherent trade off between volumetric coverage and image specificity. Because the Bessel beam excites fluorescence throughout an extended axial range, the detector inevitably integrates signal from a three dimensional volume into a two dimensional image. In densely labelled tissue, this can lead to significant signal crosstalk, reducing contrast and complicating quantitative interpretation. A brief analysis of how labeling density affects the fidelity of flow or calcium measurements, or suggestions for mitigating crosstalk (e.g., computational deconvolution, adaptive excitation shaping, or combinatorial sparse labeling), would broaden the applicability of the technique.
 
 Review 3
5. Public_Reviews 04 Jun 2026
 
 in eLife
 
 Author response:
 
 Public Reviews:
 
 Reviewer #1 (Public review):
 
 This manuscript presents a tunable Bessel-beam two-photon fluorescence microscopy (tBessel-TPFM) platform that enables high-speed volumetric imaging with stable axial focus. The work is technically strong and broadly significant, as it substantially improves the flexibility and practicality of Bessel-beam-based two-photon microscopy. The demonstrations are generally strong and bridge a wide range of neuroimaging applications, namely vascular dynamics, neurovascular coupling, optogenetic perturbation, and microglial responses. These convincingly show that the approach enables biological measurements that are difficult or impractical with existing methods.
 
 The evidence supporting the technical and biological claims is generally strong. The optical design is carefully motivated, clearly described, and validated through a combination of simulations and experimental characterization. The biological applications are diverse and well chosen to highlight the strengths of the proposed method, and the data are of high quality, with appropriate controls and comparative measurements where relevant.
 
 Strengths:
 
 (1) The optical innovation addresses a well-recognized limitation of existing Bessel-TPFM implementations, namely axial focus drift during tuning, and does so using a relatively simple, light-efficient, and cost-effective design.
 
 (2) The manuscript provides convincing experimental evidence for this being a versatile platform to map flow dynamics across diverse vessel sizes and orientations in both healthy and pathological states.
 
 (3) Biological demonstrations are comprehensive and span multiple domains such as hemodynamics, neurovascular coupling, and neuroimmune responses.
 
 (4) Quantitative analyses of blood flow across vessel sizes and orientations, including kilohertz line scanning, are particularly compelling and clearly beyond the reach of standard Gaussian TPFM.
 
 (5) Particular advantages are that higher blood slow speeds become measurable up to 23mm/sec (20x more than conventional frame scanning), and that simultaneous (Bessel-)imaging and (Gaussian-)perturbation are possible because of the stable axial focus.
 
 We thank the reviewer for this thoughtful and encouraging evaluation of our work. We are particularly grateful for the recognition of both the technical rigor and the broad applicability of the tBessel-TPFM platform, as well as the assessment that our approach enables biological measurements that are difficult or impractical with existing methods. We appreciate the reviewer’s detailed summary of the strengths of the manuscript, including the identification of axial focus drift as a major limitation in prior Bessel-TPFM implementations, and the value of our center-stable, light-efficient, and accessible solution. We thank the reviewer for the encouraging comment that our biological demonstrations to be compelling and well supported by quantitative analysis.
 
 Weaknesses:
 
 (1) At present, the paper does not properly position the new Bessel-beam method against previous work, and fails to compare it to alternative fast volumetric imaging methods without Bessel beams.
 
 We thank the reviewer for this important point. We agree that a more explicit comparison with existing fast volumetric imaging methods helps clarify the unique advantages of our system. Alternative fast volumetric imaging methods without Bessel beams include remote focusing (Sofroniew et al., 2016), acousto-optic deflectors (AOD) (Villette et al., 2019), piezoelectric objective stages (Göbel and Helmchen, 2007), tunable acoustic gradient lenses (TAG lens) (Huang et al., 2019), electrically tunable lenses (ETLs) (Grewe et al., 2011; Yang et al., 2018), and light beads microscopy (Demas et al., 2021). These methods have each enabled important forms of rapid volumetric imaging, but they differ in their speed, resolution, axial range, and optical complexity. For example, remote focusing can provide rapid axial refocusing while preserving high-resolution imaging but has limited defocus range and requires a carefully aligned relay system and aberration control to maintain image quality. AOD-based approaches enable fast random-access sampling, but introduce optical and calibration complexity associated with dispersion, and suffer light loss with limited diffractive efficiency. Piezoelectric objective scanning is comparatively simple and broadly accessible, but its mechanical inertia limits volume rate and can introduce artifacts during rapid or large axial motion. TAG lenses and ETLs provide compact non-mechanical axial scanning, but pose challenges on aberration control and synchronization. Light-beads microscopy achieves high volumetric throughput by near-simultaneously sampling multiple axial positions, but faces intrinsic compromise among axial coverage, number of sampling planes, and lateral sampling density, which limit lateral resolution when imaging over large depth ranges.
 
 Previous Bessel-beam TPFM approaches address some of these limitations by converting volumetric imaging into two-dimensional scanning with an axially extended focus. However, many existing implementations either rely on a fixed Bessel beam profile, which limits the ability to adapt spatial resolution and axial coverage to different biological applications, or use spatial light modulators, which provide tunability but introduce higher cost, increased optical complexity, reduced light efficiency, and sequential rather than simultaneous multi-wavelength operation. Other axicon or lens based tunable Bessel approaches have also been reported, but these designs generally introduce axial displacement of the Bessel focus during tuning.
 
 In contrast, our tBessel-TPFM design provides full tunability comparable with SLM based methods, maintaining a stable axial beam center, at the same time low cost, easy to implement, intrinsically high light efficiency and support simultaneous multi-color imaging. Therefore, tBessel-TPFM provides a unique solution for applications where axial projection is acceptable and where high-speed volumetric monitoring, tunable axial coverage, motion robustness, optical simplicity, and compatibility with simultaneous perturbation are valuable.
 
 (2) The cost-effectiveness of the proposed method is not well described or supported by evidence; it would be useful to include more detail or remove this claim.
 
 We thank the reviewer for requesting clarification and supporting evidence regarding the cost-effectiveness of our method. We now provide a detailed cost breakdown of the tBessel module. Briefly, the module consists of three axicons, three lenses, and one iris that together enable independent control of the NA and ΔNA of the generated Bessel beam. Based on the specified components, the three axicons (AX252B and AX255B, Thorlabs) cost $635 each, the three lenses (AC254-125-B×2 and AC254-150-B, Thorlabs) cost $110 each, and the iris (SM2D25D, Thorlabs) costs $105, resulting in a total system cost of approximately $2,340. For comparison, spatial light modulator (SLM)-based implementations that offer comparable tunability typically require an SLM module costing on the order of $20,000 USD, in addition to more complex optical alignment and reduced optical efficiency.
 
 (3) Some biological conclusions, e.g., regarding novel features of microglial dynamics (i.e., the observed two-wave responses and coordinated extension-retraction), are based on relatively limited sample size and would benefit from clearer discussion of variability across animals and fields of view.
 
 We thank the reviewer for this important comment regarding the limited sample size of the microglial dynamics study. We agree that a more comprehensive assessment across animals would be required to establish the generality of these biological findings. In the current study, our intent is not to draw broad biological conclusions, but rather to report observations enabled by the tBessel-TPFM platform. As noted in the manuscript, we have deliberately used descriptive language (e.g., “two distinct waves of process extension were observed” “process dynamics revealed…” and “advancing processes displayed…”) to avoid over claim of the biological findings beyond the data presented.
 
 (4) The use of neural network-based denoising for microglial imaging is reasonable but introduces potential concerns about trustworthiness; additional clarification of validation or failure modes would strengthen confidence in these results.
 
 We thank the reviewer for raising this important point regarding the reliability of neural network-based denoising. We agree that additional validation and discussion of potential failure modes are essential to build confidence in these results. To assess the fidelity of the CARE-denoised data, we performed several additional analyses (Author response image 1). First, we compared normalized raw and denoised images averaged over 10 frames. The difference between the two images was spatially uniform and primarily reflected residual noise present in the raw data, rather than structured discrepancies (Author response image 1a). As expected, brighter features like microglial somata exhibited smaller differences due to their intrinsically higher signal-to-noise ratio, whereas weaker processes showed larger noise-related differences. Second, we extended this comparison across the full time-lapse sequence by applying consistent color mapping to both raw and denoised videos and computing frame-by-frame difference maps. These analyses show that the observed differences are consistent with noise suppression, without introducing coherent structural features or altering the apparent microglial dynamics (Author response image 1b).
 
 Author response image 1.
 
 Validation of CARE-based denoising for microglial imaging. (a) Comparison of 10-frame averaged normalized raw (left), CARE-denoised (middle), and their pixel-wise difference (right) images. The second row shows a zoomed-in view of the boxed region. (b) Color-coded time-lapse projections over a 10-minutes imaging session for the raw (left) and CARE-denoised (middle) data, along with their pixel-wise difference (right).
 
 To conclude, most of the authors' claims are well supported by the data. The central conclusion, namely that tBessel-TPFM provides tunable volumetric imaging enabling experiments not feasible with existing two-photon approaches, is justified. Some biological interpretations would benefit from a more cautious framing, but they do not undermine the main technical and methodological contributions of the study. This is a strong and technically rigorous manuscript that makes a substantial methodological advance with clear relevance to neuroscience and intravital imaging. Minor clarifications and a slightly more measured discussion of certain biological findings are recommended.
 
 We thank the reviewer for this thoughtful and encouraging summary of our work. We greatly appreciate the recognition that tBessel-TPFM provides a meaningful methodological advance and enables volumetric imaging experiments that are difficult or impractical with existing two-photon approaches.
 
 Reviewer #2 (Public review):
 
 The authors describe a tunable Bessel beam two-photon microscope (tBessel-TPFM) designed to overcome a common limitation of Bessel-based volumetric imaging: axial shifts of the effective focus during Bessel beam parameter tuning. Their optical design allows independent control of axial beam length and resolution while keeping the axial center fixed. This is extensively validated through simulations and experiments. Strengths:
 
 A major strength of the work is the breadth of validation combined with the level of technical detail provided. The authors carefully characterize the optical performance of the system and clearly explain the design choices and underlying derivations, which will make it easier for others to understand and implement. The authors demonstrate the utility of the method across several in vivo applications, including neurovascular imaging, blood flow measurements, optogenetic stimulation, and microglial dynamics.
 
 We thank the reviewer for their thoughtful and encouraging comments. We greatly appreciate the recognition of the technical rigor, breadth of validation, and clarity of explanation presented in our work.
 
 Weaknesses:
 
 In the in vivo demonstrations, the authors employ different Bessel beam configurations across experiments, but the beam parameters are not dynamically tuned during live imaging. A video example showing continuous or interactive tuning of the Bessel beam within a single in vivo imaging sequence would further highlight the practical advantages of this platform and strengthen the case for its potential applications.
 
 We thank the reviewer for their suggestion. While we agree that continuous or interactive tuning of the Bessel beam during imaging would further highlight the practical flexibility of the platform, and changing the Bessel beam parameters during imaging session is feasible in our tBessel-TPFM implementation, for the in vivo applications presented in this manuscript, dynamic tuning during the actual recording is generally not required. In practice, the Bessel beam parameters are selected before data acquisition based on the biological target, desired axial coverage, spatial resolution, and acceptable level of projection overlap.
 
 In addition, while excitation powers are reported, the manuscript does not place these values in the broader context of known photodamage thresholds for two-photon microscopy, which would be helpful to the readers.
 
 We thank the reviewer for bringing up this important point. It is known that multiphoton imaging relies on relatively high illumination power, which causes brain heating and thus photodamage. Previous studies have reported that continuous illumination with a 920-nm laser beam at 0.8 NA over 1000s results in a peak temperature increase of ~1.73 °C/100 mW in the brain, with power above 300 mW observed to cause cellular damage. Power levels below 250 mW were considered to be safe for long-term imaging. (Podgorski and Ranganathan, 2016) In our experiments, the measured post-objective powers range from 20 mW to 149 mW, which are well below the established safe threshold.
 
 Denoising/image restoration are applied in one of the in vivo examples, but it is unclear why this step was used specifically for this dataset and whether it was necessary to achieve adequate SNR or primarily included as an additional demonstration.
 
 We thank the reviewer for requesting clarification on the usage of the CARE denoising model. The CARE-based denoising was applied only in Figure 5, the microglial imaging example, and was primarily included as an additional demonstration of how neural network–based image restoration can be used to enhance low-SNR volumetric datasets acquired with tBessel-TPFM. All other images and analyses in the manuscript were performed on raw data without any denoising. To assess the reliability of the CARE denoising method, we further compared raw and denoised data using 10-frame averages and color-mapped the full 10-minute time-lapse video, both showed minimal differences (Response Fig 1). These analyses confirm that the CARE denoising model did not introduce structural artifacts or affect the biological dynamics observations in our dataset.
 
 Reviewer #3 (Public review):
 
 The manuscript presents an elegant and cost-effective approach for generating a tunable Bessel beam on a conventional two-photon microscope. The authors assemble a compact optical module comprising three axicons and a series of lenses that permits rapid adjustment of both lateral resolution and axial extent without modifying the focal plane. This flexibility enables the system to be readily adapted to a variety of biological preparations. As a proof of concept, the authors employ the device to record blood flow velocities in cortical microcapillaries, arterioles, and venules, thereby directly visualizing vasodilatation and vasoconstriction dynamics and permitting quantitative analysis of neurovascular coupling across cortical layers in awake mice.
 
 The authors demonstrate that the tunability of the Bessel beam can be exploited to match the numerical aperture to the vessel type: a high NA configuration, albeit slower scan, is optimal for resolving flow in capillaries, whereas a low NA setting provides faster acquisition suitable for arterioles and venules. By implementing a one-dimensional line scan with the Bessel beam, they achieve an imaging speed that is twentyfold faster than conventional frame-by-frame scanning, which proves sufficient to capture hemodynamic transients before and after an induced ischemic stroke.
 
 In addition to pure observation, the authors integrate a co-propagating Gaussian line to the system, allowing simultaneous imaging and photostimulation within the same focal plane. This capability addresses a common limitation of other Bessel beam implementations, in which the observation and perturbation planes often become misaligned when the Bessel beam is altered. The manuscript also emphasizes the advantage of Bessel beam excitation for calcium imaging after a perturbation, because it captures neuronal activity in planes both above and below the nominal focal plane, signals that would be missed with a standard Gaussian focus. Finally, the authors apply the technique to investigate the neuroimmune response following targeted microglial ablation; they report that adjacent microglia extend processes toward the injury site while retracting processes in the opposite direction.
 
 Overall, the work offers a technically straightforward yet powerful extension to existing two-photon platforms, providing high-speed, volumetric imaging and stimulation capabilities that are well-suited to a broad range of neurovascular and neuroimmune studies. The experimental validation is quite thorough, and the presented data convincingly illustrates the benefits of the approach.
 
 Strengths:
 
 The authors present a truly clever and inexpensive optical module that can be integrated into almost any two-photon microscope, providing a tunable Bessel beam with a minimal modification of the existing system. The experimental data and accompanying quantitative analysis convincingly demonstrate that the system can reveal physiological events, such as capillary flow, calcium transients across multiple axial planes, and microglial process dynamics, that are difficult or impossible to capture with a conventional Gaussian beam. The breadth of experiments chosen for the manuscript illustrates the practical utility of the device and supports the authors' conclusions that it extends the functional repertoire of standard two-photon microscopy.
 
 We sincerely thank the reviewer for the thoughtful and encouraging feedback. We're glad that the technical design and broad applicability of the tBessel module came through clearly, and we appreciate the recognition of its ease of integration and ability to capture dynamic physiological processes.
 
 Weaknesses:
 
 The manuscript would benefit from a more detailed contextualisation of the claimed speed advantage. Although the authors mention other techniques in the introduction, they do not provide any direct comparison with other state-of-the-art high-speed two-photon approaches such as light beads microscopy (Demas et al., Nat. Methods 2021), temporal multiplexing schemes (Weisenburger et al., Cell 2019), or random access microscopy (Villette et al., Cell 2019). A brief comparison of imaging speed, spatial resolution, and instrumental complexity would enable readers to assess the relative merits of the present method.
 
 We thank the reviewer for this important suggestion. We agree that a more explicit comparison with other high-speed two-photon imaging methods helps clarify the speed advantages of our system. Several existing approaches, including light-beads microscopy (LBM), temporal multiplexing, and AOD-based random-access microscopy, have demonstrated impressive high-speed volumetric imaging capabilities. Light-beads microscopy (Demas et al., 2021) reported imaging over a large volume of 5.4 × 6 × 0.5 mm3 at 2 Hz. However, this large-volume acquisition used 5-μm lateral pixel sampling, corresponding to an effective lateral resolution of approximately 10 μm. In a more comparable mesoscopic volume, LBM imaged 0.6 × 0.6 × 0.5 mm3 at 9.6 Hz with 1-μm lateral pixel sampling. In addition, the LBM module uses off-axis reflective concave mirrors, which require careful alignment, and the axial sampling range is not readily tunable. Temporal multiplexing approaches (Weisenburger et al., 2019), reported imaging over approximately 1 × 1 × 0.6 mm3 at 17 Hz. However, this volume rate was achieved with relatively coarse spatial resolution of approximately 5 μm, together with a more complex optical design involving multiplexed excitation, detection, and synchronization. AOD-based random-access microscopy (Nadella et al., 2016; Villette et al., 2019) provides very fast point or region sampling, and reported 250 × 250 μm2 imaging with 512 × 512 pixels and a 50-ns pixel dwell time, corresponding to ~0.5-μm pixel sampling and ~76 frames/s for two-dimensional imaging. However, volumetric imaging requires additional axial sampling, which lowers the effective 3D acquisition rate. In addition, AOD-based systems rely on diffractive beam steering, which introduces light loss due to finite diffraction efficiency and increases optical and calibration complexity. In comparison, tBessel-TPFM imaged a 0.4 × 0.4 × 0.12 mm3 volume at 58 Hz with 0.2-μm lateral pixel sampling. Our largest demonstrated imaging volume reached 2.5 × 2.5 × 0.45 mm3 while maintaining diffraction-limited lateral resolution. Therefore, compared with these high-speed volumetric approaches, tBessel-TPFM provides a distinct balance of volume rate and spatial sampling, and easier implementation simplicity.
 
 A second limitation that warrants discussion is the inherent trade off between volumetric coverage and image specificity. Because the Bessel beam excites fluorescence throughout an extended axial range, the detector inevitably integrates signal from a three dimensional volume into a two dimensional image. In densely labelled tissue, this can lead to significant signal crosstalk, reducing contrast and complicating quantitative interpretation. A brief analysis of how labeling density affects the fidelity of flow or calcium measurements, or suggestions for mitigating crosstalk (e.g., computational deconvolution, adaptive excitation shaping, or combinatorial sparse labeling), would broaden the applicability of the technique.
 
 We thank the reviewer for highlighting this important trade-off between volumetric coverage and image specificity in Bessel beam imaging. As Bessel beams project fluorescence from multiple features along the z-axis onto the same x–y plane, longer beams expand depth coverage at the same acquisition speed but can confound signals from axially spaced structures (Line 119-121 in manuscript). For densely labeled samples, the probability of having structures overlap in their x-y locations is high, and thus a shorter beam should be used. In sparsely labeled samples, structures have a lower probability of overlapping, and thus longer foci can be used (Line 166-168 in manuscript). Additionally, at the same NA, longer Bessel beam have more energy in the side rings surrounding the central peak, which may lead to higher background signal (Line 121-123 in manuscript) (Lu et al., 2017). These reasons necessitate to have not only NA tuning, but also independent length tuning (ΔNA tuning) to optimize imaging Bessel length to provide a balance between structural overlap that obscures signal localization, and the volumetric speedup, in any given sample based on labeling density and imaging goals, which are realized in our tBessel design.
 
 Reference:
 
 Demas, J., Manley, J., Tejera, F., Barber, K., Kim, H., Traub, F.M., Chen, B., Vaziri, A., 2021. High-speed, cortex-wide volumetric recording of neuroactivity at cellular resolution using light beads microscopy. Nat Methods 18, 1103–1111. https://doi.org/10.1038/s41592-021-01239-8
 
 Göbel, W., Helmchen, F., 2007. In Vivo Calcium Imaging of Neural Network Function. Physiology 22, 358–365. https://doi.org/10.1152/physiol.00032.2007
 
 Grewe, B.F., Voigt, F.F., van ’t Hoff, M., Helmchen, F., 2011. Fast two-layer two-photon imaging of neuronal cell populations using an electrically tunable lens. Biomed Opt Express 2, 2035–2046. https://doi.org/10.1364/BOE.2.002035
 
 Huang, C., Tai, C.-Y., Yang, K.-P., Chang, W.-K., Hsu, K.-J., Hsiao, C.-C., Wu, S.-C., Lin, Y.-Y., Chiang, A.-S., Chu, S.-W., 2019. All-Optical Volumetric Physiology for Connectomics in Dense Neuronal Structures. iScience 22, 133–146. https://doi.org/10.1016/j.isci.2019.11.011
 
 Lu, R., Sun, W., Liang, Y., Kerlin, A., Bierfeld, J., Seelig, J.D., Wilson, D.E., Scholl, B., Mohar, B., Tanimoto, M., Koyama, M., Fitzpatrick, D., Orger, M.B., Ji, N., 2017. Video-rate volumetric functional imaging of the brain at synaptic resolution. Nat Neurosci 20, 620–628. https://doi.org/10.1038/nn.4516
 
 Nadella, K.M.N.S., Roš, H., Baragli, C., Griffiths, V.A., Konstantinou, G., Koimtzis, T., Evans, G.J., Kirkby, P.A., Silver, R.A., 2016. Random-access scanning microscopy for 3D imaging in awake behaving animals. Nat Methods 13, 1001–1004. https://doi.org/10.1038/nmeth.4033
 
 Podgorski, K., Ranganathan, G., 2016. Brain heating induced by near-infrared lasers during multiphoton microscopy. Journal of Neurophysiology 116, 1012–1023. https://doi.org/10.1152/jn.00275.2016
 
 Sofroniew, N.J., Flickinger, D., King, J., Svoboda, K., 2016. A large field of view two-photon mesoscope with subcellular resolution for in vivo imaging [WWW Document]. eLife. https://doi.org/10.7554/eLife.14472
 
 Villette, V., Chavarha, M., Dimov, I.K., Bradley, J., Pradhan, L., Mathieu, B., Evans, S.W., Chamberland, S., Shi, D., Yang, R., Kim, B.B., Ayon, A., Jalil, A., St-Pierre, F., Schnitzer, M.J., Bi, G., Toth, K., Ding, J., Dieudonné, S., Lin, M.Z., 2019. Ultrafast Two-Photon Imaging of a High-Gain Voltage Indicator in Awake Behaving Mice. Cell 179, 1590-1608.e23. https://doi.org/10.1016/j.cell.2019.11.004
 
 Weisenburger, S., Tejera, F., Demas, J., Chen, B., Manley, J., Sparks, F.T., Traub, F.M., Daigle, T., Zeng, H., Losonczy, A., Vaziri, A., 2019. Volumetric Ca2+ Imaging in the Mouse Brain Using Hybrid Multiplexed Sculpted Light Microscopy. Cell 177, 1050-1066.e14. https://doi.org/10.1016/j.cell.2019.03.011
 
 Yang, W., Carrillo-Reid, L., Bando, Y., Peterka, D.S., Yuste, R., 2018. Simultaneous two-photon imaging and two-photon optogenetics of cortical circuits in three dimensions. eLife 7, e32671. https://doi.org/10.7554/eLife.32671
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2025.12.31.697185v1
www.biorxiv.org www.biorxiv.org

Autosomal Allelic Inactivation: Variable Replication and Dosage Sensitivity

3
1. Public_Reviews 03 Jun 2026
  
  in eLife
  
  eLife Assessment
  
  This important study links allelic expression imbalance with replication timing, suggesting a stochastic model for haploinsufficiency in dosage-sensitive disease. The integration of allele-specific RNA-seq and replication timing in clonal systems provides solid evidence for an association between asynchronous replication and allelic imbalance, although the scope and generality should be addressed in future work. This study will interest epigeneticists and genome regulation researchers studying replication timing and monoallelic expression, as well as developmental biologists and human geneticists concerned with clonal heterogeneity, haploinsufficiency, and variable disease penetrance.
  
  [Editors' note: this paper was reviewed by Review Commons.]
  
  Summary
2. Public_Reviews 03 Jun 2026
  
  in eLife
  
  Reviewer #2 (Public review):
  
  [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]
  
  The authors pair analysis of replication timing and allele-specific expression in clonal populations of primary human cells. They combine these data with previously published data on clones from transformed human cell lines. They identify a number of genomic regions that display asynchronous replication timing in at least one clone and correlate these regions with allele-specific expression of genes within them. They also observe that several interesting gene sets, including genes that are associated with human diseases, map to asynchronously replicating regions. This is a good experimental approach that builds on already published data demonstrating the connection between allelic imbalance and replication timing.
  
  Review 1
3. Public_Reviews 03 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors pair analysis of replication timing and allele-specific expression in clonal populations of primary human cells. They combine these data with previously published data on clones from transformed human cell lines. They identify a number of genomic regions that display asynchronous replication timing in at least one clone and correlate these regions with allele-specific expression of genes within them. They also observe that several interesting gene sets, including genes that are associated with human diseases, map to asynchronously replicating regions. This is a good experimental approach that builds on already published data demonstrating the connection between allelic imbalance and replication timing.
  
  - This is a research topic that touches on a few sub-fields of biology, and thus to make the paper more approachable we would recommend a careful edit of the text for clarity and precision of language.
  
  We thank the reviewers for their thoughtful and constructive comments, which substantially improved our manuscript. In response, we have revised the text and figures throughout to address the points raised.
  
  - Authors point out that this is a decades-old field; we would suggest to use terminology established within the field is possible. Allelic imbalance has been referred to as AI, MAE (monoallelic expression), RMAE (random monoallelic expression) etc. The paper whose mouse data the authors make use of uses Asynchronous Stochastic Replication Timing (ASRT) instead of VERT to refer to the same phenomenon.
  
  While we agree that allelic expression imbalance has been described by different investigators using many different phrases, we believe that MAE, RMAE and AI do not represent accurate descriptions of the phenomenon. We point out that “Allelic Expression Imbalance” has been used to describe this variable allelic expression by other investigators >120 times in the Pubmed database. In our study [and our previous study; Nat Commun. 2022; 13(1):6301] we used clonal analysis of allele-specific expression and found that while some clones display equivalent levels of expression between alleles of a given gene (i.e. bi-allelic expression) other clones express only one allele (i.e. mono-allelic expression), and yet other clones have undetectable expression (i.e. silent on both alleles). This pattern of allele-restricted expression indicates that each allele independently adopts either an expressed or silent state. Importantly, because these expression states are mitotically stable, allele-autonomous, and independent of parental origin, we refer to the choice of the expressed allele as stochastic. Given this variability, we believe that the phrase “Allelic Expression Imbalance” (AEI) represents a more accurate descriptor for this phenomenon.
  
  In addition, the replication asynchrony that exists at these loci is not consistent with purely ASynchronous Replication Timing (ASRT) between alleles. We found that each allele can independently adopt either earlier or later replication timing in different clones. This variability results in some clones exhibiting pronounced asynchrony between alleles, while in others, the two alleles replicate synchronously, with both adopting either the earlier or later timing state. As reported in our previous study (Nat. Commun. 2022; 13:6301), this behavior reflects a stochastic and allele-autonomous process, leading us to describe these loci as exhibiting Variable Epigenetic Replication Timing (VERT), which we believe is a more accurate descriptor of this phenomenon.
  
  - Methods do not provide fully sufficient detail to fully evaluate or reproduce these experiments.
  
  We now provide a more detailed description of how VERT regions were identified, annotated, and quantified, including thresholds for allelic imbalance, replication timing variability, and sampling depth. We also justify the ≥80% AEI cutoff, which is based on recently published studies showing that modest allelic biases can have biological and clinical significance (Nature 2025; 637, 1186-1197). We also refer the readers to our recent description of these methods (Nat. Commun. 2022; 13:6301).
  
  - It is helpful to show representative loci as the authors do in Fig 1F and G and Fig 2 but these panels are very densely rendered and thus difficult to process visually - even the cartoon version (1D) is thick with overlapping lines. The point that allelic imbalance is enriched in VERTs would be enhanced if the authors could present the allelic ratio for all genes found in all VERTs, demonstrating how replication timing on either chromosome affects the allelic ratio.
  
  The stochastic nature of the allelic expression and replication timing observed at I/SCs is best visualized with each allele and each transcription unit displayed from multiple clones in the same panel. One of the goals of these figure panels is to emphasize that each I/SC has multiple transcription units that acquire expressed or silent states independently in each clone. Therefore, the expressed or silent status of one allele of a transcription unit does not predict expression status of the same or opposite allele of any other transcription unit within the same VERT region. In addition, the Early/Late pattern of replication timing that we detect is not correlated with which allele is transcriptionally active (see below). In these figure panels, we display each clone using different colors, each allele as solid or dotted lines, and each transcription unit based on chromosome position. While this arrangement makes for busy images, we believe that this format captures the full breadth of the variability in expression and replication timing that occurs at I/SCs.
  
  Regardless, because each transcription unit is independent, we now provide the expression ratios for all transcripts that are generated from the VERT regions for the coding and non-coding transcription units in Figures 1, 2, and 6; shown in Supplemental Table 9. This analysis indicated that 4,017 informative reads were derived from the earlier replicating allele and 3,161 informative reads were derived from the later replicating allele, generating an allelic ratio of 1.3 (early/late) and a binomial P value of 1.0.
  
  In addition, a similar analysis of imprinted loci revealed that even at genomic regions with parent-of-origin–specific expression, the replication timing of each allele does not align with transcriptional activity, i.e. both early- and late-replicating alleles can be transcriptionally active, depending on the gene. This observation is consistent with the complex organization of many imprinted domains, where genes on opposite alleles exhibit reciprocal expression patterns. To illustrate this point, we now include Supplemental Figure 1 demonstrating that imprinted loci harbor genes expressed from both the earlier- and later-replicating alleles. In addition, quantification of the total number of informative transcripts at the DLK1/MEG8 imprinted locus (Supplemental Figure 1a-1c) indicates that the ratio of transcripts derived from the early versus late replicating alleles is equivalent (i.e. an allelic expression ratio of 1.0; See Supplemental Table 9).
  
  - The authors make the important point that VERTs are unlikely to be shared among different cell types and tissues (Fig 1i), but then find an enrichment for neuronal and immune genes in VERT regions identified in ACPs. It follows that these same genes are unlikely to be in such regions in the tissues where they are relevant. Some of the GO terms presented are too broad to suggest any biological significance to the result, even if there is statistical significance (for example, the top term for LCL clones 'Cytoplasm' is associated with 12,000 genes, and the second term for mouse clones 'Membrane' is associated with 10,000). It would be helpful to focus on GO terms lower in the GO hierarchy.
  
  We now include our complete Gene Ontology analysis, with more specific biological categories, in Supplemental Table 5.
  
  - Figure 3 highlights the association of related gene clusters with VERTs but the VERTs are assigned based on variable replication timing in just 1 or 2 clones. This is an interesting observation, but to make the point that "VERT regions frequently coincide with gene clusters in the human genome" there needs to be a systematic assessment of replication timing at all gene clusters across all clones, and a statistical test for significance.
  
  Our intent in Figure 3 was not to suggest that all gene clusters are subject to VERT and AEI, but rather to highlight that several well-characterized multigene families that are known to exhibit AEI, such as olfactory receptor, protocadherin, and HLA gene clusters, coincide with VERT regions at their genomic locations. These examples serve as representative illustrations demonstrating that I/SC-associated regulation occurs at established AEI loci organized in gene clusters.
  
  To clarify this point, we have revised the text to explicitly state that Figure 3 presents illustrative examples of known AEI-associated gene clusters overlapping with VERT regions, rather than a comprehensive or statistically exhaustive analysis of all gene clusters across the genome.
  
  - It is an interesting hypothesis that VERTs are conserved between species at syntenic loci. If such regions are really conserved, one would expect that replication timing at these sites would be consistently asynchronous. However the data presented shows that in human clones these VERTs can be specific to an individual donor (as in 5A) or an individual clone (as in 5H).
  
  As discussed in our Limitations Section, our analysis was restricted to a limited number of cell types, individuals, and clones, which may not capture the full diversity of I/SC usage across tissues and populations. While our dataset was sufficient to identify robust patterns of AEI and VERT, it likely represents only a subset of the broader landscape of I/SC regulation in both humans and mice. We anticipate that future studies incorporating a wider range of tissues, individuals, and clones will uncover an even greater degree of conservation and diversity in I/SC usage across genomes.
  
  - The finding that VERTs coincide with neurodevelopmental disease genes in immune and cartilage cells is at odds with the previous statements and data about the tissue specificity of VERTs. In order to support the claim that neurodevelopmental disease associated genes reside in asynchronously replicating regions, and are thus more prone to allelic imbalance, it would be helpful if the authors demonstrated this phenomenon in neuronal cells.
  
  We make two points that address this critique: First, many of the neurodevelopmental disease genes associated with VERT regions are not exclusively expressed in neuronal cells and have previously been shown to exhibit AEI in non-neuronal contexts. For example, Gimelbrant and Chess (Science, 2007; 318:1136–1140) demonstrated AEI of the Parkinson disease genes SNCA and LRRK2 in lymphoblastoid cell lines (LCLs), and in our previous study, that also used LCL cells, we detected AEI of DNAJC6, which is another Parkinson disease gene (Nat. Commun. 2022; 13:6301). In the present study, using cartilage progenitor cells, we identified VERT and AEI of several epilepsy-associated genes, including SCN1A, SCN2A (Fig. 6b), GABRA1(Fig. 6e), and SAMD12 (Fig. 6j), as well as a gene implicated in autism and neurodevelopmental disorders, SEMA5A (Fig. 5c), indicating that expression of these genes is not exclusive to neuronal cell types.
  
  Second, independent studies from the Dr. E. Heard laboratory have provided further evidence that AEI occurs in neuronal lineages. Using mouse neural progenitor cells (NPCs), they identified genes subject to AEI (Dev. Cell, 2014; 28:366–380) and they later evaluated AEI of syntenic human neurodevelopmental disease genes, including Snca, App, Eya4, and Grik2 (Nat. Commun. 2021; 12:5330). In our data, we find that these mouse genes are located within VERT regions. In addition, and consistent with our use of AEI, they used the phrase “Allelic Expression Imbalance” to describe the epigenetic expression biases at these genes.
  
  Together, these findings reinforce that AEI, and by extension I/SC regulation, is not restricted to specific cell types, but rather represents a generalizable mechanism of stochastic epigenetic regulation that includes genes relevant to neuro development and disease.
  
  - The authors consistently lean on sparse samples (i.e. a single clone) within a modestly sized dataset (4 clones from 2 donors each) to propose a new model for haploinsufficiency in human disease. It may well be but the consistent focus on limited elements in the data and perhaps an overreach in the interpretation makes it difficult to appreciate the very good experiments presented.
  
  We agree that our analysis was conducted on a modest number of cell types, individuals, and clones, which we explicitly acknowledge as a limitation of the present study. However, several key points support the robustness and broader relevance of our conclusions:
  
  i) Clonal Design and Replication: The strength of our approach lies in its clonal resolution. Each clone represents a single-cell–derived population expanded to over a million cells, enabling direct detection of stable, mitotically heritable allele-specific epigenetic states that would not be apparent in population-averaged data. Importantly, many of the VERT regions we identified are shared between independent clones from different donors and across distinct cell types (ACP and LCL), demonstrating reproducibility and biological consistency.
  
  ii) Cross-Species Validation: We further identified syntenic VERT regions in mouse pre-B cell clones, including at loci known to exhibit AEI in prior studies, providing independent validation and evolutionary conservation of the phenomenon.
  
  iii) Integration with Published Evidence: Our findings extend prior observations of AEI and VERT (e.g. Gimelbrant et al. Science 2007; Heskett et al. Nat. Commun. 2022) and are fully consistent with known stochastic allelic expression imbalance of autosomal genes.
  
  iv) We also draw parallels with the absence of cellular selection mechanisms that dictate dominant inheritance patterns for loss of function alleles for X linked disease genes (reviewed in: J Clin Invest, 2008, 20-23; and Nat Rev Genet. 2025, 26, 571–580). Our proposed model linking I/SC regulation to haploinsufficiency is therefore a synthesis of our results with an extensive body of published data, not an inference drawn from isolated observations.
  
  v) Scope and Framing: We have revised the manuscript to clarify that our proposed model represents a mechanistic framework, not a definitive or exclusive explanation, for how stochastic allelic regulation could contribute to dosage-sensitive disease phenotypes. We also explicitly discuss the need for larger datasets and additional tissues to refine and test this model.
  
  - This section refers to the revised version of the paper. We would like to thank the authors for the changes and explanations offered. Although we don't fully agree with a few answers offered, overall the answers and changes in the manuscript have significantly improved the work presented. As such it should be of interest to many readers.
  
  We thank the reviewers for their thoughtful evaluation and constructive feedback. We appreciate their recognition that the revisions have strengthened the manuscript and are pleased that they find the work to be of broad interest.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.08.13.670061v4
www.biorxiv.org www.biorxiv.org

Orco regulates the circadian activity of pheromone-sensitive olfactory receptor neurons in hawkmoths

3
1. Public_Reviews 03 Jun 2026
 
 in eLife
 
 eLife Assessment
 
 This valuable study uses technically compelling long-term in vivo recordings and computational modeling to investigate whether hawkmoth olfactory receptor neurons show circadian modulation of spontaneous firing. The authors further propose the provocative model that post-translational mechanisms, rather than the transcriptional-translational processes, may contribute to circadian regulation of neuronal excitability. However, the evidence for circadian firing in these neurons, and for post-translational modification of Orco as the underlying mechanism, remains incomplete. In contrast, the study does provide strong evidence that the application of cyclic nucleotides can modulate Orco-dependent activity at a single time point, and reports that the temporal pattern of Orco transcript abundance is not circadian.
 
 Summary
2. Public_Reviews 03 Jun 2026
 
 in eLife
 
 Joint Public Review:
 
 This manuscript puts forward the provocative idea that a posttranslational feedback loop regulates daily and ultradian rhythms in neuronal excitability. The authors used in vivo long-term tip recordings of the long trichoid sensilla of male hawkmoths to analyze spontaneous spiking activity indicative of the ORNs' endogenous membrane potential oscillations. This firing pattern was disrupted by pharmacological blockade of the Orco receptor. They then use these recordings together with computational modeling to predict that Orco receptor neuron (ORN) activity is required for circadian, not ultradian, firing patterns. Orco did not show a circadian expression pattern in a qPCR experiment, and its conductance was proposed to be regulated by cyclic nucleotide levels. This evidence led the authors to conclude that a post-translational feedback loop (PTFL) clockwork, associated with the ORN plasma membrane, allows for temporal control of pheromone detection via the generation of multi-scale endogenous membrane potential oscillations. The findings will interest researchers in neurophysiology, circadian rhythms, and sensory biology. However, the manuscript has limited experimental evidence to support its central hypothesis and is undermined by several assumptions that underlie their data analysis and model builds, as well as insufficient biological data including critical controls to validate and/or fully justify the model the authors are proposing.
 
 Strengths:
 
 The authors raise several intriguing model-based hypotheses regarding the mechanisms that underlie the generation of olfactory rhythms. The electrophysiological approach and the long-term recording paradigm are elegant and technically impressive. In the revised version, the authors have added additional qPCR data supporting the lack of rhythmic Orco transcript expression and included a new figure suggesting that cAMP can modulate Orco conductance.
 
 Major weaknesses:
 
 (1) The cAMP experiment was only conducted at one time-point, which is insufficient to support the central claim that "AMP and cGMP may have ZT-dependent effects on Orco conductivity".
 
 (2) The revised manuscript continues to rely heavily on prior publications or defers key mechanistic questions (or important manipulations) to future studies. In its current form, the evidence presented remains insufficient to support the central claim that a PTFL constitutes the primary underlying circadian clock mechanism. The proposed model is intriguing, but the data provided do not yet directly demonstrate the novel mechanism.
 
 Review 1
3. Public_Reviews 03 Jun 2026
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 Joint Public Review
 
 This manuscript puts forward the provocative idea that a posttranslational feedback loop regulates daily and ultradian rhythms in neuronal excitability. The authors used in vivo long-term tip recordings of the long trichoid sensilla of male hawkmoths to analyze spontaneous spiking activity indicative of the ORNs' endogenous membrane potential oscillations. This firing pattern was disrupted by pharmacological blockade of the Orco receptor. They then use these recordings together with computational modeling to predict that Orco receptor neuron (ORN) activity is required for circadian, not ultradian, firing patterns. Orco did not show a circadian expression pattern in a qPCR experiment, and its conductance was proposed to be regulated by cyclic nucleotide levels. This evidence led the authors to conclude that a post-translational feedback loop (PTFL) clockwork, associated with the ORN plasma membrane, allows for temporal control of pheromone detection via the generation of multi-scale endogenous membrane potential oscillations. The findings will interest researchers in neurophysiology, circadian rhythms, and sensory biology. However, the manuscript has limited experimental evidence to support its central hypothesis and is undermined by several questionable assumptions that underlie their data analysis and model builds, as well as insufficient biological data, including critical controls to validate and/or fully justify the model the authors are proposing.
 
 We thank the reviewers for their thorough and thoughtful comments and believe that the manuscript is much stronger now after the revision which incorporates the requested changes. We added results of new experiments and additional analyses. Although these new insights did not change the previous conclusions, we significantly reworked the Discussion and added further references to clarify the conclusions we want to make.
 
 Please note that we used ORN as acronym for “olfactory receptor neuron” throughout the manuscript. ORNs contain odorant receptors (ORs), and in insects these ORs associate with the olfactory receptor co-receptor (Orco) to be trafficked to the membrane of the cilium of the ORN, where they can be contacted by pheromones and odorants. In Manduca sexta, evidence is accumulating for G-protein coupled metabotropic pheromone transduction and not for OR-Orco dependent ionotropic transduction, as shown for Drosophila melanogaster. In both insect species, besides its chaperone function, Orco can form leaky cation channels, which can regulate the spontaneous spiking activity of ORNs. In this study, we explored this role of Orco.
 
 Strengths:
 
 The study is notable for its combination of long-term in vivo tip recordings with computational modeling, which is technically challenging and adds weight to the authors' claims. The link between Orco, cyclic nucleotides, and circadian regulation is potentially important for sensory neuroscience, and the modeling framework itself - a stochastic Hodgkin-Huxley formulation that explicitly incorporates channel noise - is a solid and forward-looking contribution. Together, these elements make the study conceptually bold and of clear interest to circadian and olfactory biologists.
 
 Major weaknesses:
 
 At the same time, several limitations temper the conclusions. The pharmacological evidence relies on a single antagonist and concentration, without key controls. The circadian analysis is based on relatively small numbers of neurons, with rhythms detected only in subsets, and the alignment procedure used in constant darkness raises concerns of bias. The molecular evidence is sparse, with only three qPCR timepoints, and the model, while creative, rests on assumptions that are not yet fully supported by in vivo data.
 
 Please see our responses to the detailed comments.
 
 Detailed comments are provided below:
 
 (1) The role for Orco proposed in the authors' model largely stems from the effects seen following the administration of (a single dose) of the Orco antagonist, OLC15. However, this hypothesis is undercut by the lack of adequate pharmacological controls, including a basic multipoint OLC15 dose-response series in addition to the administration of blockers for the other channels that are embedded in their model, but which were ruled out as being involved in the modulation of biological rhythms. In addition, these studies would (ideally) also benefit from the inclusion of the same concentration (series) of an inactive OLC15 analog to better control for off-target effects.
 
 The Orco agonist VUAA1 (Jones et al., 2011) binds directly to Orco and increases the channel open time probability. In M. sexta hawkmoths, we have already published that VUAA 1 increases the low spontaneous activity of ORNs in a dose-dependent fashion (Nolte et al., 2013). Chen and Luetje (2012) systematically varied the chemical structure of VUAA1 to identify new Orco ligands and discovered 22 Orco ligand candidates (OLCs) that either activated or inhibited Orco. In their heterologous expression system, Orco was most sensitive to inhibition by OLC15. Based on these results, we published a dose-response curve of OLC15 inhibition (1-100 µM) using in vivo tip recordings of pheromone-sensitive long trichoid sensilla of M. sexta (Nolte et al., 2016). There, we also demonstrated that OLC15 dose-dependently antagonizes the VUAA1-dependent activation of Orco.
 
 Furthermore, we tested other published Orco antagonists, which were characterized in heterologous assays, in primary cell cultures of hawkmoth ORNs, as well as in in vivo assays in intact hawkmoths. We focused on amiloride-derived antagonists, because we previously identified an amiloride-sensitive cation channel in hawkmoth ORNs. We found that, in contrast to OLC15, the amilorides HMA and MIA were not Orco-specific antagonists but instead affected different ion channel targets depending on the time of day (Nolte et al., 2016). Based on those experiments and the dose-response curves we determined that the Orco agonist VUAA1 (Jones et al., 2011) and the Orco antagonist OLC15 (Chen and Luetje, 2012) worked best in hawkmoth ORNs to target Orco pharmacologically. Due to those results and other comparative tests with other published Orco antagonists we settled since then in all further experiments on a dose of 50 µM OLC15 as most adequate to antagonize Orco functions in Manudca. In the current study, we focus on Orco without excluding the possibility that other ion channels in the ORNs contribute to the control of membrane potential rhythms.
 
 We have clarified the Methods section accordingly.
 
 (2) The expression pattern of Orco was assessed using qPCR at only three timepoints. Rhythmic transcripts can easily be missed with such sparse sampling (Hughes et al., 2017). A minimum of six evenly spaced timepoints across a 24-hour cycle would be required to confidently rule out circadian transcriptional regulation. In addition, the use of the timeless mRNA control from another study is not acceptable. Furthermore, qPCR analysis measures transcript abundance, not transcription, as the authors repeatedly state. Transcriptional studies would require nuclear run-off or, more recently, can be done with snRNAseq analysis. Taken together, these concerns undermine the authors' desire to rule out TTFL-based control that directly led them to implicate a PTTF-based model.
 
 We agree with the referees that more time points and a direct comparison between timeless and Orco mRNA levels should be included in this manuscript. We included these additional qPCR experiments and edited the manuscript to make clear that we measure transcript abundance, but we will not perform snRNAseq analysis due to time- and financial constraints.
 
 (3) The modelling presented is based on Orco as a ZT-dependent conductance tied to the cAMP oscillations that were reported by this group in the cockroach and from the presence and functionality in Manduca of homomeric Orco complexes that are devoid of tuning ORs. While these complexes have been generated in cell culture and other heterologous expression systems, as well as presumably exist in vivo in the Drosophila empty neuron and other tuning OR mutants, there is no evidence that these complexes exist in wild-type Manduca ORNs. While this doesn't necessarily undermine every aspect of their models, the authors should note the presence of Orco/OR complexes rather than Orco homomeric complexes.
 
 Our ELISAs found circadian oscillations in cAMP levels not only in antennae of the Madeira cockroach (Schendzielorz et al., 2014, 2012), but also in hawkmoth antennae (Schendzielorz et al., 2015). For clarification, we added the 2015 citation to the Modeling chapter in the Methods section.
 
 We agree with the referees that we cannot distinguish between Orco homo- and heteromers in the different compartments of our hawkmoth ORNs but we know that both are expressed in the pheromone-sensitive ORNs. Thus, as the referee suggests, we added text regarding the presence and localization of OR-Orco heteromers. Consistent data collected across different experiments (heterologous expression systems, primary cell cultures of hawkmoth ORNs, in vivo/in situ studies) support that Orco homomers are present in hawkmoth ORNs. In addition to co-expression of MsexOrco and MsexSNMP-1 with either MsexOr-1 or MsexOr-4 in a heterologous expression system, MsexOrco expression alone was already sufficient to increase intracellular Ca2+ levels spontaneously as a result of its property as leaky, non-specific cation channel, and in response to VUAA1 application (Nolte et al., 2013). Both in developing hawkmoth pupae and differentiating primary cell cultures of hawkmoth ORNs, Orco expression started during a developmental time window where ORNs did not yet express pheromone receptors but where Orco affected spontaneous activity and intracellular Ca2+ levels dependent on VUAA1 (Nolte et al., 2016). In vitro patch clamp studies of differentiating cultured hawkmoth ORNs during this time window of pupal development characterized ion channels/currents with properties of Orco as a leaky, non-specific cation channel/current that depends on protein kinase C and cyclic nucleotides (Dolzer et al., 2021, 2008; Krannich and Stengl, 2008; Stengl, 1993). Thus, Orco homomers are present in developing hawkmoth ORNs during a time window where ORNs already express spontaneous activity but they do not heteromerize with pheromone receptors. However, we do not know whether and in what ratio homo- and heteromers of Orco and ORs are present in the respective sensillum compartments of adult hawkmoths because all OR-specific antibodies tested did not work in immunocytochemical studies of hawkmoth antennae (Nolte et al., 2013; Stengl, 1994; Stengl and Hildebrand, 1990). Our hypothesis of differential distribution of Orco homomers in the some and dendrite compartment, and OR-Orco heteromers in the cilia is based on differential immunocytochemical localization of Drosophila ORs mainly in the cilia compartment (Benton et al., 2006).
 
 We clarified our manuscript accordingly.
 
 (4) Some aspects of the authors' models, most notably the decision to phase align/optimize their DD and OLC15 recordings, are likely to bias their interpretations.
 
 It is consensus that insects display daily and circadian rhythms in pheromone-dependent mating, odor-gated feeding, and egg-laying behavior that phase-locks to environmental rhythms, corresponding with daily/circadian rhythms of sensory neuron physiology (e.g., Merlin et al., 2007; Rymer et al., 2007; Schendzielorz et al., 2015, 2012). However, circadian rhythms can be easily masked by stress, like the disturbances during an experimentally very challenging long-term recording experiment over several days. In addition, we observed over the years in our animal raising facility that in 17:7 light-dark cycles the originally nocturnal hawkmoths M. sexta distribute their activity patterns over the course of the day, finding nocturnal as well as diurnal hawkmoths. Thus, light-dark cycles were not enough to ensure phase-synchronized behavioral rhythms, and it is very likely that the nocturnal hawkmoths, next to stress signals, rely heavily on pheromone/odor dependent synchronization as also found in other moth species (Ghosh et al., 2024). Because we focus on spontaneous activity and not on pheromone-dependent physiology in this study, we used isolated males that were never exposed to the female pheromones, taking phase dispersal into account. Therefore, it became necessary in free-running conditions to first determine the respective behavioral rhythm for each animal, and then to phase-align their activity patterns to allow for statistical analysis. Otherwise, circadian differences would average out in a phase-dispersed free-running population. As requested by the referees in point (7), we added RAIN to test for rhythmicity in each of our recordings and revised the manuscript accordingly.
 
 Furthermore, in preliminary experiments we briefly exposed hawkmoths to pheromone the night before the start of the experiment. However, we failed to obtain phase-synchronized spiking rhythms. Most likely, a circadian pattern of pheromone exposure would have been necessary as zeitgeber, which could not be used here due to long-term pheromone-dependent effects in spiking activity. These results are added as supplementary figure to Fig 3.
 
 (5) The tip recordings from long trichoid sensilla are critical aspects of this study. These recordings were carried out on upper sensillar tips located on the distal-most second annulus. Since there are approximately 80 annuli on the Manduca antennae, it is unclear whether the recordings are representative of the antennal response.
 
 We think the reviewers might have misinterpreted our description of the recording site. In the Methods, we state that we clip off the 20 most distal annuli (leaving a stump of about 60 annuli) and insert the reference electrode into the flagellum up to the second annulus from the cut end, i.e., the recording sites are located at 2/3 – 3/4 of the antenna length as seen from the head of the animal. We clarified this in the Methods section.
 
 In addition, our lab did show with antibody stainings against Orco that apparently all ORNs that innervate long and short trichoid sensilla along the whole flagellum express the same staining pattern (Nolte et al., 2016). Lee and Strausfeld (1990) mapped all types of antennal sensilla, and together with pheromone-dependent tip-recordings of Kaissling et al. (1989) it was shown that most of the male antennal sensilla are pheromone-sensitive long trichoid sensilla, with one of the two innervating ORNs always responding to bombykal, ensuring high sensitivity to pheromone detection. Furthermore, our patch clamp recordings of primary cell cultures of whole male antennae found largely overlapping ion channel populations across ORNs (review: (Stengl, 2010)). This would indicate that all ORNs, whether they express ORs sensitive to pheromone or general odorants, could potentially share the same Orco-dependent spontaneous activity rhythms. Furthermore, in our lab, different experimenters from different years that recorded from long trichoid sensilla on different annuli did not detect obvious differences in neither the spontaneous activity nor the pheromone responses (c.f., Dolzer et al., 2003; Gawalek and Stengl, 2018; Schneider et al., 2025). Thus, it is very likely that we are reporting a general encoding mechanism that is not locally restricted along the antennal flagellum and is very likely shared by all types of OR-Orco expressing ORNs.
 
 (6.1) The authors do not provide any data in support of their cAMP/cGMP-based Orco gating…
 
 There are publications supporting cyclic nucleotide gating of Orco in Drosophila, but only after previous phosphorylation via protein kinase C (PKC; review: (Wicher and Miazzi, 2021)). Since Orco is very conserved among insect species, it is likely that PKC- and cGMP/cAMP-dependent regulations are present for Orco in other insect species. To test this, we are currently characterizing second messenger-dependence of spontaneous spiking activity, which is the focus of a follow-up manuscript. Nevertheless, to provide more evidence for our hypothesis of the current manuscript, we added a new set of tip-recording experiments that demonstrate cAMP-dependent gating of Orco. Because of the addition of this figure, we merged figures 8-10 into Figure 8 and added the cAMP data as Figure 9.
 
 (6.2) … and the PTTF model proposed is somewhat disappointing.
 
 For a detailed introduction of our PTFL membrane clock hypothesis please see our opinion paper that we refer to in the manuscript (Stengl and Schneider, 2024). We added clarification of how Orco activation can influence cAMP levels. A more elaborate PTFL clock model including many more of the identified ion channels in hawkmoth ORNs is the focus of another manuscript to come.
 
 (6.3) The model seems to be influenced by their long-held proposal that insect olfactory signaling has a critical metabotropic component involving cyclic nucleotides, PKC, etc, a view that may be influenced by the use of Orco homomeric complexes generated in HEK cells.
 
 Indeed, we propose a metabotropic pheromone-transduction cascade, which in moths and cockroaches is based on G-protein-mediated activation of phospholipase C but not on adenylyl cyclase activation. Our hypothesis is not influenced by HEK cell heterologous expression studies of Orco but is supported by our own work comparing in vivo tip recordings of intact hawkmoths with patch clamp experiments on hawkmoth primary cell cultures of olfactory receptor neurons, which are able to respond to their species-specific pheromones in vitro (Schneider et al., 2025; Stengl, 2010; Stengl and Funk, 2013; Wicher and Miazzi, 2021). In addition, a multitude of publications by other laboratories with in vivo and in vitro studies using physiological, genetic, and immunocytochemical assays all support a metabotropic signal transduction cascade in insect olfaction (Stengl, 2010; Stengl and Funk, 2013; Takagi et al., 2025; Wicher and Miazzi, 2021). In contrast, the hypothesis suggesting a solely ionotropic pheromone- and general odor-dependent transduction cascade for all insect species is based on very sparse experimental evidence, based primarily on heterologous expression studies such as HEK cells that lack the insect’s WT molecular surroundings, and thus, cannot predict OR-Orco function in vivo. Furthermore, the ionotropic hypothesis is heavily based upon the argument that an inverse 7TM receptor cannot couple to G-proteins, which lacks careful backup via biochemical and structural studies. In addition, the ionotropic hypothesis lacks support via carefully performed physiological in vivo studies in different insect species that paid attention to analysis of the distinct kinetic components of ORN´s odor/pheromone responses and that employ physiological concentrations and durations of odor/pheromone stimuli (please see our most recent publication by Schneider et al. (2025)). We added references to the possible odor transduction mechanisms to the introduction.
 
 (6.4) Nevertheless, structural studies on Orco do not support a cyclic nucleotide binding site, although PKC-based phosphorylation has been implicated in the fine-tuning/adaptation of olfactory signaling.
 
 While structural studies did not find evidence for conserved known cyclic nucleotide binding sites on Orco, this does not exclude the presence of indirect cAMP effects via e.g., Orco subunits complexing with other molecules under direct cAMP control, such as other ion channel subunits. Furthermore, it does not exclude so far unknown binding sites, or via sites that fold out only after a specific sequence of previous phosphorylations of the many phosphorylation sites on Orco. Indeed, physiological studies in Drosophila presented evidence for cyclic nucleotide dependence of Orco after previous PKC-dependent phosphorylation (Getahun et al., 2013). Our ongoing in vivo experiments in hawkmoths further corroborate a zeitgeber time-dependent PKC- and cyclic nucleotide-dependent modulation of Orco. These detailed studies will be published in a follow-up publication. In the revised version of this manuscript, we added tip-recording experiments that indicate cAMP involvement in Orco gating (new Figure 9).
 
 (7) Because only 5/11 LD and 7/10 DD animals showed daily rhythms, with averages lacking clear daily modulation, the methods are not sufficiently reliable enough to reveal novel underlying mechanisms of circadian rhythm generation. The reported results are therefore not yet reliable or quantifiable. To quantify their results, the authors should apply tests for circadian rhythmicity using methods such as RAIN, JTK CYCLE, MetaCycle, or Echo. The use of FFT and Wavelet is applauded, but these methods do not have tests of significance for rhythms and can be biased when analyzing data in which there could only be 1-3 circadian cycles. Because the conclusions appear to be based on 11-12 neurons that were recorded for 2-4 days, the reader is concerned that the methods are not yet perfected to provide strong evidence for circadian regulation of spontaneous firing of ORNs. The average data (e.g., Figure 3Bii and 3Cii) highlight the apparent lack of daily rhythms. In summary, the results would be more compelling if more than 50% of the recordings had significant circadian amplitudes and with similar periods and phases.
 
 The long-term tip-recordings of intact hawkmoths are very challenging and take a very long time to accomplish, thus, we are very happy that we succeeded in obtaining so many of them (N=40). We are thankful to the reviewers’ suggestion to use RAIN since this analysis revealed circadian rhythms in 7 of 11 LD recordings, 8 of 12 DD recordings, and 2 of 12 OLC15 recordings. Please see also our response to (4) above, commenting the phase-dispersal of activity rhythms observed in our experiments, as well as in the behavior of hawkmoth males in the mating cage.
 
 (8) The statement that circadian patterns of ORN firing are lost with the Orco antagonist (OLC15) is not strongly supported. The manuscript should be revised to quantify how Orco changed circadian amplitude in the 12 recorded neurons. Measures of circadian amplitude can avoid confusing/vague statements like Line 394 “low and high frequency bands appeared to merge during the activity phase around ZT 0 in the animals that showed clear circadian rhythms (N = 5 of 11 in LD)”. The conclusion that Orco blocks circadian firing appears to be contradicted by Figure 6, which indicates that ~6 of these neurons had circadian periods detected by wavelet. The manuscript would be strengthened with details about the specificity and reproducibility of the Orco antagonist. The authors quantify the gradual decrease in firing with the slope of a linear fit to estimate how the “effectiveness [of OLC15] increased over time.” They conclude that the drug “obliterated circadian rhythms and attenuated the spontaneous activity in several, but not all experiments (N = 8 of 12).” The report would be greatly strengthened with corroborating data from additional Orco antagonists and additional doses of OLC15 (the authors use only 50 uM OLC15).
 
 According to the valuable suggestions of the referees, we used RAIN to detect circadian rhythms in the spiking attributes in each individual animal. Since only 2 of 12 animals displayed a circadian rhythm in OLC15, statistical comparison of circadian amplitudes is not possible. We revised the results section accordingly and added to the figure legend to make it clearer that the heat maps in Fig 5 are representative from one animal each and not averages across animals.
 
 As the reviewer states correctly in (7), wavelet results of circadian rhythmicity must be interpreted carefully because of the low number of circadian cycles in ~3-4 day recordings. Since the heatmaps in Figure 5 visually revealed the presence of ultradian rhythms, the main focus of the wavelet analysis in Figure 6 is in the detection and quantification of ultradian periods up to 20 h.
 
 We revised the Methods section to include references to previous experiments that characterized the effect of different doses of OLC15 and other Orco antagonists and agonists in M. sexta antennae (Nolte et al., 2016). Please see also our response to (1).
 
 (9) The manuscript includes several statements that are more speculation than conclusion. For example, there is no evidence for tuning or plasticity in this report. Statements like the following should be removed or addressed with experiments that show changes in odor response specificity or sensitivity: "ORN signalosomes are highly plastic endogenous PTFL clocks comprising receptors for circadian and ultradian Zeitgebers that allow to tune into internal physiological and external environmental rhythms as basis for active sensing." (Discussion Line 622). The paper concludes that (line 380) "mean frequency of spontaneous spiking and the frequency of bursting expressed daily modulation, and are both most likely controlled via a circadian clock that targets the leak channel Orco." This is too bold given the available results.
 
 We revised the manuscript accordingly and clarified which statements are supported via published evidence and which are predictions based upon our novel hypothesis published in our opinion paper (Stengl and Schneider, 2024).
 
 (10.1) Because Orco conductance is modulated by cyclic nucleotides, it remains highly plausible that circadian regulation occurs upstream at the level of signaling pathways (e.g., calcium, calcium-binding proteins, GPCRs, cyclases, phosphodiesterases).
 
 We agree with the referees that it is very likely that there are multiple layers of interconnected feedback cycles that control Orco localization and activity. Our novel hypothesis suggests interlocked TTFL and PTFL control of physiological circadian rhythms, not strictly hierarchical TTFL control, which would require a daily turnover of membrane proteins and transcriptional control via the established TTFL clock in insect ORNs. We are currently searching for TTFL control at all levels of odor/pheromone transduction using ZT-dependent transcriptomics in combination with qPCR and single-nucleus transcriptomics, involving also all the molecules suggested by the referees. These studies are ongoing, are very time- and money-consuming, and are beyond the scope of this manuscript. However, we added a set of experiments to this manuscript in which we demonstrate that the effect of increased cAMP on the spontaneous spiking activity is mediated by Orco (new Figure 9).
 
 (10.2) The possibility that circadian oscillations of cyclic nucleotides are generated by the canonical TTFL mechanism has not been excluded. In fact, extensive work in Drosophila has demonstrated that the TTFL-based molecular clock proteins are required for circadian rhythms in olfaction.
 
 Our experiments that test circadian TTFL control at different levels of the cAMP transduction cascade in hawkmoth antennae are on the way and are part of another publication. In section 6.2 we already stated that our experiments do not exclude that Orco is under indirect control of the TTFL. We revised our discussion accordingly.
 
 The experiments published for TTFL dependent control of Drosophila olfaction that we are aware of (Krishnan et al., 1999; Tanoue et al., 2004) do not exclude interlinked PTFL and TTFL clocks. Krishnan et al. (1999) demonstrated that the TTFL clock in antennal olfactory receptor neurons correlates with circadian rhythms in odor responses measured in electroantennogram (EAG) recordings, not in single sensillum recordings as in our experiments. EAG recordings comprise not only voltage responses of the olfactory sensory neurons but also voltage changes generated in non-neuronal antennal cells such as trichogen and tormogen cells that built the transepithelial potential gradient via vATPases that generates the high K+ concentration in the sensillum lymph (Jain et al., 2024; Klein, 1992; Thurm and Küppers, 1980). In addition, EAG recordings most likely contain responses of afferent neurons originating from somata in the brain that maintain central control of the antennae. Thus, EAG recordings are difficult to interpret.
 
 (11) A defining feature of circadian oscillators is the feedback mechanism that generates a time delay (e.g., PERIOD/TIMELESS repressing their own transcription). While the authors describe how cyclic nucleotides can regulate Orco conductance, they do not provide a convincing explanation of how Orco activity could, in turn, feed back into the proposed PTFL to sustain oscillations. For these reasons, the authors should consider:
 
 (a) Providing a broader discussion of non-TTFL models of circadian rhythms (e.g., redox cycles, post-translational modifications).
 
 We revised the discussion accordingly.
 
 (b) Reassessing Orco expression using a higher-resolution temporal sampling ({greater than or equal to}6 timepoints per 24 h).
 
 We added those experiments to the revised version of the manuscript (see our response to (2)).
 
 (c) Clarifying or revising the PTFL model to explicitly address how feedback would be achieved. Alternatively, the data may be more consistent with Orco conductance rhythms being regulated by post-translational mechanisms downstream of the canonical TTFL oscillator, as suggested by the Drosophila olfactory system literature.
 
 We added possible negative feedback elements to the Discussion to explain how our proposed PTFL could in principle work independent of TTFL clock.
 
 Minor weaknesses:
 
 (1) The authors should compare the firing patterns of ORN neurons to the bursts, clusters, and packets of retinal efferent spikes reported in Liu JS and Passaglia CL (2011; JBR). By comparing measures in moths to measures in Limulus, the authors might be able to address the question: Is the daily firing pattern of ORN neurons likely a conserved feature of circadian control of sensory sensitivity?
 
 We have revised the discussion accordingly.
 
 (2) The methods need further details. For example, it is unclear if or how single neuron activity was discriminated and whether the results were compromised by the relatively large environmental fluctuations in temperature (21-27oC), humidity (35-60%), or other cues known to modulate spontaneous firing.
 
 These large fluctuations stem from doing experiments at different seasons (higher temperature and humidity in the summer months, lower temperature and humidity in winter). Throughout each individual experiment, conditions were stable. We clarified the Methods section accordingly.
 
 Recommendations for the authors:
 
 The authors should post the code for their computational model to a repository like GitHub.
 
 The code for the computational model is now available at https://github.com/a-c-schneider/VijayanForlinoEtAl2025_Model.git
 
 References
 
 Benton R, Sachse S, Michnick SW, Vosshall LB. 2006. Atypical Membrane Topology and Heteromeric Function of Drosophila Odorant Receptors In Vivo. PLOS Biology 4:e20. DOI: https://doi.org/10.1371/journal.pbio.0040020
 
 Chen S, Luetje CW. 2012. Identification of New Agonists and Antagonists of the Insect Odorant Receptor Co-Receptor Subunit. PLOS ONE 7:e36784. DOI: https://doi.org/10.1371/journal.pone.0036784
 
 Dolzer J, Fischer K, Stengl M. 2003. Adaptation in pheromone-sensitive trichoid sensilla of the hawkmoth Manduca sexta. Journal of Experimental Biology 206:1575–1588. DOI: https://doi.org/10.1242/jeb.00302
 
 Dolzer J, Krannich S, Stengl M. 2008. Pharmacological Investigation of Protein Kinase C- and cGMP-Dependent Ion Channels in Cultured Olfactory Receptor Neurons of the Hawkmoth Manduca sexta. Chemical Senses 33:803–813. DOI: https://doi.org/10.1093/chemse/bjn043
 
 Dolzer J, Schröder K, Stengl M. 2021. Cyclic nucleotide-dependent ionic currents in olfactory receptor neurons of the hawkmoth Manduca sexta suggest pull–push sensitivity modulation. European Journal of Neuroscience 54:4804–4826. DOI: https://doi.org/10.1111/ejn.15346
 
 Gawalek P, Stengl M. 2018. The Diacylglycerol Analogs OAG and DOG Differentially Affect Primary Events of Pheromone Transduction in the Hawkmoth Manduca sexta in a Zeitgebertime-Dependent Manner Apparently Targeting TRP Channels. Frontiers in Cellular Neuroscience 12:218. DOI: https://doi.org/10.3389/fncel.2018.00218
 
 Getahun MN, Olsson SB, Lavista-Llanos S, Hansson BS, Wicher D. 2013. Insect Odorant Response Sensitivity Is Tuned by Metabotropically Autoregulated Olfactory Receptors. PLOS ONE 8:e58889. DOI: https://doi.org/10.1371/journal.pone.0058889
 
 Ghosh S, Suray C, Bozzolan F, Palazzo A, Monsempès C, Lecouvreur F, Chatterjee A. 2024. Pheromone-mediated command from the female to male clock induces and synchronizes circadian rhythms of the moth Spodoptera littoralis. Current biology 34:1414-1425.e5. DOI: https://doi.org/10.1016/j.cub.2024.02.042, PMID: 38479388
 
 Jain K, Prelic S, Hansson BS, Wicher D. 2024. Expression of Drosophila melanogaster V-ATPases in Olfactory Sensillum Support Cells. Insects 15:1016. DOI: https://doi.org/10.3390/insects15121016
 
 Jones PL, Pask GM, Rinker DC, Zwiebel LJ. 2011. Functional agonism of insect odorant receptor ion channels. Proceedings of the National Academy of Sciences 108:8821–8825. DOI: https://doi.org/10.1073/pnas.1102425108
 
 Kaissling KE, Hildebrand JG, Tumlinson JH. 1989. Pheromone receptor cells in the male moth Manduca sexta. Archives of Insect Biochemistry and Physiology 10:273–279. DOI: https://doi.org/10.1002/arch.940100403
 
 Klein U. 1992. The insect V-ATPase, a plasma membrane proton pump energizing secondary active transport: immunological evidence for the occurrence of a V-ATPase in insect ion-transporting epithelia. Journal of Experimental Biology 172:345–354. DOI: https://doi.org/10.1242/jeb.172.1.345
 
 Krannich S, Stengl M. 2008. Cyclic Nucleotide-Activated Currents in Cultured Olfactory Receptor Neurons of the Hawkmoth Manduca sexta. Journal of Neurophysiology 100:2866–2877. DOI: https://doi.org/10.1152/jn.01400.2007
 
 Krishnan B, Dryer SE, Hardin PE. 1999. Circadian rhythms in olfactory responses of Drosophila melanogaster. Nature 400:375–378. DOI: https://doi.org/10.1038/22566
 
 Lee JK, Strausfeld NJ. 1990. Structure, distribution and number of surface sensilla and their receptor cells on the olfactory appendage of the male mothManduca sexta. Journal of Neurocytology 19:519–538. DOI: https://doi.org/10.1007/BF01257241
 
 Merlin C, Lucas P, Rochat D, François M-C, Maïbèche-Coisne M, Jacquin-Joly E. 2007. An Antennal Circadian Clock and Circadian Rhythms in Peripheral Pheromone Reception in the Moth Spodoptera littoralis. Journal of Biological Rhythms 22:502–514. DOI: https://doi.org/10.1177/0748730407307737
 
 Nolte A, Funk NW, Mukunda L, Gawalek P, Werckenthin A, Hansson BS, Wicher D, Stengl M. 2013. In situ Tip-Recordings Found No Evidence for an Orco-Based Ionotropic Mechanism of Pheromone-Transduction in Manduca sexta. PLOS ONE 8:e62648. DOI: https://doi.org/10.1371/journal.pone.0062648
 
 Nolte A, Gawalek P, Koerte S, Wei H, Schumann R, Werckenthin A, Krieger J, Stengl M. 2016. No Evidence for Ionotropic Pheromone Transduction in the Hawkmoth Manduca sexta. PLOS ONE 11:e0166060. DOI: https://doi.org/10.1371/journal.pone.0166060
 
 Rymer J, Bauernfeind AL, Brown S, Page TL. 2007. Circadian rhythms in the mating behavior of the cockroach, Leucophaea maderae. Journal of Biological Rhythms 22:43–57. DOI: https://doi.org/10.1177/0748730406295462, PMID: 17229924
 
 Schendzielorz J, Schendzielorz T, Arendt A, Stengl M. 2014. Bimodal Oscillations of Cyclic Nucleotide Concentrations in the Circadian System of the Madeira Cockroach Rhyparobia maderae. Journal of Biological Rhythms 29:318–331. DOI: https://doi.org/10.1177/0748730414546133
 
 Schendzielorz T, Peters W, Boekhoff I, Stengl M. 2012. Time of Day Changes in Cyclic Nucleotides Are Modified via Octopamine and Pheromone in Antennae of the Madeira Cockroach. Journal of Biological Rhythms 27:388–397. DOI: https://doi.org/10.1177/0748730412456265
 
 Schendzielorz T, Schirmer K, Stolte P, Stengl M. 2015. Octopamine Regulates Antennal Sensory Neurons via Daytime-Dependent Changes in cAMP and IP3 Levels in the Hawkmoth Manduca sexta. PLOS ONE 10:e0121230. DOI: https://doi.org/10.1371/journal.pone.0121230
 
 Schneider AC, Schröder K, Chang Y, Nolte A, Gawalek P, Stengl M. 2025. Hawkmoth Pheromone Transduction Involves G-Protein–Dependent Phospholipase Cβ Signaling. eNeuro 12:ENEURO.0376-24.2024. DOI: https://doi.org/10.1523/ENEURO.0376-24.2024, PMID: 39880675
 
 Stengl M. 2010. Pheromone Transduction in Moths. Frontiers in Cellular Neuroscience 4:133. DOI: https://doi.org/10.3389/fncel.2010.00133
 
 Stengl M. 1994. Inositol-trisphosphate-dependent calcium currents precede cation currents in insect olfactory receptor neurons in vitro. Journal of Comparative Physiology A 174:187–194. DOI: https://doi.org/10.1007/BF00193785
 
 Stengl M. 1993. Intracellular-Messenger-Mediated Cation Channels in Cultured Olfactory Receptor Neurons. Journal of Experimental Biology 178:125–147. DOI: https://doi.org/10.1242/jeb.178.1.125
 
 Stengl M, Funk NW. 2013. The role of the coreceptor Orco in insect olfactory transduction. Journal of Comparative Physiology A 199:897–909. DOI: https://doi.org/10.1007/s00359-013-0837-3
 
 Stengl M, Hildebrand JG. 1990. Insect olfactory neurons in vitro: morphological and immunocytochemical characterization of male-specific antennal receptor cells from developing antennae of male Manduca sexta. Journal of Neuroscience 10:837–847. DOI: https://doi.org/10.1523/JNEUROSCI.10-03-00837.1990, PMID: 2319305
 
 Stengl M, Schneider AC. 2024. Contribution of membrane-associated oscillators to biological timing at different timescales. Frontiers in Physiology 14:1243455. DOI: https://doi.org/10.3389/fphys.2023.1243455
 
 Takagi S, Abuin L, Mermet J, Lee D, Benton R. 2025. A GPCR signaling pathway in insect odor detection. DOI: https://doi.org/10.1101/2025.10.03.680299
 
 Tanoue S, Krishnan P, Krishnan B, Dryer SE, Hardin PE. 2004. Circadian Clocks in Antennal Neurons Are Necessary and Sufficient for Olfaction Rhythms in Drosophila. Current Biology 14:638–649. DOI: https://doi.org/10.1016/j.cub.2004.04.009, PMID: 15084278
 
 Thurm U, Küppers J. 1980. Epithelial physiology of insect sensilla. In: Locke M, Smith DS (Eds). Insect Biology in the Future. Academic Press. p. 735–763. DOI: https://doi.org/10.1016/B978-0-12-454340-9.50039-2
 
 Wicher D, Miazzi F. 2021. Functional properties of insect olfactory receptors: ionotropic receptors and odorant receptors. Cell and Tissue Research 383:7–19. DOI: https://doi.org/10.1007/s00441-020-03363-x
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.06.17.659282v3
www.biorxiv.org www.biorxiv.org

Mycobacterium tuberculosis partitions the Krebs cycle under iron starvation

4
1. Public_Reviews 03 Jun 2026
 
 in eLife
 
 eLife Assessment
 
 This well-designed, valuable study uses isotope tracing to analyse how iron limitation alters TCA cycle metabolism in Mycobacterium tuberculosis, revealing potential antibiotic targets for non-replicating bacteria in the host. The evidence is solid, providing insights into metabolic remodelling under iron-limited conditions.
 
 Summary
2. Public_Reviews 03 Jun 2026
 
 in eLife
 
 Reviewer #1 (Public review):
 
 M. tuberculosis exhibits metabolic flexibility, enabling it to adapt to various environmental stresses, including antibiotic treatment. In this manuscript, Serafini et al. investigate the metabolic remodeling of M. tuberculosis used to survive iron-limited conditions by employing LC-MS metabolomics and 13C isotope tracing experiments. The results demonstrate that metabolic activity in the oxidative branch of the TCA cycle slows down, while the reductive branch is reverted to facilitate the biosynthesis of malate, which is subsequently secreted.
 
 Overall, this study is experimentally well-designed, particularly the use of 13C isotope tracing to monitor TCA cycle remodeling under iron-limited conditions. The findings are valuable as they offer potential new targets for antibiotics aimed at non-replicating M. tuberculosis occurring in the hosts.
 
 Comments on revised version:
 
 All concerns are well addressed.
 
 I have one minor concern: Page 3 line 16 - Fig. 1G & H: The kinetics of ATP levels between H37Rv and Erdman seem different; Erdman induces greater ATP at days 2 and 3 after DFO treatment, which was not clear in H37Rv. Fig. 1I shows NAD/NADH ratio not NADH/NAD ratio. Please change it to NADH/NAD+ to be consistent with Supplement Fig. 1 result. Include the 17-day result of NADH/NAD+ in the discussion section to explain the different viability between the two strains.
 
 Review 1
3. Public_Reviews 03 Jun 2026
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The authors investigated the effect of prolonged iron limitation (which does stop growth but does not lead to cell death) alters central metabolism in M. tuberculosis. The major tool they used is metabolomics combined with stable isotope tracing. They show that the Krebs cycle is still active, despite the fact that it is dependent on some iron-dependent enzymes. They show that carbon flux through the oxidative branch of the Krebs cycle is stalled, resulting in the accumulation of metabolites, such as malate and alpha-ketoglutarate that are partially secreted. Apparently, the carbon flux from glycolysis is partially diverted to the reductive branch of the Krebs cycle. This is not achieved by using the glyoxylate shunt but probably through the GABA shunt. This unprecedented split of the Krebs cycle and malate secretion allows a continuous flow of carbon through the core of carbon metabolism, overcoming the metabolic stalling triggered by iron starvation.
 
 Strengths:
 
 Novel insight in the central metabolism of a major pathogen and its adaptation to iron starvation. Carefully conducted experimentation. Paper ends with a clear and helpful model.
 
 Weaknesses:
 
 The authors show some surprising and important findings, but would need a little more effort to really substantiate this. Especially the role of the GABA shunt should be genetically tested, as they did for ICL and the glyoxylate shunt.
 
 Also, the dataset 1 is not very convincing, it is only based on transcriptomics and shown with up or down, hardly a strong base for major conclusions. The very least you want is actual differences, preferable on the protein level, where it really counts....
 
 Comments on the revised version:
 
 In the revised version all these points were appropriately dealt with and discussed, although some of them textually and not experimentally, but for reasons that are logical.
 
 Review 2
4. Public_Reviews 03 Jun 2026
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 eLife Assessment
 
 This well-designed, valuable study uses isotope tracing to analyse how iron limitation alters TCA cycle metabolism in Mycobacterium tuberculosis, revealing potential antibiotic targets for non-replicating bacteria in the host. The findings provide insights into metabolic remodelling under iron-limited conditions. Whilst some of the evidence is solid, the data around the GABA shunt is incomplete, requiring genetic validation, as was done for the glyoxylate shunt. Questions remain about the underlying mechanisms and their specific role in M. tuberculosis pathogenesis.
 
 We thank the Editor and the reviewers for the positive evaluation of our work and for the constructive comments, which helped us improve the manuscript. We have carefully considered all the points raised and addressed them to the best of our ability. Regarding the GABA shunt, we acknowledge that genetic validation would significantly strengthen our conclusions; as this was not feasible within the revision timeframe, we have revised the relevant section by adopting more cautious language and have included genetic validation among the future perspectives. Additionally, we have expanded the discussion to address the relevance of our findings in the context of Mtb pathogenesis and host-pathogen interaction. A point-by-point response to each comment is provided below.
 
 We also made minor adjustments to the main text and figures:
 
 We removed “normalised” from the Y-axis of Figure 1 (the data are normalised and the procedure is described in the Materials and Methods).
 
 We rearranged the order of a paragraph in the Introduction: the first paragraph “During infection pathogenic bacteria […] extensively investigated” has been moved down, (page 2, lines 8-12). -We edited two sentences in the Introduction (page 2, lines 4-7)
 
 Supplementary Information: we added the following sentence at page 4, lines 23-24: “The probability of the Figure 3 and 4–figure supplement 1E scenario should be equivalent to that of the Figure 3 and 4–figure supplement 1F scenario.”
 
 We made minor typing adjustments: page 3, lines 30 and 31; page 4, lines-11-12, lines 22-24; page 5, lines 23-24; page 7, line 6; page 12, lines 28 and 32.
 
 We added details to the Materials and Methods section at page 17, lines 1 and 19-21.
 
 Public Reviews:
 
 Reviewer #1 (Public review):
 
 M. tuberculosis exhibits metabolic flexibility, enabling it to adapt to various environmental stresses, including antibiotic treatment. In this manuscript, Serafini et al. investigate the metabolic remodeling of M. tuberculosis used to survive iron-limited conditions by employing LC-MS metabolomics and 13C isotope tracing experiments. The results demonstrate that metabolic activity in the oxidative branch of the TCA cycle slows down, while the reductive branch is reverted to facilitate the biosynthesis of malate, which is subsequently secreted.
 
 Overall, this study is experimentally well-designed, particularly the use of 13C isotope tracing to monitor TCA cycle remodeling under iron-limited conditions. The findings are valuable as they offer potential new targets for antibiotics aimed at non-replicating M. tuberculosis occurring in the hosts. However, despite these strengths, the reviewer has concerns regarding the mechanistic basis underlying the observed metabolic remodeling and its role in M. tuberculosis pathogenesis.
 
 We thank the reviewer for the positive evaluation of our work and for the constructive comments. Regarding the role of the observed metabolic remodelling in Mtb pathogenesis, we have expanded the discussion to address this aspect, contextualising our findings within the framework of Mtb infection and host-pathogen interaction (page 13, line 28-37; page 14, lines 1-23). Detailed responses to each specific comment are provided below.
 
 Major comments
 
 The authors argue that iron starvation is a physiologically relevant stressor encountered by M. tuberculosis post-infection. Using Erdman and H37Rv strains under DFO conditions, Erdman loses viability, whereas H37Rv maintains it. Nonetheless, both strains exhibit similar metabolic remodeling in the TCA cycle based upon metabolomics and isotope tracing data. The authors should clarify the specific metabolic adaptations in H37Rv that enable it to sustain viability under DFO conditions.
 
 We thank the reviewer for this observation. Following additional experiments performed in response to subsequent comments, we re-analysed the secreted metabolite data and monitored ATP, NADH, and NAD+ levels over 17 days in both the Erdman and H37Rv strains. The results were concordant between the two strains, supporting the hypothesis that the decrease in CFU/mL over time does not reflect a loss of viability, but rather entry into a non-culturable state or, alternatively, an increased tendency to aggregate in liquid culture. Comments have been added at page 3, lines 16-24 and page 5, lines 30-36
 
 A mechanistic explanation of how Mtb sustains viability under iron starvation is provided at page 13, lines 2837.
 
 The authors report no significant changes in NAD/NADH and ATP levels in H37Rv and Erdman exposed to DFO conditions. They observe TCA cycle remodeling, particularly the reversal of the reaction between OAA and MAL, catalysed by malate dehydrogenase, an enzyme that uses NAD+ and NADH as cofactors. The directionality of this reaction likely depends on the relative levels of NAD+ and NADH. Additionally, other dehydrogenases, such as pyruvate DH and aKG DH, also require NAD+/NADH cofactors.
 
 We thank the reviewer for this important observation. We agree that the directionality of the malate dehydrogenase reaction, as well as the activity of other NAD+/NADH-dependent dehydrogenases, is likely influenced by the redox state of the cell. We therefore measured the NADH/NAD+ ratio over 17 days in both strains under DFO conditions. We also note that the Y-axis title in Figure 1 was incorrectly reported and has been corrected accordingly. Results and interpretation of these new data are provided at:
 
 page 3 lines 16-21
 
 page 11 lines 16-36
 
 page 12 lines 1-9
 
 page 13 lines 3-5
 
 In Figure 1I, NAD+ and NADH levels are monitored only at day 3 post-exposure to DFO conditions. Since Erdman loses viability after 2-3 weeks, the authors should include measurements of NAD+, NADH, and ATP levels at weekly intervals up to 3 weeks.
 
 We thank the reviewer for this suggestion. As recommended, we extended the monitoring of NAD+, NADH, and ATP levels over 17 days in both strains. Results and interpretation have been discussed together and are reported in the manuscript. Please refer to the response above for the relevant page and line references.
 
 Furthermore, glycine levels - which are linked to NAD+ recycling via the conversion of glyoxylate - should be measured under both HI and DFO conditions as an indirect indicator of the NAD+/NADH ratio.
 
 We thank the reviewer for this comment. However, we believe that glycine levels cannot be considered a reliable indirect indicator of the NAD+/NADH ratio, as glycine is involved in multiple metabolic pathways. It can originate from serine, threonine, glyoxylate, or protein degradation, and can be incorporated into proteins, degraded to CO2 and NH4+, converted to glyoxylate, or transformed into other amino acids. Due to its metabolic versatility, therefore, glycine levels lack the specificity required to reliably reflect the cellular NAD+/NADH ratio. In addition, we could not find a single study that claim that glycine levels can be used as indicators of NAD+/NADH ratio.
 
 Nevertheless, this comment prompted us to examine glycine levels and isotopologue distribution under iron deprivation. Glycine levels showed no consistent trend under DFO conditions, remaining unchanged or increasing in both the Erdman and H37Rv backgrounds.
 
 Importantly, the isotopologue distribution analysis led us to conclude that glyoxylate is not a key precursor of glycine under iron starvation. This new analysis is described at page 10 (lines 1-20), and a new supplementary figure has been added, Figure 3 and 4 – figure supplement 3.
 
 In Figure 2A, it is unclear why a 100-fold accumulation of aKG does not correspond proportionally to the accumulation of (iso)citrate.
 
 We thank the reviewer for this observation. We agree that this point required clarification and have added a comment addressing this apparent discrepancy in the main text at page 4, lines 12–17.
 
 The authors state that fumarate, aKG, (iso)citrate, malate, and pyruvate are secreted under DFO conditions. While the secretion of aKG and pyruvate makes sense, given their marked intracellular accumulation, it is puzzling why (iso)citrate, malate, and fumarate are secreted even though there are no changes in their intracellular abundance.
 
 To rule out the possibility that these metabolites are released due to bacterial lysis rather than active secretion, the authors should analyze the 13C-labeled fractions of these metabolites in the culture filtrate using the M. tuberculosis culture in media containing 13C glycerol.
 
 We thank the reviewer for this important observation.
 
 Regarding the possibility of cell lysis, although it cannot be completely ruled out, several observations indicate that the increase in extracellular malate was not due to lysis. If substantial cell lysis had occurred, we would expect a general increase in all extracellular metabolites. However, the extracellular fumarate and succinate levels remained unchanged in both strains under DFO (similarly to the control conditions, HI and LI). Glutamate was detected in the culture filtrate, but its abundance increased only under HI conditions, not under DFO, in either H37Rv or Erdman. The lack of increase in extracellular glutamate, fumarate and succinate, therefore suggests that, even if some cell lysis occurred, it was minimal and did not significantly affect our observations.
 
 Regarding the 13C-fractions, we note that it is unclear how should the labelling profile would differ if extracellular metabolite derived from cell lysis. Nevertheless, as suggested by the reviewer, we compared the labelled fractions of extracellular isocitrate, malate, fumarate and glutamate. The comparison revealed variations consistent with two blocks in the carbon flow occurring at the levels of pyruvate and alpha-ketoglutarate, resulting in a slowdown in the downstream flux.
 
 A description of these new considerations has been added at page 5 (lines 27-36) including the Figure 2 – figure supplement 2 and a new section of SI-Appendix. Therefore, we are confident that the selective appearance of some but not all metabolites in culture filtrates is consistent with secretion but not cell lysis.
 
 To validate the role of the PCK-mediated reductive TCA cycle in malate biosynthesis and secretion under DFO conditions, the authors should generate a malate dehydrogenase (MDH) knockdown strain, considering that MDH is essential, and examine the 13C labeling patterns and NAD/NADH under DFO conditions.
 
 The authors also observe decreased GABA abundance and overall 13C labeling in DFO conditions, suggesting that the GABA shunt is the primary route for succinate biosynthesis under DFO conditions. Thus, it is strongly recommended that the authors perform a 13C glutamate tracing experiment to directly track labeling in aKG and GABA shunt metabolites, providing more definitive evidence for the involvement of the GABA shunt.
 
 We thank the reviewer for these valuable suggestions. We fully agree that both experiments would significantly strengthen the conclusions of our work.
 
 Regarding the MDH knockdown strain, we acknowledge that this experiment would provide direct validation of the PCK/PCA-mediated reductive TCA cycle in malate biosynthesis. However, generating a knockdown strain in Mtb is a technically demanding and time-consuming process, requiring several months even under optimal conditions, which makes it unfeasible within the revision timeframe. We have therefore incorporated this experiment as a future perspective in the conclusions, highlighting its importance for further validating the proposed model.
 
 Regarding the GABA shunt, we took the reviewer's comment as an opportunity to critically re-evaluate the strength of our data. As a result, we have revised the manuscript by merging the GABA shunt discussion with the glyoxylate shunt section, while adopting more cautious language in the concluding statement to reflect its hypothetical nature. The related figures have been moved to the Supplementary Materials. These aspects have been included among the future perspectives in the conclusions. Page 11, lines 10-13; page 14, lines 3-7.
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The authors investigated the effect of prolonged iron limitation (which does stop growth but does not lead to cell death), altering central metabolism in M. tuberculosis. The major tool they used is metabolomics combined with stable isotope tracing. They show that the Krebs cycle is still active, despite the fact that it is dependent on some iron-dependent enzymes. They show that carbon flux through the oxidative branch of the Krebs cycle is stalled, resulting in the accumulation of metabolites, such as malate and alphaketoglutarate, that are partially secreted. Apparently, the carbon flux from glycolysis is partially diverted to the reductive branch of the Krebs cycle. This is not achieved by using the glyoxylate shunt but probably through the GABA shunt. This unprecedented split of the Krebs cycle and malate secretion allows a continuous flow of carbon through the core of carbon metabolism, overcoming the metabolic stalling triggered by iron starvation.
 
 Strengths:
 
 Novel insight into the central metabolism of a major pathogen and its adaptation to iron starvation. Carefully conducted experimentation. The paper ends with a clear and helpful model.
 
 Weaknesses:
 
 The authors show some surprising and important findings, but they would need a little more effort to really substantiate these. Especially the role of the GABA shunt should be genetically tested, as they did for ICL and the glyoxylate shunt.
 
 We thank the reviewer for the positive evaluation of our work. We agree that genetic validation of the GABA shunt would significantly strengthen our conclusions. However, generating the required mutant strains in Mtb is a technically demanding and time-consuming process that is unfeasible within the revision timeframe. In light of this, we have revised the manuscript by merging the GABA shunt discussion with the glyoxylate shunt section. This reorganization contextualizes the GABA shunt within a broader discussion, while adopting more cautious language in the concluding statement to reflect its hypothetical nature. Future genetic validation, including the generation of appropriate mutant strains, has been included among the future perspectives in the conclusions.
 
 Page 11, lines 10-13; page 14, lines 3-7.
 
 Also, dataset 1 is not very convincing, it is only based on transcriptomics and shown with up or down; this is not a strong base for major conclusions. As a minimum, one would want actual differences, preferably on the protein level, where it really counts.
 
 We thank the reviewer for this comment. We would like to clarify that Dataset S1 compiles transcriptomic and proteomic data from previously published studies, which represent the rational basis of our investigation. These data are consistently cited throughout the manuscript. The dataset was included solely as a convenience tool for the reader, to provide easy access to the relevant published information. To avoid any misunderstanding regarding its scope, we have renamed the file to 'Dataset S1 - Publicly available transcriptomic datasets referenced in this study'. Our conclusions derive from the integration of these published data with the novel biochemical and metabolomic evidence generated in this study. Further, to assist the reading, we added a clarifying description at top of “DE” column.
 
 Recommendations for the authors:
 
 Reviewer #1 (Recommendations for the authors):
 
 (1) Clarify the definitions of "growth defect" and "growth arrest" under LI and DFO conditions, respectively.
 
 (2) In Figure 2A, specify the unit of the y-axis. Is it on a log scale?
 
 (3) Raw data of metabolomics and 13C isotope tracing experiments should be either deposited in public websites or provided as a separate file.
 
 We thank the reviewer for these comments.
 
 Regarding the definition of 'growth defect' and 'growth arrest': we replaced 'defect' with 'slowdown' to better reflect the observed phenotype under LI conditions.
 
 Regarding Figure 2A: we have specified the unit of the Y-axis and clarified whether the scale is logarithmic in the figure legend. We have done that for all the figures containing charts with Y/X axis in logarithmic scale. We added secondary tick marks in the charts of Figure 5G.
 
 Regarding raw data availability: the metabolomics data have been deposited in the Zenodo database. The reference number has been added to the manuscript."
 
 Reviewer #2 (Recommendations for the authors):
 
 It is mentioned that measurement of the activity of these two enzymes in cell-free extracts revealed the presence of PCA activity in the DFO condition (Figure 5E), but not of MEZ activity (data not shown). Activity measurements are a great added value, but then activities should be shown, also for MEZ.
 
 We thank the reviewer for this suggestion. We agree that showing enzyme activity data adds value to the manuscript. As recommended, activity measurements have been included in the supplementary materials (Figure 5 – figure supplement 1).
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.05.12.653400v2
www.medrxiv.org www.medrxiv.org

Heterogeneity of use, access and retention of insecticide-treated nets: implications for subnational tailoring to maximise malaria control

4
1. Public_Reviews 03 Jun 2026
 
 in eLife
 
 eLife Assessment
 
 This paper provides a novel and valuable method improve the accuracy of predictions of the impact of insecticide-treated net (ITN)-based strategies for malaria control and elimination by using sub-national estimates of the duration of ITN access and use over time from cross-sectional survey data and annual country ITNs received. The authors propose a sophisticated methodological framework that accounts for many sources of uncertainty, providing compelling evidence.
 
 Summary
2. Public_Reviews 03 Jun 2026
 
 in eLife
 
 Reviewer #1 (Public review):
 
 This paper aims to improve the accuracy of predictions of the impact of ITN strategies by developing a method to estimate duration of ITN access and use over time on a subnational scale from cross-sectional survey data and the numbers ITNs received annually. The subnational estimates are then input into a mathematical model to predict clinical cases under different ITN distribution strategies.
 
 Strengths:
 
 The approach is novel and addresses a useful and timely topic. It makes use of available routine data, and has considered all of the relevant components of ITN distributions.
 
 The authors have made revisions, particularly to the methods, appendices and title - leaving the paper easier to follow, and with a clear, consistent aim. The assumptions are clearly stated.
 
 Weaknesses:
 
 The weaknesses are shared with other models of a similar complexity - it is not easy for a casual reader to fully understand the model or the implications of the assumptions which were required to be made. That routine data is used is good for availability, but data quality may be an issue in some places.
 
 Review 1
3. Public_Reviews 03 Jun 2026
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The authors design a custom Bayesian model to estimate the probabilities of access, use and use given access of insecticide-treated nets in six African countries, providing sub-national estimates and inferring the average duration of ITN use and access. An individual-based model was employed to simulate malaria epidemics and estimate the effectiveness of different ITN distribution strategies. The study finds that the mean probability of use or access did not reach 80% (a universal coverage formerly targeted by WHO) for any of the regions even for biennial campaigns, demonstrates that switching from triennial to biennial distribution campaigns increases population use by 7.9%, and evaluates the impact of employing more efficient ITNs on P. falciparum prevalence.
 
 Strengths:
 
 The authors developed a data-driven model that accounts for data collection imperfections and sources of uncertainty while differentiating between ITN use and access. They developed a methodology to infer the timing of mass campaign from publicly available data instead of assuming fixed dates. The probability of use given access allows determining the regions where ITN distribution is least effective. This work can help better inform future interventions by identifying regions where increasing mass campaign frequency or employing better ITNs are most effective. Finally, in addition to insights on ITN access and use for the six countries analyzed, the paper contributes with a methodological framework that can likely be extended to other countries.
 
 Weaknesses:
 
 Since the models employed are rather complex, the methodology description may be hard to follow for some readers. In addition, the models assume many hypotheses, including exponential decay of ITN use/access and narrow prior distributions. It is worth noting that, in the revised version of the manuscript, the authors justified the choice of exponential decay and narrow prior distributions, and made a significant effort to clarify the methodology and the model equations.
 
 Comments on revised version:
 
 I appreciate the improvements made to the text. The methodology description is much clearer now. I have no further suggestions.
 
 Review 2
4. Public_Reviews 03 Jun 2026
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 Public Reviews:
 
 Reviewer #1 (Public review):
 
 Summary:
 
 This paper provides a novel method to improve the accuracy of predictions of the impact of ITN strategies, by using sub-national estimates of the duration of ITN access and use over time from cross-sectional survey data and annual country ITNs received.
 
 Strengths:
 
 The approach is novel, makes use of available data, and has considered all of the relevant components of ITN distributions.
 
 Weaknesses:
 
 (W1.1) The main message of the paper was not very clear, and did not seem to fit the title. The title focuses on sub-national tailoring of ITN, but the abstract did not feature results directly about SNT. It was not very clear what the main result of the paper was - there are several ITN observations in the results and discussion. Most did not seem to be directly about SNT, but rather sub-national differences in use and access were accounted for in the analyses. It was not clear if the same conclusions would be reached without accounting for sub-national differences, but the estimates and predictions could be expected to be more accurate.
 
 Thank-you for highlighting this. We agree the title could be improved to better reflect the main messages of the paper and have now updated it to “Heterogeneity of use, access and retention of insecticide-treated nets: implications for subnational tailoring to maximise malaria control”. All parameters are estimated at a subnational level; this is not always the case a national level. We therefore do not have national-level models without subnational differences that our results could be compared to.
 
 (W1.2) Some of the results seemed to me to be apparent even without a modelling exercise (eg high coverage could not be maintained between campaigns, use would be higher with 2-yearly distributions rather than 3-yearly) or were not in themselves new insights (eg estimates of the duration of use). It would be helpful to clearly state what the novel results are in the abstract, the first paragraph of the discussion and the conclusions, and to make sure that the title is consistent.
 
 It is our understanding assessments on ITN coverage are often made from infrequent surveys, for example from MIS. These are typically conducted six months postcampaign and may miss notable reductions in use and access beyond this. Comparisons on ITN use and access are also frequently made directly between DHS surveys, which can be misleading in isolation if the time between campaigns and surveys is not considered. We have tried to highlight this more clearly in relation to Burkina Faso with the following text:
 
 “The observed decrease in use and access across many regions in Burkina Faso may therefore be a by-product of DHS surveys being conducted at progressively later dates relative to the most recent campaign; this does not necessarily indicate an underlying trend in decreasing use or access over longer timescales.”
 
 We do believe modelling exercises, such as the methodology presented here, can help generate improved estimates of ITN use and access over time than estimates from surveys alone, which can be biased by the relative timings of campaigns. It is also our understanding previous studies have generated national estimates of ITN retention. We are not aware of any previous studies that have estimated the duration ITNs continue to be used for, which is arguably of greater epidemiological importance than retention time. To best knowledge, these have also not been estimated at subnational scales previously.
 
 We acknowledge the novelty of some results were not clearly presented previously and are grateful to the reviewer for highlighting this. We have now highlighted some of the novel findings more clearly in the abstract, with the following text:
 
 “However, subnational variation in ITN retention and the duration that ITNs remain in use have not previously been quantified.”
 
 “Our results highlight that although transmission intensity remains an important factor for subnational tailoring of malaria control interventions, other factors, such as ITN use given access, meaningfully influence optimal deployment strategies.”.
 
 We have also highlighted the novelty and relevance of our findings more clearly in the first paragraph discussion, with the following text:
 
 “Funding constraints have also increased the need for consideration of subnational tailoring, with many recommendations being made on the basis of transmission intensity in the World Health Organisation (2025) Subnational Tailoring Reference Manual. However, a key uncertainty in assessing the potential impact of different ITN interventions has been how long nets remain in use rather than how long they are retained, and how this varies between regions. Here, to our best knowledge, we present the first estimates of subnational variation in ITN retention and the duration that ITNs remain in use, and also quantify for the first time how ITN use, access and retention vary between subnational regions across multiple African countries. Our work supports the change in guidance to optimal coverage as it highlights ITN interventions have notable differences in impact between settings, and that distributing fewer but more effective ITNs, particularly pyrethroid-chlorphenapyr products, is likely to be more impactful than maximising long-term coverage through increased campaign frequencies with pyrethroid-only ITNs. Our work also broadly supports World Health Organisation (2025) recommendations for subnational tailoring, particularly the consideration of deprioritisation of ITN distribution in very low transmission settings. However, our results provide new indications that deprioritisation of areas with higher ITN use given access may lead to greater resurgences in cases, highlighting that subnational tailoring decisions could be optimised further by considering additional factors to transmission intensity alone.”
 
 The novelty and relevance of our results are also now highlighted in the following text, which has been incorporated into the concluding paragraph:
 
 “In conclusion, the work indicates that universal coverage targets of 80% are unlikely to be consistently met due to waning overall ITN use in the intervening years between triennial mass campaigns. Improved coverage can be achieved through more frequent biennial distributions, though this is unlikely to be feasible at scale given the current funding landscape. Indeed, when resources are constrained, deprioritisation of ITN mass campaigns in certain settings is being increasingly considered through subnational tailoring of malaria control interventions. Our work highlights that the relationship between transmission intensity (whether measured in terms of prevalence or clinical cases) and intervention impact is non-linear, and notable resurgences in cases may follow when campaigns are deprioritised in all but very low transmission settings. This broadly supports WHO subnational tailoring guidance, which suggests consideration of deprioritising distribution of ITNs in regions with PfPR2-10 < 1% (World Health Organization, 2025). However, while the World Health Organization (2025) Subnational Tailoring Reference Manual proposes that the withdrawal of ITNs in favour of indoor residual spraying should be considered in areas with low ITN use, here we estimate that ITN use alone appears to be a notably poorer predictor of the impact of ceasing mass campaigns than use given access. Our findings suggest that regions with higher use given access may experience disproportionately greater resurgences in cases following deprioritisation. This implies that regions with low use given access may warrant consideration for cessation of ITN distribution, rather than decisions being based solely on low overall ITN use irrespective of whether communities have sufficient ITN access. However, subnational differences in ITN use, access and retention are key knowledge gaps in many settings, and when estimated from infrequent surveys they are highly sensitive to bias arising from the timing of surveys relative to when campaigns were conducted. To our knowledge, this study is the first to estimate subnational variation in ITN retention and the first to estimate the duration that ITNs remain in use, which is of greater epidemiological relevance than retention time. It also provides a novel framework to correct for biases in estimates of ITN use and access arising from when campaigns were conducted. Although campaigns have historically aided increasing ITN use and access over time, we estimate the mean duration of ITN use is consistently shorter than mean retention times in all regions. This raises questions about whether punctuated distribution of ITNs through campaigns is the optimal mechanism for maximising their effectiveness and cost-effectiveness. Maximising the cost-effectiveness of interventions has become increasingly pertinent in the current funding context, and consideration of alternative distribution strategies, such as increased distribution through continuous distribution channels, including school- or community-based distribution, may be warranted. Frameworks such as the one presented here, which take into account the potential for impact from different net types and the high variability of ITN duration and use, could support NMP decision making on how best to maximise impact from available funds. Whilst such frameworks may be a useful tool, local knowledge of factors impacting ITN access and use as well as operational decision making will be paramount for NMP-led tailoring of subnational strategies.”
 
 (W1.3) On L236, the link to SNT is stated: "the models indicate trends that can support subnational tailoring of ITNs". They could indeed, but SNT itself is not done in this paper. It seems to be about improving sub-national predictions of the impact of single ITN strategies, by taking into account sub-national variation in access and use duration. This is useful, and the model developed has novel aspects.
 
 Thank-you for highlighting this. We hope our updated title and response to W1.12 below help address this. Where relevant we have also framed our findings in relation to the World Health Organization’s Subnational tailoring of malaria strategies and interventions: refence manual which was published following our original submission; examples of this are highlighted in our response above to W1.2.
 
 (W1.4) Individual countries may have records on when nets were distributed to the regions rather than needing to use the annual country number of nets together with the DHS data. It could be helpful to say what the analysis steps would be in that case.
 
 We have now added the following text of appendix 3.2 to clarify how the methodology could be adapted:
 
 “In contexts where national malaria programmes or other stakeholders have knowledge of the timings of mass campaigns (i.e. when there is no uncertainty in ɸij), the methodology can be adapted by deterministically evaluating the time since the last campaign (equation S18) for each time point.”
 
 (W1.5) There were several assumptions that needed to be made in building the model. There is some validation of the timing of the distributions (L633 "verified where possible through discussion with interested parties nationally and internationally") and the fit of estimated access and use to survey data, and agreement between predictions of prevalence and MAP estimates. It would be helpful to say which assumptions are important for the results (and would be key knowledge gaps) and which would not make a difference. It might be possible to validate the net timing model using a country where net distributions are known reasonably well.
 
 Thank-you for raising this. We acknowledge that to investigate which assumptions are less likely to make a meaningful difference, we would ideally have conducted a full sensitivity analysis on these. This however would be challenging, since many of these are structural assumptions rather than numerical ones (for example, the assumption of an exponential decay in use and access) which would require the entire methodology to be adapted to conduct a sensitivity analysis. We did validate our estimated campaign timings against some known subnational campaign timings for Senegal. However, we could not source data on when all campaigns were conducted for all regions of Senegal to the nearest month to be able to conduct validation against this. We were also not able to source other use and access data from separate data sources to the DHS to be able to validate our discrete-time models of historical use and access. PfPR2-10 estimates are however fitted to equivalent MAP estimates. These were validated against DHS estimates of PfPR6-59mo, which were not used at any stage to fit our models. We have made slight changes to the original wording in relation to this at the end of appendix 5.2.
 
 (W1.6) What was assumed about what happens to old nets after a mass campaign was not clear. This assumption is likely to affect the predictions of access for the biennial distributions.
 
 To generate our initial estimates of the mean duration of use and retention time with our hierarchical model, we assume nets are only distributed to individuals who do not already have ITNs (appendix 2). This initial step is necessary for our methodology, but is relaxed later under our discrete-time model where we assume ITNs are distributed at random such that individuals with an ITN are equally likely to receive a new ITN (and replace their existing one) following a mass campaign (appendix 4). Much of the aforementioned sections has been rewritten and we hope this is now clearer.
 
 (W1.7) L312 and elsewhere: That use given access declines with net age is plausible. However, I wondered if this could be partly a consequence of the assumptions in the model (eg the two exponential decays for access and use, the possible assumption that new nets displace the current ones when there is a mass campaign).
 
 Declining use given access as nets age is not affected by model assumptions. Due to being fitted independently of each other, there are no constraints that would prevent a faster decay in access than use. Had the data supported this, this would have led to use given access increasing over time since the last campaign. The data did not support this. Further clarification that use and access are fitted independently of each other is has now been provided in the following text:
 
 “All subsequent analyses described are conducted independently for use and access”
 
 (W1.8) The Methods section on Estimating historical use and access seemed to be aimed at readers familiar with formulae, but I think it could lose other interested readers. It could be useful to explain a little more about what is happening at each step and also why.
 
 Thank-you for highlighting this. We have re-written this section in the main manuscript, now named ‘Historical use, access and retention times’, where we now only highlight key equations and provide a high-level overview of the methodological steps. We have sought to provide clearer explanations here behind the rationale for each step to ensure maximum accessibility for interested readers. The original wording was used as a basis for the newly provided series of appendices which provide further technical detail; this wording has also been heavily re-drafted to improve clarity of each step.
 
 (W1.9) The model was fitted to MAP estimates of PfPR2-10, which themselves come from a model. It may be that there is different uncertainty in the MAP estimates for different regions. I couldn't see this on the graph, but maybe the uncertainty is small. Was this taken into account in the fitting?
 
 We only used median MAP estimates of PfPR2-10 to calibrate the baseline EIR for each region in our model. We have clarified our rationale in appendix 5.2:
 
 “Since the relationship between baseline EIR and PfPR2-10 here is specific to malaria simulation, MAP uncertainty estimates were not propagated through to our estimates in baseline EIR since these would not faithfully represent its true uncertainty.”
 
 (W1.10) Was uncertainty from each estimated component integrated into the other components?
 
 Thank-you for highlighting this as this indicates we had failed to clearly indicate this. To confirm, we propagate uncertainty in each component through to our estimates of cases averted. New text has been provided to clarify this in the following text:
 
 “Region-specific uncertainty in ITN efficacy, use, retention, and the relative contributions of continuous and campaign channels is therefore propagated through to our estimates of cases averted.”
 
 Further details are also provided in the preceding text of the same paragraph. The central 95% credible intervals of cases averted shown in figures 5.C and 6 and associated figure supplements are reflective of this uncertainty.
 
 (W1.11) Eyeballing Figure 2 (Burkina Faso), there is a general pattern of decline in all the regions, some differences between the regions and some differences in how well the model fits between the regions. If possible, it could be helpful to say how much better the fit was when using regionspecific compared to countrywide parameter values for access and use, and how different the results would be.
 
 In the “Universal coverage: was it achievable under triennial mass campaigns” results section, we have now provided further emphasis that the observed decrease from DHS data may be driven by surveys being conducted progressively later in relation to the last campaign:
 
 “The observed decrease in use and access across many regions in Burkina Faso may therefore be a by-product of DHS surveys being conducted at progressively later dates relative to the most recent campaign; this does not necessarily indicate an underlying trend in decreasing use or access over longer timescales.”
 
 In the case of Burkina Faso (figure 2.A), aside from months when very small numbers of individuals were surveys where either 0% or 100% use or access was reported, no other data lie outside our 95% credible interval for any region.
 
 We are unable to generate comparisons with countrywide parameters as these are not generated when fitting our discrete-time model, even though they are a by-product of the initial hierarchical model used to generate initial estimates of region-specific ITN retention, which was a necessary methodological step. We hope the extensive revision of the text in the methods and appendices helps to improve the clarity on this. Where national estimates are provided, these are population-weighted means of the subnational median posterior estimates. New text is included in appendix 1 to clarify this:
 
 “National and continental values are reported as population-weighted summaries of the median subnational estimates generated from the discrete-time models”
 
 (W1.12) The question of moving from a campaign every three to every two years may not be the most pertinent question in the current funding landscape. I realise that a paper is in development for a long time, but it would be helpful to comment on what else the model could be used for when fewer rather than more nets are likely to be available.
 
 We acknowledge the funding landscape has changed substantially, but we still believe this work has important implications in the current context. We have emphasised this further in the following text:
 
 “If budget constraints necessitate the deprioritisation of campaigns, our results highlight that this should be avoided, if possible, in regions with moderate to high transmission intensity, particularly those with mean annual incidence exceeding 100– 150 clinical cases per 1,000 people. Shortening campaign intervals from three to two years in moderate- and high-transmission regions is projected to avert more cases than the additional cases that may arise from ceasing campaigns in some lower-transmission settings. Additionally, although pyrethroid–chlorfenapyr ITNs are more costly, the additional cases projected to be averted by them relative to pyrethroid-only and pyrethroid–PBO ITNs are substantial. In certain national contexts it may be more cost-effective for biennial pyrethroid-chlorfenapyr campaigns to be conducted in fewer subnational regions even under reduced budgets. However, more thorough economic analyses will be needed to understand this fully. Moreover, as ITNs remain one of the most cost-effective malaria control interventions, improving the impact of them could still be more cost-effective than the introduction of new untested interventions (Topazian et al., 2023; Schmit et al., 2024).”
 
 We have also related some of our findings to the WHO Subnational Tailoring Reference Manual (as highlighted in W1.2), which we hope better relates our findings to the current context.
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The authors design a custom Bayesian model to estimate the probabilities of access, use and use given access of insecticide-treated nets in six African countries, providing sub-national estimates and inferring the average duration of ITN use and access. An individual-based model was employed to simulate malaria epidemics and estimate the effectiveness of different ITN distribution strategies. The study finds that the mean probability of use or access did not reach 80% (a universal coverage formely targeted by WHO) for any of the regions, even for biennial campaigns, demonstrates that switching from triennial to biennial distribution campaigns increases population use by 7.9%, and evaluates the impact of employing more efficient ITNs on P. falciparum prevalence.
 
 Strengths:
 
 The authors developed a data-driven model that accounts for data collection imperfections and sources of uncertainty while differentiating between ITN use and access. They developed a methodology to infer the timing of a mass campaign from publicly available data instead of assuming fixed dates. The probability of use given access allows for determining the regions where ITN distribution is least effective. This work can help better inform future interventions by identifying regions where increasing mass campaign frequency or employing better ITNs are most effective. Finally, in addition to insights on ITN access and use for the six countries analyzed, the paper contributes a methodological framework that can likely be extended to other countries.
 
 Weaknesses:
 
 Since the models employed are rather complex, the description of the methodology may be hard to follow for most readers. In addition, the models assume many hypotheses, including:
 
 (W2.1) Exponential decay of ITN use/access.
 
 We do acknowledge different modelling studies have typically assumed either an exponential decay or an “S-shaped” smooth-compact loss function, with many of these studies having been validated against cluster-randomised trial data for both functional forms. We believe the ITN age distribution data across the DHS surveys inspected provides reasonable evidence to support the use of an Exponential decay function here. We have now included a proof (appendix 2.1) demonstrating an exponentially distributed ITN age distribution will be yielded for an exponential decay function with the same rate parameter; this is true under periodic ITN distribution and becomes an approximation for a finite number of surveys. We now also included additional text (appendix 2.2) highlighting the empirical ITN age distributions appear to support our exponential decay assumption.
 
 (W2.2) The decay rates for the probability of the ITN repelling and killing a mosquito are the same.
 
 Although the same decay rate parameter (\gamma_N) is present in our expressions for the probability of repellency and mortality (equations (53) and (54)), the half-life of the latter is shorter, since repellency is assumed to decay towards a constant value. These structural forms are not unique to this paper but are shared among all malaria simulation-based studies with ITN interventions. This decay rate parameter has been estimated in previous studies (Sherrard-Smith et al., 2022; Churcher et al., 2024), and we carry through uncertainty estimates from those previous studies into the work presented here; additional text has been added to clarify this:
 
 “Uncertainty in ITN repellency and mortality parameters (equation (53) and (54)) is also propagated forward to this study by simulating random draws from previous posterior distributions (Sherrard-Smith et al., 2022; Churcher et al., 2024) across each distribution event and realisation.”
 
 (W2.3) Given a time instant, all individuals in the same administrative unit and have the same probability of using a net;
 
 Our discrete-time model estimates the proportion of the population with use and access at each time instant. We purposefully do not conflate this with the probability of use and access, which can vary between individuals within the same subnational unit of analysis (urban and rural regions of each administrative-one area). We are grateful this point has been raised as it indicates we had not communicated this sufficiently clearly before. We hope the extensive re-draft of the ‘Historical use, access and retention times’ methods section has helped address this, in particular in the following text preceeding equation (7):
 
 “We do not assume the probability of access is the same for all individuals in a region at a given point in time. Instead, we assume the probability any given individual has access to an ITN at time tj can be described by a Beta distribution”
 
 (W2.4) ITN use/access decay models do not depend on the distribution strategy (e.g. bienal vs trienal distribution).
 
 We may not have fully understood this point, but in terms of our historical models of use and access, assumptions are not imposed on the frequency of previous campaigns. Instead, historical campaign timings are estimated from data from DHS surveys and the AMP Net Mapping Project (now detailed in appendix 3.1); historical estimated intervals could be either two or three years (or indeed any interval) as informed by this data. In terms of the duration of use and retention time, these are estimates how long a net would continue to be used, or provide access, if an individual were not to replace it at earlier date; these estimates are therefore independent of campaign intervals, and we have now added addition text to provide additional clarity:
 
 “However, throughout this study, the durations of use and retention time are always estimates of how long an individual continues to use or have access to a net in the absence of future replacement; estimates of these are therefore reflective of behaviour or ITN durability and not distribution patterns themselves.”
 
 We do acknowledge under our approach, use immediately following a campaign is agnostic of campaign frequency; however, given an absence of data on how use changes following a switch from triennial to biennial campaigns, we believe this was a reasonably conservative assumption. Further confirmation is now provided in the following text, with additional preceding context:
 
 “Future campaigns, whether conducted every two or three years, are therefore assumed to achieve a consistent initial level of use.”
 
 (W2.5) The Bayesian model assumes some narrow prior distributions.
 
 Thank-you for highlighting this. We acknowledge the need for further justification for the choice of priors. We have provided this in depth for the hierarchical model of the mean duration of use and access (in appendix 2.2). Further justification for the choice of priors for the discrete-time model are also now provided in appendix 4.2).
 
 The impact of these hypotheses on the estimated parameters is not explored in the paper, and no sensitivity analyses are performed, although some limitations are discussed.
 
 We fully acknowledge we had not conducted sensitivity analyses for many of our assumptions, and we have now tried to provide better justification for our assumptions. The assumptions most likely to influence inference are structural components of the modelling framework rather than scalar parameters that can be varied independently in a conventional sensitivity analysis. Many of the assumptions highlighted above are structural, such as the assumption of an exponential decay (W2.1). In the case of our assumption of exponential decay, multiple elements of the methodology are restricted by this (for example, when correcting for biases that arise from nets being lost between campaigns and survey times when estimating the timing of campaigns in appendix 3.1). Investigating the sensitivity of this assumption over an assumed smooth compact function would require extensive adaptation of the methodology that would be beyond the scope of this paper. Some other assumptions, such the assumption of the same decay rate parameter for repellency and mortality (W2.2) have been estimated in the previous studies referenced and have been validated against cluster-randomised, controlled trials. We nevertheless recognise our justification of some assumptions could have been expanded upon previously, and we hope the changes highlighted above go towards addressing this.
 
 Recommendations for the authors:
 
 Reviewer #1 (Recommendations for the authors):
 
 (R1.1) I looked for the reference WHO 2024b for the recent optimal allocation guideline, but there were just three WHO 2024 references in the bibliography. In addition, what exactly the 80% rule applies to is not clear - this could be explained so it is clearer what result to compare to it (or explain that the rule itself is not clear).
 
 We have used the eLife LaTeX/BibTex template for citations throughout and acknowledge this doesn’t show letter suffixes in the reference list for multiple author-year entries. We unsure of how to address this given this is generated by the official template, though we note that when citations are clicked on in the document, the relevant citation is then shown at the top of the page on the web version.
 
 (R1.2) L24 'estimated', but this seems more like a prediction. The words 'estimated' and 'predicted' should be carefully used throughout when combining statistical and mechanistic modelling.
 
 This has now been changed.
 
 (R1.3) The point estimates should always have measures of uncertainty.
 
 The rationale for the omission of credible intervals for some point estimates has now been clarified in the manuscript (appendix 1). The following text has been added:
 
 “Additionally, in relation to uncertainty estimates, credible intervals are shown for all subnational quantities that are directly estimated in our models. National and continental values are reported as population-weighted summaries of the median subnational estimates generated from the discrete-time models (appendix 4) and therefore do not correspond to explicitly estimated model parameters, so credible intervals are not shown for these aggregated estimates.”
 
 (R1.4) It would be helpful to justify the choice of ADM1 as the geographical unit.
 
 We have clarified the rationale for this on the following text:
 
 “Here, (subnational) regions are defined as the first administrative unit below the country level and are further divided into rural and urban areas to align with DHS stratification”
 
 (R1.5) The terminology was slightly confusing: in some places, it sounded as if regions were the sub-national regions, in others as if they were different things (eg L74, L105). L45 'and' seems odd here.
 
 ‘Region’ is used interchangeably with ‘subnational region’ at points in the paper to aid the flow of the text. We hope the use of paratheses around (subnational) in the updated text quoted above (and on the following text) helps provide clarity:
 
 “here, the units of analysis are consistently referred to as (subnational) regions”
 
 (R1.6) Spurious accuracy in some estimates, e.g. L52.
 
 This was a result cited from Bertozzi-Villa et al. (2021) for which uncertainty estimates were not available. We hope the response to R1.3 above helps clarify the rationale for omitting credible intervals for some estimates generated here.
 
 (R1.7) L68 'lose' instead of 'loose'.
 
 Now corrected.
 
 (R1.8) L534. I suspect that the model was actually fitted in Stan via the R interface rstan.
 
 Language adjusted accordingly.
 
 (R1.9) L633 'through' rather than 'though'.
 
 This section has been heavily redrafted and we have checked for typos.
 
 Reviewer #2 (Recommendations for the authors):
 
 The paper is well-written and presents an important contribution to better aid interventions. The proposed models are reasonable, but because of their complexity, even readers who work with epidemic modelling might have issues understanding the methodology.
 
 We thank the reviewer for highlighting that the methodology may be difficult to follow. The methods section has now been substantially rewritten to provide a clearer conceptual description of the modelling framework, with detailed model specification and derivations moved to the appendices. We hope this restructuring will allow readers to follow the modelling approach at a high level in the main text with technical details contained in the appendices.
 
 To improve the clarity of the methods section, I suggest:
 
 (R2.1) Include a list of symbols with the meaning of each variable defined in the text.
 
 Definitions for symbols are now also shown in appendix 1 – tables 1-5.
 
 (R2.2) Include a centralized full description of each model, clearly stating the priors and likelihood (similarly to a Stan code).
 
 There are two models that are fitted with Stan (the hierarchical retention model and discrete-time use/access model). To improve clarity for the hierarchical model, priors are now presented in a single block (equations 11 – 17) in appendix 2.2, with the likelihood (equation 18). For the discrete-time model, we have split the presentation of the priors (equations 37 – 42) and the likelihood expressions (equations 43 – 45) into different subsections (respectively appendices 4.2 and 4.3).
 
 (R2.3) If needed, include additional data preprocessing in the form of an algorithm.
 
 Although we have not included an algorithm outlining the preprocessing steps, we have ensured sufficient detail has been provided to facilitate replicability. For example, in appendix 1, we now outline how use and access are inferred from DHS data:
 
 “ITN use is inferred from DHS data (ICF, 2025) on whether individuals slept under an ITN the previous night, while all individuals who used an ITN are assumed to have access; when fewer than two individuals used an ITN, the ITN is assumed to be able to provide access at random to up to two individuals in a household.”
 
 (R2.4) Mention the main hypotheses and limitations of the model in the main text.
 
 We have ensured key assumptions of the model are stated in the re-written ‘Historical use, access and retention times’ methods subsection; for example, in the following text:
 
 “Due to the sparsity and irregularity of DHS and MIS surveys, we were unable to investigate seasonal fluctuations in either access or use; we therefore assume that nets provide access or are used continuously over some period of time.”
 
 (R2.5) Including a flowchart or diagram that provides an overview of the proposed framework could be helpful.
 
 We have now included a flowchart of methodological steps in appendix 1 – figure 1.
 
 (R2.6) Line 89: Define NMP before presenting the acronym.
 
 We have ensured this is defined in the first instance on line 39.
 
 (R2.7) Equation (1): Explain why you chose the Exponential distribution (e.g. constant hazard), as this is one of the main hypotheses of the model.
 
 As highlighted in our response to W2.1, we have now included justification of this assumption in the final paragraph of appendix 2.2.
 
 (R2.8) Equation (2): Although Equation (2) passes a clear message of how alpha_i^x is distributed, I wonder if it is mathematically correct to express the limit this way, since the argument of the limit is a random variable. Maybe the limit should be applied to gamma_i^x instead.
 
 Thank-you for highlighting this. We acknowledge the limit behaviour was expressed in a short-hand manner that is not strictly mathematically correct. Indeed, the limit should be applied to the decay rate parameter gamma (now shown in equation 10). In appendix 2.1, we have now provided a proof demonstrating the rate parameter of the pooled ITN age distribution should tend to the same decay rate as the assumed exponential loss function.
 
 (R2.9) I think the difference between pho_i^x (Equation (1)) and alpha_i^x (Equation (2)) is not very clear in the text.
 
 In the context of access, rho_{i(l)} and alpha_{i(l)} are respectively the duration an ITN l is retained for and its age at the time of a survey. We hope the redrafted appendices make this clearer, in addition to the inclusion of the new parameter tables in appendix 1.
 
 (R2.10) Line 479: Typo (and or).
 
 Updated wording is now contained in appendix 2.
 
 (R2.11) Line 711: Typo (The limit is equal to infinity).
 
 This has now been corrected.
 
 (R2.12) Equation (15): I could not understand this equation. What is rho(s) and rho(s \in I), where I is one of the intervals mentioned in this equation?
 
 Rho(tau_ik) was introduced as simplified notation for the probability density of the timing of campaign k in region i (tau_ik) but we acknowledge this was not explained clearly. We also acknowledge this equation presented a lot of concepts at once. The equation attempted to describe the probability density of the last campaign in region i relative to time t_j, denoted phi_ij. We no longer make use of this previously notation (rho) for the probability density. This equation has been updated to equation (30), with incremental explanation of its construction now provided on lines in appendix 3.2.
 
 (R2.13) Line 642: What is t?
 
 The use of $t_j \ni t$ was previously used to indicate that the discrete time point t_j lies within continuous time t. We acknowledge this was a non-standard use of notation and was not clearly explained. This section (now in appendix 4) has been rewritten without this notation. The use of t and t_j to denote continuous time and discrete time points respectively is now defined in the core notation table (appendix 1 – table 1).
 
 (R2.14) The proposed model has narrow hyperhyperpriors because of convergence issues. Are the estimated parameters sensitive to the choice of hyperhyperpriors?
 
 We acknowledge limited justification was previously provided for the choice of hyperhyperpriors. We have now provided additional justification within appendix 2.2.
 
 (R2.15) Since the proposed Bayesian models are relatively complex, it might be useful to provide convergence diagnostic plots in the supplement.
 
 Convergence diagnostics were inspected using the ShinyStan packagxe. Chains showed satisfactory convergence based on standard diagnostics. We have not included diagnostic plots due to the large number of parameters in the fitted models. Under the hierarchical model (appendix 2) for ITN use, 146 region-specific parameters (one for each region), 12 country-level hyperparameters (two for each country), and four hyperhyperparameters were estimated. Under the discrete-time model (appendix 4), a further 876 parameters (six for each region) were estimated. In total, 1,038 parameters were fitted for the ITN use models. The same number of parameters were estimated for the ITN access models, giving a total of 2,076 estimated parameters.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

medrxiv.org/content/10.1101/2025.08.27.25334550v2
www.biorxiv.org www.biorxiv.org

Somatic Programmed DNA Elimination is widespread in free-living Rhabditidae nematodes

5
1. Public_Reviews 03 Jun 2026
  
  in eLife
  
  eLife Assessment
  
  In this manuscript, the authors investigate programmed DNA elimination (PDE) across nematodes using a large-scale cytological approach. This work is potentially significant because it expands PDE beyond a few known nematodes to a much broader set of Rhabditidae species, providing an important resource for investigating PDE's evolutionary origins and functions. The strength of evidence, however, is incomplete; the technique used to evaluate PDE is insufficient to provide unambiguous support for the phenomenon, so additional methods, such as genomic sequencing from a few species spanning the range of elimination levels, would be required to confirm these findings. This research would be of interest to geneticists, evolutionary biologists, and those working on the regulation of genome integrity.
  
  Summary
2. Public_Reviews 03 Jun 2026
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Launay et al., conducted a screen of PDE in 25 new Rhabditidae species through cytological approaches and found PDE is detected in 17 out of 25 species, representing 12 out of 17 genera within the family. This work is significant because it expands PDE from a few known nematodes to a much broader set of Rhabditidae species.
  
  Strengths:
  
  By demonstrating PDE across many genera with the exception of C. elegans and some other Caenorhabditis species, the study provides an important resource for investigating PDE's evolutionary origins, mechanisms of genome reorganization and DNA repair, and its functional consequences.
  
  Most of the observed PDEs were supported by solid evidence through a survey-style cytological screen (PDE detected in 17/25 species and in 12/17 genera), which supports the main claim of widespread occurrence.
  
  Weaknesses:
  
  Although most PDE claims are supported by solid evidence, some of the existing data do not describe the depth of characterization, e.g., how many replicates were conducted for each species? How reproducible are the claimed PDEs between embryos in terms of timing and cell identities destined for PDE? Is it possible to validate a subset of PDE with independent evidence, especially for those with marginal PDE? This is important because some dying embryos may fail to maintain their chromosome integrity and release some of the broken DNAs, some others may suffer from noise such as intracellular parasites, for example, microsporidia, or even highly condensed mitochondrial DNAs.
  
  Review 1
3. Public_Reviews 03 Jun 2026
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Programmed DNA elimination is increasingly recognised as an important phenomenon across many species, including in animals. Exactly how widespread is still unclear, and the function of PDE is even more mysterious in most species where it has been described. PDE has been discovered in several nematode species, and in this manuscript, the authors carry out a more extensive search for PDE. They find PDE in many species, indicating that it is widespread across the phylum.
  
  Strengths:
  
  The large number of species across many different clades provides good evidence that the phenomenon has evolved many times independently. The work will therefore prompt many further studies characterising individual species, and potentially linking the evolution of the phenomenon to other features of these species' ecological characteristics.
  
  Weaknesses:
  
  The major technical weakness of this project is the assay that is used to evaluate PDE. First, this assay is clearly insensitive, as the authors acknowledge, O. tipulae, which has PDE, does not appear in their screen. Second, the assay gives no information about breakpoints and only limited, non-quantitative information about how much DNA is eliminated. Thus, their data really is only a preliminary screen, which would need to be confirmed by genomic assays.
  
  Review 2
4. Public_Reviews 03 Jun 2026
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  Somatic programmed DNA elimination (PDE), also known as chromatin diminution, has primarily been studied in parasitic nematodes, such as Ascaris species, in which it was discovered almost 140 years ago. Recently, PDE has also been reported in three non-parasitic nematode species. In this manuscript, Launay et al present the results of a large-scale cytological and evolutionary study of PDE across 29 free-living nematode species belonging to the Rhabditidae family, for which they established a phylogeny based on 18S and 28S ribosomal RNA sequences. By combining DNA staining and telomere DNA FISH labeling in developing embryos, they convincingly document the formation of lagging fragments and/or the loss of long germline telomeres in 17 species, during one particular division of somatic precursor cells.
  
  Strengths:
  
  (1) The whole study is well executed, and the results are convincing.
  
  (2) The authors present compelling evidence that PDE is an ancestral feature of Rhabditidae nematodes.
  
  (3) This study provides a valuable resource of lab-tractable species for future PDE studies.
  
  Weaknesses:
  
  (1) Some clarifications are necessary to make the figures more reader-friendly.
  
  (2) Important references to ciliates are missing.
  
  Review 3
5. Public_Reviews 03 Jun 2026
  
  in eLife
  
  Author response:
  
  We thank you and the three reviewers for their careful examination and critical assessment of our work.
  
  All acknowledge the significance of revealing the widespread occurrence of programmed DNA elimination (PDE) in nematodes, a phenomenon long considered a parasitic specificity. The reviewers, particularly Reviewer #2 and the Editors, have raised important concerns regarding confirming PDE with more sensitive methods, in particular using genomic data to characterize breaksite motifs across the phylogeny and to better understand the amount and nature of eliminated sequences across species. While we fully agree that such confirmation would ideally complement our discovery, this approach extends beyond the scope of the current manuscript. Our primary aim was to inform the scientific community of the widespread occurrence of PDE in the short term.
  
  In the longer term, an ambitious collaborative effort is currently underway to produce high-quality genome assemblies of several 100s of nematode species (ENA: PRJEB36817) , covering the diversity of Rhabditina and beyond. These will enable precisely characterising PDE, ultimately addressing these concerns. However, given the scale of this project, aiming at telomere-to-telomere assemblies - which can be particularly challenging for species that perform PDE - it will take considerable time. We believe the community should be informed of the widespread nature of PDE now, rather than waiting for this genomic data.
  
  Nevertheless, we would like to emphasize that PDE has already been confirmed using genomics in the three clades where we have identified it cytologically: through our own work in Mesorhabditis (1) and Letcher et al., in prep, and also in Caenorhabditis (2) and Oscheius (3, 4). We will state this explicitly in our revision.
  
  For these reasons, and to avoid overstepping extensive genomic studies that are underway, we will maintain our focus on the cytological description in this manuscript.
  
  In addition to the above-mentioned concern, we will also address the other points:
  
  Reviewer #1:
  
  “Although most PDE claims are supported by solid evidence, some of the existing data do not describe the depth of characterization, e.g., how many replicates were conducted for each species? How reproducible are the claimed PDEs between embryos in terms of timing and cell identities destined for PDE? Is it possible to validate a subset of PDE with independent evidence, especially for those with marginal PDE? This is important because some dying embryos may fail to maintain their chromosome integrity and release some of the broken DNA, some others may suffer from noise such as intracellular parasites, for example, microsporidia, or even highly condensed mitochondrial DNA.
  
  we will provide the missing information concerning number of observed embryos (using DNA stainings or DNA-FISH), and better explain and illustrate the reason why the observed fragments cannot be attributed to intracellular parasites, or to the consequence of dying embryos.
  
  Reviewer #3:
  
  Some clarifications are necessary to make the figures more reader-friendly.
  
  This will be improved, thank you for pointing this out
  
  Important references to ciliates are missing.
  
  Thank you for pointing this out. We will improve the comparisons that can be made with the mechanism of PDE found in ciliates.
  
  References
  
  (1) C. Rey, C. Launay, E. Wenger, M. Delattre, Programmed DNA elimination in Mesorhabditis nematodes. Curr Biol 33, 3711-3721.e5 (2023).
  
  (2) L. Stevens, S. Sun, N. Haruta, L. Xiao, N. Uwatoko, M. Kieninger, K. Sato, A. Yoshida, D. Absolon, J. Collins, A. Sugimoto, T. Kikuchi, M. Blaxter, Programmed DNA elimination was present in the last common ancestor of Caenorhabditis nematodes. bioRxiv [Preprint] (2025). https://doi.org/10.1101/2025.10.23.681605.
  
  (3) T. C. Dockendorff, B. Estrem, J. Reed, J. R. Simmons, S. B. Zadegan, M. V. Zagoskin, V. Terta, E. Villalobos, E. M. Seaberry, J. Wang, The nematode Oscheius tipulae as a genetic model for programmed DNA elimination. Curr Biol 32, 5083-5098.e6 (2022).
  
  (4) P. M. Gonzalez de la Rosa, M. Thomson, U. Trivedi, A. Tracey, S. Tandonnet, M. Blaxter, A telomere-to-telomere assembly of Oscheius tipulae and the evolution of rhabditid nematode chromosomes. G3 (Bethesda) 11, jkaa020 (2021).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.08.21.671558v2
www.biorxiv.org www.biorxiv.org

Learning is a fundamental source of behavioral individuality

4
1. Public_Reviews 03 Jun 2026
 
 in eLife
 
 eLife Assessment
 
 This is a fundamental study of individual variation and the contribution of learning to behavioural individuality. The experimental design of massively parallel behavioural phenotypes is outstanding and the conclusions are supported by a compelling and rigorous analysis across a large number of experiments in thousands of individuals across genotypes and conditions. The dataset further represents an advance in studying visual associative learning thanks to the ability to make longitudinal measurements of many behavioural decisions within the same animals. These results are a major contribution to the understanding of the sources of behavioural individuality.
 
 Summary
2. Public_Reviews 03 Jun 2026
 
 in eLife
 
 Reviewer #1 (Public review):
 
 "Learning is a fundamental source of individuality," by Manna and colleagues, interrogates different sources of variation in individual behavior. The authors place individual flies in a Y-shaped arena, which is a common design in the field, and illuminate the arms of the Y with blue versus green light. They track the color preference of individual animals and also perform operant conditioning, meaning that they teach the fly to avoid a particular color/arm by generating a foot shock when the fly enters that arm. There are a number of things that are impressive about this setup: The authors are able to collect data on thousands of individual flies of many different strain backgrounds, and they demonstrate a strong change in color preference after conditioning. This is nice, because in past papers, visual learning ability has been modest and difficult to study. To put a number on it, in this paper, animals on average don't show a color preference at the start of the assay, spending around 30% of their time in the one arm illuminated green, and the remaining time in the two arms illuminated blue. After conditioning, the average animal spends only 23% of its time in the green arm.
 
 The authors run 64 animals through the assay for each of 88 wild-type strains (maybe? see Major Point 1 below) and see considerable strain-specific (genetic) variation in the change in time spent in the shocked color after conditioning. Some strains show no learning, while others spend <10% of their time in the shocked color after conditioning. They also, I believe, see that some strains have more variability across individuals, which would suggest that some strains have stronger canalization at the development or circuit function level than others, i.e., some genotypes produce more consistent copies of the individual, others less consistent copies. (Or, some genotypes produce robust circuits, and others produce noisy circuits.)
 
 Finally, the authors argue statistically that learning itself increases variability in individual performance. This makes a lot of sense to me intuitively. Learning changes the physical/chemical properties of circuits in the brain, and because it evolves over time and interacts with environmental variables, it seems like it should send different animals down different channels. Or, at a conceptual level, if I learn to play the piano and my sister doesn't (because of some genetic difference between us or something stochastic), this learning experience will cause all sorts of other differences in our behavior as time passes. I also think the authors do have enough data to be able to make this finding. However, the presentation of the argument in this portion of the paper is hard for me to understand, and I am not an expert in statistics, so the strength of the result is difficult for me to evaluate.
 
 Major points
 
 (1) It's difficult to track through the paper the number of animals tested for different assays. At the beginning, it says N=5632, which works out to 64 flies for each of the 88 DGRP strains. 64 happens to be the number of parallel Y arenas they have. Later in the methods, there's a description of more variation within the set of 64 for each strain, two different parent sets per strain, different sexes, conditioned and unconditioned. And, while the results text focuses on the color learning, the methods discuss additional assays (place learning, multi-day learning).
 
 Given the numbers, does each run of the 64 mazes include all the tested flies of one strain, or are flies of many strains included in each batch? Do different flies do different assays (color, place, multi-day), or do they all do all the assays? Perhaps there is a table including this information already in the supplement, but I recommend making it much clearer in the main results text and methods. While the dataset is large, if it is split over many conditions and/or if batch and genotype confound each other, this will affect the robustness of the results and how strong the conclusions can be.
 
 (2) The data presentation in Figure 1 is elegant and easy to follow, but getting into Figure 2 and subsequently, I get lost in the statistics and have trouble understanding what is being measured. My understanding of the big picture is that while genetics and individual randomness contribute a lot to behavior, the evidence for learning as an amplifier of individuality is that variance in behavior among animals of the same strain increases over time in the conditioned group (i.e., the group that is doing the most learning, or a specific kind of learning), but not in the control group. This idea is illustrated in the flattening distributions in the cartoons in Figure 1A. The authors should include graphs of the real data that use the same format as in that cartoon. Instead, the graphs present "residuals," and I don't know what those are. I suspect it's "variation left over after accounting for effects of strain and individual stochasticity." I see the residuals being tracked per strain over time in Figure 2H, but I don't see the change over time in other graphs. I'm looking for something simple like, "variation within the strain at the beginning of learning and at later time points in learning." (But I'm not sure exactly what instantaneous measurement would be the focus in longitudinal analyses of learning behavior.)
 
 (3) Figure 3 is a cool stab at tracking down the precise mechanism by which a stochastic environment interacts with learning to send individuals along different behavioral routes. But again, like in Figure 2, I don't have the sophisticated understanding of statistics to understand exactly what the graphs are telling me, or how they relate to the underlying measurements. I'm relying on the results text alone to reach a conceptual understanding, and just taking the graphs on trust.
 
 So, overall, the authors have a very nice body of work here, and with the potential to add a new facet to our understanding of the origins of diversity in animal behavior. In addition to the interpretations they focus on here, this dataset also represents an advance in studying visual associative learning in general, and quite an amazing ability to make longitudinal measurements of many behavioral decisions within the same animals. Improving the data presentation to make it easier to follow for a larger swathe of researchers, especially in figures 2 and 3, will increase its potential impact.
 
 Review 1
3. Public_Reviews 03 Jun 2026
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The authors set out to test the extent to which differences in learning capacity and experience contribute to behavioural variation in a genetically identical population under identical environmental conditions.
 
 Strengths:
 
 The authors developed and used a scaled-up version of a simple two-choice behavioural paradigm, allowing them to test thousands of individuals across multiple genotypes. They then deployed clever and powerful statistical analysis methods and provided compelling evidence for a role of variability in learning in the expression of behavioural variation.
 
 Weaknesses:
 
 There are no major weaknesses, although some level of longitudinal analysis to strengthen the evidence for a strict definition of individuality would be a welcome extension of a future study. In addition, it would have been very interesting, although understandably beyond the current scope, to delineate a potential source of learning variability in the brain.
 
 Review 2
4. Public_Reviews 03 Jun 2026
 
 in eLife
 
 Author response:
 
 Reviewer 1:
 
 Clarification of sample sizes, assay structure, and experimental design.
 
 Reviewer 1 noted that the number of animals tested across strains, assays, sexes, parent sets, conditioned and unconditioned groups, and longitudinal conditions is difficult to track through the manuscript. Given the extent of the experimental and data processing procedures such as filtering for inactive or injured flies, we agree that a summary table and/or a visual schematic of the full experimental setup would be helpful.
 
 Importantly, the vast majority of individuals was used for the main experiment where we conditioned the flies to avoid the green arm, and where the colors of the arms were fixed throughout the assat. A smaller number of flies were tested in the validation experiments (such as different types of conditioning). In each experiment, 64 flies were always set up per genotype and their behaviour was measured in parallel. Usually, around ~60 flies passed the filtering step before analysis (filtering due to inactivity or injured flies). Among those 60-ish flies per genotype the distribution of flies of different sex or flies raised in different replicate vials was balanced. Different individual flies were tested across different assays, except in the multiday experiment, where each individual was tested across four different assays.
 
 We will add a supplementary summary that includes how many flies were tested across assays, how individuals, males, females, replicates and genotype were distributed across batches (and in the multiday experiments how they were distributed across experiments), and how many flies were filtered out from the final analysis.
 
 Clearer presentation of the statistical argument that learning amplifies individuality.
 
 Reviewer 1 also noted that the presentation of the statistical analyses, particularly in Figure 2, was difficult to follow (e.g. what is residual individuality, how is it tracked over time, and why not replace it with something simpler like variance?).
 
 Our experimental design combines multiple, replicated environments and genotypes. For example, genetically identical flies from genotype A, are raised under identical developmental environments that are replicated two times in two vials. The same is true for genotype B. Individuals from both genotypes are then tested under different conditions, i.e. control and conditioned.
 
 As we saw, combinations of these factors can change both the means and variance of distributions of individual behaviours in both genotype- or environment-specific manner. Normally, variance would be a good estimate for expressed individuality within a genotype, and comparison of variances would be a good estimate of change in individuality due to some factor (genotype, replicate, or type of conditioning).
 
 However, we saw that the resulting shape of the data in these experiments, (the shape of the distributions) was incompatible with the classical definition of extent of individuality measured by variance. While it would be more intuitive to track variance over time, we found that this measure obfuscates some obvious changes in the normal shape of the distributions of individual behaviours, as can be visually observed for example between conditioned and control experiments. This is why we moved to develop the measure of residual individuality. Residual individuality aims to measure exactly this dimension of individuality that is missed by measures of variance. We will add a schematic presentation of residual individuality in Figure 2 to explain more explicitly and visually what is being measured here, and what residual individuality represents. This should shed more light on how these analyses support the conclusion that learning increases behavioural variability among individuals in both Figure 2 and Figure 3. The schematic should provide more intuition on how to interpret the data to those unfamiliar with some of the statistics. Besides the schematics, we will also add more intuitive visualizations of the behaviour data in the supplementary, including representations of how within-strain distributions of behaviour change before and during learning or in control condition for all strains, so that the reader may inspect them in more detail.
 
 Improved explanation of Figure 3 and the link between statistical outputs and behavioural measurements.
 
 Reviewer 1 also noted that the analyses in Figure 3 are difficult to interpret without relying heavily on the Results text. Hopefully the added schematic in Figure 2 that explains what Divergence represents should address this note and make the interpretation of Figure 3 easier. Indeed, upon reflection, we agree that the label “Divergence” is quite vague. The “Divergence” in fact shows again residual individuality, and how it changes with every made decision in the case where we compare distributions of flies that start at green versus the blue arm. We further subset the distributions by clustering flies that share the same individual initial color bias or similar learning score and measure residual individuality for them as well. Here, value 0 means the two distributions have the same shape, and higher values mean the shapes are more different. We will rename Divergence to “Residual individuality Start” to make it clear that this is conceptually the same type of measurement, and revise the figure legends accordingly so that they match the new schematic in Figure 2. This should hopefully clarify what the figures show. We will also add a schematic to depict how change in the shape of the distribution with each decision can affect residual individuality.
 
 Reviewer 2:
 
 Clarification of the term “deterministic” when referring to genetic sources of variation.
 
 Reviewer 2 noted that describing genotype as a deterministic source of variation could be confusing, since gene expression and downstream cellular phenotypes are themselves noisy and stochastic. Indeed, gene expression as a phenotype is noisy, but also at the core it is a result of G x E (albeit the environment at the molecular scale). What we meant to emphasize here is that an individual’s genotype can be considered a fixed variable that determines phenotype expression across environments. The environment also determines the phenotype, again, in concert with genotype, but it will always vary over time. We agree with the reviewer that the wording should be made stricter to avoid confusion.
 
 We changed this sentence from “In every individual, behaviour is shaped by deterministic, genetic factors and by environmental events throughout lifetime, which may be stochastic and can occur at the molecular, cellular, organismal and even population scales.” to “In every individual, behaviour is shaped by fixed genetic factors and by variable environmental events throughout lifetime, which may be stochastic and can occur at the molecular, cellular, organismal and even population scales.”
 
 Longitudinal analysis and neural sources of learning variability.
 
 Reviewer 2 suggested that additional longitudinal analysis could further strengthen the evidence for individuality, and that identifying neural sources of learning variability would be an interesting future direction. We appreciate these suggestions and very much agree with them. But as it was pointed out by the reviewer, this was beyond the scope of this study. Nonetheless, it may be good to note that we have in fact already started this (ongoing and quite extensive) experimental endeavour to identify neural sources of individuality, which we hope will be soon available as a follow-up study.
 
 Within the current study we were able to track behaviour longitudinally within a 20-minute experiment, and in one case over multiple days, though for only a smaller subset of flies. Broader conclusions on how behaviour would change over longer timeframes (except those already included in the manuscript) could not be made with the current dataset. We have added a figure in the supplement where the reader can visually explore the temporal changes to the distributions of behaviour. More extensive study to see how individuality evolves over longer time frames is indeed planned for the future.
 
 We thank the reviewers again for their thoughtful and constructive comments. We believe that addressing these points improved the manuscript.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.08.30.610528v4
www.biorxiv.org www.biorxiv.org

A genetic toolkit for stable episomal transgenesis in the anaerobic gut parasite Blastocystis ST7-B

4
1. Public_Reviews 03 Jun 2026
 
 in eLife
 
 eLife Assessment
 
 This paper presents a valuable methodology for genetic manipulation of Blastocystis. Although some imaging data are compelling, higher-quality figures together with more rigorous biochemical assays would strengthen support for the authors' claims. With the experimental evidence and graphics improved, the study would be of interest both to researchers investigating mitochondrial evolution under anaerobic conditions and to medical biologists studying human pathogens.
 
 Summary
2. Public_Reviews 03 Jun 2026
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 This paper presents a toolkit for the transformation of Blastocystis. The authors have screened a number of selectable agents, promoters and reporter genes and present their findings. This resource will be of immense use to those in the Blastocystsis field, as well as those seeking to establish transformation tools in other species where such tools do not yet exist. Establishing new transformation tools is extremely challenging, and the authors have done an excellent job.
 
 Strengths:
 
 The authors have carried out a systematic screen of promoters, reporter genes and selectable agents. They have screened numerous for each, and all the data is presented. It is good to see when things did not work as well as when things did, so this data set is extremely useful indeed.
 
 Weaknesses:
 
 The findings are reported by reporter gene assay (microscopy). No evidence is given using genetics. The authors claim that the DNA is maintained episomally. However, could it be possible that there is integration? No PCRS/RT-PCRs are shown (although it can safely be assumed that the DNA/RNA is present where the transformation was successful), nor are any Western blots. These would have been useful to show that the P2A ribosomal skipping had occurred, and that proteins were expressed individually rather than as a polyprotein.
 
 Review 1
3. Public_Reviews 03 Jun 2026
 
 in eLife
 
 Reviewer #2 (Public review):
 
 This manuscript presents a substantial technical advance for the genetic manipulation of Blastocystis by establishing an integrated workflow for stable episomal transgenesis, antibiotic selection, clonal recovery, and reporter-based imaging in the ST7-B subtype. The study is particularly valuable because it combines multiple previously fragmented approaches into a coherent and practically applicable toolkit, including endogenous regulatory elements, optimized electroporation conditions, selectable markers, and anaerobic compatible fluorescent reporters. This methodological work greatly expands the molecular toolbox and future studies focused on both basic and infection biology can now build on the ability to express and localize proteins in fixed as well as live cells.
 
 The microscopy data are convincing and clearly demonstrate functional reporter expression and successful recovery of stable transgenic lines. Nevertheless, because this is primarily a methodological paper, the study would be further strengthened by the inclusion of Western blot validation of reporter expression and bicistronic constructs. In particular, biochemical analysis of the P2A-containing constructs would help assess the efficiency of ribosomal skipping and exclude the possible presence of uncleaved fusion proteins, thereby providing stronger support for the interpretation of the imaging data and the functionality of the expression system.
 
 Review 2
4. Public_Reviews 03 Jun 2026
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 The primary objective of this study was to establish a practical and functional framework for the propagation of stable transgenic cell lines of Blastocystis, a common animal gut microeukaryote. Although the work focused on Blastocystis ST7-B, a subtype with relatively low prevalence in humans, this choice is justified by its association with more frequent negative health effects. Beyond their relevance to the medical field, the methodological advances described here have the potential to also expand cell biology studies of this anaerobic organism, including its unusual mitochondria and redox metabolism.
 
 Strengths:
 
 Prior to this work, genetic tools for Blastocystis were very limited, relying on a single strong promoter-terminator combination. The authors successfully expanded the available promoter set across a range of expression strengths by testing two dozen variants in luciferase-based assays. Critically, they developed an integrated workflow from a modular transgenic construct design, to an expanded inventory of molecular components (promoters, reporters), optimized DNA delivery, stepwise antibiotic resistance-mediated clonal selection and propagation, and to reporter validation. The evaluation of several anaerobiosis-compatible labeling strategies for live (and fixed) cell optical imaging will be particularly useful, with the SNAP-tag system appearing especially promising for Blastocystis.
 
 Weaknesses:
 
 The presented data generally provide solid support for the conclusions that the work reached, but clarification of reasoning and several inconsistencies, as well as amendments to the visual presentation of the data, would be highly beneficial, as detailed below.
 
 (1) Episomal persistence of the construct: The manuscript repeatedly assumes, including in its title, that constructs persist in Blastocystis in their episomal form, but no direct evidence is provided. Although this interpretation is plausible, it should be identified more clearly as provisional. Nuclear genomic integration (e.g., via NHEJ) remains a possible explanation unless supporting evidence or rationale is provided to exclude it. Testing whether the phenotype persists without drug-mediated selection in the generated transgenic cell lines would help strengthen the case for episomal maintenance.
 
 (2) Promoters and terminators: 2.1) There is a discrepancy between the claimed number of loci (14), from which promoters used to drive luciferase expression were derived, and those detailed as having been actually generated in Table 1 (11). This inconsistency should be corrected or explained, as it creates uncertainty around the accuracy of the dataset. 2.2) Based on the presented evidence, constructs benchmarked in bioluminescence assays differed only in their promoter composition. Although terminator selection is mentioned in the Methods section, no additional details are provided; for instance, Table 1 and Figure 2 only list 23 promoters in total. Figure 2A likewise shows only promoter-dependent variation. If the terminator was held constant (LeguP1?), this should be stated explicitly. The authors may then consider revising the wording of having tested "23 promoter-terminator pairs" to better reflect that only promoters varied. 2.3) Promoter benchmarking was done with a plasmid lacking a selection marker, so it is unclear how the maintenance of the luciferase construct was ensured. Without selection, the observed reporter intensity could reflect differential or stochastic plasmid retention rather than promoter strength alone. The luminescence assay was performed 16-18 hours after transfection, but the rationale for this particular timeframe should be explained. In this context, the authors should explicitly state whether the experiments shown in Fig.2A represent biological triplicates or technical triplicates from a single transfection.
 
 (3) Figure 2: 3.1) Several aspects of the current design may lead to ambiguity for the reader. The boxplots are colour-coded, but it is unclear whether the colours carry meaning or are purely decorative. Because the data are already spatially separated into bins, additional random colouring is redundant and may suggest distinctions that are not intended. In addition, part A of Figure 2 is split into two panels, with the scale for the left panel shown in the right panel and some of the boxplot colours falling in the range of the scale, but not in line with their counterparts in the left panel. Because the colour use is not consistent, it is difficult to tell whether the same scale should be applied to both panels or how it should be interpreted. 3.2) The left panel of part A uses a diverging blue-white-red colour scheme, which is most appropriate when the midpoint represents a meaningful central value such as zero. Because the values shown in this graph are only positive, a non-diverging 2-colour scale or a colour palette such as 'viridis' would make the plot easier to interpret. 3.3) A black background should be avoided: 'B' and 'C' labels are invisible, and it draws attention to a distracting design feature rather than the data themselves.
 
 (4) Figure 3: 4.1) Individual snapshots should be separated more clearly, either by using a white background or by adding visible borders to make the overall composition clearer. As currently displayed, some boundaries between fluorescent channels resemble image artifacts rather than intentional panel divisions. 4.2) In parts B-D, the legend should explain more clearly what each image shows, and the figure itself would benefit from annotations. There seem to be three sub-panels in each 'condition' of part B (as well as C and D): while the middle and rightmost panel can be easily inferred to represent the fluorescent protein and bright-field image, what the leftmost panels represent is not specified. If DAPI was used to dye DNA, an explanation why mostly multiple labelled regions are visible should be provided. 4.3) Cell morphology and appearance differ markedly between UnaG/smURFP and SNAP-tag images, which should be explained. A microscope issue is mentioned in the main text, but if that was the cause, the authors should consider replacing the images, as the current distortions complicate interpretation.
 
 Review 3
Visit annotations in context

Tags

Review 3

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.04.28.721505v1
www.biorxiv.org www.biorxiv.org

Systematic Analysis of Network-driven Adaptive Resistance to CDK4/6 and Estrogen Receptor Inhibition using Meta-Dynamic Network Modelling

3
1. Public_Reviews 02 Jun 2026
 
 in eLife
 
 eLife Assessment
 
 This manuscript presents a useful computational framework for systematically characterising how heterogeneity in initial conditions or biophysical parameters shapes the dynamic behaviour of protein signalling networks, with potential relevance to understanding adaptive drug resistance. While the approach represents a significant methodological contribution, the extent to which its conclusions are biologically informative remains debated, as the model is only qualitatively compared with experimental data and lacks quantitative validation. As a result, the strength of evidence supporting the mechanistic claims is viewed as incomplete.
 
 Summary
2. Public_Reviews 02 Jun 2026
 
 in eLife
 
 Joint Public Review:
 
 In this manuscript, the authors proposed an approach to systematically characterise how heterogeneity in a protein signalling network affects its emergent dynamics, with particular emphasis on drug-response signalling dynamics in cancer treatments. They named this approach Meta Dynamic Network (MDN) modelling, as it aims to consider the potential dynamic responses globally, varying both initial conditions (i.e., expression levels) and biophysical parameters (i.e., protein interaction parameters). By characterising the "meta" response of the network, the authors propose that the method can provide insights not only into the possible dynamic behaviours of the system of interest but also into the likelihood and frequency of observing these dynamic behaviours in the natural system.
 
 The authors study the Early Cell Cycle (ECC) network as a proof of concept, focusing on pathways involving PI3K, EGFR, and CDK4/6 with the aim of identifying mechanisms that may underlie resistance to CDK4/6 inhibition in cancer. The biochemical reaction model comprises 50 state variables and 94 kinetic parameters, implemented in SBML and simulated in Matlab. A central component of the study is the generation of large ensembles of model instances, including 100,000 randomly sampled parameter sets intended to represent intra-tumour heterogeneity. On the basis of these simulations, the authors conclude that heterogeneity in kinetic rate parameters plays a stronger role in driving adaptive resistance than variation in baseline protein expression levels, and that resistance emerges as a network-level property rather than from individual components alone. The revised manuscript provides additional clarification regarding aspects of the simulation and filtering procedures and frames the comparison with experimental data as qualitative. Nonetheless, the study is best interpreted as a theoretical and exploratory analysis of the model's behaviour under heterogeneous conditions. Consequently, questions remain regarding the biological grounding of the sampled parameter regimes and the extent to which the reported frequencies of resistance-associated behaviours can be directly interpreted in physiological terms.
 
 While the authors propose a potentially useful computational framework to explore how heterogeneity shapes dynamic responses to drug perturbation, a number of important conceptual and methodological concerns remain to be addressed:
 
 (1) The sampling of kinetic parameters constitutes the backbone of the manuscript, yet important concerns remain regarding its biological grounding and transparency. Although the revised version provides additional clarification on the exploration of "model instances", it is still not sufficiently clear how parameter values and initial conditions are generated, nor how the chosen ranges relate to biological measurements. The kinetic rates are sampled over broad intervals without explicit justification in terms of experimentally measured bounds or inferred distributions. As a consequence, it remains uncertain whether the ensemble of simulated behaviours reflects physiologically plausible cellular regimes or primarily the properties of the assumed parameter space. In this context, the large-scale sampling (100,000 parameter sets) resembles a Monte Carlo exploration of the model rather than a biologically calibrated representation of tumour heterogeneity.
 
 Furthermore, the adequacy of the sampling strategy in such a high-dimensional space (94 free parameters) remains open to question. In the absence of biologically informed constraints, the combinatorial space of possible parameter configurations is vast, and it is unclear to what extent the sampled ensembles can be considered representative. This issue is particularly relevant because the manuscript interprets the frequency of resistance-associated behaviours as indicative of their likelihood.
 
 The validation presented in Figure 7 does not fully resolve these concerns. The comparison with experimental data is qualitative, and the simulations are performed in arbitrary time units, which complicates direct interpretation alongside time-resolved experimental measurements. Moreover, certain qualitative discrepancies between simulated and experimental trends (e.g., persistent versus decreasing CDK4/6 activity) are not thoroughly discussed. As this figure represents the primary empirical reference point in the manuscript, the extent to which the model captures experimentally observed dynamics remains uncertain.
 
 Finally, aspects of presentation continue to limit transparency. Parameter ranges are described at different points in the manuscript but are not consolidated clearly in the Methods, and the definition of initial conditions remains ambiguous - particularly whether these correspond to conserved quantities or to the dynamic variables used to initialise simulations. In addition, the exact number of model instances underlying specific analyses and figures is not always explicit. Greater clarity on these issues is essential for assessing reproducibility and for interpreting the quantitative claims of the study.
 
 (2) A central conclusion of the manuscript is that heterogeneity in protein-protein interaction kinetics is a stronger driver of adaptive resistance than heterogeneity in protein expression levels. To assess the latter, the authors fix a nominal set of kinetic parameters and generate 100,000 random initial concentrations for the 50 model species. However, according to the simulation protocol described in the manuscript, each trajectory includes three phases: (i) simulation under starvation conditions to equilibrium, (ii) mitogenic stimulation to a second ("fed") equilibrium, and (iii) application of drug treatment. The equilibrium concentrations reached in phases (i) and (ii) are determined by the kinetic parameters of the model and are independent of the initial concentrations, provided the system converges to a stable steady state. In dynamical systems terms, stable equilibria are defined by the parameter set and attract all initial conditions within their basin of attraction. Since the kinetic parameters are fixed in this experiment, the pre-treatment equilibrium that serves as the starting point for drug application should likewise be fixed. Under these conditions, it is therefore not unexpected that sampling a large number of initial concentrations has limited influence on the treated dynamics.
 
 This raises conceptual questions about the interpretation of the comparison between kinetic and expression heterogeneity. If the system converges to a unique stable steady state prior to treatment, then variability in initial concentrations does not propagate into variability in drug response, and the observed dominance of kinetic heterogeneity may partly reflect this structural property of the model rather than a biological principle. Clarification is needed regarding whether multiple steady states exist under the nominal parameter set, and if so, how basins of attraction are explored.
 
 More broadly, it remains unclear why initial protein concentrations can be sampled independently of the kinetic parameters. In biological systems, steady-state expression levels are typically determined by the underlying kinetic rates. A more consistent approach might require constraining initial concentrations to correspond to equilibrium states of the chosen parameter set, thereby introducing relationships between at least some of the 50 initial conditions and the 94 kinetic parameters. Finally, the manuscript employs a non-standard terminology regarding "initial conditions," which may further obscure interpretation of these results and would benefit from clarification.
 
 (3) The technical implementation of the modelling and simulation framework remains difficult to evaluate due to insufficient methodological detail. Although the authors state that kinetic parameters are randomly sampled, the manuscript does not specify the distributions from which parameters are drawn, nor whether potential correlations between parameters are considered or explicitly ignored. Without this information, it is not possible to assess how implicit modelling assumptions shape the ensemble of simulated behaviours. Given that the conclusions rely on frequency-based interpretations across sampled parameter sets, greater transparency regarding the sampling procedure is essential.
 
 A further concern relates to the parameter filtering step. The authors report that the "vast majority" of sampled parameter sets produced systems that were "too stiff," and that these were excluded on the grounds that stiff dynamics are not biologically plausible. However, the manuscript does not clearly define how stiffness is assessed, nor why stiffness is interpreted as biologically unrealistic rather than as a numerical property of the formulation. In standard practice, stiff systems are typically handled using appropriate implicit solvers rather than being discarded. Similarly, parameter sets that produce negative state values are excluded, yet such behaviour may arise from numerical artefacts rather than from intrinsic model inconsistency. The rationale for excluding these parameter sets, rather than adapting the numerical scheme, is not sufficiently justified.
 
 The reported rejection rate - approximately 90% of sampled parameter sets - is substantial and raises questions regarding the interplay between model structure, parameter ranges, and numerical methods. As currently described, the filtering step appears to select parameter sets based primarily on computational tractability rather than on experimentally motivated biological criteria. The manuscript would be strengthened by clarifying whether the retained parameter sets are representative of biologically meaningful regimes, and by distinguishing clearly between exclusions based on biological plausibility and those arising from numerical considerations.
 
 Finally, important aspects of the simulation protocol require clarification. The model is simulated under "fasted" and "fed" conditions until equilibrium is reached, yet the criterion used to determine convergence is not specified. It would be important to describe how equilibrium is assessed (e.g., based on the norm of the time derivatives). Additionally, it remains unclear whether the mitogenic stimulus applied in the "fed" phase is assumed to be constant over time and, if so, how this assumption relates to biological experimental conditions. Greater detail on these implementation choices is necessary to ensure interpretability and reproducibility.
 
 (4) The manuscript states that the modelling conclusions are strongly supported by existing literature; however, the validation presented does not fully substantiate this claim. As noted above, the comparison with CDK2 and CDK4/6 experimental data remains qualitative, and the use of arbitrary simulation time units complicates interpretation of temporal agreement. The extent to which the model quantitatively or mechanistically recapitulates experimentally observed dynamics therefore remains uncertain.
 
 The claim that the model reproduces known resistance mechanisms is also difficult to assess in light of Figure S10, where a large fraction of network nodes (~80%) appear implicated in resistance under some conditions. If most components of the network can, in at least some parameter regimes, be associated with resistance phenotypes, the resulting lack of selectivity weakens the strength of model-based validation. It becomes challenging to distinguish specific mechanistic insights from generic consequences of network connectivity. In addition, the Supplementary Information notes that certain components of the mitogenic and cell-cycle pathways were abstracted or excluded in order to maintain computational tractability. While such abstraction is understandable in a large ODE framework, it raises interpretative questions. Proteins identified as potential resistance drivers within the model may, in some cases, represent aggregated or simplified pathway effects. Clarifying in the main text how such abstractions may influence the attribution of resistance mechanisms would strengthen the biological interpretation of the results.
 
 Drug inhibition is central to the manuscript's conclusions. The revised version clarifies that inhibition is implemented as a fixed fractional modification of specific kinetic rate laws. This abstraction is appropriate for exploring network-level responses, but it represents a stylised perturbation rather than a pharmacologically calibrated model of drug action. For full interpretability and reproducibility, the mathematical form of the modified rate laws, as well as the timing of inhibition relative to network equilibration, should be specified unambiguously. The biological implications of the findings depend critically on understanding this modelling choice.
 
 The one-at-a-time perturbation analysis presented in Figure 5 provides an interpretable ranking of first-order control points across the ensemble and offers mechanistic insight into primary sensitivities of the network. However, many targeted therapies act on multiple components, and resistance frequently arises through combinatorial mechanisms. The reported rankings should therefore be interpreted as identifying primary influences under isolated perturbations, rather than as a comprehensive account of multi-target drug behaviour.
 
 Overall, the manuscript succeeds in presenting a conceptual and exploratory framework for analysing how signalling network topology can shape the qualitative landscape of adaptive responses under heterogeneous kinetic conditions. Its principal contribution lies in establishing a systematic platform for large-scale in silico exploration. At the same time, the current limitations in biological calibration, parameter grounding, and validation constrain the extent to which the conclusions can be interpreted as predictive or quantitatively representative of specific tumour contexts. Addressing these issues would further strengthen the connection between the theoretical landscape described here and experimentally observed resistance dynamics.
 
 Review 1
3. Public_Reviews 02 Jun 2026
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the current reviews.
 
 eLife Assessment
 
 This manuscript presents a useful computational framework for systematically characterising how heterogeneity in initial conditions or biophysical parameters shapes the dynamic behaviour of protein signalling networks, with potential relevance to understanding adaptive drug resistance. While the approach represents a significant methodological contribution, the extent to which its conclusions are biologically informative remains debated, as the model is not qualitatively or quantitatively validated against experimental data. As a result, the strength of evidence supporting the mechanistic claims is viewed as incomplete.
 
 We thank the editors and reviewers for their further assessment of the manuscript. The revised public review raises several issues that overlap with points addressed in our previous response, particularly around the intended scope of MDN modelling, the interpretation of parameter sampling, and the qualitative nature of the experimental comparison. In this final revision, we have made targeted clarifications in the main text, Methods, figure legends, and Supplementary Information to make these points more explicit for readers. We emphasise that the present work is intended as a theoretical and exploratory framework for mapping the qualitative dynamic behaviours accessible to a fixed network topology, rather than as a quantitatively calibrated model of a specific tumour or cell line.
 
 Joint Public Review:
 
 In this manuscript, the authors proposed an approach to systematically characterise how heterogeneity in a protein signalling network affects its emergent dynamics, with particular emphasis on drug-response signalling dynamics in cancer treatments. They named this approach Meta Dynamic Network (MDN) modelling, as it aims to consider the potential dynamic responses globally, varying both initial conditions (i.e., expression levels) and biophysical parameters (i.e., protein interaction parameters). By characterising the "meta" response of the network, the authors propose that the method can provide insights not only into the possible dynamic behaviours of the system of interest but also into the likelihood and frequency of observing these dynamic behaviours in the natural system.
 
 The authors study the Early Cell Cycle (ECC) network as a proof of concept, focusing on pathways involving PI3K, EGFR, and CDK4/6 with the aim of identifying mechanisms that may underlie resistance to CDK4/6 inhibition in cancer. The biochemical reaction model comprises 50 state variables and 94 kinetic parameters, implemented in SBML and simulated in Matlab. A central component of the study is the generation of large ensembles of model instances, including 100,000 randomly sampled parameter sets intended to represent intra-tumour heterogeneity. On the basis of these simulations, the authors conclude that heterogeneity in kinetic rate parameters plays a stronger role in driving adaptive resistance than variation in baseline protein expression levels, and that resistance emerges as a network-level property rather than from individual components alone. The revised manuscript provides additional clarification regarding aspects of the simulation and filtering procedures and frames the comparison with experimental data as qualitative. Nonetheless, the study is best interpreted as a theoretical and exploratory analysis of the model's behaviour under heterogeneous conditions. Consequently, questions remain regarding the biological grounding of the sampled parameter regimes and the extent to which the reported frequencies of resistance-associated behaviours can be directly interpreted in physiological terms.
 
 While the authors propose a potentially useful computational framework to explore how heterogeneity shapes dynamic responses to drug perturbation, a number of important conceptual and methodological concerns remain to be addressed:
 
 (1) The sampling of kinetic parameters constitutes the backbone of the manuscript, yet important concerns remain regarding its biological grounding and transparency. Although the revised version provides additional clarification on the exploration of "model instances", it is still not sufficiently clear how parameter values and initial conditions are generated, nor how the chosen ranges relate to biological measurements. The kinetic rates are sampled over broad intervals without explicit justification in terms of experimentally measured bounds or inferred distributions. As a consequence, it remains uncertain whether the ensemble of simulated behaviours reflects physiologically plausible cellular regimes or primarily the properties of the assumed parameter space. In this context, the large-scale sampling (100,000 parameter sets) resembles a Monte Carlo exploration of the model rather than a biologically calibrated representation of tumour heterogeneity.
 
 Parameters were sampled from a uniform distribution spanning values 10-5 to 104. Conserved totals were sampled from the range 100 to 104. Each of these is roughly in line with measured spans of orders of magnitude for parameter values and protein expression (REF). Again, we would like to point out that we intentionally kept our ranges broad, and sampled from uniform distributions, to assess upper bounds of heterogeneity, not biologically informed heterogeneity. We also comment on the likely effects of expanding these ranges in our response to (26) in our original rebuttal.
 
 Main text has been updated to include this information. LINES: 175-179
 
 Furthermore, the adequacy of the sampling strategy in such a high-dimensional space (94 free parameters) remains open to question. In the absence of biologically informed constraints, the combinatorial space of possible parameter configurations is vast, and it is unclear to what extent the sampled ensembles can be considered representative. This issue is particularly relevant because the manuscript interprets the frequency of resistance-associated behaviours as indicative of their likelihood.
 
 This was addressed extensively in our original rebuttal, response to point (3). A new section was added to the supplementary text, along with new figures demonstrating the validity of the claims.
 
 The validation presented in Figure 7 does not fully resolve these concerns. The comparison with experimental data is qualitative, and the simulations are performed in arbitrary time units, which complicates direct interpretation alongside time-resolved experimental measurements. Moreover, certain qualitative discrepancies between simulated and experimental trends (e.g., persistent versus decreasing CDK4/6 activity) are not thoroughly discussed. As this figure represents the primary empirical reference point in the manuscript, the extent to which the model captures experimentally observed dynamics remains uncertain.
 
 This was addressed in the original rebuttal, response to point (12). The actual time units are arbitrary in the sense that they are determined by the units of the parameters in our model. It is important to understand that the meta-dynamic analysis is not calibrated to data and so the meaning of time units is far less important than the distribution of behaviours. We have updated the figure to reflect the arbitrary units of time in our simulations.
 
 Finally, aspects of presentation continue to limit transparency. Parameter ranges are described at different points in the manuscript but are not consolidated clearly in the Methods, and the definition of initial conditions remains ambiguous - particularly whether these correspond to conserved quantities or to the dynamic variables used to initialise simulations. In addition, the exact number of model instances underlying specific analyses and figures is not always explicit. Greater clarity on these issues is essential for assessing reproducibility and for interpreting the quantitative claims of the study.
 
 (2) A central conclusion of the manuscript is that heterogeneity in protein-protein interaction kinetics is a stronger driver of adaptive resistance than heterogeneity in protein expression levels. To assess the latter, the authors fix a nominal set of kinetic parameters and generate 100,000 random initial concentrations for the 50 model species. However, according to the simulation protocol described in the manuscript, each trajectory includes three phases: (i) simulation under starvation conditions to equilibrium, (ii) mitogenic stimulation to a second ("fed") equilibrium, and (iii) application of drug treatment. The equilibrium concentrations reached in phases (i) and (ii) are determined by the kinetic parameters of the model and are independent of the initial concentrations, provided the system converges to a stable steady state. In dynamical systems terms, stable equilibria are defined by the parameter set and attract all initial conditions within their basin of attraction. Since the kinetic parameters are fixed in this experiment, the pre-treatment equilibrium that serves as the starting point for drug application should likewise be fixed. Under these conditions, it is therefore not unexpected that sampling a large number of initial concentrations has limited influence on the treated dynamics.
 
 This raises conceptual questions about the interpretation of the comparison between kinetic and expression heterogeneity. If the system converges to a unique stable steady state prior to treatment, then variability in initial concentrations does not propagate into variability in drug response, and the observed dominance of kinetic heterogeneity may partly reflect this structural property of the model rather than a biological principle. Clarification is needed regarding whether multiple steady states exist under the nominal parameter set, and if so, how basins of attraction are explored.
 
 More broadly, it remains unclear why initial protein concentrations can be sampled independently of the kinetic parameters. In biological systems, steady-state expression levels are typically determined by the underlying kinetic rates. A more consistent approach might require constraining initial concentrations to correspond to equilibrium states of the chosen parameter set, thereby introducing relationships between at least some of the 50 initial conditions and the 94 kinetic parameters. Finally, the manuscript employs a non-standard terminology regarding "initial conditions," which may further obscure interpretation of these results and would benefit from clarification.
 
 This was addressed in the original rebuttal, response to point (4). Text was modified to clarify what was meant by initial conditions to clarify that this meant the conserved total for the protein species. A supplementary figure (supp. fig. 4) was added to demonstrate that changes to the conserved totals of protein species does, in fact, shift the dynamics and steady state equilibria of protein species. Text was updated throughout the paper to ensure that our definition of ‘initial conditions’ was consistent throughout the text.
 
 (3) The technical implementation of the modelling and simulation framework remains difficult to evaluate due to insufficient methodological detail. Although the authors state that kinetic parameters are randomly sampled, the manuscript does not specify the distributions from which parameters are drawn, nor whether potential correlations between parameters are considered or explicitly ignored. Without this information, it is not possible to assess how implicit modelling assumptions shape the ensemble of simulated behaviours. Given that the conclusions rely on frequency-based interpretations across sampled parameter sets, greater transparency regarding the sampling procedure is essential.
 
 Updated the main text to clarify random sampling from a log transformed uniform distribution. LINES: 175-179
 
 A further concern relates to the parameter filtering step. The authors report that the "vast majority" of sampled parameter sets produced systems that were "too stiff," and that these were excluded on the grounds that stiff dynamics are not biologically plausible. However, the manuscript does not clearly define how stiffness is assessed, nor why stiffness is interpreted as biologically unrealistic rather than as a numerical property of the formulation. In standard practice, stiff systems are typically handled using appropriate implicit solvers rather than being discarded. Similarly, parameter sets that produce negative state values are excluded, yet such behaviour may arise from numerical artefacts rather than from intrinsic model inconsistency. The rationale for excluding these parameter sets, rather than adapting the numerical scheme, is not sufficiently justified.
 
 The reported rejection rate - approximately 90% of sampled parameter sets - is substantial and raises questions regarding the interplay between model structure, parameter ranges, and numerical methods. As currently described, the filtering step appears to select parameter sets based primarily on computational tractability rather than on experimentally motivated biological criteria. The manuscript would be strengthened by clarifying whether the retained parameter sets are representative of biologically meaningful regimes, and by distinguishing clearly between exclusions based on biological plausibility and those arising from numerical considerations.
 
 This was extensively addressed in the original rebuttal, response to points (6) and (7). Main text was updated to clarify that a solver specific for stiff systems was used. Furthermore, we addressed this issue but consequential analysis revealed that lack of drug response and not achieving steady state in the simulated time period now accounted for the majority of filtering. This had no effect on the distributions of behaviours identified in our analyses. Main text was updated to reflect these changes. Rejection rate was explicitly discussed in main text.
 
 Finally, important aspects of the simulation protocol require clarification. The model is simulated under "fasted" and "fed" conditions until equilibrium is reached, yet the criterion used to determine convergence is not specified. It would be important to describe how equilibrium is assessed (e.g., based on the norm of the time derivatives). Additionally, it remains unclear whether the mitogenic stimulus applied in the "fed" phase is assumed to be constant over time and, if so, how this assumption relates to biological experimental conditions. Greater detail on these implementation choices is necessary to ensure interpretability and reproducibility.
 
 This was addressed in the original rebuttal, response to point (8). Clarification about simulations were added to main text, including explicitly stating that mitogenic and drug inputs were continuous stepwise functions and how steady state equilibrium was defined/calculated.
 
 (4) The manuscript states that the modelling conclusions are strongly supported by existing literature; however, the validation presented does not fully substantiate this claim. As noted above, the comparison with CDK2 and CDK4/6 experimental data remains qualitative, and the use of arbitrary simulation time units complicates interpretation of temporal agreement. The extent to which the model quantitatively or mechanistically recapitulates experimentally observed dynamics therefore remains uncertain.
 
 This was addressed in the original rebuttal, response to points (13) and (14). Wording was changed to remove the suggestion of strong evidence and the tone was shifted to reflect reasonable qualitative support for our analysis, not strong evidence.
 
 The claim that the model reproduces known resistance mechanisms is also difficult to assess in light of Figure S10, where a large fraction of network nodes (~80%) appear implicated in resistance under some conditions. If most components of the network can, in at least some parameter regimes, be associated with resistance phenotypes, the resulting lack of selectivity weakens the strength of model-based validation. It becomes challenging to distinguish specific mechanistic insights from generic consequences of network connectivity.
 
 In addition, the Supplementary Information notes that certain components of the mitogenic and cell-cycle pathways were abstracted or excluded in order to maintain computational tractability. While such abstraction is understandable in a large ODE framework, it raises interpretative questions. Proteins identified as potential resistance drivers within the model may, in some cases, represent aggregated or simplified pathway effects. Clarifying in the main text how such abstractions may influence the attribution of resistance mechanisms would strengthen the biological interpretation of the results.
 
 This was addressed in the original rebuttal, response to points (15). The discussion was significantly revised to reflect our reasoning with respect to our conclusions. We completely understand that more work could be done to verify our claims, however, our intention is to demonstrate the generalised relationship between network heterogeneity and drug resistance, not to predict patient-specific resistance mechanisms.
 
 Drug inhibition is central to the manuscript's conclusions. The revised version clarifies that inhibition is implemented as a fixed fractional modification of specific kinetic rate laws. This abstraction is appropriate for exploring network-level responses, but it represents a stylised perturbation rather than a pharmacologically calibrated model of drug action. For full interpretability and reproducibility, the mathematical form of the modified rate laws, as well as the timing of inhibition relative to network equilibration, should be specified unambiguously. The biological implications of the findings depend critically on understanding this modelling choice.
 
 All equations were included in the supplementary model files, including typeset ODEs, as requested by the reviewers. R15 and R27 contain the relevant equations, which specify the exact implementation of the drug inhibition. Number of time units per simulation phase now included in main text. LINES: 166 – 168
 
 The one-at-a-time perturbation analysis presented in Figure 5 provides an interpretable ranking of first-order control points across the ensemble and offers mechanistic insight into primary sensitivities of the network. However, many targeted therapies act on multiple components, and resistance frequently arises through combinatorial mechanisms. The reported rankings should therefore be interpreted as identifying primary influences under isolated perturbations, rather than as a comprehensive account of multi-target drug behaviour.
 
 Overall, the manuscript succeeds in presenting a conceptual and exploratory framework for analysing how signalling network topology can shape the qualitative landscape of adaptive responses under heterogeneous kinetic conditions. Its principal contribution lies in establishing a systematic platform for large-scale in silico exploration. At the same time, the current limitations in biological calibration, parameter grounding, and validation constrain the extent to which the conclusions can be interpreted as predictive or quantitatively representative of specific tumour contexts. Addressing these issues would further strengthen the connection between the theoretical landscape described here and experimentally observed resistance dynamics.
 
 Joint Recommendations for the authors:
 
 (1) Supplementary Figure S4 is not sufficiently explained in its current form. The structure of the figure, the meaning of its colour coding, and the intended interpretation are not clearly described, making it difficult for readers to extract the key message without substantial inference. Given that the manuscript relies heavily on large-scale ensemble analyses, clear visual communication is essential. A more detailed legend, explicit definition of axes and colour scales, and improved visual labelling would substantially enhance clarity, accessibility, and reproducibility.
 
 Supp. Fig. 4 legend updated with additional detail. LINES: Supp. Text. 256 - 263
 
 (2) The approximately 90% rejection rate of sampled parameter sets should be reported explicitly in the main text of the manuscript rather than only in the Supplementary Information. Given the central role of large-scale parameter sampling in the study, this level of exclusion is a critical aspect of the modelling workflow and directly affects the interpretation of robustness and representativeness. Clear disclosure in the main text would allow readers to properly evaluate the effective size of the analysed ensemble and the implications of the filtering procedure for the generality of the conclusions.
 
 This was explicitly addressed in the original rebuttal.
 
 (3) The model would benefit from quantitative validation against experimental data. In Figure 7C, the authors note in the response letter that the simulations are performed in arbitrary time units. However, the figure itself labels the time axis in hours, which may lead readers to infer a direct quantitative correspondence between simulated and experimental time courses. If the simulations are not calibrated to real time, this labelling is potentially misleading and should be corrected. Either the model should be explicitly time-calibrated and quantitatively compared to experimental measurements, or the figure should clearly indicate that the time axis is dimensionless. Clarifying this point is essential to avoid overinterpretation of the agreement between model and data.
 
 Label updated.
 
 The following is the authors’ response to the original reviews.
 
 Joint Public Reviews:
 
 In this manuscript, the authors proposed an approach to systematically characterise how heterogeneity in a protein signalling network affects its emergent dynamics, with particular emphasis on drug-response signalling dynamics in cancer treatments. They named this approach Meta Dynamic Network (MDN) modelling, as it aims to consider the potential dynamic responses globally, varying both initial conditions (i.e., expression levels) and biophysical parameters (i.e., protein interaction parameters). By characterising the "meta" response of the network, the authors propose that the method can provide insights not only into the possible dynamic behaviours of the system of interest but also into the likelihood and frequency of observing these dynamic behaviours in the natural system.
 
 The authors studied the Early Cell Cycle (ECC) network as a proof of concept, specifically focusing on PI3K, EGFR, and CDK4/6, with particular interest in identifying the mechanisms that cancer could potentially exploit to display drug resistance. The biochemical reaction model consists of 50 equations (state variables) with 94 kinetic parameters, described using SBML and computed in Matlab. Based on the simulations, the authors concluded the following main points: a large number of network states can facilitate resistance, the individual biophysical parameters alone are insufficient to predict resistance, and adaptive resistance is an emergent property of the network. Finally, the authors attempt to validate the model's prediction that differential core sub-networks can drive drug resistance by comparing their observations with the knock-out information available in the literature. The authors identified subnetworks potentially responsible for drug resistance through the inhibition of individual pathways. Importantly, some concerns regarding the methodology are discussed below, putting in doubt the validity of the main claims of this work.
 
 While the authors proposed a potentially useful computational approach to better understand the effect of heterogeneity in a system's dynamic response to a drug treatment (i.e., a perturbation), there are important weaknesses in the manuscript in its current form:
 
 (1) It is unclear how the random parameter sets (i.e., model instances) and initial conditions are generated, and how this choice biases or limits the general conclusions for the case studied. Particularly, it is not evident how the kinetic rates are related to any biological data, nor if the parameter distributions used in this study have any biological relevance. (2) Related to this problem, it is not clear whether the considered 100,000 random parameter samples sufficiently explore parameter space due to the combinatorial explosion that arises from having 94 free parameters, nor 100,000 random initial conditions for a system with 50 species (variables). (3) Moreover, the authors filter out all the cases with stiff behaviour. This filtering step appears to select model parameters based on computational convenience, rather than biological plausibility. (4) Also, it is not clear how exactly the drug effect is incorporated into the model (e.g., molecular inhibition?), nor how it is evaluated in the dynamic simulations (e.g., at the beginning of the simulation?). Moreover, in a complex network, the results may differ depending on whether the inhibition is applied from the start or after the network has reached a stable state. (5) On the same line, the conclusions need to be discussed in the context of stability, particularly when evaluating the role of initial conditions. As stable steady states are determined by the model parameters, once again, the details of how the perturbation effect is evaluated on the simulation dynamics are critical to interpret the results. (6) The presented validation of the model results (Fig. 7) is only qualitative, and the interpretation is not carefully discussed in the manuscript, particularly considering the comparison between fold-change responses without specifying the baseline states.
 
 We thank the reviewers for their thoughtful and constructive comments. In response to their comments, we have undertaken a substantial revision to address all the comments, improve clarity, transparency, and robustness while preserving the paper’s core contribution: a principled, scalable framework (MDN) for mapping how molecular heterogeneity and network architecture shape adaptive drug-response dynamics. At a high level, we clarified the study design and analysis goals, tightened definitions, and added methodological detail where it most advances interpretability. Importantly, these updates leave the analytical pipelines and major conclusions unchanged.
 
 Conceptually, we now make explicit that our objective is coverage of the output space of qualitative dynamics supported by the network topology, not exhaustive enumeration of parameter space. To support this, we added a convergence analysis and clarified that “triplicates” refers to independent ensembles used to demonstrate reproducibility. We also refined how we describe and implement initial conditions (as conserved total abundances that encode expression heterogeneity) and reframed filtering as minimal numerical/feasibility checks, using rejection sampling to obtain the prespecified ensemble size. Solver choices and input modelling (constant step mitogen/drug) are now spelled out succinctly.
 
 We expanded the model specification and rationale (complete reaction list with rate laws and brief biological justifications in the Supplement) and unified terminology throughout. Figures and legends have been overhauled for readability and accuracy, with missing labels added and ordering corrected. For validation, we clarified the nature of the single-cell reporter readout, improved Figure 7’s presentation, and emphasised - consistent with our aims - that comparisons are qualitative.
 
 Finally, we have rewritten the Discussion to centre on interpretation, implications, and connect our findings to the literature. It now: (i) frames MDN as a systems-level framework that links molecular heterogeneity to qualitative signalling “meta-dynamics” and adaptive escape under constant drug pressure; (ii) highlights two key findings: an asymmetry in control (interaction kinetics exert stronger, more consistent influence than protein abundance) and a topology-driven convergence whereby a vast parameter space funnels into a finite set of recurrent behaviours; (iii) shows that resistance is a network-level property, with many possible routes but a small set of recurrent hubs/modules dominating; and (iv) provides a qualitative alignment with single-cell reporter data while clarifying the intent and limits of that comparison. Moreover, we now explicitly discuss limitations (rate-law simplifications, broad priors, determinism, and modular abstractions) and outline next steps for future research, including data-constrained priors and stochastic extensions.
 
 We believe these revisions materially strengthen the manuscript and fully address all the reviewers’ comments. A detailed, point-by-point response follows.
 
 Joint Recommendations for the Authors:
 
 (1) It is confusing exactly what are the different sets evaluated in each cases, e.g. "generated 100,000 model instances, each with the same set of ICs but a unique set of randomly generated parameter values" (lines 299-300), "generated 100,000 model instances (in triplicate), each with the same set of 'nominal' parameter values (see supplementary Table S1), and a unique set of ICs, and repeated the analysis as performed previously" (lines 366-368), "combined the 1000 IC sets with each parameter set to create 1000 model instances" (lines 382-383), "repeated for 1000 parameter sets, allowing us to observe how frequently IC variation induced adaptive resistance independent of the chosen parameter set" (lines 386-387). A small table or just a clearer explanation is needed.
 
 In response to these comments, we have revised the main text to clarify the process of model instance generation. Specifically, we have made changes at page 7: line 297 - page 8: line 302, page 8: lines 305 - 310, page 9: lines 372-378, and page 9: line 384 – page 10: line 399 in the revised main text.
 
 We have also added a new Figure (Figure S1) to the supplementary file to allow readers to visualise the model generation process for each relevant set of experiments. Supplementary figures are referenced in the main text where appropriate.
 
 (2) The authors mentioned performing each simulation in triplicate, which is puzzling as the model is based on deterministic ODEs with fixed parameters for each simulation. Under such conditions, one would anticipate identical results from multiple simulations with the same initial conditions and fixed parameters. Perhaps the authors expect the model to exhibit chaos or aim to assess the precision of the parameter estimates through triplicate simulations. Further clarification from the authors would be valuable to comprehend the rationale behind conducting triplicate simulations in a deterministic setting.
 
 We agree that repeating deterministic ODE simulations with identical inputs would be redundant. In our study, “triplicate” referred instead to generating three independent ensembles of 100,000 unique model instances each, where model parameters (or initial conditions) were randomly resampled. These ensembles were analysed separately to assess whether the inferred meta-dynamic distributions converged robustly. Indeed, the distributions from the three replicates were nearly indistinguishable, confirming that the results are reproducible and not artefacts of a particular random draw.
 
 We have revised the main text to clarify this distinction (page 8: lines 305 - 310) and added an extended explanation for meta-dynamic behaviour convergence in the new section Error Convergence in the supplementary text (page 6: lines 184 - 210).
 
 (3) While the lack of a connection between model parameters and biological data (mentioned in the public review) may not be a fatal flaw in the manuscript, the concern about the 100,000 random samples being insufficient to explore the parameter space is valid. In a thought experiment, considering the high and low rate for each parameter and the combinatorial explosion of possibilities (2^94), the number of simulations performed (100,000) represents only an extremely small fraction of the entire parameter space (~1/10^(23)). This limitation might not accurately capture the true heterogeneity present inside a solid tumour. One potential solution is to determine biological bounds on model parameters through data fitting, which can provide more meaningful constraints for the simulations. Alternatively, increasing the number of simulations and adopting more efficient sampling techniques can enhance the coverage of possible parameter sets.
 
 We thank the reviewer for this insightful comment. We agree that the 94-dimensional parameter space is vast, and that 100,000 simulations represent only a fraction of the total combinatorial possibilities. However, the objective of our study is not to exhaustively sample the entire parameter space, but rather to sufficiently sample the ‘output space’ - that is, the complete spectrum of qualitative dynamic behaviours the network topology can generate. The key question is whether 100,000 model instances are sufficient for the distribution of these output dynamics to converge.
 
 To formally address this, we have performed a convergence analysis, which is now detailed in the new supplementary section "Error Convergence" (Supplementary text page 6: lines 184 - 210) and illustrated in Supplementary Figure S12. This analysis demonstrates that the mean squared error (MSE) between dynamic distributions from N and 2N simulations exponentially decreases as N increases, and the distribution of protein dynamics changes negligibly well before reaching 100,000 instances. Furthermore, performing the entire analysis in triplicate with independent random seeds yielded nearly identical meta-dynamic maps (average standard deviation < 0.04%), giving us high confidence that we have robustly captured the network's behavioural repertoire.
 
 We believe this convergence occurs because the system is degenerate: many distinct parameter sets within the high-dimensional space map to the same qualitative outcome (e.g., 'rebound' or 'decreasing'). Our goal was to capture the set of possible outcomes, not every unique parameter combination that leads to them.
 
 Regarding the parameter range, we intentionally chose a broad, unbiased range (10-5 to 10<sup4>)as a proof-of-concept to delineate the theoretical upper limit of heterogeneity the network can support, thereby capturing even rare but potentially critical resistance dynamics. We agree with the reviewer that a future direction is to constrain these ranges using biological data. Such an approach would transition from defining what is possible (the focus of this manuscript) to predicting what is probable in a specific biological context. We have added this important point to the Discussion (page 16: lines 663-679) to highlight this avenue for future work.
 
 (4) One of the manuscript's main results indicates that protein interactions play a more significant role in driving adaptive resistance than protein expression. To explore the impact of protein expression, the authors fixed a nominal parameter set and generated 100,000 initial concentrations of the 50 proteins in the ODE model. However, the simulations' equilibrium concentrations in the "starvation" and "fed" phases, which form the initial condition for the treated phase, are uniquely determined by the nominal model's kinetic parameters and not the initial conditions, which remain identical for each simulation. From a dynamical systems perspective, stable steady states are determined by the model parameters and attract all initial conditions within their basin of attraction. As a result, a random sampling of the initial conditions has a limited impact on the model dynamics. The authors' conclusion that "the ability of expression to induce resistance also seems to be dependent on the master parameter set" can be explained by this dynamical systems perspective, where the resistance state corresponds to a stable steady state determined by the master parameter set. Considering this, the evidence presented in the manuscript may not fully support the authors' conclusion regarding the importance of protein expressions relative to protein dynamics. The discrepancy might be attributed to a possible misunderstanding of this point, and further clarification from the authors could be helpful.
 
 We thank the reviewer for the thoughtful perspective. We agree that, in a monostable system with fixed kinetic parameters and fixed conserved totals, varying only the initial split among moieties (e.g., X vs pX) will not change the final steady state; trajectories converge to the same attractor. In our analysis, however, “initial conditions” predominantly refer to total protein abundances (e.g., X_tot = X + pX + complexes), used as a proxy for expression heterogeneity. These totals are invariants on the simulated timescale (no synthesis/degradation in the pre-equilibration phases), and therefore alter the value of the steady state under a given parameter set. In other words, our IC sampling mostly varies conserved totals rather than merely redistributing a fixed total; hence the equilibrium reached after the starvation/fed pre-equilibrations depends on the sampled totals and the kinetics. This can be seen in the new Supplementary Figure S4, showing that changing the ICs does shift the eventual steady state even when kinetic parameters are fixed.
 
 We have revised the text to: (1) define ICs explicitly as total abundances for multi-state species, (2) distinguish “initial split” from “conserved totals,” and (3) clarify that expression effects are context-dependent rather than universally dominant (page 4: lines 139-141 and page 10: lines 413-416)
 
 (5) Additionally, it is important to note that the random sampling of 100,000 initial concentrations might not sufficiently explore the vast space of possible initial conditions. In the thought experiment mentioned earlier, where each protein can have high or low expression concentrations, there are approximately 2^(50) = ~10^(15) possible combinations of initial concentrations. Thus, the 100,000 random simulations only represent around ~1/10^(10) of the possible initial conditions in this simplistic scenario. Consequently, this limited sampling of initial conditions may not provide enough information to draw meaningful conclusions, even if the initial conditions were more directly linked to kinetic rates.
 
 Please see our response to Comment (3). Briefly, our ICs are continuous total abundances (conserved moieties), not binary high/low states; many IC configurations converge to the same qualitative attractors, so we estimate distributional properties rather than enumerate all combinations. Our convergence diagnostics (independent replicates and sample-size doubling) show that the meta-dynamic distributions stabilise well before N=100,000 (see Supplementary Figure S12). We have clarified this in the Supplementary Information (Error Convergence section) with the new convergence results.
 
 (6) The authors implement a parameter selection step in the manuscript, where they filter out parameter sets that lead to what they term non-biological simulations. However, the rationale for determining if a given parameter set results in a stiff system of ODEs remains unclear. The authors cite references [38,39] to support the claim that stiff equations are not biologically plausible. Still, upon review, it is evident that [38] does not include the term "stiff," and [39] discusses using implicit methods to simulate stiff ODE models without specifically commenting on the biological plausibility of stiff systems. The manuscript lacks direct evidence to justify the conclusion that filtering out parameter sets that result in stiff ODE systems is reasonable. Since the filtering step accounts for the majority of discarded parameter sets, a stronger foundation is required to support the statement that stiff equations are non-biological.
 
 We thank the reviewer for pointing out the issue in our original justification. The reviewer is correct: stiff systems are a common feature of biological models, and our claim that they are likely ‘biologically implausible’ was not well substantiated. The filtering of these model instances was, in fact, due to a computational limitation rather than a biological principle. The issue was that these parameter sets produced systems of ODEs that were so numerically stiff they were unsolvable within a reasonable timeframe by the SUNDIALS ODE solver suite, which is specifically designed for such systems.
 
 Following the reviewer's comment, we investigated the source of this prohibitive stiffness. We discovered it was not an intrinsic property of the parameter sets themselves, but rather an artifact of our simulation setup. The extreme stiffness occurred almost exclusively during the initial integration timesteps, caused by the large initial discrepancy between the concentrations of active and inactive protein forms. This large discrepancy created the conditions for overtly stiff solutions i.e. unsolvable with implemented ODE solve settings. To overcome this problem, we set a large maximum number of steps in the ODE solver for the first couple of time points, enabling the solver to overcome the excessively stiff portion of the solve. We found that the vast majority of the previously 'unsolvable' model instances could now be successfully simulated. Consequently, the number of parameter sets discarded due to solver failure is now negligible (< 1%), and this filtering step no longer accounts for the majority of discarded parameter sets. Most importantly, the distributions of dynamics were not significantly altered by this adaptation.
 
 We have revised the " Sampling and filtering of model instances (page 5: lines 174 – 189)" part in the Methods section to reflect this more accurate understanding. We have corrected our original claim regarding the biological plausibility of stiff systems and corrected our use of the references. Ref [38] was included to demonstrate that models of biological systems are stiff, which was a major conclusion of that paper, and [39] was originally included to demonstrate that solving ODEs is reliant on solvers that can integrate stiff systems. Upon review, ref [39] has been removed.
 
 Overall, this investigation has made our analysis more robust by allowing us to include a wider, more representative range of parameter sets, and has tangibly improved the quality of our study.
 
 (7) Additionally, it is important to consider the standard method for accounting for stiff systems, as presented in [39], which involves using implicit numerical methods for ODE simulation. The authors mention using numerical methods from the SUNDIALS suite, which includes implicit methods, but the specific numerical method used remains unclear. Furthermore, it would be valuable for the authors to disclose the number of parameter sets that were filtered to obtain the final set of 100,000 accepted parameter sets. This information would provide insights into the extent of filtering and the proportion of parameter sets that were excluded during the selection process.
 
 We apologise for the lack of specific detail and have now updated the text. To clarify, all ODE simulations were performed using the CVODE solver from the SUNDIALS suite. This solver employs an implicit, variable-order, variable-step Backward Differentiation Formula (BDF) method, which is robust and specifically designed for handling the stiff systems common in biological network modelling. We have now explicitly stated this in the "ODE model construction, modelling, and simulations (page 4: lines 162 – 164)" section of the Methods.
 
 Regarding the filtered parameters, we have included a revised and detailed discussion of this in the "Sampling and filtering of model instances (page 5: lines 174 – 189)" part in the Methods section (see our response to comment (6) above). Briefly, after applying the filters, ~40–45% of instances did not reach steady state within the simulation timeframe, and ~50–55% did not meet the minimum drug-response criterion. Approximately 10% satisfied all criteria and were retained for analysis. Importantly, we employed ‘rejection sampling’ and continued drawing until we had N = 100,000 accepted instances that satisfied all the criteria.
 
 (8) An important step in the simulation process described by the authors is the simulation of the "fasted" and "fed" states until an equilibrium is reached. However, it is not clear how the authors determine if the system has reached an equilibrium. It would be helpful if the authors could provide more information regarding the criteria used to assess equilibrium in the simulations. Regarding the "fed" state, it is not explicitly stated whether the mitogen stimulus is assumed to be constant throughout the "fed" experiment. Considering the dynamic nature of mitogen stimulation in biological systems, it would be beneficial if the authors could clarify this assumption and discuss its biological relevance.
 
 We apologise for the lack not specifying this in the original text. A simulation was considered to have reached equilibrium when the concentration of every protein species changed by < 1% over the final 100 time steps of the simulation phase. We have now added this criterion to the "Sampling and filtering of model instances (page 5: lines 177 – 179)" part of the Methods section.
 
 Regarding the second part of the comment, in our simulations, both the mitogenic and the drug inputs were modelled as constant, stepwise functions that, once turned on, remained at a fixed concentration for the remainder of the simulation. The biological rationale for this choice was to rigorously test for bona fide adaptive resistance. By maintaining a constant mitogenic and drug pressure, we can ensure that any observed recovery in the activity of downstream proteins is due to the internal rewiring and adaptation of the signalling network itself, rather than an artefact of the removal or decay of the external stimulus/drugs. We have now clarified this rationale in the "ODE model construction, modelling, and simulations (page 4: lines 168 – 171)" part of the Methods section.
 
 (9) The "Description of Model Scope and Construction" section in the Supplementary Information should include explicitly the model reactions and some discussion about their specific form (e.g., why is '(((kc2f1*pIR*PI3K) / (1 + (pS6K/Ki2))) + (kc2f2*pFGFR*PI3K))' representing the phosphorylation rate of PI3K, with pS6K in the denominator?).
 
 The reviewer is right to ask for model justification. We have expanded the Supplementary “Description of Model Scope and Construction” section (page 2: line 63 – page 5: line 185) to include a complete reaction list with rate laws and a brief rationale for each. We also explain the specific PI3K phosphorylation term: activation by pIR and pFGFR is attenuated by pS6K via a denominator, which captures the well-described S6K-mediated negative feedback that reduces activation (e.g., via IRS1 phosphorylation).
 
 (10) In line 349, the statement "Given that CDK46cycD is only strongly suppressed in just under 60% of the model instances (Figure 3C)" lacks clarity regarding where to look to interpret the 60% value. If this means that 4 out of the 7 model instances are resistant, and the other 2 proteins also have the same percentage of resistance, then there is no apparent reason to focus solely on CDK46cycD.
 
 The reviewer is correct; the figure reference was an error, which has been rectified in the main text (page 9: line 355). The actual figure reference was to Supplementary Figure 2A, which shows the heatmap of all the frequencies for each protein dynamics for all the active protein forms. CDK4/6cycD shows a sustained decreasing dynamic for 59.93% of model instances, which is where this number was derived. We have also now explicitly referenced this number in the supplementary Figure 2A legend.
 
 We focus on CDK4/6cycD because it is the direct pharmacological target of CDK4/6 inhibitors. Our point was to suggest that even when the target is suppressed in the majority of instances (~60%), this does not reliably propagate to uniform downstream inhibition across the network, thus highlighting emergent, network-driven adaptive responses.
 
 (11) We observed that in Fig. 5A, the authors show that multiple pathways are blocked. However, it is unclear whether they reduced the value of one parameter in the experiment or simulated multiple combinations of parameter inhibition. Considering the large number of parameters (94) in the model, if the authors simulated all possible combinations of parameter inhibition, the number of combinations would be significantly more than 94. An actual inhibitor typically has an inhibitory effect on multiple molecules. Therefore, it would be necessary to identify the parameters that lead to drug resistance when multiple molecules are inhibited. However, examining the inhibition patterns for all 94 parameters would be practically impossible. As a potential approach, we suggest using ensemble learning techniques, such as random forests, to handle this problem efficiently. With a dataset of binary outputs indicating the presence or absence of resistance for a sufficient number of inhibition patterns, ensemble learning can be applied to find the parameters that contribute to drug resistance. Popular feature selection algorithms like Boruta could be utilised to identify the most relevant parameters. The results obtained by ensemble learning are similar to the ranking in Fig. 5C, potentially providing a more robust validation of the authors' findings. By incorporating these additional analyses, the authors could strengthen the reliability and significance of their results related to parameter inhibition and drug resistance.
 
 We appreciate the suggestion and the opportunity to clarify. Figure 5A depicts multiple pathways were interrogated, but in the analysis, parameters were inhibited one at a time (OAT) - not in combination. We have revised the figure legend and added a section named “Protein knockdown perturbation analyses (page 6: lines 228 – 233)” in the Methods section to make this explicit. Moreover, some additional text in the main text has been slightly modified to make this clearer (page 11: lines 462-463, page 24: lines 856-857).
 
 We chose the OAT design intentionally to obtain causal, first-order attribution of control points across a broad parameter ensemble without confounding from simultaneous co-inhibition. This provides an interpretable ranking of primary drivers (Figure 5C) that is consistent with the paper’s mechanistic focus. We agree that a multi-target inhibition approach could be a useful next step; however, an exhaustive combinatorial screen is beyond the scope of this proof-of-concept. In such future studies, the ensemble learning, as suggested by the reviewer, could be layered onto our MDN framework to assess robustness of the ranking under co-inhibition.
 
 (12) In explaining the parameterization of the model, we find an implication of a quantitative model. However, upon examining the results in Fig. 7D, we observe that they are only qualitatively correct. When comparing Figs. 7A and 7C, we note that many model instances are immediately suppressed, and the time scale remains unknown. We believe it would be essential for the authors to explain how the model of this study maintains its quantitative nature despite the results in Fig. 7. If such an explanation cannot be provided, it raises concerns regarding the biological reliability of several findings within this study.
 
 While our framework is built on quantitative ODEs, the validation we present in Figure 7 is indeed qualitative. This is an intentional and key feature of our study's design. Our goal was not to build a calibrated, quantitative model of a specific cell line (e.g., MCF10A), but rather to establish a proof-of-concept theoretical framework that systematically explores the full spectrum of dynamic behaviours a given network topology can possibly generate. To achieve this, we intentionally sampled parameters from a very broad, unbiased range to delineate the theoretical upper limit of heterogeneity. This in silico population is therefore designed to be far more heterogeneous than any single isogenic cell line.
 
 The striking qualitative agreement seen between our meta-dynamic distributions and the single-cell data in Figure 7D is thus not a failure of quantitative prediction, but rather a strong validation of our core premise: that a significant degree of signalling heterogeneity exists in cell populations and that our framework can effectively capture its emergent properties.
 
 Regarding the specific comment on Figure 7C, we apologise for the lack of clarity. Nominally, we chose to simulate for 24 hours however, the x-axis in our simulations represents arbitrary time units, as the timescale is dependent on the meaning/units of the parameter values. The goal is to compare the qualitative shape of the response (e.g., rebound, sustained decrease), not the absolute time in hours. Moreover the rapid initial suppression seen in many of our model instances (Fig 7C) is a direct parallel to the rapid suppression seen in the experimental data (Fig 7A). This initial phase is followed by a wide variety of adaptive behaviours (or lack thereof) in both our simulations and the real cells, which is the key phenomenon we are studying.
 
 We have revised the text (page 14: lines 598-601) and Figure 7’s legend to state more explicitly that our validation is qualitative and to clarify the purpose of our broad, uncalibrated approach. We have also added a note in the Discussion (page 18: lines 744-747) that calibrating this framework with cell-line-specific data is a natural next step for generating quantitative, context-specific predictions.
 
 (13) Related to the previous point, the experimental data is presented as fold-change during CDK4/6 inhibition, and we notice that the initial fold-change at time 0 varies between 1 and 1.8. The difference in initial fold-change is unclear to us, as our understanding of fold-change typically corresponds to the change from baseline, typically represented by the protein concentration at time 0.
 
 Furthermore, while the experimental data exhibits uniformly decreasing CDK4/6 activity, a substantial number of simulations indicate constant CDK4/6cycD, showing a significant qualitative discrepancy between the simulations and experimental findings. This disparity makes it difficult for us to interpret the comparison between the two datasets effectively, given the complexities in comprehending the experimental fold-change figure.
 
 As Figure 7 serves as the primary validation of model simulations in the manuscript, we believe that the current presentation may not provide a compelling reason to believe that the model accurately captures experimental data. To enhance clarity and validation, we suggest overlaying the experimental data over the simulations or considering the median and 10/90% percentile of the experimental data, which may potentially offer improved readability and facilitate a more robust interpretation of the comparison.
 
 The experimental data from Yang et al. (ref 55, main text) measures kinase activity using a nucleus-to-cytoplasm translocation reporter system, wherein a bait protein is phosphorylated by the target kinase causing it to translocate from the nucleus to the cytoplasm. Hence, the y-axis represents the ratio of nuclear vs. cytoplasmic fluorescence, not a fold-change from a t=0 baseline. The variation in the starting value (between 1 and 1.8) reflects the inherent heterogeneity in the reporter's localization across individual cells even before the drug is added. We have updated the y-axis label and revised Fig. 7’s legend to state this explicitly.
 
 The most likely explanation for the discrepancy between experimental dynamics and our simulation dynamics is that the experimental data comes from an isogenic cell line that is largely sensitive to CDK4/6 inhibition. Our simulations are derived from a very wide parameter sweep, where the intent is to represent all possible cell states. It is quite striking that that there is such a high correlation between the experimental data and simulations, indicating that perhaps the heterogeneity of even isogenic cell lines is significantly greater than might be intuited; a point we now mention in the revised Discussion (page 17: lines 716-727).
 
 It is worth noting again, that our analysis is intentionally constructed to be as heterogeneous as possible, and is not trained on any biological data that might otherwise constrain the output-behaviour space. The isogenic cell line almost certainly represents a much more constrained output-behaviour space than our analysis.
 
 The y-axis label has also been updated accordingly. As mentioned in (12) this result is intended as a qualitative validation, showing that cell lines indeed have highly variable signalling dynamics. Given the range of parameters tested, we think it is surprising that the degree of agreement between the experiment and our analysis is as high as it is. Again, we believe this suggests that heterogeneity may be more prevalent than is intuited. We do not believe we have made any strong quantitative claims in the main text, and we certainly aim to work towards biological, quantitative validation in the future. Finally, we altered the wording of the results heading (page 14: line 562) to make it clear that we are only making qualitative claims and removed the claim that the evidence was strong.
 
 With these clarifications and corrections, we believe the validation is now much more compelling. The key point is not a perfect quantitative match, but the strong similarity in the distribution of heterogeneous behaviours.
 
 (14) The authors mention simulating treatment with 10nM of CDK4/6i or Ei, but specific details on how this treatment is included in the model simulations are not provided. This lack of information makes it challenging to fully evaluate the comparison between model simulations and experimental evidence in Figure 7. It would be highly appreciated if the authors could clarify how the treatment with CDK4/6i or Ei is incorporated into the simulations to facilitate a better understanding and interpretation of the results.
 
 To clarify, the effects of the inhibitors were incorporated directly into the kinetic rate laws of their respective target reactions.
 
 CDK4/6 inhibitor (CDK4/6i): This was modelled as an inhibitor of the formation of the active CDK4/6-cyclin D complex. We have now explicitly detailed this in the description for reaction R27 in the "Description of Model Scope and Construction" section of the Supplementary Information.
 
 Estrogen Receptor inhibitor (Ei): This was modelled as an inhibitor of the estrogen-dependent activation of the Estrogen Receptor. This is now explicitly detailed in the description for reaction R15 in the same supplementary section.
 
 It is however important to reiterate that our goal in Figure 7 is qualitative, shape-based comparison; therefore, we used a fixed fractional inhibition (reported in Methods) rather than a calibrated IC50/Hill model.
 
 (15) The authors state strong support for their modelling conclusions based on the literature. However, we still have concerns regarding the validation of the model against CDK2 or CDK4/6 data in Figure 7, as it appears less convincing to us. Furthermore, the authors list known resistance mechanisms that are replicated in their modelling. Nevertheless, we find the conclusion somewhat weakened by Figure S10, where approximately 80% of the nodes are implicated in some form of resistance pathway. This raises questions about the model's selectivity, as many proteins included in the model seem to drive resistance in some manner. In the Supplementary Information, the authors mention excluding or abstracting some protein species from the mitogenic and cell cycle pathways to manage computational resources effectively. This abstraction makes it difficult to determine if the proteins identified as potential drivers of resistance genuinely drive resistance or might represent abstractions of other potential drivers. To enhance the manuscript's clarity and address potential concerns about the model's selectivity and abstraction, we suggest providing more details and discussion in the main text.
 
 The reviewer's observation that a large number of nodes are implicated in resistance pathways in Figure S10 is correct. However, we argue this is not a weakness of the model's selectivity, but rather a key finding that reflects the biological reality of adaptive resistance. The literature is replete with a wide and growing number of distinct mechanisms of resistance even to a single class of drugs (1,2), which supports the idea that cancer can co-opt a wide variety of network nodes to survive.
 
 Figure S10 is not a binary map where every implicated node is equal, instead it is a likelihood map, where the colour and weight of the connections represent how often a particular interaction participates in driving resistance across the theoretical full range of possible network dynamics. The figure shows that while many nodes can contribute to resistance, they do so in a hub-like manner i.e. small subsets of nodes coordinate to drive resistance. This provides a rationalised, data-driven prioritisation of the most dominant and recurrent resistance strategies. We draw two important conclusions from this work 1) Resistance likely occurs due to resistance hubs, not individual proteins, and 2) that the frequency of a resistance hub in an MDN analysis is likely proportional to the frequency of that hub emerging as a resistance mechanism in a population of cells and patients.
 
 Regarding the issue of abstraction, the reviewer is correct that this is an inherent feature of any tractable systems model. In our case, several species in the mitogenic/cell-cycle pathways are module-level proxies to control model size. The highly implicated "hub" nodes in our model likely represent critical cellular processes that are themselves composed of several individual protein interactions.
 
 To address these concerns, we have significantly revised the Discussion (page 16: lines 681 – 694) to: (1) frame resistance as a network-level phenomenon; (2) show that our frequency-based ranking is selective, prioritising the most probable, recurrent mechanisms; and (3) clarify that - given model abstraction -our findings implicate critical processes (modules), not just single proteins, as the drivers.
 
 Overall, these changes do not alter our main conclusions: adaptive resistance is an emergent, network-level property; many routes exist, but a smaller set of nodes/modules consistently carry the largest influence across heterogeneous contexts.
 
 (16) We consider that the figures and legends, including the supplementary information, are inadequately explained. The information provided is insufficient for us to comprehend the figures fully, leading to the need for interpretation on our part as readers. This could potentially introduce biases when trying to understand the claims made by the authors. To improve our understanding, it would be essential for the authors to assign appropriate labels to the figures and provide comprehensive explanations in the legends. For example, in Fig 3, we suggest labelling the tree diagrams in panels A and B, as well as the colour bars. We also recommend applying the same approach to other figures, adding accurate axis labels and descriptions of colour gradients to enhance clarity.
 
 We thank the reviewer for this critical feedback. To address this comment, the figure legends have been revised where appropriate and greatly expanded to improve their comprehension. Moreover, we have added explicit labels to all previously unlabelled components, such as the cluster dendrograms and colour code bars in Figure 3A, B.
 
 (17) To enhance readability, we recommend interchanging the order of Figures 1 and 2 in the sequence they appear in the main text. Alternatively, the text can be adjusted to refer to the figures in the correct order. Additionally, attention should be given to the bottom of Fig 1, which appears to be cropped or cut off. Furthermore, the incorrect word spacing in some figure elements, such as Fig. 3A title, Fig. 5B title, and Fig. 6B y-label, should be corrected for improved visual presentation.
 
 Following the reviewer’s comment, the order of Figures 1 and 2 has been switched to reflect the order in which they are referred to in the main text. These Figures have been re-exported to fix unintentional word spacing errors.
 
 (18) We recommend that the language used to refer to the initial conditions in the manuscript is clarified and homogenised. Currently, the authors use different terms such as "basal expression," "protein expression," "state variable values," or "initial conditions" to refer to them. This variation in terminology can be confusing for readers. In particular, the use of "basal expression" is problematic, as it typically refers to the leaky value of a reaction in the absence of an inducer, making it another biophysical parameter of the system rather than an initial condition. To enhance clarity and consistency, we suggest the authors decide on a single term to refer to the initial conditions throughout the manuscript and provide a clear explanation of its meaning to avoid any confusion. This will help readers better understand the concept being discussed and prevent any potential misinterpretations.
 
 We thank the reviewer for this very helpful suggestion. To resolve this and improve clarity, we have homogenized the language throughout the manuscript. We now clarify the use the following 3 terms in their specific contexts:
 
 We use “protein abundances” exclusively for the conserved total abundances of multi-state species (e.g., Xtot = X + pX + complexes) that are sampled across instances to represent expression heterogeneity.
 
 We use ‘initial conditions’ to refer to initial values of the state variables in a model simulation. This term is related to protein abundance as the setting of initial conditions for conserved species sets the protein abundance. This is explicitly stated in the text (page 3: lines 87 - 91).
 
 We use “state variables” to refer to the time-dependent model species.
 
 We avoid the term “basal expression” in technical descriptions. Where a biology-facing phrase is helpful, we use “protein expression level”. This is used when referring to the biological concept that the initial conditions are intended to represent, i.e. the heterogeneity in protein amounts across a cell population.
 
 We have performed a thorough search-and-replace to ensure this new convention is applied consistently and have removed the potentially confusing term "basal expression" from the revised manuscript.
 
 (19) Why are saturable functions (e.g., Michaelis-Menten functions) ignored in the model? What are the potential consequences?
 
 The main objective of this work was to perform a large-scale, systematic exploration of a high-dimensional parameter space (94 parameters) to map the full repertoire of qualitative dynamic behaviours a network topology can support. Using saturable functions like Michaelis-Menten kinetics would have roughly doubled the number of parameters to be explored (from k to Vmax and Km for each enzymatic reaction), making a parameter sweep of this scale computationally intractable. We therefore prioritised the breadth of the parameter search over the depth of kinetic detail, which we believe is the appropriate choice for a proof-of-concept study focused on heterogeneity.
 
 This simplification has potential consequences. A major one is that our model cannot capture phenomena that arise specifically from enzyme saturation, such as zero-order kinetics or certain forms of ultrasensitivity (switch-like responses). However, we argue that this is an acceptable trade-off for two main reasons: (1) Our analysis is based on classifying broad, qualitative response shapes (increasing, decreasing, rebound, etc.). Mass-action kinetics are fully capable of generating this rich spectrum of behaviours; and (2) by varying the mass-action rate constants over nine orders of magnitude (from 10-5 to 10<sup4>), our parameter sweep effectively samples a vast range of reaction efficiencies. A very low rate-constant can approximate the behaviour of a saturated, low-efficiency enzyme, while a high rate-constant can approximate a highly efficient, non-saturated one. In this way, the broad sweep of the rate parameter partially reflects the effects that would be captured by varying Vmax and Km.
 
 For transparency, we have added a brief rationale to the “ODE model construction, modelling, and simulations” part of the Methods (revised main text, page 4: lines 153-155) and the "Description of Model Scope and Construction" section in the Supplementary file (Supplementary text page 2: lines 63-73).
 
 (20) Given the relevance of the concept of "heterogeneity" in this work, a short discussion about biochemical noise and its implications on the analysis (e.g., why it is not included, and if it will be a next step) would be appreciated.
 
 Our MDN modelling framework represents heterogeneity by creating an ensemble of deterministic models, where each model instance has a unique set of kinetic parameters and/or initial protein abundances. We propose that this is a powerful way to mechanistically represent the functional consequences of all sources of cellular variation. Over time, the effects of genetic mutations, epigenetic states, and even the time-averaged impact of intrinsic biochemical noise will manifest as changes in the effective interaction strengths and protein concentrations within a cell. Our large-scale parameter/IC sweep is designed to systematically explore the full range of dynamic behaviours that can emerge from this underlying biological variation. Therefore, our approach does not compete with stochastic modelling but is complementary to it. While stochastic simulations can capture the dynamic trajectories of single cells, our framework provides a panoramic view of the entire spectrum of possible stable phenotypes that can emerge at the population level. We agree that modelling intrinsic biochemical noise (stochasticity arising from finite copy numbers), e.g. using chemical Langevin or SSA, is a possible extension in future work but expected to be very computationally expensive. We have added a brief discussion on this as future direction in the revised Discussion.
 
 (21) We have noticed that the first four paragraphs of the Discussion section overlap with the Introduction, as they mainly reiterate the significance of the study itself rather than focusing on the specific results obtained. To avoid redundancy and provide a more cohesive and informative discussion, we recommend that the authors shift the focus of the Discussion section towards presenting potential interpretations, even if they are not definitive, of the results obtained. By doing so, the Discussion will serve as a valuable platform for deeper analysis and insightful observations, allowing readers to better comprehend the implications and significance of the research findings.
 
 We thank the reviewer for this structural feedback. Following the reviewer's feedback, we have significantly rewritten and restructured the Discussion section. The redundant introductory material has been removed.
 
 The rewritten Discussion centres on interpretation, implications, and connect our findings to the literature. It now: (i) frames MDN as a systems-level framework that links molecular heterogeneity to qualitative signalling “meta-dynamics” and adaptive escape under constant drug pressure; (ii) highlights two key findings: an asymmetry in control (interaction kinetics exert stronger, more consistent influence than protein abundance) and a topology-driven convergence whereby a vast parameter space funnels into a finite set of recurrent behaviours; (iii) shows that resistance is a network-level property, with many possible routes but a small set of recurrent hubs/modules dominating; and (iv) provides a qualitative alignment with single-cell reporter data while clarifying the intent and limits of that comparison. Moreover, we now explicitly discuss limitations (rate-law simplifications, broad priors, determinism, and modular abstractions) and outline next steps for future research, including data-constrained priors and stochastic extensions.
 
 We believe this substantial revision has transformed the Discussion into a much more insightful and valuable part of the manuscript that directly addresses the reviewer's concerns.
 
 (22) The supplemental text file containing the model equations can be a bit challenging to read and understand. It would be greatly beneficial if the authors could consider generating a file using a typesetting program.
 
 We have now included a typeset list of state variable equations and ODEs, along with the original model files.
 
 (23) The authors mentioned that some model parameterizations result in negative solutions, which is surprising. Access to the model equations would help understand why this happens and is crucial for researchers who may want to use this approach. Clarifying the model equations' presentation would enhance transparency and aid other researchers in applying this method for similar research questions.ach. Clarifying the model equations' presentation would enhance transparency and aid other researchers in applying this method for similar research questions.
 
 The reviewer is correct to be surprised by the mention of negative solutions, as negative concentrations are physically impossible. We clarify that these are not a result of any structural flaw in our model's equations but are a well-known, although rare, numerical artifact of floating-point arithmetic in computational solvers.
 
 Our model is constructed using standard mass-action and first-order kinetics, which structurally guarantee non-negativity. However, when a species' concentration approaches the limits of machine precision (i.e., becomes a very small number extremely close to zero), the ODE solver can, in rare instances, numerically undershoot zero, resulting in a small negative value. If this occurs, it can lead to instability in subsequent integration steps.
 
 This is not a biological phenomenon but a computational one. Therefore, the standard and appropriate procedure, which we follow, is to implement a filter that discards any simulation trajectory where such a numerical instability occurs.
 
 (24) The reference listed for the CDK4/6 and CDK2 measurements is Yang et al. [55] in the figure caption, but as Xe et al. in lines 559-561 of the manuscript.
 
 The text has been updated to match citation.
 
 (25) We suggest that the authors revise and cite a previous study conducted by Yamada et al. (Scientific Reports, 2018), which presents an approach to expressing cell heterogeneity as a probability distribution of model parameters.
 
 Following this suggestion, we have revised the Discussion (see response to comment (21)) to include and discuss Yamada et al. (Scientific Reports, 2018), which models cell heterogeneity as a probability distribution over parameter values.
 
 (26) In the manuscript, on line 677, the authors state, "This indicates that there is an upper limit to the degree to which parameter sets can influence the qualitative shape of a protein's dynamic within a given network topology." We wish to highlight that this finding may not be particularly surprising. Given that the parameters were randomly determined within a specific range, it is understandable that altering the number of parameter samples would not substantially impact the distribution of model instances.
 
 We thank the reviewer for this insightful comment, which allows us to clarify the significance of this finding. While it is true that any sampling from a fixed distribution will eventually converge statistically, our conclusion is not about statistics but about the intrinsic, constraining properties of the network's topology. The novelty is not that the distribution converges, but that it converges to a surprisingly limited and finite repertoire of qualitative dynamic behaviours. A complex, non-linear network with nearly 100 free parameters could theoretically generate an almost endless variety of complex dynamics. Our finding is that this specific biological topology acts as a powerful filter, robustly channelling the vast majority of the near-infinite parameter combinations into a small, recurring set of functional outputs (increasing, decreasing, rebound, etc.).
 
 The reason for this finite limit is mechanistic, as the reviewer's comment prompted us to investigate further. Our parameter sweep already covers an extremely wide, 9-order-of-magnitude range. As we pushed parameter values to even greater extremes in exploratory simulations, we found they do not generate novel, complex dynamic shapes. Instead, they tend to drive network nodes into saturated states- either permanently "on" (maximally activated) or permanently "off" (minimally activated). In both cases, the node becomes unresponsive to upstream perturbations.
 
 Therefore, further expanding the parameter range would be unlikely to uncover new behavioural categories; it would simply increase the proportion of model instances classified as "no-response." This demonstrates a fundamental principle: the network topology itself enforces an upper limit on its dynamic complexity. We think this inherent robustness is what allows for reliable cellular signalling in the face of constant biological variation. We believe this is a non-trivial finding, and we have revised the Discussion (page 16: lines 664 - 680) to state this conclusion and its implications more clearly.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.01.24.525460v2
www.biorxiv.org www.biorxiv.org

Specific Sensitivity to Rare and Extreme Events: Quasi-Complete Black Swan Avoidance vs Partial Jackpot Seeking in Rat Decision-Making

3
1. Public_Reviews 02 Jun 2026
 
 in eLife
 
 eLife Assessment
 
 This study represents an important contribution to the study of decision-making under risk, bringing an interdisciplinary approach spanning economic theory, behavioral neuroscience, and computational modeling to test how choice preference is influenced by rare and extreme events. The authors aim to test whether rats are indeed sensitive to these rare and extreme events despite their infrequent occurrence, and to isolate behavioral evidence for avoidance of "Black Swans" - rare and extreme losses. The evidence for specific sensitivity to rare and extreme events however remains incomplete, owing in part to the difficulty of isolating the effect of these events beyond that arising from risk preferences more generally in both task design and in the computational modeling of the choice behavior. Despite this, and given the approach here brings a relatively novel and highly interdisciplinary perspective, this paper will be of broad interest to those seeking to understand animal behavior through the lens of economic choice and decision theory.
 
 Summary
2. Public_Reviews 02 Jun 2026
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 This paper attempts to examine how rare, extreme events impact decision-making in rats. The paper used an extensive behavioural study with rats to evaluate how the probability and magnitude of outcomes impact preference. The paper, however, provides limited evidence for the conclusions because the design did not allow for the isolation of the rare, extreme events in choice. There are many confounding factors, including the outcome variance and presence of less-rare, and less-extreme outcome in the same conditions.
 
 Strengths.
 
 (1) The major strength of the paper is the significant volume of behavioural data with a reasonable sample size of 20 rats.
 
 (2) The paper attempts to examine losses with rats (a notoriously tricky problem with non-human animals) by substituting time-outs as a proxy for losses. This allows for mixed gambles that have both gain and loss possible outcomes.
 
 (3) The paper integrates both a behavioural and a modelling approach to get at the factors that drive decision-making.
 
 (4) The paper takes seriously the question of what it means for an event to be rare, pushing to less frequent outcomes than usually used with non-human animals.
 
 Weaknesses:
 
 (1) The primary issue with this work is that the primary experimental manipulation fails to isolate the rare, extreme events in choice. As I understand the task, in all the conditions with a rare extreme event (e.g., 80 pellets with probability epsilon), there is also a less-rare, less-extreme event (e.g., 12 pellets with probability 5). In addition, the variance differs between the two conditions. So, any impact attributable to the rare, extreme event could be due to the less rare event or due difference in the variance (or other statistical moments, like skew or kurtosis). That the distributions can be shown to be different under specific assumption to value maximizing agents (e.g., with Jensen Gaps and Table 2) is not really relevant to what rats are sensitive and what drive their behaviour. The design here does not support the conclusions. Finally, by deliberately confounding rarity and extremity, the design does not allow for assessing the impact of either aspect on rat behaviour.
 
 (2) The RL modelling work also fails to show a specific impact of the rare extreme event. As best as I can understand Eq 2, the model provides a free parameter that adds a bonus to the value of either the two options with high-variance gains (A and V in the paper) or to the two options with high-variance losses (F and V in the paper). Or equivalently to the ones with "Jackpots" vs the ones with "Black Swans" (see Point 1 above as to how these different aspects are all confounded in this design). This parameter seems to only depends on whether this option could have possibly yielded the rare, extreme outcome (i.e., based on the generative probability) and was not connected to its actual appearance. [This point is unclear as the text says this, but the rebuttal states otherwise; plus some options never received the REE, see Table S11]. That makes it a free parameter that just bumps up (or down) the probability of selecting a pair of options. That may be due to presence of the REE or the other rare event or just the variance difference. Moreover, in the case of the "black swan" or high-variance loss conditions, this seems very much like a loss aversion parameter, but an additive one instead of a multiplicative one. Is there a theoretical claim here that "extreme losses" need an additive loss-aversion parameter?
 
 (3) The paper presented the methods and results with lots of neologisms and fairly obscure jargon (e.g., fragility, total REE sensitivity). That might it very hard to decipher exactly what was done and what was found. For example, on p. 4, the use of concave and convex was very hard to decipher; the text even has to repeat itself 3 times (i.e., "to repeat" and "in other words") and is still not clear. It would be much clearer (and probably accurate) to say that the options varied along the variance dimension, separately for gains and losses. Option A was low-variance gains and losses. Option B was low-variance losses and high-variance gains. Option C was high-variance losses and low-variance gains, and Option D was high-variance losses gains. That tells much more clearly what the animals experienced without the reader having to master a set of new terminologies around fragility and robustness, which brings a set of theoretical assumption unnecessarily into the description of the experimental design. Alternatively, if the authors are wary of using the term "variance" because other moments of the distribution also differ, they could use "high-value gains" or "high-value losses" or something else which does not obscure the experimental design with jargon. Again, this goes back to point 1 above, whereby the different options differ on so many dimensions (as is made even more apparent in the rebuttal) that the design cannot isolate the impact of the variables of interest.
 
 (4) Were the probabilities shuffled or truly random (seem to be fixed sequences, so neither)? What were the experienced probabilities? Given the fixed sequences, these experienced ("ex-post") probabilities, could differ tremendously from the scheduled ("ex ante") probabilities. It's quite possible than an animal never experienced the rare, extreme event for a specific option. From Table S11, that is guaranteed to have happened in that 4 animals only ever experienced the "black swan" outcome once. It's even possible (if they only picked a specific option on the 10th/60th choices by chance), that they only ever experienced that rare extreme event. This point still cannot be known given the information provided, which does not break down outcomes by options. The Supplemental in Table S11 only gives overall numbers but does not indicate what the rats experienced for each choice/option-which is what matters here. A simple table that indicates for each of the 4 options, how often they were selected, and how often the animals experienced each of the 6-8 possible outcome would make it much clearer how closely the experience matched the planned outcomes. In addition, by restricting the rare outcome to either the 10th or 60th activations in a session, these are not random. Did the animals learn this association? The text states that they did not, but no evidence is provided.
 
 (5) The choice data are generally presented in an overprocessed fashion with a sum and a difference (in both figures and tables). The basic datum (probability/frequency of selecting each of the 4 options) is not provided directly in the main text, even if it can theoretically be inferred from the sum and the difference. New right side of Table S4 is probably the most valuable piece in terms of explaining what rats did and should be highlighted a lot more. Inspection of that table reveals some interesting (and potentially worrying) results. Most notably, the vast majority of responding happens on the "anti-fragile" and "robust" option, often totalling around 90% of all selections, especially amongst the most common blue rats. Alas, those were all those the two options that were deliberately assigned to the two most preferred holes in the training phase (see p. 26). Does this reflect genuine preference for reward distributions or does this reflect a spatial hole bias? The assignment strategy makes this impossible to tell apart.
 
 (6) There is insufficient detail provided on the inferential statistical tests (e.g., no degrees of freedom or effect sizes), and only limited information on exactly what tests were run and how (bootstrapping, but little detail). Without code or data (only summary information is provided in the supplement), this is difficult to evaluate. In addition, the studies seem not to pre-registered in any way, leaving many research degrees of freedom. Not all studies need to be pre-registered and sometimes discovery of new things requires exploratory work, but preregistration does provide additional safeguards against overemphasizing post-hoc detected patterns-a serious issue in behavioural science. Moreover, this promotes transparency in reporting results and analyses, allowing for a better assessment of the strength of evidence for a claim. For example, here, were any alternative analysis pipelines attempted? Also, there were many sub-groupings of the animals and subsequent comparisons between them which all seemed post-hoc. On what grounds were these divisions made-were other divisions examined as well?
 
 (7) On p. 12 (Fig 4), there is an attempt to look at the impact of a rare, extreme event by plotting a measure of preference for the 10 trials before/after the rare, extreme event. In the human literature, the main impact of experiencing a rare, extreme event is what is known as the wavy recency effect (See Plonsky et al. 2015 in Psych Review for example, now cited). What this means is that there tends to there tends to be some immediate negative recency (e.g., avoiding a rare gain) followed by positive recency (e.g., chasing the rare gain). Typically, this refers to the specific option that yielded that outcome. First, as the other analyses do, the current analysis combines choice of the option that yielded the rare outcome with choice of other options, so that cannot directly assess the impact of the rare, extreme event on choice. Also, using a 10-trial window would thus obscure any impact of this rare, extreme event. There is mention of the very next trial, but an analysis that looks at the 10-trial time course trial-by-trial could reveal any impact that might be predicted from the human literature.
 
 (8) As I understood the method (p. 31), the assignment of options to physical locations was not random or counterbalanced, but deliberately biased to have one of the options in the preferred location. This would seem to create a bias towards a particular option and a bias away from the other options, which confounds the preference data in subsequent analyses. Table S4 reinforces this concern where the vast majority of response are clustered in the two most preferred options from training.
 
 (9) Are delays really losses? This is a big assumption. Magnitude and delay are different aspects of experience, which are not necessarily commensurable and can be manipulated independently. And, for the model, how were these delays transformed into outcomes for the model. Eq 1 skips over that. Is there an assumption of linearity? In addition, I was not wholly clear if the delays meant fewer trials in a session or if the delays merely extended the session and meant longer delays until the next choice period.
 
 Other points:
 
 (1) I think the authors still misunderstand the concept of "hot-stove effects". The idea is that the experience of a very bad outcome can lead to avoiding the situation again (i.e., not sampling that option) and can provide the appearance of oversensitivity to that bad outcome. Here, that might be more thought as "black-swan avoidance". Imagine if, to the rat, all options are equal in value, then some initial bad luck in encountering the black swan might make the animal avoid that option, even though with enough experience, then it would have been equal in value.
 
 (2) I am still not convinced that the Jensen inequalities add to this paper in terms of understanding the rat behaviour. That may be more suited for a different paper about the statistical and mathematical properties of certain generative distributions, but not here given what rats actually choose and experience.
 
 (3) Providing the data open access is very good. The code, however, should be equally available and not just upon request. Code needs to be available for assessment during peer review and for reproducibility checks. There are substantial enough problems with reproducibility in the field that code availability should be a minimum criterion for publication (see Miske et al., 2026 in Nature for the most recent large-scale evaluation of this problem).
 
 (4) The paper still somewhat mischaracterizes the literature on rare events, posing it as a series of "exceptions", rather than recognizing that a huge chunk of the literature uses rare events rarer than 10%. Also, there is even existing terminology in that literature for exactly the situation that is being created here-rare treasures (aka jackpots here) and rare disasters (aka Black Swans here).
 
 (5) Defining the observed behaviour in terms convexity, instead of stating choices more plainly obscures what is done/found. This is especially the case here because convex and concave mean different things when applied to gains/losses in terms of whether or not that option can lead to the REE. The use of the terms obscures rather than clarifies and probably is best left for the discussion (and maybe the intro) when mapping from theoretical distributions to the experiment at hand. In the paper, even the bottom of p.5 seems to incorrectly define "Total Sensitivity" as the combined proportion of selecting convex options in either domain, which does not map how convex is defined in Fig 1B or elsewhere in the text.
 
 (6). Fig 1C is baffling. Why are probabilities drawn moving away from the origin? The standard scientific plotting convention is for numbers to grow when moving away from the origin. That would be vastly clearer. Also, the color coding is confusing. Green-red maps onto convex-concave, but that would naturally seem to indicate gains vs losses, not convex vs concave. And why are probabilities growing larger in both directions from the origin? Much more sensible to communicate the procedure would likely be a standard plot of magnitude vs probability.
 
 (7) Discussion: I think the main difference between the human situations discussed and this experiment is that humans have not experienced those rare "black swan" outcomes. Rather, they hear about the disasters that are possible and do not incorporate that information, as discussed in the description-experience literature already cited in this paper (though not in that context).
 
 Review 1
3. Public_Reviews 02 Jun 2026
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 Public Reviews:
 
 Reviewer #1 (Public Review):
 
 Summary:
 
 In this manuscript, the authors investigate the impact of rare and extreme events on rodents' decisionmaking under risk, in gain and loss contexts. They describe the behavior of 20 rats performing a four-armed bandit task, where probabilistic gains (sugar pellets) and losses (time-out punishments) can - in some arms - incorporate extremely large - but rare - outcomes. They report that most rats are sensitive to rare and extreme outcomes despite their infrequent occurrence, and that this sensitivity is primarily driven by extreme loss events which they try to avoid, rather than extreme gains that they seek to obtain.
 
 They finally propose a modification of standard reinforcement-learning, which features a specific sensitivity to rare and extreme outcomes and can account for the observed behavior.
 
 Strengths:
 
 The manuscript really taps into a surprisingly neglected but very relevant aspect of decision-making: the effect of rare and extreme events (REE). The authors have developed an experimental setup that seemingly allows investigation of this aspect, which is not trivial given the idiosyncratic properties of rare and extreme events.
 
 The parameters of the experimental setup seem also to be well thought off: basically, in the absence of REE, some options are objectively better than others (because, in expectation, they overall deliver more food, or minimize time-out punishments), but this ordering reverses if REE are taken into account. This allows for a clean test of the integration of REE in the rodent's decision-making model.
 
 The data is presented and analyzed in a very descriptive but exhaustive and transparent way, down to the description of individual rodent's behavior.
 
 Weaknesses:
 
 While the description and analyses of the behavioral patterns are rigorously done under the economic lens of risky decision-making, the authors' interpretation heavily relies on the assumption that rodents have built the correct model of the task during the training. Extensive details are provided about the training procedure, and the observed behavior at the end of the training, but it remains virtually impossible to disambiguate choices due to imperfect learning to choices made due to intrinsic preferences for risk or REE.
 
 As detailed in Material and Methods, the animals were progressively overtrained following standard behavioral procedures. During this process, they experienced all available options, including both positive and negative REE. We assume that repeated exposure to these REE supported learning, as would be expected for any event occurring throughout such an extended training phase. The rats ultimately displayed an asymmetric pattern of choices: they consistently avoided the Black Swan, indicating that they had learned its negative consequences, yet they did not systematically seek the Jackpot. If their behavior were driven solely by incomplete learning or by an inherent preference for risk or REE, we would expect to see the opposite pattern systematic Jackpot seeking or inconsistent avoidance of the Black Swan.
 
 By nature, gains (food pellets) and losses (time-out punishments) are somewhat incommensurable so the interpretation of the asymmetry due to outcome valence is also subject to interpretation. There might be some additional subtleties due e.g. satiety that could come from gaining REE (i.e. the delivery of 80 pellets from the Jackpot).
 
 As described in Material and Methods, we used mouse pellets (20 mg) instead of rat pellets (45 mg) to prevent satiety during Jackpot delivery (80 pellets). We also selected gains (sweet pellets) and losses (delays) that we have successfully used in previous rat decision-making paradigms, such as the rat gambling task (Adams et al., 2017; doi: 10.1523/ENEURO.0094-17) and the loss-chasing task (Breysse et al., 2021; doi: 10.1111/ejn.14895). Notably, if the Jackpot induced satiety, one would expect animals to stop seeking it yet this was not systematically observed. Nonetheless, we added a sentence to the Discussion on page 18 of the manuscript to acknowledge that we cannot fully exclude the possibility that satiety contributed to the lack of systematic Jackpot Seeking.
 
 In its current form, the paper is quite hard to digest. This is naturally the case with interdisciplinary work (here mixing economists and neurobiologists). But I am afraid that with the current frame, the paper is going to miss its target, in terms of audience.
 
 We have rewritten entirely and the english was corrected thanks to ChatGPT. We hope that the paper is now easier to digest.
 
 The proposed model seems somewhat disconnected from the behavioral patterns: while the model suggests an effect of REE at the decision stage (i.e. with specific decision weights for those rare events), this formalism seems at odds with the observation that REE (notably in the loss domain) has an impact of subsequent behavior - (Black Swans tend to reinforce Total Sensitivity to REE) which rather suggests an effect at the learning stage.
 
 We agree with the referee that this may appear surprising at first glance. However, we would first like to emphasize that the general model allows REE to influence learning—that is, to contribute to the updating of the Q-subvalues. Moreover, even when REE are incorporated only as decision weights, as is the case for most rats, this does not imply that REE are unimportant during learning. In fact, the model assumes that REE are learned once and for all when they first occur during a trial of the corresponding option. Unreported simulation exercises indicate that a more gradual learning of maximal and minimal values would likely yield similar results.
 
 Second, the Before/After analysis shows that the behavioral response to Black Swans is locally small in terms of both total and one-sided sensitivities. This suggests that such effects are likely too subtle to be captured by this class of models for most rats. We have added this clarification to the revised version (page 17).
 
 Discussion:
 
 This study convincingly demonstrates that REEs are processed rather uniquely, which makes sense given their evolutionary relevance. REE has indeed been somewhat neglected in previous research, and this study therefore opens an interesting new front on the fundamental aspects of decision under risk. The authors have devised an original theoretical and empirical framework that will be useful for the community, and the combination of economics analysis and rodent behavior constitutes a thoughtprovoking ground to think about the nature of risk preferences. The interpretation and mechanistic account of these aspects, as well as their generalizability outside the specific context of this study, remain to be strengthened.
 
 We have modified the discussion to further insist on the translational aspect of the study and its interest for various populations (page 22). We hope that the generalizability is now strengthened.
 
 Reviewer #2 (Public Review):
 
 Summary:
 
 This paper attempts to examine how rare, extreme events impact decision-making in rats. The paper used an extensive behavioural study with rats to evaluate how the probability and magnitude of outcomes impact preference. The paper, however, provides limited evidence for the conclusions because the design did not allow for the isolation of the rare, extreme events in choice. There are many confounding factors, including the outcome variance and presence of less-rare, and less-extreme outcomes in the same conditions.
 
 Strengths:
 
 (1) The major strength of the paper is the significant volume of behavioural data with a reasonable sample size of 20 rats.
 
 (2) The paper attempts to examine losses with rats (a notoriously tricky problem with non-human animals) by substituting time-outs as a proxy for losses. This allows for mixed gambles that have both gain and loss possible outcomes.
 
 (3) The paper integrates both a behavioural and a modelling approach to get at the factors that drive decision-making.
 
 (4) The paper takes seriously the question of what it means for an event to be rare, pushing to less frequent outcomes than usually used with non-human animals.
 
 Weaknesses:
 
 (1) The primary issue with this work is that the primary experimental manipulation fails to isolate the rare, extreme events in choice. As I understand the task, in all the conditions with a rare extreme event (e.g., 80 pellets with probability epsilon), there is also a less-rare, less-extreme event (e.g., 12 pellets with probability 5). In addition, the variance differs between the two conditions. So, any impact attributable to the rare, extreme event could be due to the less rare event or due difference in the variance. The design does not support the conclusions. Finally, by deliberately confounding rarity and extremity, the design does not allow for assessing the impact of either aspect.
 
 We agree with the referee that both the REE and the rare (≈10% frequency) but non-extreme outcomes are present in the relevant options. However, the rare but non-extreme reward is not large enough to make the convex option attractive and to shift choice away from the concave option. In other words, unlike REE, these outcomes do not reverse stochastic dominance in our design (as noted in Material and Methods). We have explored modified designs for human subjects in which the rare but non-extreme outcomes are removed. Preliminary results indicate that the behavioral phenotypes observed in rats also emerge in humans under these modified conditions, suggesting that REE are the primary drivers. We have added a statement to the Discussion (page 22) to clarify this point.
 
 We elaborate further in our response to point (3) below on why analyses based solely on variance are insufficient when dealing with REE. To clarify the role of rare and extreme outcomes in distinguishing convex from concave options, we provide two new columns to Table 2 in the Materials and Methods, in our reply to point (3).
 
 Finally, although a detailed analysis of rare but non-extreme outcomes lies outside the scope of this paper, the symmetric treatment of extreme and frequent outcomes can be addressed straightforwardly using strong First-Order Stochastic Dominance. Classical decision-theoretic approaches indeed satisfy this property.
 
 (2) The RL-modelling work also fails to show a specific impact of the rare extreme event. As best as I can understand Eq 2, the model provides a free parameter that adds a bonus to the value of either the two options with high-variance gains (A and V in the paper) or to the two options with high-variance losses (F and V in the paper). This parameter only depends on whether this option could have possibly yielded the rare, extreme outcome (i.e., based on the generative probability) and was not connected to its actual appearance. That makes it a free parameter that just bumps up (or down) the probability of selecting a pair of options. In the case of the "black swan" or high-variance loss conditions, this seems very much like a loss aversion parameter, but an additive one instead of a multiplicative one.
 
 We agree with the referee that the additional parameters, compared to more standard Q-learning models, specifically capture the fact that some options deliver REE while others do not. In our estimation procedure, these parameters become nonzero as soon as REE are observed for the first time for a given option. Therefore, the first step is to estimate a baseline nested model in which REEs contribute only at the learning stage (i.e., they affect the updating of Q-subvalues), while the additional parameters are constrained to zero. The next step is to compare alternative models against this baseline, allowing REEs to enter through the additional parameters. In this respect, our specification is parsimonious, especially given that very little is known about REEs in computational neuroscience. More structural modeling is certainly a promising direction for future research, and this paper constitutes a first step toward that goal.
 
 We provide the BIC, in addition to the AIC, to account for the presence of additional parameters in model selection and to ensure that the observed improvement in fit is not merely driven by their inclusion.
 
 Unlike most of the existing literature, our results extend the notion of loss aversion to extreme losses. The negative decision weight on options yielding the Black Swan can be interpreted as a differential treatment of negative REE, an issue we discuss extensively in the Discussion (page 20).
 
 (3) The paper presented the methods and results with lots of neologisms and fairly obscure jargon (e.g., fragility, total REE sensitivity). That made it very hard to decipher exactly what was done and what was found. For example, on p. 4, the use of concave and convex was very hard to decipher; the text even has to repeat itself 3 times (i.e., "to repeat" and "in other words") and is still not clear. It would be much clearer (and probably accurate) to say that the options varied along the variance dimension, separately for gains and losses. Option A was low-variance gains and losses. Option B was low-variance losses and high-variance gains. Option C was high-variance losses and low-variance gains, and Option D was high-variance losses and gains. That tells much more clearly what the animals experienced without the reader having to master a set of new terminologies around fragility and robustness, which brings a set of theoretical assumptions unnecessarily into the description of the experimental design. In terms of results, "Black Swan" avoidance is more simply known as risk aversion for losses.
 
 Because our experimental design focuses on REE, outcomes cannot be summarized only by their variance. This is well known from the large literature on so-called fat-tailed statistical distributions. Unlike the Normal distribution that is entirely characterized by its expected value and variance, fat-tailed distributions have nonzero kurtosis. This implies that a fat-tailed distribution (e.g. exponential) with the same expected value and variance as the Normal differs importantly by possessing extreme values that are much more likely in terms of frequency. To illustrate, if the distribution of pellets was assumed to be Normal with expected value set at 3.89 and variance set at 9.37 as for the convex option, the probability of getting 80 pellets would be about 2.10-16, practically zero. In contrast, this probability is smaller than, but close to 1% in our design.
 
 In Material and Methods, we clearly explain how our novel approach in terms of convexity relates to the moments of the reward distributions, including but not limited to the variance. To clarify further, we provide two new tables (Author response table 2 and Author response table 3) to be compared to Table 2 of the manuscript in which we report the first four moments (mean, standard deviation, skewness and kurtosis) of the full concave and convex gain distributions, reproduced for convenience
 
 Author response table 1.
 
 In Author response table 2 we report the first four moments when REE are truncated. Comparing convex and concave gains shows that the convex option has a smaller but still close mean compared to the concave option. In contrast, the former has larger variance, skewness and kurtosis compared to the latter. Therefore, interpreting choosing the convex option as reflecting “preference” for variance is at best incomplete.
 
 Author response table 2.
 
 First four moments of concave and convex gains when REE are removed
 
 Author response table 2 further shows that REE alone goes a long way towards explaining the differences between convex and concave options in terms of the first four moments: removing the rare and extreme value results in the concave option having now a larger mean, while the convex option still has larger variance, skewness, and kurtosis but by a smaller margin.
 
 In Author response table 3 we report the first four moments when both RE and REE are truncated, which shows that the convex and concave options differ only with respect to their mean (which is here also larger for concave).
 
 Author response table 3.
 
 First four moments of concave and convex gains when both RE and REE are removed
 
 In addition, our focus on REE implies that we go beyond mean-variance preferences that apply mostly to Gaussian distributions. It is not clear theoretically what type of utility functions would reflect preferences that combine a taste for variance, skewness and kurtosis, even though all those moments affect expected utility. See for example Phelps, C.E. “A user’s guide to economic utility functions”. J Risk Uncertain 69, 235–280 (2024) for a recent overview (on page 242, Phelps states that “In situations where risk is not normally distributed, it is ill-advised to ignore statistical parameters beyond variance, unless the deviations from normality are relatively small”).
 
 More importantly, our proposed measure of the convexity of the reward distributions, the Jensen gap, further reveals how even restricting the analysis to the first four moments is incomplete in the sense that it fails to characterize the difference between options: the fifth moment of the concave contributes more the Jensen gap than even kurtosis, while one needs to look at much higher moments to find significant contributions to the Jensen gap for the convex option. In that sense, there is no reason to restrict the analysis to variance, and even to skewness and kurtosis, to compare options, in general and in our particular setup as well. Note that introducing REE would result in convex distributions even in simplified designs, e.g. with 3-value support. Studying REE implies the need to look beyond variance, and our proposal is to use the Jensen gap as a measure of convexity. In the Material and Methods section of the paper, we did not develop an in depth analysis of Jensen gap so as to spare the reader confronted with an already rather technical paper.
 
 We thank the referee for raising the issue of whether variance is a simpler explanation of our results. To keep the main text as short as possible, we chose to refrain from adding technical complexity. We hope we made clear in our reply that the analysis cannot be restricted to variance when studying REE. We believe that Jensen gap is a useful notion in this regard. As our replies will be made publicly available, we chose not to integrate the above discussion in the main text.
 
 (4) Were the probabilities shuffled or truly random (seem to be fixed sequences, so neither)? What were the experienced probabilities? Given the fixed sequences, these experienced ("ex-post") probabilities, could differ tremendously from the scheduled ("ex ante") probabilities. It's quite possible that an animal never experienced the rare, extreme event for a specific option. It's even possible (if they only picked it on the 10th/60th choices by chance), that they only ever experienced that rare extreme event. This cannot be known given the information provided. The Supplemental info on p.55 only gives gross overall numbers but does not indicate what the rats experienced for each choice/option-which is what matters here. A simple table that indicates for each of the 4 options, how often they were selected, and how often the animals experienced each of the 6-8 possible outcome would make it much clearer how closely the experience matched the planned outcomes. In addition, by restricting the rare outcome to either the 10th or 60th activations in a session, these are not random. Did the animals learn this association?
 
 Probabilities are not random and a limited number of fixed sequences has been used, as stated in Material and Methods. We have chosen sequences that satisfy our assumptions about ex-post stochastic dominance reversal of convex over concave options when REE are added. We have added in Table S4 the choice frequencies for all four options. If the animals had learnt the 10th and 60th activation, they would exhibit a strategy in their choice that would tend to be more optimized than what is observed. For example, the options offering the possibility to obtain the Jackpot are not optimal in terms of gains for the frequent events, therefore the animals should tend to select these options only around the 10th and 60th choice. Most of their other choices should favor the options delivering the larger gains in the frequent domain. This is not what is observed. We have added this important point in the discussion (page 18).
 
 (5) The choice data are only presented in an overprocessed fashion with a sum and a difference (in both figures and tables). The basic datum (probability/frequency of selecting each of the 4 options) is not provided directly, even if it can theoretically be inferred from the sum and the difference. To understand what the rats actually do, we first need to see how often they select each option, without these transformations.
 
 As described in Material and Methods, the 4 options are combinations of 2 convex and concave sub-options for gains and losses, which is why our analysis of the behavioral data focuses on convexityrelated total and one-sided sensitivities to REE. The third dimension needed to fully characterize rats’ behavior is simply 1−ffFF, the fraction of non-Fragile choices. In addition, we also provide in Table S4 of the Supplementary Material an alternative interpretation in terms of Black Swan Avoidance and Jackpot Seeking. We have added in Table S4 the choice frequencies for all four options. Finally, all the raw data will be made available with open access and no access codes.
 
 (6) There is insufficient detail provided on the inferential statistical tests (e.g., no degrees of freedom or effect sizes), and only limited information on exactly what tests were run and how (bootstrapping, but little detail). Without code or data (only summary information is provided in the supplement), this is difficult to evaluate. In addition, the studies seem not to be pre-registered in any way, leaving many researchers with degrees of freedom. Were any alternative analysis pipelines attempted? Similarly, there were many sub-groupings of the animals, and then comparisons between them - were these post-hoc?
 
 We understand the concern of the referee for pre-registration of the referee, as an epistemic safeguard to make empirical claims more falsifiable, more transparent, and less dependent on post hoc rationalization. But the contemporary push for preregistration is often presented as an “epistemic improvement,” but in practice it functions largely as a norm of moral regulation, not a scientific necessity. The rhetoric is moralistic: preregistered research is “clean,” “transparent,” “credible,” while non-preregistered work is viewed with suspicion—even when the methodology is sound. This language is not epistemologically neutral; it enforces ought to be done, irrespective of the diversity of legitimate scientific practices.
 
 From a philosophy of science perspective, this is historically and conceptually problematic. Scientific progress has never followed a uniform, rule-based method. As e.g. Feyerabend has argued, major discoveries have emerged precisely because researchers were not bound by predetermined plans: they followed anomalies, improvised, reinterpreted data, and revised methods and hypotheses in light of new evidence — practices that a rigid preregistration ethos can suppress and that are not aligned with how genuine discovery often occurs.
 
 Even from a statistical standpoint, preregistration is far from a panacea. It reduces some degrees of freedom (mainly in confirmatory statistics), but it does not eliminate flexibility; researchers can still choose models, transformations, exclusion rules, stopping rules, etc. And more importantly: reducing flexibility is not inherently epistemically virtuous. Flexibility is often necessary to understand data properly—especially in new paradigms or first-of-their-kind experiments, which is the case for this study. Science needs exploration, opportunism, and theoretical plasticity. Preregistration is compatible with these only if it is treated as one optional tool among many—not as a universal evaluative standard.
 
 As the referee pointed out, this study “taps into a surprisingly neglected but very relevant aspect of decision-making.” Our work is therefore mainly exploratory: the experimental paradigm reveals new behavioral patterns in how rats cope with rare and extreme events, and much of our analysis is necessarily descriptive. We conduct formal inference only where it is methodologically appropriate — the short-term behavioral response to rare events (for which we now provide more details in the Material & methods section p.35) and the estimation of augmented Q-learning models, which follow a standard econometric approach (documented in the Material & Method section–see also our response to recommendation 4). These inferential results support the descriptive patterns that motivate this new line of research.
 
 (7) On p. 17, there is an attempt to look at the impact of a rare, extreme event by plotting a measure of preference for the 10 trials before/after the rare, extreme event. In the human literature, the main impact of experiencing a rare, extreme event is what is known as the wavy recency effect (See Plonsky et al. 2015 in Psych Review for example). What this means is that there tends to be some immediate negative recency (e.g., avoiding a rare gain) followed by positive recency (e.g., chasing the rare gain). Using a 10-trial window would thus obscure any impact of this rare, extreme event. An analysis that looks at a time course trial-by-trial could reveal any impact.
 
 We thank the referee for drawing our attention to the wavy recency effect documented in human experiments. We have added the corresponding reference in the Discussion (page 20). Regarding rats, the Before/After analysis reported in the paper suggests that there is no sizeable immediate recency effect for Jackpots. Even for Black Swans, the immediate recency effect we report remains modest when using a 10-trial window, and the analysis of the choice immediately following a REE does not show evidence of immediate negative recency. This casts doubt on the presence of such an effect in rats.
 
 (8) As I understood the method (p. 31), the assignment of options to physical locations was not random or counterbalanced, but deliberately biased to have one of the options in the preferred location. This would seem to create a bias towards a particular option and a bias away from the other options, which confounds the preference data in subsequent analyses.
 
 We agree that the design incorporated an intentional bias toward the anti-fragile option as a proof of concept. Nevertheless, Figure 8 demonstrates that animals substantially altered their choices between training and final testing, with a median change of approximately 35% across sessions. This indicates that behavior was driven by the structure of possible outcomes rather than by a stereotyped location-based preference.
 
 (9) Are delays really losses? This is a big assumption. Magnitude and delay are different aspects of experience, which are not necessarily commensurable and can be manipulated independently. And, for the model, how were these delays transformed into outcomes for the model? Eq 1 skips over that. Is there an assumption of linearity? In addition, I was not wholly clear if the delays meant fewer trials in a session or if the delays merely extended the session and meant longer delays until the next choice period.
 
 Consistent with established rodent decision-making paradigms (Adams et al., 2017 doi: 10.1523/ENEURO.0094-17; Breysse et al., 2021 doi: 10.1111/ejn.14895), we employed sweet pellets as gains and imposed delays as losses. Delays are operationalized as losses because they preclude the animal from engaging in reward-generating behavior; thus, increasing the delay duration proportionally increases the magnitude of the opportunity cost.
 
 (10) The paper does not sufficiently accurately represent the existing literature on human risky decision-making (with and without rare events). Here are a few examples of misrepresented and/or missing literature:
 
 Most studies on decision-making do not only rely on p > 10% (as per p. 2). Maybe that is true with animals, but not a fair statement generally. Some do, and some don't. There is substantial literature looking at rarer events in both descriptions (most famously with Kahneman & Tversky's work), but also in experience (which is alluded to in reference 19). That reference is not only about the situation when choices are not repeated (e.g. the sampling paradigm), but also partial feedback and full-feedback situations.
 
 We have corrected that statement in the main text (page 3) and we thank the referee for pointing this out.
 
 The literature on learning from rewarding experiences in humans is obliquely referenced but not really incorporated. In short, there are two main findings - firstly people underweight rare events in experience; second, people overweight extreme outcomes in experience (both contrary to description). Some related papers are cited, but their content is not used or incorporated into the logic of the manuscript.
 
 One recent study systematically examined rarity and extremity in human risky decision-making, which seems very relevant here: Mason et al. (2024). Rare and extreme outcomes in risky choice. Psychonomic Bulletin & Review, 31, 1301-1308.
 
 There is a fair bit of research on the human perception of the risk of rare events (including from experience) and important events like climate. One notable paper is Newell et al (2015) in Nature Climate Change.
 
 We agree with the referee that the related literature on REE in animal Decision Making is scant and that it is more developed in humans. We thank the referee for pointing at Mason et al. (2024), who clarify where the literature on humans stands and why combining rarity and extremity, as we also do, is important and highly relevant. We have added a new statement and references in the Introduction and Discussion (pages 3, 20, 22).
 
 Recommendations for the authors:
 
 Reviewer #1 (Recommendations For The Authors):
 
 (1) As said above, I think the manuscript would really benefit from a rewriting, to replace some technical terms with more readable ones, and maybe rebalance the focus from the current focus on the framework (heavily loaded with economics concepts, which will be hard to digest for the eLife readership) to a higher weight on information that is critical to understand and interpret the behavior (e.g. information about training & training behavior, etc.).
 
 We have revised the entire manuscript to improve readability and have clarified in the main text: (1) why convexity of exposures to REE could, beyond variance, be useful for experiments in other settings that our own; (2) why the associated notion of antifragility may be applicable to other settings and therefore of broader interest; (3) what was done in the training sessions compared to the final sessions.
 
 (2) From Figure 8, it seems that rodent behavior is more clustered after the training (i.e. before the sessions) than after the sessions. Could that be a sign of imperfect learning?
 
 Figure 8 mostly suggests that there is some flexibility in the choices made and that the intended initial bias towards the antifragile choice in the design of the task could be over ridden by the rats.
 
 (3) The modelling section seems incomplete. I think the authors want to tease apart where REE enters the model and should propose an alternative where REE affects the learning rather than the decision.
 
 In fact, the general model allows REE to have an effect at the learning stage only (i.e. to contribute to the updating of the Q subvalues), when the specific decision weights attached to options delivering REE are both zero. However, our analysis shows that such a model is rejected by the behavioral data for all rats. We have clarified this point in the revised version.
 
 (4) Also, parameter and model recovery exercises seem mandatory (Wilson & Collins, 2019).
 
 We thank the referee for highlighting this valuable reference in computational modeling, particularly in the context of model identification and estimation in computational biology. In the present research, we adopted an econometric perspective on model identification—especially with regard to the integration of Q-values for gains and losses. The softmax choice function is formally equivalent to a multinomial logit model, and as is well known in econometrics, identification in such models presents non-trivial challenges. The standard approach in classical Q-learning is to multiply the Q-value by an inverse temperature parameter (also known as a precision parameter in random utility models). When extending the model to include separate Q-values for gains and losses, specifying the model in an identifiable way becomes more complex.
 
 To address this issue, we considered several alternative model specifications and conducted grid-based estimation of starting parameter values. This approach allowed us to examine the shape of the loglikelihood function and assess whether the parameters are globally identified, rather than only identifiable up to a linear combination. We found that the most parsimonious and empirically identified specification in our experimental paradigm is one in which Q-values for gains and losses are summed, each weighted by distinct decision weights (see our Equation 2 in the paper).
 
 The inclusion of decision weights for REE for each option (Equation 2) is then structurally equivalent to introducing constant terms in a logit model. The identification of these parameters follows standard econometric results on discrete choice models (e.g., Davidson & MacKinnon, 2003): since we model choices among four options, three free parameters can be estimated, leaving one degree of freedom in the specification. As mentioned in the "Modelling and Statistical Analysis" section, we further guarded against the presence of local maxima by applying a two-step estimation procedure, combining two optimization algorithms with multiple sets of starting values for the baseline model (i.e., the model without decision weights for REE). We also tested the addition of a global optimization method— simulated annealing—but found that it did not significantly improve upon our two-step procedure. This is not surprising, as our preliminary investigation of model identification, based on grid searches over starting parameter values, confirmed that all parameters were identified in our simple specification. Our intuition is that simulated annealing may yield different estimates than gradientbased methods primarily in cases where the model is not theoretically identified—suggesting that the need for such global optimization techniques can be indicative of underlying identification issues in Qlearning models.
 
 Regarding model comparison, we have used penalized information criteria to account for additional parameters. Although we do not report confusion or inversion matrices for our nested models, we verified that the estimated models replicate observed behaviors across all phenotypes, as shown in the main text (see bottom left panel of Figure 5 for the Total and One-Sided sensitivities). Most importantly, we conducted 100 additional simulations of 40 artificial sessions for each phenotype using the “winning” models and the median fitted parameters. These simulated rats—playing the task 100 times over 40 sessions—offer strong evidence that the selected models are valid: they quantitatively capture the behavior of all phenotypes in terms of our key metrics, Total and One-Sided sensitivities (see bottom right panel of Figure 5).
 
 Taken together, this methodical econometric approach to model specification and estimation gives us strong confidence in the identification and robustness of our model. Overall, while Wilson & Collins (2019) provide an interesting framework for model estimation in computational biology, we believe that a more formal theoretical analysis of model identification in Q-learning models would be a valuable addition to the field—though it lies beyond the scope of the present work. In our view, computational biologists should complement simulation-based validation and empirical fit with formal methods for assessing theoretical identifiability, particularly when estimating complex choice models.
 
 Davidson, R. and J.G. MacKinnon (2003) Econometric Theory and Methods. Oxford University Press (New York).
 
 Wilson, R. C., & Collins, A. G. (2019). Ten simple rules for the computational modeling of behavioral data. eLife, 8, e49547. https://doi.org/10.7554/eLife.49547
 
 Reviewer #2 (Recommendations For The Authors):
 
 (1) The paper confuses risk sensitivity and exploration in the opening lines. These are not the same.
 
 What we have in mind here is that uncertainty about outcomes is one of the main drivers of exploration, in the sense that there would be no need to explore in a counterfactual world with deterministic gains and losses. We have modified the opening lines of the paper to better reflect this dimension (page 2).
 
 (2) p. 9. "awfully long" is an unnecessary descriptor. Descriptions of methods should be more factual.
 
 The manuscript has been entirely rewritten.
 
 (3) p. 13. Most points lie on the left of the square (not right?).
 
 We thank the referee for pointing at this typo, that is now corrected in the text (page 8).
 
 (4) p. 13. Last line. "obviously" is patronizing to the readers.
 
 The manuscript has been entirely modified to address related points.
 
 (5) p. 23. The avoidance of black swans by not choosing that option sounds like a hot-stove effect (see Denrell & March, 2001). Is this evidenced here?
 
 To the best of our knowledge, the statement that “people tend to avoid activities they have had a negative experience of, resulting in a negativity bias” (from Jerker Denrell’s website) does not explicitly concern REE. Instead, it appears to refer broadly to reinforcement learning mechanisms driven by negative outcomes, irrespective of their magnitude or frequency. In our task, animals encounter both negative rare events (RE) and negative rare and extreme events (REE; Black Swans). Notably, the task design does not allow rats to completely avoid negative RE unless they cease performing the task altogether—a pattern typically seen in paradigms involving aversive stimuli such as electric foot shocks. The fact that all 20 rats maintained stable performance across the 41 sessions provides evidence against a pronounced hot-stove effect. This point has been incorporated into the revised discussion (page 20).
 
 (6) "menus" is an odd term. Better described as reward schedules?
 
 “Menu” has been replaced by “option” in the main text.
 
 (7) Why are they 20-minute sessions? I thought it was 120 trials per session? And 41 sessions? Or was this only in training?
 
 Each session ended after 20 minutes had elapsed, which led to approximately 120 trials (but not systematically). The choice of 20 minutes was made in order to limit the number of trials to prevent satiety. The total number of sessions ran with all 20 animals for the final testing was 41, an odd number but there was no justification to remove one session from the analysis. The training was much longer and is not included in the 41 sessions.
 
 (8) Really not clear why these Jensen inequalities were relevant or even calculated for these options? How is it relevant to what animals chose or experienced? They seem to be based on the generative probabilities for different options, which is not what happened in reality.
 
 We propose the Jensen gap as a general measure of convexity that relates to all moments of the probability distribution, as described in more detail in our answer to point (3) above. As such, we think it is a characterization of options with stochastic outcomes that could prove useful to other experimenters in alternative settings beyond our own.
 
 (9) Only some summary data in supplemental materials. No open data or code for recreating the experiment or analyzing the data.
 
 The data is available on Github (see page 38) and the code will be available upon request.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2021.11.01.466806v4
www.biorxiv.org www.biorxiv.org

Dopamine and its receptor DcDop2 are involved in the coevolution between ‘Candidatus Liberibacter asiaticus’ and Diaphorina citri

4
1. Public_Reviews 02 Jun 2026
  
  in eLife
  
  eLife Assessment
  
  Insects can act as vectors of plant diseases, hence the study of insect-pathogen interactions is relevant for agriculture. This important study identifies in Diaphorina citri a dopamine receptor responsive to 'Candidatus Liberibacter asiaticus' infection, demonstrate direct regulation of this receptor by a microRNA, and integrate dopamine signaling into an established insect reproductive hormone framework. Multiple complementary experimental approaches convincingly support for the findings, although key conclusions rely on correlative data and the mechanistic evidence for the proposed linear signaling cascade is limited. This work will be of interest for insect physiology and vector-pathogen biology, and more broadly for citrus agriculture.
  
  Summary
2. Public_Reviews 02 Jun 2026
  
  in eLife
  
  Reviewer #1 (Public review):
  
  I read this paper with great interest based on my experience in insect sciences. Previous concerns:
  
  (1) The paper has an original biological question that is overly broad and mechanistically ambitious. The central biological question, namely how CLas infection enhances fecundity of Diaphorina citri via dopamine signaling, is clearly stated and well motivated by previous literature. However, my advice to the authors is that, while the general question is clear, the manuscript attempts to answer multiple mechanistic layers simultaneously. As a result, I feel that the biological narrative becomes diffuse, especially in later sections where DA, miRNA regulation, AKH signaling, and JH signaling are all proposed as parts of a single linear cascade. In summary, my key concern is that the paper often moves from correlation to causal hierarchy without fully disentangling whether these pathways act sequentially, in parallel, or redundantly. A more explicitly framed primary hypothesis (e.g., "DA-DcDop2 is necessary and sufficient for CLas-induced fecundity") may improve conceptual clarity.
  
  (2) On the novelty of the data, I feel they are moderately novel, with substantial confirmatory components. If I am correct, the novel contributions include the identification of DcDop2 as the DA receptor responsive to CLas infection in D. citri, the discovery that miR-31a directly targets DcDop2, which is supported by luciferase assays and RIP, and thirdly, the integration of dopamine signaling into the already-described CLas-AKH-JH-fecundity framework. My advice to the authors is to focus more on the manuscript's novelty, which lies more in pathway integration than in discovering fundamentally new biological phenomena. This is appropriate for a mechanistic paper, but should be framed as an extension of existing models rather than a paradigm shift.
  
  (3) On the conclusions, I recommend that the authors modify their statements a little. I feel that there are some overstated or insufficiently supported claims. For instance, the assertion that CLas "hijacks" the DA-DcDop2-miR-31a-AKH-JH cascade implies direct pathogen manipulation, but no CLas-derived effector or mechanism is identified. Also, that the model suggests a linear signaling hierarchy, but the data largely show correlation and partial dependency rather than strict epistasis. In third, the term "mutualistic interaction" may be too strong, as host fitness costs outside fecundity (e.g., longevity, immunity) are not evaluated. In conclusion, I confirm that the data support a functional association, but mechanistic causality and evolutionary interpretation are somewhat overstated.
  
  Comments on revised version:
  
  The authors provided a satisfactory revision.
  
  Review 1
3. Public_Reviews 02 Jun 2026
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Nian and colleagues comprehensively apply metabolomics, molecular, and genetic approaches to demonstrate that CLas hijacks the DA/DcDop2-miR-31a-AKH-JH signaling cascade to enhance lipid metabolism and fecundity in D. citri, while concurrently promoting its own replication.
  
  Strengths:
  
  These findings provide solid evidence of a mutualistic interaction between CLas proliferation and ovarian development in the insect host. This insight significantly advances our understanding of the molecular interplay between plant pathogens and vector insects and offers novel targets and strategies for HLB field management.
  
  Weaknesses:
  
  While the article investigates the involvement of dopamine signaling and specific microRNAs in enhancing fecundity and pathogen proliferation, it still needs to provide a detailed mechanistic understanding of these interactions. The precise molecular pathways and feedback mechanisms by which CLas manipulates dopamine signaling in Diaphorina citri remain unclear.
  
  Review 2
4. Public_Reviews 02 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  I read this paper with great interest based on my experience in insect sciences. I have some minor comments (and recommendations) that I believe the authors should address.
  
  (1) The paper has an original biological question that is overly broad and mechanistically ambitious. The central biological question, namely how CLas infection enhances fecundity of Diaphorina citri via dopamine signaling, is clearly stated and well motivated by previous literature. However, my advice to the authors is that, while the general question is clear, the manuscript attempts to answer multiple mechanistic layers simultaneously. As a result, I feel that the biological narrative becomes diffuse, especially in later sections where DA, miRNA regulation, AKH signaling, and JH signaling are all proposed as parts of a single linear cascade. In summary, my key concern is that the paper often moves from correlation to causal hierarchy without fully disentangling whether these pathways act sequentially, in parallel, or redundantly. A more explicitly framed primary hypothesis (e.g., "DA-DcDop2 is necessary and sufficient for CLas-induced fecundity") may improve conceptual clarity.
  
  We sincerely thank the reviewer for these constructive comments and agreed that the initial version of our manuscript attempted to integrate multiple signaling layers, which may have blurred the logical distinction between sequential, parallel, or redundant pathways. To address this concern, we have restructured the narrative to center on a clearly defined hypothesis by changing “DA/DcDop2-miR-31a-AKH-JH signaling cascade” to “DA-DcDop2 signaling axis” in Abstract (Line 33) of the revised manuscript.
  
  (2) On the novelty of the data, I feel they are moderately novel, with substantial confirmatory components. If I am correct, the novel contributions include the identification of DcDop2 as the DA receptor responsive to CLas infection in D. citri, the discovery that miR-31a directly targets DcDop2, which is supported by luciferase assays and RIP, and thirdly, the integration of dopamine signaling into the already-described CLas-AKH-JH-fecundity framework. My advice to the authors is to focus more on the manuscript's novelty, which lies more in pathway integration than in discovering fundamentally new biological phenomena. This is appropriate for a mechanistic paper, but should be framed as an extension of existing models rather than a paradigm shift.
  
  We sincerely thank the reviewer for this thoughtful and highly constructive assessment. We greatly appreciate the clear articulation of what constitutes the novel contributions of our work, and we fully agree with the characterization that the primary novelty lies in pathway integration rather than the discovery of entirely unprecedented biological phenomena. We also accept the valuable advice that our manuscript should be framed as an extension of existing models rather than a paradigm shift. In response to this insightful comment, we have carefully revised the Results part in Line 275-278 of the revised manuscript.
  
  (3) On the conclusions, I recommend that the authors modify their statements a little. I feel that there are some overstated or insufficiently supported claims. For instance, the assertion that CLas "hijacks" the DA-DcDop2-miR-31a-AKH-JH cascade implies direct pathogen manipulation, but no CLas-derived effector or mechanism is identified. Also, that the model suggests a linear signaling hierarchy, but the data largely show correlation and partial dependency rather than strict epistasis. In third, the term "mutualistic interaction" may be too strong, as host fitness costs outside fecundity (e.g., longevity, immunity) are not evaluated. In conclusion, I confirm that the data support a functional association, but mechanistic causality and evolutionary interpretation are somewhat overstated.
  
  We sincerely thank the reviewer for these insightful comments and agreed that there are some overstated or insufficiently supported claims. In response to this insightful comment, we have changed "hijacks" to "regulates" (Line 32 and 124), and "mutualistic interaction" to “coevolution” (Line 2, 34, 127, 257, 763, 806, and 842) in our revised manuscript.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Nian and colleagues comprehensively apply metabolomics, molecular, and genetic approaches to demonstrate that CLas hijacks the DA/DcDop2-miR-31a-AKH-JH signaling cascade to enhance lipid metabolism and fecundity in D. citri, while concurrently promoting its own replication.
  
  Strengths:
  
  These findings provide solid evidence of a mutualistic interaction between CLas proliferation and ovarian development in the insect host. This insight significantly advances our understanding of the molecular interplay between plant pathogens and vector insects, and offers novel targets and strategies for HLB field management.
  
  Weaknesses:
  
  While the article investigates the involvement of dopamine signaling and specific microRNAs in enhancing fecundity and pathogen proliferation, it still needs to provide a detailed mechanistic understanding of these interactions. The precise molecular pathways and feedback mechanisms by which CLas manipulates dopamine signaling in Diaphorina citri remain unclear.
  
  These comments are extremely helpful for revising and improving our manuscript.
  
  Recommendations for the authors:
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) In Figures 1C and 1D, please maintain consistent gene nomenclature: change "henna" to "Henna", "TH" to "Th", and "DDC" to "Ddc".
  
  Thanks for your great suggestion. We have changed "henna" to "Henna", "TH" to "Th", and "DDC" to "Ddc" in Figure 1C and 1D of our revised manuscript.
  
  (2) In Figure 7, correct "Emergy metabolism" to "Energy metabolism".
  
  Thanks for your valuable suggestion. We have corrected "Emergy metabolism" to "Energy metabolism" in Figure 7 of our revised manuscript.
  
  (3) Please specify the number of biological replicates in the figure captions.
  
  Thanks for your perfect suggestion. We have specified the number of biological replicates in the figure captions of Figure 1 (Line 737-738), Figure 2 (Line 757-759), Figure 3 (Line 780-782), Figure 4 (Line 799-800), Figure 5 (Line 816-819), and Figure 6 (Line 833-836).
  
  (4) For Figure 2I, 3J, and 5H, clarify that CLas 16s rRNA was detected by FISH. The age of the dissected females should also be described in the captions.
  
  Thanks for your insightful suggestion. We have added the female age (at 7 DAE) in the captions for Figure 2I (Line 752), 3J (Line 773), and 5H (Line 813) of our revised manuscript.
  
  (5) A blot is shown in Figure 3B but not discussed in the text. Since the manuscript describes mRNA levels, please specify whether these blots are from Northern or Western blotting and provide relevant methodological details.
  
  Thanks for your great suggestion. The blot in Figure 3B is Western blot result. We have added the related descriptions in Result (Line 202), Materials and Methods (Line 521-536), and figure legend (Line 766) of our revised manuscript.
  
  (6) In Figure 3G-3K, an "inhibitor" was used, but its name and functional role are not described. Please give more details.
  
  Thanks for your valuable suggestion. We have added the detail information for “Dop2 inhibitor” in the Figure 3G-3K legend (Line 772-776) of our revised manuscript.
  
  (7) In Lines 23-24 of the Abstract, consider revising "their neuroendocrine regulation remains unclear" to "their neuroendocrine regulation mechanisms remain unclear" for grammatical accuracy.
  
  Thanks for your perfect suggestion. We have revised "their neuroendocrine regulation remains unclear" to "their neuroendocrine regulation mechanisms remain unclear" for grammatical accuracy in Line 24 of our revised manuscript.
  
  (8) The last sentence of the Abstract is overly long. It is recommended to split it as follows: "These findings reveal a mutualistic interaction between CLas proliferation and ovarian development in the insect host. This discovery enhances our understanding of the molecular interplay between plant pathogens and vector insects and offers novel targets and strategies for HLB field management."
  
  Thanks for your excellent suggestion. We have splited the last sentence of the Abstract as follows: "These findings reveal a coevolution between CLas proliferation and ovarian development in the insect host. This discovery enhances our understanding of the molecular interplay between plant pathogens and vector insects and offers novel targets and strategies for HLB field management." in Line 34-37 of our revised manuscript.
  
  (9) In Line 139, remove the comma between "female" and "adult".
  
  Thanks for your great suggestion. We have removed the comma between "female" and "adult" in Line 139 of our revised manuscript.
  
  (10) In Line 149, replace "d" with day.
  
  Thanks for your perfect suggestion. We have replaced "d" with "day" in Line 149 of our revised manuscript.
  
  (11) The JH determination method references a previous study but lacks a detailed description of the extraction procedure. Please include this information in the methodology section.
  
  Thanks for your valuable suggestion. We have added the detailed description of the JH extraction procedure in Line 511-514 of our revised manuscript.
  
  (12) In Figure S2, since the panel shows interference efficiencies for four genes, "treated with dsDcAKHR" should be revised to "treated with dsRNA" for accuracy.
  
  Thanks for your insightful suggestion. We have revised "treated with dsDcAKHR" to "treated with dsRNA" for accuracy in the Figure S2 legend.
  
  (13) In line 354-355, change "DcVg1-like, DcVgA1-like and DcVgR" to "DcVg1-like, DcVgA1-like, and DcVgR".
  
  Thanks for your great suggestion. We have changed "DcVg1-like, DcVgA1-like and DcVgR" to "DcVg1-like, DcVgA1-like, and DcVgR" in Line 350 of our revised manuscript.
  
  (14) The study primarily investigates the role of agomir-31a. Would antagomir-31a promote ovarian development in CLas- females? In addition, did the authors perform a rescue experiment using antagomir-31a in CLas+ females after dsDcDop2 treatment?
  
  Thanks for your valuable suggestion. The proposed experiments will be instrumental in further elucidating the functional role of miR-31a and represent a key direction for our future research. We will carefully consider and incorporate these approaches in our subsequent study.
  
  (15) The method used to determine CLas-negative and CLas-positive individuals should be described in more detail in the Materials and Methods section.
  
  Thanks for your great suggestion. We have added more details about CLas detection in the Materials and Methods section (Line 378) of our revised manuscript.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.10.10.681724v2
www.medrxiv.org www.medrxiv.org

In vivo mapping of striatal neurodegeneration in Huntington's disease with Soma and Neurite Density Imaging

4
1. Public_Reviews 02 Jun 2026
 
 in eLife
 
 eLife Assessment
 
 This fundamental manuscript presents a novel application of the SANDI (Soma and Neurite Density Imaging) model to study microstructural alterations in the basal ganglia of individuals with Huntington's disease (HD). The compelling methods are, to our understanding, the first application of SANDI to neurodegenerative diseases, provide strong evidence for HD-related neurodegeneration in the striatum, account significantly for striatal atrophy, and correlate with motor impairments. The integration of novel diffusion acquisition and modelling methods with multimodal behavioural data are both of high value in their own right, and create a framework for future studies.
 
 Summary
2. Public_Reviews 02 Jun 2026
 
 in eLife
 
 Reviewer #1 (Public review):
 
 (1) In this study, the authors aimed at characterizing Huntington's Disease (HD) - related microstructural abnormalities in the basal ganglia and thalami as revealed using Soma and Neurite Density Imaging (SANDI) indices (apparent soma density, apparent soma size, extracellular water signal fraction, extracellular diffusivity, apparent neurite density, fractional anisotropy and mean diffusivity).
 
 (2) The study implements a novel biophysical diffusion model that extends up-to-date methodologies and presents a significant potential for quantifying neurodegenerative processes of the grey matter of the human brain in vivo. The authors comment on the usefulness of this technique in other pathologies, but they exemplify only with multiple sclerosis. Further development of this, building evidence should be provided.
 
 (3) Study found that HD-related neurodegeneration in the striatum accounted significantly for striatal atrophy and correlated with motor impairments. HD was associated with reduced soma density, increased apparent soma size and extracellular signal fraction in the basal ganglia, but not in the thalami. Additionally, these affects were larger at manifest stage.
 
 (4) The results of this work demonstrate the impact of HD on basal ganglia and thalami which can be further explored as a non-invasive biomarker of disease progression. Additionally, the study shows that SANDI can be used to explore grey matter microstructure in a variety of neurological conditions.
 
 Comments on revised version.
 
 I have no further comments. Thank you
 
 Review 1
3. Public_Reviews 02 Jun 2026
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 Ioakeimidis and colleagues studied miscrostructural abnormalities in N=56 Huntington's disease (HD) patients compared to N=57 normative controls. The authors used a powerful MRI Connectom scanner and applied the SANDI model to estimate the soma size, neurite size, soma density, and extracellular fraction in key subcortical nuclei related to HD. In the striatum, they found decreased soma density and increased soma size, which also seemed to become more pronounced in advanced HD individuals in the final exploratory analyses. The authors conducted important analyses to find whether the SANDI measures correlate with clinical scores (i.e., QMotor) and whether the variance of the striatal volume is explained by the SANDI measures. They found a relationship of SANDI measures to both.
 
 Strengths:
 
 The study is both innovative and of high interest for the HD community. The authors provide a rich pool of statistical analyses and results which anticipate the questions that may emerge in the HD research community. Statistics are carefully chosen and image processing is done with state-of-the-art methods and tools. The sample size gives sufficient credibility to the findings. Altogether, I think this study sets a milestone in the attempts of the HD community to understand neuropathological processes with non-invasive methods, and extends the current knowledge of microstructural anomalies identified in HD with diffusion MRI. More importantly, the newly identified anomalies in soma size and soma density open new avenues for studying these biological effects further, and perhaps develop these biomarkers for use in clinical trials.
 
 Weaknesses:
 
 (1) An important question is whether the SANDI measures, which require an expensive scanner and elaborate processing, are better biomarkers than the more traditional DTI measures. Can the authors compare the effect size of FA/MD with SANDI measures. In some of the plots and tables, FA/MD seem to have comparable, if not higher, correlations with QMotor or CAP scores. On the same vein, it is unclear whether DTI measures were included in hierarchical stepwise regression. I wonder if the stepwise models may have picked up FA/MD instead of SANDI measures if they are given a chance. Overall, I hope the authors can discuss their findings also in this light of cost vs. benefit of adopting SANDI in future studies, which is an important topic for clinical trials.
 
 (2) Similar to the above point, it is very important to consider how strong the biomarking signal is from SANDI measures compared to the good old striatal volume. Some plots seem to indicate that volumes still have the highest correlation with QMotor, and highest effect size in group comparisons. It would be helpful for the community to know where do the new SANDI measures stand compared to the most typically used volumes in terms of effect size.
 
 (3) The diffusion measures are inevitably correlated to some degree. Please provide a correlation matrix in supplementary material including all DWI measures to enable readers to understand better how similar SANDI measures are between each other or vs. other DTI measures. Perhaps adding volumes to this correlation matrix may also be a good future reference.
 
 (4) ISS stages:
 
 (a) The online ISS calculator requires cut-offs derived from the longitudinal Freesurfer pipeline, while the authors do not have longitudinal data. Thus, the ISS classification might be inaccurate to some degree if the authors used the FS cross-sectional pipeline. Please review this issue and see if updated cut-offs should be used to classify participants. (b) Were there really no participants with ISS 0 among 56 HD individuals, please clarify in the manuscript? (c) A note on terminology that might be confusing to some readers. According to the creators of ISS, the ISS stages are created for research only, they are not used or applied in the clinic. On the other hand, the terms "premanifest" and "manifest" have a clinical meaning, typically based on the diagnostic confidence level. The assignment of ISS0-1 to premanifest and ISS2-3 to manifest may create some non-trivial confusion, if not opposition, in some segments the HD community. The authors can keep their current terminology but will need to at least clarify to the reader that this assignment is speculative, does not fully match the clinically-based categories, and should not be confused with similarly named groups in the previous literature.
 
 Comments on revised version.
 
 The authors have moved to address many points from reviewers. The manuscript had indeed become more objective, transparent, and to the point. The amount of information and analyses is large, which perhaps is inevitable when new methods are being tested for the first time in a neurodegenerative disease.
 
 Review 2
4. Public_Reviews 02 Jun 2026
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 Reviewer #1:
 
 (1) The biological and/or mathematical meaning of the Soma and Neurite Density Imaging (SANDI) indices (apparent soma density, apparent soma size, extracellular water signal fraction, extracellular diffusivity, apparent neurite density, fractional anisotropy, and mean diffusivity) should be briefly introduced for those less familiar with this novel technique.
 
 Further explanations about the biological and mathematical meaning of the SANDI indices were added to the introduction on page 6.
 
 (2) The study implements a novel biophysical diffusion model that extends up-to-date methodologies and presents a significant potential for quantifying neurodegenerative processes of the grey matter of the human brain in vivo. The authors comment on the usefulness of this technique in other pathologies, but they exemplify it only with multiple sclerosis. Further development of this, building evidence, should be provided.
 
 Clinical applications of SANDI have primarily focused on MS. However, since preparation of the manuscript, one study has been published reporting reductions in apparent soma density and white and grey matter specific differences in apparent soma size in amyotrophic lateral sclerosis (ALS) (Zeng et al., Eur J Radiol 2025, 10.1016/j.ejrad.2025.111981). These findings accord with the loss of motor neurons and glial responses in ALS. We have added this study to the introduction of SANDI on page 7.
 
 (3) Why are the basal ganglia compared against thalami? The rationale of this decision is missing.
 
 The thalami were selected as control regions based on the established trajectory of neurodegeneration in HD, which begins with early loss of medium spiny neurons in the striatum and later extends to surrounding structures, including the putamen and thalamus. Given that most participants in our study were at early disease stages, we assumed the thalami would remain relatively unaffected in this sample. This explanation has been added to the introduction on page 7.
 
 (4) The use of bullet points is unusual for a scientific paper format.
 
 Bullet points have been removed throughout the manuscript.
 
 (5) The authors mention that they eroded the boundaries of the subcortical masks. Providing the details and parameters of this erosion would be beneficial.
 
 Details of the default parameters of the FSL erode function that was used have been added to the method section on page 13.
 
 (6) In the conclusion, the authors state that their results will bridge the gap between histopathological findings and in vivo imaging, but it would be helpful if they could briefly explain how they imagine such a bridge (e.g., which kind of comparisons or correlations) and whether there exists any literature in this regard so far.
 
 We have added the following brief explanation to the conclusion on page 26: “Although conventional MRI lacks the resolution to directly capture histopathology, advanced biophysical models such as SANDI may help bridge this gap by providing biologically interpretable parameters that reflect tissue composition and capture histopathological changes in vivo.”
 
 (7) The scale is missing in Figure 3.
 
 The scale has been added to Figure 3.
 
 (8) In general, the work would benefit from a better organization and potentially a smaller number of figures and tables.
 
 The manuscript has been re-edited to improve the readability and organization throughout and the number of figures and tables were reduced by moving some of them to the Supplementary Material (old Tables 2 and 5 are now Supplementary Tables 2 and 3, old Figure 3 is now Supplementary Figure 1).
 
 Reviewer #2:
 
 Certain aspects of the study would benefit from clarification:
 
 (1) Scanner and acquisition consistency: While HD data are from the WAND study, it is not clear whether controls were scanned on the same scanner or protocol. Given the use of model-derived metrics (especially SANDI), differences in scanner or acquisition could introduce confounds. From the text, the HD participants are explicitly said to come from the WAND study (a longitudinal HD cohort). On the other hand, while the HC participants are described as age-matched controls, the paper does not clearly state whether they were scanned in the same study (i.e., WAND), on the same scanner, or with the same acquisition protocol. This ambiguity is potentially problematic, especially since they use model-derived diffusion metrics that can be very sensitive to scanner hardware, gradient strengths, and protocol settings. If the WAND HD data were acquired on a specific scanner (e.g., 3T Connectom) and the HCs were not, then differences in SANDI/DTI metrics might reflect scanner bias, not disease pathology. This is particularly critical in SANDI, which is sensitive to high b-values and SNR. It would strengthen the manuscript to explicitly state whether the HD and control data were acquired using the same scanner model, sequence, and protocol, and ideally at the same site. If this were not the case, the authors should include this as a limitation and discuss any harmonization strategies applied (e.g., ComBat, covariate modeling, etc).
 
 For harmonization and comparison purposes, HD and control data were acquired using the same strong gradient (300mT/m) 3T Connectom MRI system at CUBRIC with the same acquisition protocols and sequences. It should also be noted that the Connectom scanner has not had any software upgrades that could introduce scanner biases in data acquired at different time points. This is now made explicit on page 8 by stating that all MRI data for all participants were acquired on the same MRI system using the same acquisition protocols, and on page 10 by stating that all HD and HC MRI data included in our analyses were acquired on the same 3T Siemens Connectom scanner at CUBRIC using the same acquisition protocols described in this section.
 
 Also, although it offers novel and biologically informative markers, widespread clinical translation still faces hurdles. For instance, the study used a 3T Connectom scanner (300mT/m gradients), which is not widely available. Reproduction of these results in standard 3T clinical scanners would be a great addition, in scenarios with lower resolution, less precise parameter recovery, and longer scans if SNR needs to be maintained.
 
 We agree that for clinical adoption it is important to demonstrate that HD-related SANDI differences can also be detected on clinical MRI systems and do not require ultra-strong gradient imaging. While we have not collected such data in people with HD, we have demonstrated the feasibility of modelling SANDI metrics from multi-shell diffusion-weighted imaging acquired on a clinical 3T MRI (maximum b-value of 6,000 s/mm2) in healthy adults and people with MS (Schiavi et al 2023, https://doi.org/10.1002/hbm.26416). Furthermore, Zeng et al 2025, reported significant differences in SANDI metrics acquired on a 3T MRI Prisma system between individuals with ALS and healthy controls (maximum b-value of 3,000 s/mm2).
 
 Two additional studies demonstrated that SANDI could be implemented and microstructural differences could be detected in MS using 3T scanners with standard gradient strength (Barakovic et al., 2024; Margoni et al., 2023). Collectively, these findings indicate that SANDI can be applied on clinical scanners, particularly as clinical systems move toward stronger gradient capabilities such as Siemens Magnetom Cima.X. These explanations can be found under the clinical implication section in the Discussion on page 25.
 
 (2) Limitations of HD-ISS staging resolution and group separation:
 
 The use of HD-ISS staging to anchor progression analyses is conceptually appropriate, but, in practice, the sample is quite limited.
 
 (a) Only 26-27 out of 56 gene-positive participants could be assigned HD-ISS stages, and none were classified into stages 0 or 4. This restricts the interpretation of progression to a narrow clinical window (mostly stages 1-3) and excludes over 50% of the cohort.
 
 (b) Furthermore, visual inspection of the scatter plots (e.g., Figures 3 and 4) reveals substantial overlap between stages 1 and 2, particularly in CAP100 and Q-Motor measures. This suggests that the separation between early disease stages may not be robust in this dataset, potentially due to limited power or phenotypic variability.
 
 (c) The above may lead to claims based on progression across HD-ISS stages to be overinterpreted or underpowered
 
 Despite this, the paper treats the staging as a reliable stratification for group comparisons. To improve clarity and transparency, I would recommend that the authors:
 
 (a) Acknowledge that over 50% of the HD cohort could not be classified.
 
 (b) Discuss whether those excluded differed from those included in key metrics.
 
 (c) Explicitly comment on the substantial overlap between stages 1 and 2, and limit claims about progression unless such separation is statistically supported.
 
 (d) Avoid overinterpreting staging-related effects without statistical support for group separability
 
 Re a-d) We have added to the study limitations on pages 23 ff that only 54% (30 out of 56) HD participants could be HD-ISS classified due to missing data, and provide an overview of demographic and clinical information for HD-ISS stages and unclassified individuals in Supplementary Table 1. We acknowledge that the combined groups (HD-ISS 0-1 versus HD-ISS 23) for exploratory group analyses did not represent discrete disease stages and that there was some overlap in imaging and behavioural features between them as illustrated in Figures 3, 4, and 7. We state explicitly that these exploratory findings should be interpreted with caution and require replication in larger, prospective cohorts before SANDI metrics can be considered as potential markers of disease progression.
 
 (3) Clarify regression strategy and interpretational limits of SANDI-derived regressors: While the hierarchical regression strategy is broadly appropriate, several aspects would benefit from clarification to improve both interpretability and robustness of the findings. For example:
 
 (a) Why were only a subset of SANDI parameters (fis and De) considered in the HC models (Figure 6), while additional metrics (fec and rs) were tested in HD models (Figures 7-8)? Including the same variables across groups could aid comparability?
 
 The same SANDI indices were included in regression models for HD and HC groups, Figure 7-8 report only significant predictors. This has been clarified in the figure legend and on pages 14 of the manuscript.
 
 (b) Were any checks for multicollinearity (e.g., variance inflation factors) conducted? Given known interdependencies among some SANDI parameters, I wonder whether some of the reported regression coefficients may be unstable or difficult to interpret.
 
 Cross-correlation matrices between all imaging metrics for HD, HC, and total samples have been included to Supplementary materials Figure 3.
 
 To improve transparency and interpretability, I suggest actions such as:
 
 (a) SANDI metrics included in the models differ between HC and HD groups, reducing comparability. Consider using consistent full models across ROIs for comparison purposes, even if some predictors are not significant.
 
 (b) Report the correlation structure between SANDI metrics within each group to assess multicollinearity (The potential impact of multicollinearity (e.g., between fis and rs) is not discussed)
 
 (c) Explicitly acknowledge the limitations imposed by parameter degeneracy in the SANDI model and clarify how the authors ensured the biological interpretability of regression outputs in this context - Beta coefficients could reflect model instability or parameter degeneracy rather than true biological effects.
 
 (a) The same SANDI metrics and age were included in the first regression models for HD and HC data. The first models only differed by the inclusion of TFC as estimate of disease burden for the HD data. HD and HC participants were not included in a single regression model, as our aim was not to perform formal between-group inference on regression coefficients. Instead, models were fitted separately to explore within-group associations and to descriptively compare patterns of relationships across groups. This approach avoids imposing identical model structures across groups that may differ in variance structure, disease burden, and biological coupling between SANDI metrics. We have clarified these points on page 13/14.
 
 (b) We agree that multicollinearity is an important consideration when interpreting regression coefficients derived from microstructural models. To address this, we examined pairwise Spearman correlations between all imaging (SANDI, DTI, volume) metrics (averaged across ROIs), shown in the revised Supplementary Figure 2. As can be seen in the healthy control data, SANDI indices of apparent soma and neurite fractions showed a strong inverse correlation (rho = -0.92) and did not correlate with soma radius (rho = 0.1). All SANDI indices correlated only weakly with FA and volume and moderately with MD. This correlation pattern suggests that apparent soma density and radius capture distinct information about grey matter microstructure that differs from neurite fraction and is not captured by FA or volume. We note in HD participants a negative correlation between soma radius and fraction, and stronger correlations between SANDI metrics and volume measures. We would argue that these reflect disease-related reorganization of micro- and macro-structural relationships rather than uniform collinearity across groups. This information has been added to the Methods, Results and Discussion sections on pages 13, 19, and 21, 23ff.
 
 (c) We agree that regression coefficients derived from interdependent microstructural parameters should be interpreted with caution, as they may reflect shared variance or partial parameter degeneracy rather than fully independent biological effects. For this reason, we do not interpret individual beta coefficients in isolation. Instead, our conclusions focus on the consistency and directionality of associations across regions and metrics, and on the overall feasibility and sensitivity of SANDI to detect biologically meaningful variation in HD. The observed correlation structure (Supplementary Figure 2) provides important context for these interpretations and supports a multivariate, pattern-based rather than univariate reading of the results. These points have been added to the Discussion on pages 23 ff. Please also refer to our response to point (5) below.
 
 (4) Preprocessing order:
 
 Gibbs ringing correction was applied after TOPUP and EDDY, which deviates from the commonly recommended order in diffusion MRI preprocessing. Since Gibbs artifacts are introduced by kspace truncation and affect the spatial domain, it is typically advised to perform Gibbs correction prior to geometric corrections like TOPUP and EDDY. This avoids potential blurring or propagation of ringing artifacts during resampling. Could the authors clarify the rationale for this ordering, and whether an early application of Gibbs correction was tested?
 
 We agree that the application of Gibbs ringing correction prior to TOPUP and EDDY correction deviates from the commonly recommended order in diffusion MRI preprocessing. However, as some of the data included in this paper were preprocessed before this consensus was agreed in the literature, we kept the preprocessing order consistent for all datasets for harmonization and comparison purposes. We have since changed the order for subsequent preprocessing of the HDDRUM data and have found comparable FA maps for data processed with Gibbs ringing correction before and after TOPUP and EDDY correction.
 
 (5) Expand on SANDI model assumptions:
 
 SANDI is presented as being used for the very first time in this problem. However, a vague explanation is given: "using all the default settings". Given the novelty of applying SANDI in a clinical HD context, the manuscript would benefit from a discussion of the model's key assumptions and limitations. For instance:
 
 (a) The potential degeneracy between fis and rs in the absence of protocol features (e.g., long Δ or high b) that can disambiguate them.
 
 (b) Whether a dot compartment was included, and the implications of excluding it for the interpretation of rs or fis.
 
 (c) The lack of exchange modeling or fixed stick diffusivity, and how these may bias compartment estimates (particularly in diseased or aging tissue).
 
 (d) Any steps taken to verify robustness or identifiability (e.g., simulations, synthetic fitting). These issues are not flaws in the method, but they do affect how confident we can be in interpreting fis/rs as markers of neuron loss or glial hypertrophy, especially given the subtle group differences and the potential for biological heterogeneity in HD. Even a brief acknowledgment would strengthen the manuscript and provide useful context to readers less familiar with multicompartment modeling.
 
 We thank the reviewer for this constructive suggestion and fully agree that, because this is the first application of SANDI in our clinical HD cohort, the manuscript should more explicitly describe the model assumptions, potential identifiability limitations under our protocol, and the implications for biological interpretation.
 
 We have revised the Methods (pages 11-12) and Discussion (page 24) to (i) specify the exact SANDI implementation used (the SANDI MATLAB toolbox, available at: https://github.com/palombom/SANDI-Matlab-Toolbox-Latest-Release), (ii) describe which components are included in the default formulation and the key modelling assumptions, and (iii) add a dedicated “Limitations and interpretability” paragraph addressing points (a–d) below. We also avoid the previous shorthand “default settings” and provide a clear description of the fitting setup.
 
 “The SANDI model [Palombo M. et al, NeuroImage 2020] assumes three compartments, namely intra-neurite signal modelled as diffusion inside impermeable randomly oriented sticks, intra-soma signal modelled as restricted diffusion inside spheres, and extra-cellular signal modelled as Gaussian isotropic diffusion. The direction-averaged (or spherical mean) normalized diffusion signal has thus the following expression:
 
 S(b) = fisAsphere (b, rs, Dis) + finAstick (b, Din) + fecA ball (b, De)
 
 where fin + fis+ fec = 1; Astick and Asphere are the normalized, directionally-averaged (or spherical mean) signals for restricted diffusion within neurites and soma, respectively and Aball is the normalized, directionally-averaged (or spherical mean) signal of the extra-cellular space. The specific expressions are given in [Palombo M. et al. NeuroImage 2020]. The parameters estimated from the direction-averaged (or spherical mean) data are Din, proxy of the intra-neurite effective axial diffusivity; De, proxy of the extracellular effective mean diffusivity; rs</sub, a proxy of apparent soma radius as well as the signal fractions subject to the constraint fin + fis + fec = 1, proxy respectively of the relaxation-weighted neurite, soma and extracellular volume fractions. The bulk diffusivity inside the sphere Dis is fixed to 3 μm2/ms. The parameters were fitted using a Random Forest regression algorithm (TreeBagger Matlab®) with 200 trees, trained on simulated data, using the code publicly available at https://github.com/palombom/SANDI-Matlab-Toolbox-Latest-Release. The training data consisted of simulated signals for 105 parameter combinations, uniformly sampled: fin and fis ∈ [0, 1], Din ∈ [0.5, 3] μm2/ms, De ∈ [0.5, 3] μm2/ms and rs ∈ [1, 12.5] μm. Rician noise with a distribution of standard deviations randomly sampled from the voxels within the brain mask of the noise map obtained using MPPCA denoising was added to account for realistic SNR levels and rectified noise floor. The loss function of the training was the mean squared error between predicted parameters and ground truth values. Model fitting provided maps of fin, fis, fe, Din, De and rs.”
 
 (a) Potential degeneracy between fisand rs. We agree that partial coupling (or degeneracy) between the soma fraction fis and soma radius rs is possible when the acquisition does not provide strong sensitivity to restricted sphere size (e.g., in the low b-values regime). Our protocol benefits from high b-values (up to 6000 s/mm2) enabled by the Connectom gradient system, which increases sensitivity to signal attenuation from restricted compartments and reduce the fis-rs coupling/degeneracy. However, we acknowledge that the specific choice of fixed diffusion timing (in our case δ=7 ms, Δ=24 ms) can further modulate the fis-rs coupling/degeneracy in a protocol-dependent way. To reflect this appropriately, we now explicitly state that rs should be interpreted as an “apparent soma radius” under our protocol, and that our inferences focus on relative group differences and spatial patterns rather than absolute histological soma radii.
 
 We have now added a paragraph in the limitations section acknowledging this point.
 
 (b) Dot compartment. We did not include an explicit “dot” (immobile) compartment, because there is no evidence that in human in vivo this is required (see for example very low and negligible contribution provided in Tax C. et al. NeuroImage 2020: https://www.sciencedirect.com/science/article/pii/S1053811920300215). Accordingly, our fits did not include a dot term, and we now state this explicitly in the Methods. However, we would like to clarify that our fitting method (described in details at https://github.com/palombom/SANDI-Matlab-Toolbox-Latest-Release) includes accurately the impact of Rician noise and thus it account for the corresponding rectified noise-floor that very often, in high b-values applications, is mistakenly associated with a “dot” compartment. Therefore, there is no expected bias on the estimated fis and rs due to not including a “dot” compartment.
 
 (c) Exchange modelling and fixed stick diffusivity. We agree that SANDI, as implemented here, does not explicitly model inter-compartment exchange during the diffusion encoding and uses simplified representations of neurites (sticks), but the intra-stick diffusivity, Din, was not fixed but rather fitted. In diseased or aging tissue, deviations from these assumptions (e.g., altered membrane permeability) may bias compartment estimates. This has been investigated in dept in Schiavi S. et al. HBM 2023 (https://onlinelibrary.wiley.com/doi/full/10.1002/hbm.26416), so we refer the redear to that. We have added an explicit limitation statement noting that HD-related microstructural changes (e.g., changes to membrane permeability) could affect model parameter fidelity, and thus fisand rs should be treated as MRI-derived effective indices rather than direct quantitative measures of neuron loss or glial hypertrophy. Importantly, our analysis compares groups under an identical acquisition and fitting pipeline, so grouplevel contrasts remain informative even if absolute parameter values are biased.
 
 (d) Robustness / identifiability checks. We agree that reporting robustness strengthens confidence, particularly given subtle effects and biological heterogeneity. The SANDI Matlab Toolbox we used extensively investigates model parameters robustness and identifiability using numerical simulations and synthetic signals accounting for the specific experimental protocol and noise distribution. An example of the results supporting the robustness / identifiability is reported in the Author response images. These results show that accuracy and precision of all SANDI model parameters, except Din, is very high (>~80%, Author response image 1)
 
 Author response image 1.
 
 Analysis of the accuracy and precision of SANDI model parameters estimation. We simulated 104 synthetic diffusion signals using the SANDI model with random combinations of five parameters: fneurite(fin), fsoma(fis), Din, Rsoma(rs), and De. Parameters were sampled uniformly from: fneurite, fsoma ∈ [0,1]; Din, De 𝛜[0.5,3.0] µm2/𝑚𝑠; 𝑅soma 𝛜[1,12] µm. Rician noise with experimentally estimated variance was added, and the SANDI model was then fit to the noisy signals. For each parameter, we report the relative percentage error between estimated and ground-truth values as a function of the parameter value (normalized to [0,1]), together with goodness-of-fit (R2).
 
 and sensitivity to changes as small as 5% in each of the model parameters is correctly captured (Author response image 2A), with small to negligible degeneracy (except, once again, for Din), even in presence of exchange (Author response image 2B).
 
 Author response image 2.
 
 Sensitivity to 5% parameter modulations. The matrices show how a controlled perturbation in one parameter propagates into the estimated values of all model parameters. Each row corresponds to a 5% increase in the parameter on the y-axis; the resulting percentage change observed in each estimated parameter is reported along the x-axis. An ideal estimator would yield a purely diagonal matrix, with 5% on the diagonal and 0% elsewhere (no cross-talk). In (A), we used the same synthetic SANDI signals as in Figure 1. In (B), we additionally generated 104 synthetic signals incorporating neurite–extra-cellular exchange using the NEXI model [https://doi.org/10.1016/j.neuroimage.2022.119277] and an exchange time representative of human cortex (𝜏ex ≈ 30 ms) [https://doi.org/10.1162/imag_a_00104].
 
 We have therefore revised the manuscript language to be more precise and appropriately cautious, describing fis and rs as apparent compartment indices and explicitly discussing potential confounds (e.g., parameter coupling, and unmodelled exchange), while clarifying the value of SANDI for detecting reproducible group-level microstructural differences in HD.
 
 (6) Clarify "not-classified" group in figures:
 
 It is not clear to me what the "not-classified" groups shown in Figures 3-4 represent, what criteria determined their inclusion, and whether their inclusion affects the comparability or interpretability of staging-based analyses
 
 We have added to the legends of Figures 3 and 4 that not-classified refers to HD participants who could not be HD-ISS classified due to missing clinical data or their CAG repeat falling within the 36-40 range. As correlation analyses were conducted across the whole HD sample though, these datapoints were included in the scatterplot.
 
 (7) Figure labeling:
 
 There appears to be a mismatch between figure numbering and captions around Figures 3-4. Please ensure alignment.
 
 Mismatch between figure numbering and captions has been corrected.
 
 Minor suggestions:
 
 (1) Figures 1-2:
 
 (a) Label axis values meaningfully, e.g., negative vs. positive instead of 0 vs 1.
 
 (b) Add units to MD axes (e.g., ×10⁻⁴ mm²/s).
 
 (c) Figure 6 colors: Consider improving the color distinction between "Age" and "fis" predictors, which are currently hard to differentiate.
 
 The suggested adjustments have been made to Figures 1, 2, 5 and 6 and Figure 2 legend.
 
 (c) Discuss why apparent soma size decreases in some ROIs (e.g., pallidum), if unexpected.
 
 We offer the following speculation about the reduced soma size in the pallidum (pages 20/21): Changes in apparent soma size may reflect alterations in neural and glial cell proportions and/or morphology, including astrocyte and microglia swelling in response to neurodegeneration and soma shrinkage preceding neuronal cell death. Thus, increased apparent soma size in the striatum may indicate HD-related reorganisation of cell types driven by MSN loss and reactive glial cell swelling, whereas smaller soma size in the pallidum may result from infiltration of smaller glia cells prior to secondary neuronal loss following striatal MSN degeneration.
 
 Reviewer #3:
 
 (1) An important question is whether the SANDI measures, which require an expensive scanner and elaborate processing, are better biomarkers than the more traditional DTI measures. Can the authors compare the effect size of FA/MD with SANDI measures? In some of the plots and tables, FA/MD seem to have comparable, if not higher, correlations with QMotor or CAP scores. On the same vein, it is unclear whether DTI measures were included in hierarchical stepwise regression. I wonder if the stepwise models may have picked up FA/MD instead of SANDI measures if they are given a chance. Overall, I hope the authors can discuss their findings also in this light of cost vs. benefit of adopting SANDI in future studies, which is an important topic for clinical trials.
 
 Effect sizes (ES) of group differences in all microstructural indices can be found in Table 4. ES of DTI and SANDI indices in the caudate and putamen were broadly comparable with a trend for MD showing larger ES (FA: rrb = 0.38 -0.55, MD: rrb = 0.51 -0.61, fis: rrb = 0.32 -0.45, rs: rrb = 0.45 0.53).
 
 This information is now reported in the result section on pages 15/16 and is being discussed in light of cost versus benefit considerations on pages 21 and 25.
 
 (2) Similar to the above point, it is very important to consider how strong the biomarking signal is from SANDI measures compared to the good old striatal volume. Some plots seem to indicate that volumes still have the highest correlation with QMotor and the highest effect size in group comparisons. It would be helpful for the community to know where the new SANDI measures stand compared to the most typically used volumes in terms of effect size.
 
 Effect sizes (ES) of group differences in volumes can be found in Table 2. ES in caudate and putamen volumes ranged between rrb = 0.49 -0.55 and were comparable to the ES of apparent soma size rrb = 0.45 -0.53 but slightly larger than ES of soma density rrb = 0.32 -0.45.
 
 This information is now reported in the result section on page 15/16 and is being discussed on pages 21 and 25.
 
 (3) The diffusion measures are inevitably correlated to some degree. Please provide a correlation matrix in the supplementary material, including all DWI measures, to enable readers to better understand how similar SANDI measures are to each other or vs. other DTI measures. Perhaps adding volumes to this correlation matrix may also be a good future reference.
 
 We have added cross-correlation matrices between all imaging measures (SANDI, DTI, Volumes) for the total sample as well as for HC and HD participants separately to the Supplementary material (Figure 3), providing an overview of the shared variance within SANDI parameters and between SANDI and DTI and volume metrics for each group.
 
 (4) ISS stages:
 
 (a) The online ISS calculator requires cut-offs derived from the longitudinal Freesurfer pipeline, while the authors do not have longitudinal data. Thus, the ISS classification might be inaccurate to some degree if the authors used the FS cross-sectional pipeline. Please review this issue and see if updated cut-offs should be used to classify participants.
 
 We acknowledge that our HD-ISS classifications may have been biased due to the use of crosssectional rather than longitudinal FreeSurfer v6 volumes (page 23).
 
 (b) Were there really no participants with ISS 0 among the 56 HD individuals? Please clarify in the manuscript.
 
 We classified four individuals as ISS 0 based on their caudate and/or putamen z-scored volumes falling below 2SD of the healthy control mean. These analyses are described on pages 14-15 and were based on the cross-sectional data of this study.
 
 (5) A note on terminology that might be confusing to some readers. According to the creators of ISS, the ISS stages are created for research only; they are not used or applied in the clinic. On the other hand, the terms "premanifest" and "manifest" have a clinical meaning, typically based on the diagnostic confidence level. The assignment of ISS0-1 to premanifest and ISS2-3 to manifest may create some non-trivial confusion, if not opposition, in some segments of the HD community. The authors can keep their current terminology, but will need to at least clarify to the reader that this assignment is speculative, does not fully match the clinically-based categories, and should not be confused with similarly named groups in the previous literature.
 
 To avoid confusion about terminology, we have removed the labels “premanifest” versus “manifest” throughout the manuscript. We refer to HD-ISS 0-1 and HD-ISS 2-3 when referring to the exploratory comparisons between HD-ISS stages.
 
 (6) The population in the study seems to be obtained from different other studies or research projects, and there are missing scores for several participants due to the retrospective nature of sample gathering for the analyses. Please state clearly that this study was done with retrospective data to properly justify why there are missing data. Also, and this is important, please clarify for the reader whether there was any temporal bias in the acquisition of data of a certain group (HD) vs. another (HC). It is important to rule out that there were no scanner changes or upgrades that may confound the reported group differences.
 
 We can confirm there were no Connectom scanner changes or upgrades that may have confounded the reported group differences. This was added to the image acquisition section on page 10. We have added to the participant section on page 9 that data were retrospectively pooled from separate studies and explain this was the reason why HD-ISS classification was only available for a subset of participants.
 
 (7) Several of the significant results with SANDI scores seem to be driven by a subgroup of HD individuals that are more clearly different than the healthy control distribution. Not sure if this may help, but one idea the authors can consider is to check if HD individuals that deviate more than 2 SDs from the healthy control distribution of SANDI scores have also worse QMotor, worse atrophy, or higher CAP scores than those HD individuals that are practically within the 2SD boundary distribution of HDs. This is another way of showing that the new measures have potential for application in individualized medicine (the MRI Z score of a patient as a proxy of the clinical deterioration). It is not a request to authors but just a suggestion for their consideration.
 
 The data points in the scatterplots of Figures 3, 4, and 7 have now been color-coded according to HD-ISS stage, showing a stage-related worsening of microstructural and volumetric imaging markers and Q-Motor performance.
 
 (8) The variance explained in hierarchical regression is obtained by fitting models within the sample, and can be subject to overfitting. In the absence of a more robust cross-validated R2, the authors may want to at least briefly inform the reader that the current approach can be subject to overfitting and does not represent a true out-of-sample R2.
 
 We have added this point to the study limitations in the Discussion section on page 23.
 
 (9) There are two Figure 3 labels, and all figures thereafter do not match the manuscript.
 
 The Figure numbering has been corrected.
 
 (10) In (the currently labelled) Figure 8, there seem to be fewer than 56 data points in the scatterplots. Is there a reason why not all 56 HD individuals do not have the CAP100 score available? CAP needs only CAG and age, which all HD gene carriers should have, to be included in the study.
 
 Inclusion criteria for individuals with HD for the HD-DRUM project were a positive genetic test for the presence of the mutant huntingtin allele (CAG length ≥ 36 repeats) and/or a clinical diagnosis of HD. Thus, for a small number of participants CAG was not available for the calculation of CAP100 score.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

medrxiv.org/content/10.1101/2025.03.17.25324107v3
www.biorxiv.org www.biorxiv.org

Mapping Visual Contrast Sensitivity and Vision Loss Across the Visual Field with Model-Based fMRI

5
1. Public_Reviews 02 Jun 2026
 
 in eLife
 
 eLife Assessment
 
 Using fMRI-based pRF mapping, this important study presents a novel method for estimating visual field (VF) loss and potential restoration by analyzing contrast-sensitivity patterns in early visual cortex. The evidence supporting the main claims is convincing. This work will be of broad interest to researchers in vision and clinical vision, neuroscience, and brain imaging.
 
 Summary
2. Public_Reviews 02 Jun 2026
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Integrating large-field stimulation with a retinotopic atlas, this study introduces an fMRI-based method for measuring contrast sensitivity across the visual field. Retinotopy was assessed using pRF mapping and a calibrated Benson atlas. The authors validate their method by replicating known patterns of contrast sensitivity across eccentricities and visual field quadrants in healthy subjects, and demonstrate its potential clinical utility through case studies of both simulated and real visual field loss.
 
 Comments on revisions:
 
 I appreciate the addition of the quadrant-scotoma condition and the authors' clarification that the goal is to demonstrate individual-level detection sensitivity. The 95% CI argument is reasonable, and I am satisfied with framing the simulated-scotoma work as proof-of-concept.
 
 Review 1
3. Public_Reviews 02 Jun 2026
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary
 
 This study uses functional MRI to evaluate visual contrast sensitivity across the visual field at the level of the visual cortex, testing the method as a proof of principle in a small group of normally sighted individuals, modelling both normal vision and simulated vision loss, as well as a patient with independently verified vision loss. The results suggest a promising technique to measure vision objectively across the visual field and overcomes the requirement for careful fixation which is often challenging in those with low vision or sight loss.
 
 Strengths
 
 • Objective measure of central vision: The proposed method may provide a more comprehensive and objective assessment of residual visual function in individuals with sight loss. This may be particularly useful for those with central visual field loss without the requirement of stable fixation or subjective motor responses.
 
 • More sensitive measure: The use of slope to calculate contrast sensitivity across a range of contrasts within the brain is clever and likely more sensitive than single threshold measurements or standard clinical measures of visual acuity using letter charts. Standard supra-threshold (high contrast) tests are not ideal for capturing residual vision or partial vision loss.
 
 • Good agreement with standard atlas: The Benson atlas provides a good estimate of visual field maps within V1 based on anatomical landmarks, and the authors take steps to refine this informed by cortical magnification and V1 surface area (brain size) for each individual participant. This could allow the technique to be generalised without the need to collect lengthy individual mapping data from every participant.
 
 • Within-subject reproducibility: The measurements appear to be sensitive and reproducible, particularly in those with normal vision, and are consistent with known features of visual sensitivity differences in different parts of the visual field.
 
 • Potential tool to measure visual field sensitivity in controls: Even if the proposed methods are not ideal for widespread clinical translation, they do offer an exciting tool to test hypotheses about visual field differences in healthy controls. For example, there seems to be an increase in sensitivity on either side of the simulated ring scotoma (Fig 6 - perhaps due to the release of lateral inhibition?). Reliability measures suggest that individual differences are consistent in healthy controls (although not tested statistically, perhaps due to the small sample size?). Whether they reflect behaviourally meaningful differences in visual field sensitivity could be tested in individuals by comparing them to behavioural measures across the visual field.
 
 • Potential tool to test novel treatments: The proposed techniques could be used to test within-subject changes in visual function in environments that are equipped to measure and analyse fMRI data, including clinical trials aimed at determining the success of novel treatments. Preliminary testing in healthy controls with eye movements also suggests that the method is suitable for testing low vision patients with unstable fixation (e.g., nystagmus), and the authors have modelled the effects of varying amounts and types of eye movements on functional outcome measures.
 
 Weaknesses
 
 • Questionable sensitivity to differences in patients. The variability in heat maps across healthy control participants is somewhat surprising, and it is uncertain whether they represent actual visual sensitivity differences or an artifact of the measurement technique, e.g., due to signal-to-noise differences introduced by local variations in brain anatomy. Thus, it is uncertain whether the substantial variance across controls will allow for a sufficiently stable baseline to detect meaningful differences in individual patients. Also, as the authors rightly point out, Benson atlas does not model differences along meridians, so that upper/lower field differences might not be detectable. However, the authors acknowledge that this is a pilot study, and further testing a wider range of scotoma types in patients and simulated in controls will only improve the methods. Furthermore, the ability to capture visual field representations in human visual cortex is also likely to improve with computational advances, making the use of atlases more feasible, obviating the need for individualised population receptive field mapping.
 
 • Potential for clinical translation. Although it is a sensitive measure, functional MRI is costly, is not available in all clinical settings, requires significant post-processing analyses, and may be contraindicated in some individuals due to safety (e.g., metallic implants) or other concerns (e.g., claustrophobia). These could present significant barriers to widespread clinical translation, if this were the ultimate goal of the study.
 
 • Limited range of spatial frequencies. The spatial frequencies tested were still quite low (0.3 and 3cpd) compared to measures such a visual acuity. Extending the measurements to higher spatial frequencies could allow better characterization of central vision, although necessarily for peripheral vision. However, this may depend on the typical visual abilities of the patient population of interest.
 
 Appraisal and Impact:
 
 The authors used appropriate and robust methods to assess and model known features of visual sensitivity differences across the visual field in sighted controls. In addition, the assessment technique successfully captured sensitivity changes due to simulated and actual partial field loss but was also fairly resilient to eye movements and fixation instability, typical of patients with sight loss. Although currently providing a proof of principle, the method is likely to improve with further testing and increasing normative sample sizes, and as computational methods continue to advance visual field map predictions. Although it may not be adopted widely as a standard clinical assessment technique due to the expense and other obstacles, it would provide a valuable tool in assessing clinical populations, for example in the context of clinical trials to assess suitability for treatment interventions or monitor treatment outcomes.
 
 Review 2
4. Public_Reviews 02 Jun 2026
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 Chow-Wing-Bom et al. introduce an innovative wide-field visual stimulation setup for 3T experiments that enables stimulation up to a diameter of 40{degree sign} visual angle while allowing continuous gaze tracking. Using this setup, the authors systematically investigate contrast sensitivity across the visual field by presenting subjects with sinusoidal gratings varying in contrast and spatial frequency. Their findings confirm the expected organization of contrast sensitivity, demonstrating a preference for high spatial frequencies in the central field and lower frequencies in the periphery. They also extend these measurements to eccentricities up to 20{degree sign}, which exceeds previous fMRI-based reports. Moreover, the study explores the potential of using contrast sensitivity calculations as a method for detecting visual field defects, demonstrated in a healthy subject with simulated ring-shaped and upper-right-quadrant scotomas, and in a patient with LHON. The revised version additionally characterises the robustness of the approach to varying degrees of fixation instability.
 
 Strengths:
 
 - The manuscript is well written and provides comprehensive methodological details, ensuring high transparency and reproducibility.
 
 - The visual stimulation setup represents a significant technical advance by enabling wide-field stimulation with continuous eye tracking, which is crucial for both research and potential clinical applications.
 
 - The study confirms established findings regarding the organization of contrast sensitivity while extending them to a larger eccentricity range.
 
 - The efforts to establish a measure for visual field losses aligns with current efforts to develop objective alternatives to conventional perimetry.
 
 - The revised manuscript includes an empirical assessment of how varying levels of eye movement affect cortical contrast sensitivity estimates, providing useful guidance on the tolerance of the approach to fixation instability.
 
 Weaknesses:
 
 - The original version left certain methodological aspects unclear, particularly the correction of eccentricity values from the Benson atlas and the V1 masks used in each analysis branch. The authors have added a dedicated figure illustrating the eccentricity correction procedure and now explicitly state that a manually delineated V1 mask was used for the pRF-based analyses while the Benson V1 label was used for the atlas-based analyses, together with a discussion of how this difference may influence the comparison.
 
 - Minor inconsistencies in reporting, such as the introduction of a second session in the Results section, have been corrected.
 
 The conclusion that high-contrast patterns as in pRF mapping are not optimal to test for subtle but potentially clinically relevant changes in the visual field coverage are very valid. The suggested use of contrast sensitivity can therefore be a potentially well-suited parameter for estimating visual field losses. The presented work is an interesting starting point, and the proposed method of using contrast sensitivity as measure for partial vision loss should be further explored.
 
 Comments on revisions:
 
 The authors have thoroughly addressed all points raised in my original review, and I have no further concerns.
 
 Review 3
5. Public_Reviews 02 Jun 2026
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews
 
 Reviewer #1 (Public review):
 
 The current claims should be better supported by more evidence.
 
 R1-1: In the first experiment, have the statistics undergone multiple comparison corrections (e.g., Line 441-442)? Given the small sample size, incorporating additional statistical tests (such as the Bayes Factor) could strengthen the analysis.
 
 We confirm that corrections for multiple comparisons are now applied where appropriate, particularly in the group-level ANOVA analyses.
 
 “Post-hoc tests using Holm-Bonferroni correction show that V1 neuronal populations receiving inputs from the central visual field (0.5-4.5°) showed greater contrast sensitivity to high spatial frequency as compared to low spatial frequency stimuli (steeper slope for the 3cpd versus 0.3cpd condition: 0.5-2.5º: t(6) = 4.35, pbonf = 0.0149; 2.5-4.5º: t(6) = 3.471, pbonf = 0.0266). Conversely, peripheral eccentricities in V1 (above 9.5°) showed higher contrast sensitivity to low as compared to high spatial frequency stimuli (steeper slope for 0.3cpd versus 3cpd condition: 9.5-15º: 𝑡(6) = −4.591, pbonf = 0.0149; 15-20º: t(6) = −6.615, pbonf = 0.0029). Between 4.5° and 9.5°, V1 contrast sensitivity was similar for both spatial frequencies (t(6) = −0.226, pbonf = 0.8286). Crucially, these effects remained when using retinotopic estimates based on structural scans derived from the Benson retinotopic atlas instead of the pRF-mapping measures (0.5-2.5º: 𝑡(6) = 5.768, pbonf = 0.0059 ; 2.5-4.5º: t(6) = 2.531, pbonf = 0.0892 ; 4.5-9.5º: 𝑡(6) = −0.293, pbonf = 0.7792; 9.5-15º: t(6) = −3.274, pbonf = 0.0509; 15-20º: t(6) = −3.528, pbonf = 0.0496; see Figure A2 and Table A3 in Appendix section).”
 
 “Post-hoc pairwise comparisons using Holm-Bonferroni corrections revealed that, as predicted, the cortical contrast response function had a higher slope – indicating better V1 sensitivity – along the horizontal versus vertical quadrants (Horizontal-Vertical Anisotropy – HVA: 𝑡(6) = 5.908, pbonf = 0.0031) and along the lower versus upper quadrant (Vertical Meridian Anisotropy – VMA: 𝑡(6) = 4.106, pbonf = 0.0126). Conversely, no difference in cortical contrast sensitivity was found between V1 neuronal populations encoding the left and right quadrants of the visual field (Left-Right Horizontal Meridian Anisotropy – LRHMA: t(6) = 0.7197, pbonf = 0.4988).”
 
 “We found that the horizontal-vertical anisotropy effect was recovered (HVA: t(6) = 3.584, pbonf = 0.0347), but that the vertical meridian anisotropy effect was not (VMA: t(6) = 0.744, pbonf = 0.9697) with this approach.”
 
 R1-2a: The authors claim that "structure-based atlases can replace the need for pRF mapping in cases where it might otherwise be difficult or impossible to collect pRF data." This claim needs further scrutiny. Currently, only one simulated condition of visual field loss was examined in one subject.
 
 AR-R1-2a: We agree that further work is needed to fully establish the utility of structure-based atlases. As a first step, we have followed the reviewer’s suggestion and collected an additional dataset from one of the seven participants, in whom we simulated another condition of visual field loss – specifically, loss of the upper right quadrant. This participant is the same individual already presented in the manuscript (C5), but with a different simulated vision loss condition.
 
 This new condition has been introduced in the Methods, Results and Discussion section, and a new Figure 10 alongside Figure 9 which showed the 3º-8º scotoma. With relevant changes as follows:
 
 “We also demonstrate the clinical relevance of this approach by recovering simulated scotomas (i.e., a ring of visual field loss around fixation and the loss of an entire visual field quadrant), as well as visual field loss in a patient with a neurodegenerative disorder causing large areas of visual field loss.”
 
 “Additionally, one participant (C5) repeated the task under two simulated vision loss conditions (ring or quadrant loss), and two others (C5, C6) completed it with different levels of eye movement.”
 
 “Simulated vision loss
 
 One healthy control participant (C5) also performed a version of the task designed to simulate two forms of visual input loss (i.e., artificial scotoma). These simulations were implemented by: (a) masking a region of the visual field with a grey, annular ring, covering 3º-8º eccentricity, and (b) masking the upper right visual quadrant using a grey quarter-sector overlay. The stimuli and contrast levels used in this task were identical to those described in the original task.”
 
 “A test-case of simulated loss of visual inputs
 
 In the previous sections, we showed that the slope of a square root function provides a reliable measure of contrast sensitivity in the brain of healthy controls. But can this brain-level model also quantify loss of visual inputs? To test this, we first simulated an artificial scotoma in one normal sighted participant, by (a) masking a region of the visual field with a grey, annular ring, covering 3°-8° eccentricity (Figure 9A), and (b) masking the upper-right visual quadrant using a grey quarter-sector overlay (Figure 10A). We expect smaller slope values in V1 neuronal populations that would under normal circumstances encode that part of the visual space.
 
 As expected, we observed reduced responses in V1 locations corresponding to the artificial scotoma (Figures 9 and 10), with increased responses along the edges of the mask for the ring scotoma condition (Figure 9B). This artificial loss of visual input was also clearly present in the cortical contrast sensitivity estimate, with significantly reduced slope steepness in V1 between 3-8° for the ring scotoma condition (Figure 9C&D) and in the upper-right quadrant for the quarter-sector scotoma condition (Figure 10B&C). Additionally, we could recover this scotoma using the calibrated Benson template, although less accurately (Figures 9E and 10D). These results show that this measure of V1 contrast sensitivity is sensitive enough to detect loss of visual inputs in the brain at an individual level, when a complete local loss of sight is simulated, and that this approach does not crucially rely on pRF mapping data from the individual. This supports the utility of our approach in recovering patterns of vision loss and recovery at a cortical level.”
 
 “Mapping Simulated and Pathology-Driven Vision Loss
 
 Our method successfully identified both simulated retinal loss in a healthy volunteer and real visual field loss in a patient with Leber Hereditary Optic Neuropathy (LHON). The signal drop observed in response to masking portions of the visual field in the healthy control was both large and significant at the individual level, as demonstrated by non-overlapping 95% confidence intervals (Figures 9B-C and 10B). This provides proof-of-concept evidence that our approach can detect signal changes in individual patients, which is a critical requirement for clinical translation.
 
 Unlike previous fMRI studies that used high-contrast stimuli (Farahbakhsh et al., 2022; Pawloff et al., 2023; Ritter et al., 2019), which may not accurately represent partial vision loss due to potential saturation effects and the stimulation of less sensitive retinal cells, our use of multiple contrast levels offers a more nuanced assessment of cortical contrast sensitivity.
 
 Combined with the large-field set-up allowing stimulation up to 20° eccentricity, this approach may be particularly well-suited for evaluating treatment efficacy in cases of widespread and variable vision loss.
 
 Future work will focus on further validating reconstruction accuracy under controlled conditions, including simulated scotomas of varying severity and location, expanding testing to larger patient cohorts, and establishing a normative dataset to contextualize patient data.
 
 R1-2b: Also, in Figure 7, contrast sensitivity in the periphery differs between pRF mapping and the Benson atlas. How do the authors explain this discrepancy?
 
 AR-R1-2b: The discrepancy in periphery between pRF mapping and Benson atlas is caused by various factors. These include (a) individual differences in the retinotopy/structure relationship that are not captured in the template, (b) the fact that the Benson atlas at larger eccentricities was obtained with hemifield stimulation, and (c) a larger impact of any inaccuracies at larger eccentricities because of cortical magnification. As a result, peripheral vertices are more likely to be mis-assigned by the template than central ones. Note that this adds distortion in cortical visual field maps which will be consistent across timepoints (rather than noise). Critically, a reduction in accuracy does not preclude utility if meaningful differences in spatial patterns in cortical sensitivity can still be recovered, as is the case in our data. We cover this in the discussion.
 
 “Particularly at large eccentricities however, we initially observed inaccuracies between the template and individual retinotopy eccentricity estimates which led to substantial distortions in cortical visual field maps due to cortical magnification (see Figure A4 in Appendix section). To address this, we adjusted the Benson eccentricity estimates to align with the cortical magnification scaling function (Horton & Hoyt, 1991).”
 
 “Beyond ROI considerations, we still observed differences in cortical sensitivity between pRF mapping and the adjusted Benson atlas - particularly in the periphery. Several factors likely contribute to this. First, individual differences in the relationship between cortical structure and retinotopy are not fully captured by the template. Second, the Benson atlas has never been fit with empirical data more eccentric than approximately 20°, which naturally limits its precision in the far periphery. Third, because of cortical magnification, any small inaccuracy at larger eccentricities has a disproportionately large effect, making peripheral vertices more susceptible to mis-assignment than central ones. These influences introduce systematic distortions in cortical visual field maps rather than random noise and thus remain consistent across time points - an important point when assessing longitudinal changes (e.g., ageing or gene-therapy interventions). Importantly, the spatial gradients in cortical contrast sensitivity were preserved across both the pRF and Benson atlas approaches, indicating that minor ROI differences do not affect our conclusions. Together, these findings show that the Benson Atlas remains a useful alternative when pRF mapping is not feasible.
 
 R1-3: Overall, the writing could be significantly improved.
 
 AR-R1-3: We have made edits throughout the manuscript and hope this has improved the writing.
 
 Reviewer #1 (Recommendations for the authors):
 
 R1-Recommendation 1a: The writing can be significantly improved for clarity.
 
 The introduction section is not well-organized, and the motivation for developing the current method (Paragraphs 2-3) is vague and lacks adequate documentation.
 
 Several references are missing (e.g., Lines 90-92) or incorrectly placed (e.g., Lines 108-109).
 
 AR-R1-Recommendation 1a: We have revised the Introduction to clarify the motivation for developing the current method and to correct missing or misplaced references.
 
 “Still, testing visual function across the visual field remains limited in clinical and therapeutic contexts, especially in patients with drastic central vision loss. In this study, we aimed to address this gap by introducing a novel fMRI-based approach to measure visual field sensitivity across a wide expanse of the visual field (40º diameter).”
 
 “Beyond visual acuity, functional impairment across the wider visual field can be measured using a range of visual field tests, from the finger counting visual confrontation field test to more complicated and/or computerized tests (e.g., standard automatic perimetry, kinetic perimetry, microperimetry; Rai et al., 2024). Computerized tests typically involve measuring sensitivity to the luminance contrast of a target relative to a background at different visual field locations while the participant’s gaze is fixed on a central point. In some cases (e.g., microperimetry), sensitivity measurements are paired with fundus imaging, offering greater precision in linking visual field functions to specific retinal locations (Rai et al., 2024). As a result, visual field assessments can reveal functionally relevant deficits – including localized sensitivity loss and scotomas – that are not captured by foveal acuity alone, and are therefore potentially valuable for tracking disease progression and therapeutic efficacy.
 
 Despite their clinical relevance, visual field testing comes with challenges and limitations, and as a result, the inclusion of visual field measures in sight-rescuing therapy trials is limited. Firstly, it requires prolonged fixation and sustained visual attention. This can be very challenging for patients with severe vision loss, who often struggle to fixate, and strain to detect even high intensity stimuli. This can lead to long and unpleasant testing sessions with unreliable results. Secondly, as perception of light stimuli is inherently subjective (Rai et al., 2024) and effortful, patients may vary in their criteria for visual recognition, and in their ability to report visual signals that are weakened or distorted by disease. Together, these constraints reduce the feasibility, robustness, and interpretability of conventional visual field testing in clinical trials, underscoring the need for alternative or complementary approaches that can assess functional vision while placing fewer demands on subjective reporting.”
 
 “Functional MRI (fMRI) has recently been proposed as a promising alternative to measure visual field loss, as it requires no overt task, and instead measures visual sensitivity directly from brain responses (Farahbakhsh et al., 2022; Prabhakaran et al., 2021; Ritter et al., 2019). Population receptive field (pRF) mapping fMRI can measure which parts of the cortex respond to which parts of the visual scene (Dumoulin & Wandell, 2008).”
 
 “Finally, most studies use a single maximum contrast stimulus to assess visual function (Broderick et al., 2022; Farahbakhsh et al., 2022; Liu et al., 2006; O’Connell et al., 2016; Ritter et al., 2019).”
 
 R1-Recommendation 1b: The strengths of the current method and its applicable scenarios are unclear. For example, in Lines 39-40: "We developed an fMRIbased approach to measure contrast sensitivity across the visual field without the need for precise fixation." To what extent can fixation be imprecise? Could this protocol be applied to patients with strabismus, who have biased fixation?
 
 AR-R1-Recommendation 1b: We agree with the reviewer that the tolerance to fixation challenges is key here and so we collected additional data to respond to your points regarding the effects of eye movement on the cortical contrast sensitivity maps.
 
 In terms of biased fixation, the approach should be very robust to this, as this would just reduce the cortical visual field covered on one side and extend it on the other.
 
 We collected new data to test the tolerance to fixation instability across a wide range of eye movement, including severe nystagmus-level movement. Despite large eye movements, the cortical contrast-sensitivity pattern remained largely consistent, though extreme movements reduced slope estimates and flattened the cortical sensitivity pattern for 3cpd, indicating reduced measurement sensitivity for extreme eye movement to high spatial frequency gratings.
 
 These additions have been incorporated into the Abstract, Methods, Results, and Discussion sections as follows:
 
 Abstract
 
 “To assess the method’s tolerance to fixation variability, we further investigated how different levels of eye movement affect cortical sensitivity patterns in two participants. We found that cortical sensitivity patterns were largely preserved across eye movement, particularly at low spatial frequencies. This suggests that our approach can accommodate several degrees of fixation instability, making it suitable for populations with unstable or biased fixation for whom visual field maps are harder to acquire behaviorally (e.g., patients with dense central scotoma or strabismus).”
 
 Methods
 
 “Additionally, one participant (C5) repeated the task under two simulated vision loss conditions (ring or quadrant loss), and two others (C5, C6) completed it with different levels of eye movement.”
 
 Results
 
 “Effect of eye movement
 
 Participants C5 and C6 also performed a version of the task designed to test the effect of eye movements. In this version, saccades were elicited by randomly and rapidly shifting the fixation dot away from central fixation (C5: 2º and 5º from fixation and random motion; C6: up to 2º from fixation). Participant C5 was tested using 0.3 and 3cpd gratings at four contrast levels (7.5, 42.2, 60, 100%), while participant C6 was tested only under the low spatial frequency condition (0.3cpd).
 
 Fixation stability was assessed for each fMRI run using the bivariate contour ellipse area (BCEA), which estimates the area (in degrees2 or arcmin2) of an ellipse that contains approximately 95% of fixation points. BCEA was calculated using the formula: , as described by Morales et al. (2016). In this expression, σh and σv represent the standard deviations of eye position in the horizontal and vertical directions, respectively, while p corresponds to the Pearson correlation coefficient between horizontal and vertical eye positions. The constant k determines the size of the ellipse based on the desired probability area, defined by the relationship P =1 – e-k, with P set to 0.95 in this study. A smaller BCEA indicates greater fixation stability.
 
 “Effect of eye movements on V1 cortical sensitivity
 
 So far, we have demonstrated that our measure of cortical sensitivity can reliably recover known gradients in sensitivity across eccentricities and visual quadrants. We also showed that this measure was consistent across visits and sessions, suggesting its potential utility for monitoring changes over time. However, all prior tasks were conducted under conditions of central fixation, with participants instructed to maintain gaze on a central dot. A key motivation for this approach was its theoretical robustness to fixation instability. We therefore also aimed to investigate how varying degrees of eye movement might influence cortical sensitivity across the visual field.
 
 To address this, two participants (C5 and C6) completed a modified version of the contrast sensitivity task in which they made eye movements either by following a dot moving randomly at a radius of 2º or 5º around fixation, or by self-initiated very large eye movements. Eye movements across these or by self-initiated very large eye movements. Eye movements across these conditions (Figure 7, bottom row; Figure 8, bottom row), were quantified using BCEA (C5 – Central fixation: mean±SD = 0.57±0.11 deg2, 2º eye motion: 2.69±0.48 deg2, 5º eye motion: 20.3±1.32 deg2, random eye motion: 133.7±23.36 deg2; C6 – Central fixation: 0.96±0.56 deg2, 2º eye motion: 1.28±0.15 deg2). For reference, in severe (idiopathic) nystagmus, the eye movement variability along the vertical and horizontal planes is on average 1.08 deg and 1.60 deg, respectively (Tailor et al., 2021). Assuming a moderate correlation between axes (p = 0.3), the average fixation stability would equate to a BCEA of ~21.46 deg2 (i.e., ~5º eye motion condition in our data).
 
 Despite these very large levels of eye movements, we observed that the overall cortical contrast sensitivity spatial pattern across eccentricity remained remarkably consistent (Figure 7, top and middle rows; Figure 8, top row). However, at the most extreme movements, contrast sensitivity estimates (slope values) were lower; and while the overall cortical visual field map structure was still clearly present for low spatial frequencies, it appeared more flattened for 3cpd, suggesting reduced sensitivity of our measure for large eye movement and high spatial frequency stimuli.”
 
 Discussion
 
 “Crucially, one advantage of cortical visual field mapping is that the maps are inherently centered on the foveal confluence, providing a stable reference point for comparing responses across eccentricities. When combined with large-field, spatially homogeneous stimuli, this anchoring means that our approach should remain robust to moderate fixation variability and still quantify sensitivity changes across the visual field – provided that fixation instability does not exceed the stimulus extent (40º diameter).
 
 When measuring the impact of eye movements, we found that spatial sensitivity patterns were largely preserved, even for extreme eye movements (emulating severe nystagmus). However, under the most extreme conditions, sensitivity estimates (i.e., slope values) were reduced, especially for high spatial frequency (SF) stimuli. This likely reflects image blurring from large rapid eye movements, which degrades high-SF inputs and shifts activation toward neurons tuned to lower SFs. This aligns with evidence that nystagmus and large saccades impair perception of fine detail and grating stimuli due to retinal image slip (Abadi & Bjerre, 2002; Dickinson & Abadi, 1985; Hertle et al., 2017; Randall et al., 2020). While classic findings report suppression of low-SF signals during saccades (Burr et al., 1994; Ross et al., 2001), our results suggest that high SF sensitivity may be more vulnerable to large eye movements when participants are presented with 2Hz phase-flickering gratings. Further validation in clinical groups with naturally-occurring fixation instability would further strengthen these conclusions.”
 
 R1-Recommendation 1c: There are also some confusing descriptions, such as Lines 130-132.
 
 AR-R1-Recommendation 1c: We have also clarified ambiguous descriptions of the Benson atlas templates.
 
 “We therefore also evaluated the approach using the structure-based atlas of retinotopic values developed by Benson et al. (Benson et al., 2014; Benson & Winawer, 2018). This atlas predicts retinotopic organization by aligning individual cortical anatomy (e.g., surface curvature) to a group-average template that incorporates an algebraic model of retinotopy (Benson et al., 2014). Once the subject’s brain is aligned to this structural atlas, retinotopic maps defined by the model – i.e., polar angle and eccentricity maps – are projected onto the individual’s cortex. This allows estimation of visual field maps without requiring functional imaging, and provides a non-invasive, anatomy-driven approximation of visual field representations.”
 
 R1-Recommendation 1d: Line 361: "Assessing the brain's ability to discriminate shapes"-is the author referring to the functional relevance of contrast tuning assessment here? Since the task or stimuli are not related to shapes, this description is unclear.
 
 AR-R1-Recommendation 1d: We have revised the reference to “discriminating shapes” to more accurately reflect the functional relevance of contrast sensitivity mapping.
 
 “To measure visual field function, we developed a new measure of cortical contrast sensitivity, assessing the brain’s ability to discriminate gratings of varying spatial frequencies based on luminance variations.”
 
 R1-Recommendation 2a: Simulated visual loss experiment: only one condition of visual field loss was examined in a single subject. I encourage the authors to include additional subjects to meet statistical test criteria at group level. Simulated scotomas in more visual quadrants, including both central and peripheral areas, should be examined, as asymmetries may exist.
 
 AR-R1-Recommendation 2a: We agree that it is important to verify that the approach can also capture other types of scotomas. We have therefore now incorporated another simulated condition of visual field loss, namely loss of the upper right quadrant.
 
 Regarding adding more participants: The drop in signal is clearly large and significant at the individual level (error bars corresponding to 95% confidence interval do not overlap; Figures 9B-C & 10B). The ability to detect signal change at the individual level is what we need for clinical application, and here we are showing proof-of-concept of its feasibility with our approach. However, we do appreciate that it might be valuable to test cortical visual field loss reconstruction accuracy with simulated scotomas of varying levels of vision loss in variable locations. We now highlight this as a future direction.
 
 Please refer to our response to R1-2a, where we also detail the corresponding changes made in the manuscript.
 
 R1-Recommendation 2b: Additionally, why do the results from pRF mapping and the corrected Benson atlas differ, particularly in the far periphery?
 
 AR-R1-Recommendation 2b: Please refer to our response to R1-2b, where we also detail the corresponding changes made in the manuscript.
 
 R1-Recommendation 3: To validate the recovery of visual field loss in the case study, it would be necessary to include fundus imaging to characterize the structural loss and correlate it with the behavioral and fMRI results.
 
 AR-R1-Recommendation 3: We included Compass perimetry data for the LHON patient, which is fundus-tracked perimetry and uses fundus imaging to keep the visual stimulation fixed to retinal locations.
 
 In the context of LHON, the fundus image is not expected to provide more information than perimetry. This is because the visual deficit in LHON arises from optic nerve dysfunction, and retinal abnormalities are typically minimal. Aside from the characteristic pallor of the optic disc, the fundus appearance is usually normal in appearance.
 
 For illustration, Author response image 1 shows the Compass-acquired fundus image from the LHON patient included in this study. For comparison, we also show a normal fundus image from a 25-year-old male volunteer, reproduced from Häggström, Mikael (2014). "Medical gallery of Mikael Häggström 2014". WikiJournal of Medicine 1 (2). DOI:10.15347/wjm/2014.008. ISSN 2002-4436. Public Domain.
 
 Author response image 1.
 
 We do, however, recognize the importance of linking functional changes to structural alterations (e.g., retinal thickness measured with OCT), and we now highlight this as a key future direction in the discussion. This will be a central focus of a planned follow-up study involving a larger patient cohort.
 
 “Next steps in this work will therefore involve testing larger patient cohorts with diverse forms of vision loss, validating the approach for tracking pathology over time, and investigating how cortex-based visual field measures relate to and complement other visual field and retinal integrity indices including Compass measures and OCT-derived retinal layer thickness.”
 
 “Additionally, linking brain-based variations in function across the visual field to behavioral performance (e.g., perimetry, microperimetry) and retinal structure (fundus imaging, retinal thickness from Optical Coherence Tomography), could help bridge the gap between neural measures and functional outcomes. Such integration would provide deeper insights into developmental, learning, and vision loss mechanisms.”
 
 R1-Recommendation 4a: Why is a 0.5 mm smoothing applied to the contrast task data?
 
 AR-R1-Recommendation 4a: We have now clarified in the Methods section. This 0.5 mm FWHM smoothing kernel was applied to the contrast sensitivity task data to meet the minimum requirements of the GLM module in SPM.
 
 “To accurately capture neural activity across various eccentricities and polar angle locations, minimal smoothing (0.5mm FWHM Gaussian blur) was applied to the contrast sensitivity task data using FSL’s 3dmerge program. This was done to meet the minimum requirements of the GLM module in SPM.”
 
 R1-Recommendation 4b: Is this the first time the cortical magnification calibration has been applied to the Benson atlas? I recommend including a figure to describe this method.
 
 AR-R1-Recommendationn 4b: This is indeed the first time this correction has been applied to the Benson atlas. We have now added a figure (Figure 3) to illustrate the eccentricity adjustment procedure applied to the Benson atlas.
 
 R1-Recommendation 5: In Figure 5, the test-retest reliability can be reported by including r-values.
 
 AR-R1-Recommendation 5: We have now included Spearman correlation 𝜌-coefficients for test-retest and between-condition comparisons in Figure 6 (previously Figure 5).
 
 R1-Recommendation 6: Inconsistency in the reporting format of statistical values: e.g., the degrees of freedom are presented with, or without parentheses.
 
 AR-R1-Recommendation 6: Thank you for pointing this out. We have reviewed and standardized the reporting format of all statistical values throughout the manuscript to ensure consistency. Degrees of freedom are now all presented with parentheses, in details:
 
 “Using ANOVA, we found the expected interaction between spatial frequency and eccentricity (F(1.96,11.79) = 28.66, p < 0.001; Figure 4) as well as a main effect of eccentricity (F(2.33,13.99) = 12.67, p < 0.001).”
 
 “We found a main effect of visual field quadrant location on V1 sensitivity (F(2.46,14.76) = 20.71, p < 0.001).”
 
 “Moreover, there was no interaction between spatial frequency and (F(2.16,12.99) = 1.34, p = 0.298), visual field quadrant positions suggesting V1 visual field anisotropies are relatively constant across spatial frequencies.”
 
 Reviewer #2 (Public reviews):
 
 R2-1a: Questionable sensitivity to differences in patients. The variability in heat maps across healthy control participants is somewhat surprising. Do differences between individuals represent actual visual sensitivity differences, or are they an artifact of the measurement technique, e.g., due to signal-to-noise differences introduced by local variations in brain anatomy? Will the substantial variance across controls allow for a sufficiently stable baseline to detect meaningful differences in individual patients?
 
 AR-R2-1a: We agree the variability across healthy controls is surprising. It is unclear whether this reflects true individual differences in visual sensitivity or arises from factors like local signal-to-noise introduced by local variations in brain anatomy. It will be really interesting to investigate this further by examining structural variations across the visual field and comparing them with behavioral measures.
 
 As for establishing a stable baseline for patient comparisons, this is inherently an empirical question and depends on the degree of vision loss. LHON patients typically show dense central scotomas (up to 15º) in the chronic phase, making them well suited for detecting sensitivity differences – e.g., between central versus peripheral locations. Detecting subtler changes – in the acute phase or other conditions – may be more challenging. We agree with the reviewer that a normative range will be essential for contextualizing patient data, which we now mention in the Discussion, and we aim to develop in the future based on the present data.
 
 “Future work will focus on further validating reconstruction accuracy under controlled conditions, including simulated scotomas of varying severity and location, expanding testing to larger patient cohorts, and establishing a normative dataset to contextualize patient data.”
 
 R2-1b: Also, as the authors rightly point out, Benson atlas does not model differences along meridians, so upper/lower field differences might not be detectable.
 
 AR-R2-1b: We acknowledge the limitations of the Benson atlas, particularly its inability to model meridional asymmetries (e.g., upper vs. lower visual field). Still, our goal is to provide a method for tracking visual cortex changes over time. By consistently projecting longitudinal functional data onto the same structural image fitted with the Benson atlas, we maintain a stable anatomical reference, which supports reliable comparisons across timepoints – even with limited spatial accuracy. Future improvements could include shearing corrections, Bayesian updating, or alternative models such as DeepRetinotopy developed by Ribeiro et al.
 
 “Further enhancing the alignment between retinotopic template atlases and individual retinotopic tuning could improve this approach further, for example, by integrating them with functional measures using Bayesian methods (Benson & Winawer, 2018). In parallel, geometric deep learning frameworks such as DeepRetinotopy (Ribeiro et al., 2021) could also offer anatomy-driven predictions from structural MRI, and combining these strategies may yield more accurate and generalizable retinotopic reconstructions.”
 
 R2-2: Effects of unstable fixation/eye movements not explicitly tested: The methods state, 'In all tasks, participants were asked to report when the color of a central fixation dot changed', suggesting participants maintained fairly good fixation. Most of the results seem to pertain to measurements where central fixation is required. How does unstable fixation affect measurements?
 
 AR-R2-2: This is an important point. We have now extensively and systematically investigated the impact of eye movements on the cortical contrast sensitivity maps and updated the Abstract, Methods, Results, and Discussion sections (see R1-1b).
 
 R2-3: Potential for clinical translation. Although it is a sensitive measure, functional MRI is costly, is not available in all clinical settings, requires significant post-processing analyses, and may be contraindicated in some individuals due to safety (e.g., metallic implants) or other concerns (e.g., claustrophobia). These could present significant barriers to widespread clinical translation if this were the ultimate goal of the study.
 
 AR-R2-3: We agree that fMRI, while sensitive, has practical limitations for broad clinical adoption due to cost, accessibility, and contraindications. However, it remains a valuable tool in targeted contexts, where sensitive detection of visual field loss has large utility – for example for evaluating treatment effects in clinical trials. This application has been demonstrated in recent studies (Farahbakhsh et al., 2022; Maimon-Mor et al., 2025; Haal et al., 2016; Ritter et al., 2019).
 
 R2-4: Limited range of spatial frequencies. The spatial frequencies tested were still quite low (0.3 and 3cpd) compared to measures such as visual acuity. Extending the measurements to higher spatial frequencies could allow better characterization of central vision, although necessarily for peripheral vision.
 
 AR-R2-4: We agree that extending to higher spatial frequencies could improve central vision characterization and note this can be readily incorporated into future studies using the current framework. However, LHON patient’s acuity tends to be very low, and we found that 5cpd did not allow us to measure any cortical contrast sensitivity in a prior pilot. So, to characterize the visual field in LHON with fMRI, we therefore aimed to balance central and peripheral coverage: 0.3 cpd ensured broad detectability, while 3 cpd offered a middle ground to assess central vision without exceeding acuity of this population. Additional approaches, such as neural contrast sensitivity functions (e.g., Roelofzen et al., 2025) may also offer complementary insights such as acuity, and contrast sensitivity across the full spatial frequency range (area under the curve).
 
 Reviewer #2 (Recommendations for the authors):
 
 R2-Recommendation 1: It appears that the reliability measures, comparing differences in Spearman correlations between and within sessions, were not tested statistically, but evaluated qualitatively. What was the justification for this? The results only state Spearman values, but the discussion claims that the differences between the two comparisons were significant.
 
 AR-R2-Recommendation 1: The differences in Spearman correlations between and within sessions were tested statistically, and the omission of p-values was an oversight. We have now revised the Results section results from the paired one-tail t-test as follows:
 
 “We collected test-retest reliability measures from 4 out of 7 participants (Figures 6A-B) and benchmarked them against the correlations between the 0.3cpd condition and 3cpd spatial frequency condition, collected in the same session (Figure 6C). If measures are reliable, correlations should be higher for repeated measures with the same spatial frequency stimulus, collected on different days. We tested this prediction using a one-tailed paired t-test.”
 
 “This difference was statistically significant (t(3) = 2.62, p < 0.0395).”
 
 R2-Recommendation 2a: The variability of heat maps (visual field sensitivities) between healthy controls should also be discussed. What are potential explanations for this variability?
 
 AR-R2-Recommendation 2: We have expanded the Discussion section to address the variability observed in cortical sensitivity maps across healthy controls.
 
 “We also observed intriguing variability in cortical visual field maps across healthy controls, and this variability was consistent across measures. This may reflect genuine individual differences in visual sensitivity that are relevant for behavioral performance. Alternatively, it could arise from factors such as local signal-to-noise differences driven by anatomical variability. However, the fact that maps derived from different spatial stimulus conditions showed markedly different patterns argues against a purely anatomical explanation and suggests that at least part of the variability is functional. Despite this inter-subject variability, variations in cortical contrast sensitivity across eccentricities and visual field quadrants were significant at the individual level indicating high sensitivity.”
 
 R2-Recommendationn 2b: There should also be more discussion about any potential effects of eye movements/unstable fixation in order to address the suitability of the methods for these clinical populations.
 
 AR-R2-Recommendation 2b: Please refer to our response to R2-2, where we also detail the corresponding changes made in the manuscript.
 
 Reviewer #3 (Public review):
 
 R3-1: The authors should more strongly emphasize their findings on the organization of contrast sensitivity, particularly in light of the stimulation extent provided by the wide-field setup.
 
 AR-R3-1: Thank you for this important point – we have now emphasized more clearly in the manuscript that our method extends the measurement of contrast sensitivity to 20º eccentricity, which represents a significant advancement over previous studies.
 
 “These results demonstrate that our approach can detect subtle changes in visual sensitivity across eccentricities at the individual participant level. The ability to reveal these gradients was made possible by the large peripheral coverage provided by our large-field stimulation set-up (see Figure A1 in Appendix section), which enabled a more complete characterization of V1 sensitivity across the visual field. Importantly, the same effects were preserved when using retinotopic estimates derived from structure-based atlases, demonstrating that atlas-based methods can be used as alternative to pRF mapping in cases where it might otherwise be difficult or impossible to directly collect pRF measures. Together, these highlight both the validity of our approach and its potential to broaden the scope of visual neuroscience.”
 
 “Crucially, the ability to visualize these sensitivity gradients was made possible by the large peripheral coverage provided by our large-field stimulation set-up. Such coverage is particularly important for clinical applications, as it enables the detection of visual field losses beyond the macula (i.e., beyond 10º eccentricity) and the evaluation of residual peripheral vision in patients with macular-restricted damage. In doing so, this work provides a useful tool for advancing both basic visual neuroscience and translational research in clinical populations.”
 
 R3-2: Certain methodological aspects require further clarification, particularly regarding the correction of eccentricity values from the Benson atlas. It's not clear which V1 masks are used for the specific analysis which could have a substantial impact on the reported differences between the two approaches of pRF mapping and atlas-based pRF parameters.
 
 AR-R3-2: The correction of eccentricity values was performed using the V1 label provided by the Benson atlas. We have now explicitly stated this in the Methods section:
 
 “We collected data from 7 healthy controls (mean±SD: 29.6±4.7yo; 1M). All controls either had normal or corrected to normal vision, with no other ocular pathologies, and were recruited from the local staff and student pool at the University College of London. Each control completed both the population receptive field (pRF) mapping and the fMRI contrast sensitivity task. To assess measurement repeatability, four participants (C2, C4, C5, C6) performed the contrast sensitivity task twice. Additionally, one participant (C5) repeated the task under two simulated vision loss conditions (ring or quadrant loss), and two others (C5, C6) completed it with different levels of eye movement.”
 
 “Four participants (C2, C4, C5, C6) were invited for a second session in which they repeated the task to assess the reliability of the measures.”
 
 R3-4: The conclusion that high-contrast patterns as in pRF mapping are not optimal to test for subtle but potentially clinically relevant changes in the visual field coverage is very valid. The suggested use of contrast sensitivity can therefore be a potentially well-suited parameter for estimating visual field losses. The presented work is an interesting starting point and the proposed method of using contrast sensitivity as a measure for partial vision loss should further be explored.
 
 AR-R3-4: Thank you for the positive evaluation of our work.
 
 Reviewer #3 (Recommendations for the authors):
 
 R3-Recommendation 1: The shown organization of contrast sensitivities is consistent with previous studies; however, it extends the measurements to up to 20º eccentricity, which is, to my knowledge, much more than previously reported. The authors should therefore emphasize this more strongly.
 
 AR-R3-Recommendation 1: Please refer to our response to R3-1, where we also detail the corresponding changes made in the manuscript.
 
 R3-Recommendation 2: In the Methods section, it is not entirely clear why the eccentricity values originating from the Benson atlas need to be corrected using Horton & Hoyt cortical magnification. Do the authors consider these cortical magnification measurements as ground truth? Is the correction only applied to higher eccentricity values that are not mapped by the Benson atlas?
 
 AR-R3-Recommendation 2: The Benson et al. (2014) atlas predicts both polar angle and eccentricity from cortical anatomy (curvature, thickness) using a template pRF dataset and a mathematical retinotopic model. However, it does not incorporate a smooth parametric cortical magnification function such as Horton & Hoyt. Because the atlas is fit to an average map across subjects, and because the FreeSurfer alignment used to apply the template cannot incorporate functional information, the atlas cannot capture individual variability in eccentricity or cortical magnification. In practice, we therefore treat the Benson atlas as providing the correct topological layout of eccentricity, but not necessarily the correct eccentricity values for a given individual. Moreover, the data used to generate the Benson atlas have mainly been restricted to the central visual field (roughly 8º-12º) and the Benson atlas themselves has never been fit with data more eccentric than 20º. Consequently, peripheral eccentricity values are more model-driven and less constrained by ground-truth data.
 
 To improve the correspondence between the atlas and expected cortical representations, we applied Horton & Hoyt cortical magnification function to all eccentricities in the V1 Benson mask (from the foveal confluence to the periphery, up to 90º). We assume that the Horton & Hoyt model, adapted from physiology data, provides an accurate model of group level cortical magnification (Benson et al., 2021) – even though it does not capture individual differences. This means it offers the best approximation of ground-truth in the absence of individual pRF data, which is often not feasible to collect in patients with unstable fixation. We have now added a figure that showcases the method and shows how this correction affects the distribution of eccentricity values in the Benson atlas.
 
 R3-Recommendation 3: For the analysis using the atlas-based retinotopy, it is not entirely clear whether the authors also use the provided V1 masks. In other words, differences between the original pRF-based and atlas-based analyses could originate from different borders of V1 rather than from the atlas-based pRF parameters. The authors could try using the same mask for both analyses, either the manually delineated one or the atlas-based one.
 
 AR-R3-Recommendation 3: This is a well-noted point that is important to clarify. We used a manually delineated V1 mask for the own pRF map data and the Benson mask for the adjusted Benson atlas-based analysis – both restricted to the screen size. The difference in included vertices could have indeed introduced some additional error beyond the atlas/pRF mapping itself. We have opted not to correct this in this version of the manuscript because (1) the error introduced is likely small (as we inspected that the alignment of V1 ROI delineations with the Benson ROIs are good, so effects are likely not too major - although using identical masks may slightly improve the mapping further in particular the very center and outer-periphery), and (2) our ROI selection for each respective approach is in line with typical procedures used in reality. Critically, the spatial gradients in cortical contrast sensitivity are preserved across the pRF and Benson atlas approach with the different ROIs, so we believe that improvements would not alter our conclusions that Benson offers a useful alternative when pRF mapping is not possible - however, we now highlight this important difference across the two approaches in the paper.
 
 “With this structure-based atlas, we successfully replicated key variations in visual field function (across eccentricity and polar quadrants), although sensitivity to more subtle differences (e.g., upper versus lower quadrant anisotropy) was reduced. This reduction may partly stem from differences in ROI definitions: a manually delineated V1 mask was used for the pRF-based data, while the Benson atlas mask was used for the adjusted Benson atlas analysis. Such differences could introduce minor error beyond the atlas/pRF mapping itself due to differences in the vertices included by each mask.”
 
 “Importantly, the spatial gradients in cortical contrast sensitivity were preserved across both the pRF and Benson atlas approaches, indicating that minor ROI differences do not affect our conclusions. Together, these findings show that the Benson atlas remains a useful alternative when pRF mapping is not feasible.”
 
 R3-Recommendation 4: The patient was measured monocularly. Given the widefield stimulation setup and the fact that the blind spot is located at about 15º eccentricity, do the authors expect to measure this blind spot with the given setup?
 
 Does this have an influence in binocular measurements?
 
 AR-R3-Recommendation 4: This is an interesting point. In theory, our wide-field setup could allow for the detection of the blind spot, as located around 12-15º eccentricity. However, in our LHON patient, the visual field defect typically extends to or beyond the blind spot, making it difficult to isolate its boundary, as shown in Figure 11 (previously Figure 7). Additionally, under binocular viewing, the brain integrates inputs from both eyes to create a unified percept, which may obscure blind spots unless specific paradigms are used (e.g., binocular rivalry or dichoptic tasks). Whilst this is outside the scope of this work, our setup could be adapted to map out the blind spot or explore phenomena like binocular rivalry more directly in future research.
 
 R3-Recommendation 5: How stable is the presented wide-field stimulation setup? In other words, does the eye tracker still capture the eye reliably after small head movements?
 
 AR-R3-Recommendation 5: While small head movements can occur, these were minimized by the use of padding cushions and monitored throughout the session, and the eye tracker maintained reliable tracking throughout the sessions.
 
 R3-Recommendation 6: Are the shown sine-wave gratings always oriented the same? We would expect orientation tuning curves in the early visual cortex; how could this influence the results?
 
 AR-R3-Recommendation 6: For six of the seven control participants (C1-C6), the sinewave gratings were presented with a fixed horizontal orientation. In an updated version of the task – used for participant C7, cases of simulated eye movements, cases of artificial scotoma, and the patient – the orientation of the gratings was varied every 5 seconds among four angles (−45º, 0º, 45º, 90º) during each 15-second stimulus block.
 
 We acknowledge that orientation tuning in the early visual cortex could influence responses, since V1 neurons are selective for specific stimulus orientations and respond most strongly to their preferred orientation. However, we replicated the same overall pattern of results in groups tested with a single orientation and with multiple orientations. Importantly, some participants completed both versions of the task, and the contrast sensitivity patterns remained consistent across conditions. This suggests that the results we report are robust across different orientation-tuned populations for the purposes of this study. A more fine-grained investigation of orientation effects would nevertheless be an interesting direction for future work.
 
 “For six control participants (C1–C6), gratings were initially presented with a fixed horizontal orientation. In an updated version of the task – used for C7, cases of simulated eye movement, cases of artificial scotoma, and the LHON patient – the orientation varied every 5 s among four angles (−45º, 0º, 45º, 90º). Contrast sensitivity patterns were consistent across single and multiple-orientation conditions, including in participants who completed both versions, indicating robustness across orientation-tuned populations.”
 
 R3-Recommendation 7: Are pRF centers also fitted outside the stimulated 20º radius? If yes, were they masked for the analysis?
 
 AR-R3-Recommendation 7: During pRF model fitting, pRF centers were allowed to extend beyond the stimulated visual field, up to approximately 1.5 times the maximum stimulus eccentricity (~30°), to improve model stability near stimulus boundaries. Eccentricity was sampled on a logarithmically spaced grid defined as 2*, with 𝑥 ranging from -5 to 0.6 in steps of 0.2, and then scaled by the maximum stimulus eccentricity (20°) to express pRF centers in degrees of visual angle. This spacing approach provided finer sampling near the fovea and progressively coarser sampling at larger eccentricities, consistent with cortical magnification principles. For all subsequent analyses of cortical contrast sensitivity, pRF centers located outside the stimulated 20° eccentricity were explicitly excluded. Likewise, although the Benson atlas provides eccentricity estimates extending well beyond the stimulated range (up to ~90°), only pRF centers within 20° were included to ensure consistency across pRF based and atlas-based analyses.
 
 “During pRF model fitting, pRF centers were allowed to extend beyond the stimulated visual field to improve model stability near stimulus boundaries – up to approximately 1.5 times the maximum stimulus eccentricity (~30°). Eccentricity was sampled on a logarithmically spaced grid defined as 2*, with x ranging from −5 to 0.6 in steps of 0.2, and then scaled by the maximum stimulus eccentricity (20°) to express pRF centers in degrees of visual angle. This sampling scheme provided finer resolution near the fovea and progressively coarser sampling at larger eccentricities, consistent with cortical magnification principles.”
 
 “For all subsequent analyses of cortical contrast sensitivity, pRF centers outside the stimulated 20° eccentricity were excluded. Similarly, although the Benson atlas provides eccentricity estimates extending far beyond the stimulated range (up to ~90°), only values within 20° were retained to maintain consistency across pRF-based and atlas-based analyses.”
 
 R3-Recommendation 8: L212: Could the authors please clarify what "scaled across eccentricity to account for cortical magnification" means for the given stimulus?
 
 AR-R3-Recommendation 8: The pRF stimulus was scaled across eccentricity using a logarithmic transformation of retinal radius to approximate cortical magnification. Radial checker boundaries were defined in log eccentricity space (log(r)), resulting in an exponential increase in checker size with eccentricity (scaling factor = 3.2; ~1.37× increase per radial step). As a result, the spatial frequency content of the stimulus decreases with eccentricity (i.e., checker size increases), compensating for known changes in V1 spatial frequency preference across the visual field. This eccentricity dependent scaling inherently relies on precise fixation to stimulate the intended retinal locations, which can be difficult for patients with central vision loss and therefore motivates the use of Benson templates.
 
 “This scaling was implemented by applying a logarithmic transformation of retinal radius, such that radial checker boundaries were defined in log eccentricity space (log(r)), where r denotes to eccentricity relative to the fixation target). This produced an exponential increase in checker size with eccentricity (scaling factor = 3.2; ~1.37 times increase per radial step), resulting in lower spatial frequency content at larger eccentricities – consistent with known variations in V1 spatial frequency tuning. Because this eccentricity dependent scaling assumes precise fixation, it can be challenging for individuals with central vision loss, further motivating the use of Benson atlas templates in such populations.”
 
 R3-Recommendation 9: L213: Three runs were measured per session, were they averaged before analysis or analyzed independently? If analyzed independently, how were the individual results handled?
 
 AR-R3-Recommendation 9: As described in the Methods, data from all three runs were first aligned to an alignment scan that had been co-registered to the MPRAGE image – typically the scan with the fewest outlier voxels, or alternatively, a single-band reference scan in cases of misregistration. The runs were then analyzed as separate regressors in a single design matrix in SPM to account for run-specific variation - following standard recommendations for this software (Author response image 2 shows the SPM design matrix for the GLM). We did not average the runs beforehand due to differences in the order of stimulus presentation across runs. Instead, the GLM modeled each run’s specific presentation sequence to estimate condition-specific beta values, capturing the average contribution of each spatial frequency and contrast level to the BOLD response.
 
 Author response image 2.
 
 R3-Recommendation 10: L289: Did the authors check for very small pRF sizes, as SamSrf is prone to fitting many small sizes?
 
 AR-R3-Recommendation 10: We did not apply an explicit filter to remove very small pRF sizes; we excluded only pRFs with σ > 6.
 
 R3-Recommendation 11: L384: p is missing before the value.
 
 AR-R3-Recommendation 11: Thank you for catching this oversight. We have now added the missing p-value in the revised manuscript.
 
 “Post-hoc tests using Holm-Bonferroni correction show that V1 neuronal populations receiving inputs from the central visual field (0.5-4.5°) showed greater contrast sensitivity to high spatial frequency as compared to low spatial frequency stimuli (steeper slope for the 3cpd versus 0.3cpd condition: 0.5-2.5º: t(6) = 4.35, pbonf = 0.0149; 2.5-4.5º: 𝑡(6) = 3.471, pbonf = 0.0266).”
 
 R3-Recommendation 12: I have a very subjective comment regarding the figures. I do not really like the use of the hot colormap in this setting, as I feel it is hard to interpret high and low values.
 
 AR-R3-Recommendation 12: We appreciate the suggestion, but we have had many heated discussions amongst the authors about this and have moved back forth several times before settling. Hopefully the reviewer will be happy for us to stick with the author’s eventually agreed-on subjective preference although we acknowledge that it is by no means a perfect color scheme.
 
 R3-Recommendation 13: L474: Suddenly, a second session appears in the Results section; please report this in Methods.
 
 AR-R3-Recommendation 13: Please refer to our response to R3-3, where we also detail the corresponding changes made in the manuscript.
 
 R3-Recommendation 14: Figure 5C: are the reported results from the first session of the same subjects?
 
 AR-R3-Recommendation 14: That is correct. The results shown in Figure 6C (previously 5C) reflect correlations between slope estimates obtained from the 0.3 and 3cpd conditions within the same session for each subject. We have updated the panel title to “C. 0.3cpd vs 3cpd (within session)” to clarify this point.
 
 R3-Recommendation 15: For the classic pRF mapping (Figure 6D), the artificial scotoma shows lower contrast sensitivity within the scotoma and increased values outside its borders. In contrast, using the retinotopic template (Figure 6E), the area of increased sensitivity is shifted inside the scotoma. Can the authors please comment on this discrepancy?
 
 Is this shift due to systematic differences between the eccentricity values estimated during the pRF run and those derived from the template?
 
 If such a shift exists, is it induced by the eccentricity correction step performed?
 
 AR-R3-Recommendation 15: The shift inside the scotoma observed in the atlas-based analysis (Figure 9E; previously Figure 6E) compared to the pRF-based analysis (Figure 9D; previously Figure 6D) likely reflects residual inaccuracies in eccentricity estimates from the adjusted Benson atlas. While the Horton & Hoyt correction improves the alignment of eccentricity values, it does not ensure perfect matching with the pRF data. Without the Horton & Hoyt correction, the misalignment and shift of activity in the scotoma region are even more pronounced (see below).
 
 We have added a sentence to the Methods section to justify the applied correction. Furthermore, to illustrate the impact of misalignment and its correction on cortical sensitivity maps, we have included an additional figure in the Appendix section showcasing the effect of applying the correction to improve mapping of the artificial scotoma.
 
 “We initially observed inaccuracies between the template and individual retinotopy eccentricity estimates which led to substantial distortions in cortical visual field maps due to cortical magnification – especially in peripheral locations (see Figure A4 in Appendix section).”
 
 R3-Recommendation 16: L532: The age and mutation type of the patient are already reported in the Methods. In general, many Methods and Discussion statements are embedded within the Results section.
 
 AR-R3-Recommendation 16: We are aware that it is a stylistic choice to remind of method in the results and foreshadow discussion. We chose this approach to support the interpretability of the results for less specialist readers.
 
 R3-Recommendation 17: L636: Did the authors consider other options for estimating pRF parameters based on anatomical features, like Ribeiro et al. (2021;https://github.com/felenitaribeiro/deepRetinotopy_TheToolbox).
 
 AR-R3-Recommendation 17: We agree that alternative approaches to estimating pRF parameters based on anatomical features, such as the DeepRetinotopy method proposed by Ribeiro et al. (2021), are promising and worth exploring. In this study, we used the Benson atlas as a starting point, along with an adjustment of eccentricity estimates based on cortical magnification. Future work could compare the performance of different retinotopic template fitting approaches, including deep learning-based methods, to further improve anatomical alignment and functional predictions.
 
 “Further enhancing the alignment between retinotopic template atlases and individual retinotopic tuning could improve this approach further, for example, by integrating them with functional measures using Bayesian methods (Benson & Winawer, 2018). In parallel, geometric deep learning frameworks such as DeepRetinotopy (Ribeiro et al., 2021) could also offer anatomy-driven predictions from structural MRI, and combining these strategies may yield more accurate and generalizable retinotopic reconstructions.”
 
 R3-Recommendation 18: Figure A4: This figure brings up a very important point, namely, whether small eye movements reduce the accuracy of pRF and contrast sensitivity estimates. However, these experiments and results are not reported in the manuscript. I would prefer the authors to add all necessary Methods and Results, or at least not leave this Figure unexplained.
 
 AR-R3-Recommendation 18: We thank the reviewer for highlighting the importance of this figure. To address this point, we collected additional data and have revised the manuscript to include a dedicated section on the effects of eye movements, with corresponding updates in the Abstract, Methods, Results, and Discussion.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.10.29.619403v3
elifesciences.org elifesciences.org

How attention simplifies mental representations for planning

1
1. Public_Reviews 02 Jun 2026
  
  in eLife (unscoped)
  
  eLife Assessment
  
  This important study utilizes behavioral data and computational modeling to show that spatial properties of visual attention affect human planning. The methodology and statistical analyses are convincing, though the way attention is conceptualized and modeled could be refined. The findings of this study will interest cognitive scientists studying attention, perception, and decision-making.
  
  Summary
Visit annotations in context

Tags

Summary

Annotators

Public_Reviews

URL

elifesciences.org/reviewed-preprints/108034v1
www.biorxiv.org www.biorxiv.org

Arousal modulates functional connectivity through structured and hemispherically asymmetric community architecture during wakefulness

5
1. Public_Reviews 02 Jun 2026
  
  in eLife
  
  eLife Assessment
  
  This study offers a valuable analysis of how moment-to-moment fluctuations in arousal are associated with structured, non-uniform patterns of brain-wide functional connectivity during wakefulness. Using data-driven analyses of resting-state and naturalistic fMRI with eye tracking, the authors present convincing evidence that arousal is a dynamic, continuous process that shapes brain activity in a structured way beyond a simple global effect. This paper sheds light on the link between brain activity and ongoing fluctuations in arousal and will be of interest to researchers studying large-scale brain functional organization and links between the brain and body.
  
  Summary
2. Public_Reviews 02 Jun 2026
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  In this study, the authors aim to characterize how moment-to-moment fluctuations in arousal during wakefulness shape large-scale functional brain connectivity. Using pupil diameter as an index of arousal and high-field functional imaging, they seek to determine whether arousal-related modulation of connectivity is uniform across the brain or organized into structured patterns, and whether such patterns show hemispheric asymmetry. The work further aims to assess whether these organizational features generalize across resting-state and naturalistic viewing conditions.
  
  Strengths:
  
  The study addresses an important and timely question regarding how spontaneous variations in arousal influence whole-brain communication during wakefulness. The dataset is rich, combining high-field imaging with concurrent physiological measurements, and the analyses are ambitious in scope. A key strength is the attempt to move beyond region-based effects and to describe arousal-related modulation at the level of large-scale connectivity organization. The comparison across rest and movie viewing provides useful context and suggests a degree of consistency across behavioral states.
  
  Weaknesses
  
  All analyses are based on 7T ultra-high-field imaging. The manuscript does not address whether the reported arousal-related patterns, including the community structure and hemispheric asymmetries, are expected to be reproducible at standard 3T field strengths. It therefore remains unclear whether the findings depend critically on the use of high-field data or whether they would generalize to more widely available datasets, limiting the broader applicability of the results.
  
  Review 1
3. Public_Reviews 02 Jun 2026
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This manuscript addresses a clear and widely relevant question: how ongoing fluctuations in alertness during wakefulness relate to large scale patterns of coordinated brain activity. The authors combine high field magnetic resonance imaging with simultaneous pupil measurements, and they compute an edgewise measure of arousal-related coupling for every pair of regions. Their main contribution is to show that arousal-related coupling is low dimensional and organized into seven reproducible "connectivity communities", each with characteristic network pair compositions. A secondary contribution is the observation that these communities exhibit systematic but community-specific hemispheric asymmetries, including a striking left/right dissociation within the ventral attention network, where the left side participates broadly across communities while the right side forms a more cohesive, segregated arousal responsive module. A final contribution is cross-context generalization: the same organizational structure and lateralization signatures are largely preserved during naturalistic movie watching.
  
  Strengths:
  
  (1) The paper moves beyond state contrasts and quantifies arousal related modulation continuously within wakefulness, directly addressing a gap highlighted in the Introduction.
  
  (2) The hemispheric asymmetry result is not framed as a crude global dominance effect; the authors explicitly test and argue that the key signal lies in structured spatial heterogeneity rather than mean shifts.
  
  (3) The cross-paradigm replication in movie watching is a strong design choice and supports the claim that the organizational motifs are not limited to unconstrained rest.
  
  (4) Arousal effects on BOLD signals and on pupil size can have different delays. The authors have now tested lagged relationships (for example shifting the pupil series forward and backward) to show that the main community structure and lateralization results are not sensitive to an arbitrary temporal alignment.
  
  (5) Time resolved connectivity results are now shown to be robust to changes in parameters.
  
  Review 2
4. Public_Reviews 02 Jun 2026
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The paper investigates neural fluctuations underlying arousal using a combination of resting state/naturalistic movie watching fMRI and eye tracking data. The authors have used several data driven approaches, including time varying sliding window analyses and clustering methods, to characterize large scale brain organization and hemispheric asymmetries associated with arousal fluctuations. This is an interesting study framing arousal as a dynamic, continuously varying process rather than a discrete state. Overall, the manuscript is well written and the authors have provided sufficient details about the methodological choices, their impact on the results, along with the limitations of the study.
  
  Strengths:
  
  This is an interesting study framing arousal as a dynamic, continuously varying process rather than a discrete state. Overall, the manuscript is well written and provides sufficient methodological and analytical details to evaluate the results.
  
  Weakness:
  
  While the study provides new insights regarding neural processes underlying arousal, future studies may be needed to further examine the implications of identified cluster and patterns.
  
  Review 3
5. Public_Reviews 02 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  (1) First, a central claim is that arousal modulates functional connectivity in a hemispherically asymmetric and community-specific manner. Although structured asymmetries are demonstrated at the group level, it remains unclear whether these effects reflect a stable neurobiological principle or arise from high-dimensional, connection-wise analyses that are sensitive to sampling variability. Given the interpretive weight placed on hemispheric lateralization, stronger evidence of robustness and individual-level consistency would be necessary to support this conclusion.
  
  We appreciate your critical comments on the robustness of our lateralization findings. We fully agree with you that it is essential to demonstrate that the observed hemispheric asymmetries reflect a stable neurobiological principle rather than an artifact of sampling variability or high-dimensional noise. To address this concern, we performed two rigorous validation analyses using 500-iteration resampling schemes, consisting of a split-half reliability test and a participant-level consistency assessment.
  
  First, to ensure our findings do not depend on specific sample compositions, we conducted a split-half reliability test where the dataset was randomly partitioned into two independent subgroups over 500 iterations. As shown in Figure S1A, the community labels maintained high spatial consistency across iterations (as evidenced by the confusion matrix and Dice coefficient distributions), and our original findings—including network-pair community architecture (Fig. S2A), regional affiliation patterns (Fig. S3A-B), and arousal–tvFC coupling lateralization (Fig. S4A-B)—were consistently situated at the center of the iteration distributions.
  
  Second, to account for potential within-participant dependencies in the HCP 7T dataset, we performed a participant-level resampling analysis (N = 139). By randomly selecting a different session for each participant across 500 iterations, we confirmed that the community architecture and hemispheric biases remain robust even under this strict control (Figure S1A, S2B, S3C-D and S4C-D). Collectively, these additional analyses provide strong evidence that the hemispheric lateralization we reported is not a byproduct of sampling bias, but instead represents a stable organizational principle of the arousal-modulated connectome.
  
  (2) Second, all analyses are based on ultra-high-field imaging. The manuscript does not address whether the reported arousal-related patterns, including the community structure and hemispheric asymmetries, are expected to be reproducible at standard field strengths. It therefore remains unclear whether the findings depend critically on the use of high-field data or whether they would generalize to more widely available datasets, limiting the broader applicability of the results.
  
  We appreciate your constructive comments on the generalizability of our findings across different field strengths.
  
  As you noted, our primary motivation for employing 7T ultra-high-field imaging was to leverage its superior signal-to-noise ratio (SNR) and significantly enhanced BOLD sensitivity. These technical advantages were instrumental in capturing the subtle, moment-to-moment coupling between spontaneous pupillary fluctuations and tvFC—signals that might be close to the detection threshold in standard field strength environments.
  
  However, we fully recognize your point that 3T remains the standard in most clinical and research settings. In the revised manuscript, we have added a dedicated discussion to address this (page 21, lines 447-456):
  
  “Fifth, the findings reported here were derived exclusively from ultra-high-field (7T) imaging data. The superior BOLD sensitivity of 7T fMRI was instrumental in resolving the fine-scale community architecture of arousal–tvFC coupling, which involves subtle signals that may be challenging to detect at lower field strengths. Given that 3T remains the most common parameter for neuroimaging research and clinical applications, future investigations are needed to determine the extent to which these organizational principles generalize to standard field strength data. Validating these motifs in large-scale 3T datasets will be essential to establish their broader applicability across different imaging environments.”
  
  (3) Third, arousal-connectivity coupling is assessed using zero-lag correlations between pupil diameter and time-resolved connectivity estimates. Physiological and hemodynamic considerations suggest that pupil-linked arousal and blood-based imaging signals may exhibit systematic temporal delays. The absence of analyses examining sensitivity to such delays raises the possibility that the reported coupling patterns depend on a specific temporal alignment assumption.
  
  Given the inherent delay of the hemodynamic response function (HRF) and the complex temporal relationship between pupillary dynamics and neural activity, we conducted an additional lagged cross-correlation analysis to test the sensitivity of our findings. Following established frameworks for linking BOLD signals with pupillometry (Yellin et al., 2015; Gonzalez-Castillo et al., 2022; Lloyd et al., 2023), we systematically shifted the pupil time series relative to the fMRI data by -3 TR to +3 TR (-3s to +3s) and evaluated the consistency of the community architecture across these different lags using Dice coefficients.
  
  As shown in Figure S5, these results demonstrate that the community organization remain stable across the tested range of physiological delays. This stability indicates that the arousal-modulated communities we reported are not specific to the zero-lag assumption but instead persist throughout the physiologically plausible lag window. Consequently, our findings reflect a robust neurobiological phenomenon rather than an artifact of a specific temporal alignment.
  
  (4) Fourth, the estimation of time-resolved connectivity relies on a single choice of sliding-window length. The manuscript does not examine whether the reported patterns are stable across different window sizes. Given ongoing concerns about parameter dependence in time-resolved connectivity analyses, sensitivity analyses would be important to establish that the findings are not artifacts of a particular analytical choice.
  
  To ensure that our findings are not artifacts of a specific analytical choice, we performed an exhaustive sensitivity analysis by repeating our entire pipeline across a wide range of window lengths (30s, 35s, 60s, and 90s) and step sizes (1s, 5s, and 10s). We then employed Dice coefficients to quantify the topological similarity between these alternative configurations and our original parameters (30s window, 5s step).
  
  As shown in Figure S5, our results demonstrate high topological consistency, with Dice coefficients for community structures remaining consistently above 0.8 across all tested parameter combinations. These findings provide strong evidence that the arousal-modulated organizational principles we reported are inherent to the data rather than being driven by specific analytical choices in the sliding-window setup.
  
  (5) Finally, the identification of seven connectivity communities is a central result, yet the justification for this choice relies primarily on a single clustering quality measure. In practice, evaluation of clustering solutions typically draws on multiple complementary criteria, including measures of compactness and separation, approaches for selecting the number of clusters, and assessments of stability under resampling. Without such complementary evaluations, it is difficult to determine whether the reported community structure reflects a stable organizational feature or sensitivity to specific methodological decisions.
  
  We agree that relying on a single measure can be limiting, and in the revised manuscript, we have implemented a comprehensive multi-criteria evaluation to justify our selection of K=7. To ensure the robustness of the community partition, we expanded our analysis to include several complementary indices, such as the Davies-Bouldin Index, Calinski-Harabasz Score, and Silhouette Coefficient, alongside the original Within-Cluster Sum of Squares (WCSS), as detailed in Figure S7A.
  
  To further minimize subjective bias in "elbow" detection, we utilized the L-method (Salvador & Chan, 2004), which identifies the optimal K by minimizing the combined root-mean-square error (RMSE) of two linear regression segments. As illustrated in Figure S7B, the RMSE was minimized at K=7, providing a robust mathematical basis for our partition. Furthermore, we systematically visualized the community maps across a range of granularities from K=5 to 9 (Figure S7C). This stability analysis demonstrates that the fundamental topological features and the resulting hemispheric asymmetries are not transient artifacts of a specific K but are consistently preserved as the clustering granularity increases. These additional evaluations demonstrate that the seven-community structure reflects a stable organizational feature of arousal-modulated connectivity
  
  Reviewer #2 (Public review):
  
  (1) Arousal effects on BOLD signals and on pupil size can have different delays, so it would be valuable to test lagged relationships (for example, shifting the pupil series forward and backward) to show that the main community structure and lateralization results are not sensitive to an arbitrary temporal alignment.
  
  We agree with you that accounting for the varying delays between BOLD signals and pupillary dynamics is essential for ensuring the robustness of our results. We conducted a comprehensive lagged cross-correlation analysis to address it. Following established frameworks for linking BOLD signals with pupillometry (Yellin et al., 2015; Gonzalez-Castillo et al., 2022; Lloyd et al., 2023), we systematically shifted the pupil time series relative to the fMRI data by -3 TR to +3 TR (-3s to +3s) and evaluated the consistency of the community architecture across these lags using Dice coefficients.
  
  As shown in Figure S5C, these results demonstrate that the core community organization remain stable across the tested range of physiological delays. This stability confirms that our findings are not sensitive to an arbitrary temporal alignment but instead reflect a robust neurobiological phenomenon that persists throughout the physiologically plausible lag window.
  
  (2) Pupil diameter covaries with blinks, eye closure, and other factors that can covary with head motion and physiological noise. The Methods include substantial quality control and denoising, including motion regression and scrubbing, plus exclusions for eye closure.
  
  We appreciate your attention to these potential confounding factors. While we implemented rigorous preprocessing including regressing out confounds on fMRI images, we agree that physiological noise and motion may influenced pupil signals.
  
  To address this, we conducted an additional control analysis where we included head motion (framewise displacement, FD) and the global signal (defined as the mean signal across all gray matter voxels) as covariates when calculating the arousal–tvFC coupling. We then re-evaluated the similarity between the resulting community architecture and our original findings. As shown in Figure S4, the community structure remained stable after controlling for these variables.
  
  Regarding eye closure, we intentionally did not regress this out, as extensive literature demonstrates that eye closure is itself a reliable physiological proxy for arousal levels (Sommer & Golz, 2010; Chang et al., 2016; Gonzalez-Castillo et al., 2022); regressing it out would likely remove the very arousal-related coupling effects we aim to investigate.
  
  (3) The dataset is described in terms of runs retained (for example, 485 resting runs), and runs are treated as observations in clustering after z-scoring across runs. If multiple runs come from the same individuals, the manuscript would benefit from explicitly showing that results replicate at the participant level (for example, community structure stability within participant across runs, and participant-level summary statistics used for inference), rather than relying primarily on pooled run-level patterns.
  
  We fully agree with you that it is essential to demonstrate that the observed hemispheric asymmetries reflect a stable neurobiological principle rather than an artifact of sampling variability or high-dimensional noise. To address this concern, we performed two rigorous validation analyses using 500-iteration resampling schemes, consisting of a split-half reliability test and a participant-level consistency assessment.
  
  First, to ensure our findings do not depend on specific sample compositions, we conducted a split-half reliability test where the dataset was randomly partitioned into two independent subgroups over 500 iterations. As shown in Figure S1A, the community labels maintained high spatial consistency across iterations (as evidenced by the confusion matrix and Dice coefficient distributions), and our original findings—including network-pair community architecture (Fig. S2A), regional affiliation patterns (Fig. S3A-B), and arousal–tvFC coupling lateralization (Fig. S4A-B)—were consistently situated at the center of the iteration distributions.
  
  Second, to account for potential within-participant dependencies in the HCP 7T dataset, we performed a participant-level resampling analysis (N = 139). By randomly selecting a different session for each participant across 500 iterations, we confirmed that the community architecture and hemispheric biases remain robust even under this strict control (Figure S1A, S2B, S3C-D and S4C-D). Collectively, these additional analyses provide strong evidence that the hemispheric lateralization we reported is not a byproduct of sampling bias, but instead represents a stable organizational principle of the arousal-modulated connectome.
  
  (4) Time-resolved connectivity is estimated using a 30-second sliding window and 5 second step. It is reasonable to wonder whether the same conclusions hold with alternative estimators that do not rely on fixed windows. The Discussion acknowledges this limitation, but adding a small robustness analysis would make the paper more definitive.
  
  To ensure that our findings are not artifacts of a specific analytical choice, we performed an exhaustive sensitivity analysis by repeating our entire pipeline across a wide range of window lengths (30s, 35s, 60s, and 90s) and step sizes (1s, 5s, and 10s). We then employed Dice coefficients to quantify the topological similarity between these alternative configurations and our original parameters (30s window, 5s step).
  
  As shown in Figure S3, our results demonstrate high topological consistency, with Dice coefficients for community structures remaining consistently above 0.8 across all tested parameter combinations. Furthermore, the core hemispheric asymmetry patterns were robustly preserved regardless of the specific windowing configuration used. These results provide strong evidence that the arousal-modulated organizational principles we reported are inherent to the data and are stable across a broad range of temporal scales.
  
  Reviewer #3 (Public review):
  
  (1) A major limitation of the study is the limited discussion of subcortical regions, which play a central role in arousal regulation according to extensive prior literature. Although the current analyses focus primarily on cortical organization, the authors should include a brief discussion of how their findings relate to subcortical arousal systems.
  
  We completely agree that subcortical structures are pivotal drivers of arousal regulation. While our study primarily utilized a symmetric cortical atlas to ensure a mathematically rigorous assessment of hemispheric lateralization, we recognize that the exclusion of subcortical regions limits the functional interpretation of the observed patterns.
  
  In the revised manuscript, we have added a dedicated discussion part (page 20, lines 412-428) to address this point:
  
  “First, to ensure a mathematically rigorous assessment of hemispheric asymmetry, our analysis was restricted to a symmetric cortical parcellation. Consequently, while we demonstrate that arousal-modulated connectivity follows a structured macroscopic architecture, we did not explicitly analyze the subcortical nuclei hypothesized to drive these patterns. We hypothesize that the presence of these low-dimensional cortical communities reflects coordinated motifs rather than a homogeneous gain modulation, potentially mirroring the differentiated projection patterns of subcortical neuromodulatory systems. For instance, the locus coeruleus–noradrenergic pathway (Chandler et al., 2014; Schwarz & Luo, 2015) and thalamus (Hwang et al., 2017; Shine, 2019; Müller et al., 2020; Shine et al., 2023) possess extensive yet non-uniform projections that may anchor the community-specific and hemispherically asymmetric patterns observed here. “
  
  (2) While sliding window methods can capture temporal changes in functional organization, they have limitations in characterizing moment-to-moment neural fluctuations. In particular, results can be highly sensitive to window length and step size. The manuscript would benefit from (a) a clearer discussion of these methodological limitations, (b) justification for the chosen window length and step size, and (c) a sensitivity analysis demonstrating whether the main findings are robust across different parameter choices.
  
  To ensure that our findings are not artifacts of a specific analytical choice, we performed an exhaustive sensitivity analysis by repeating our entire pipeline across a wide range of window lengths (30s, 35s, 60s, and 90s) and step sizes (1s, 5s, and 10s). We then employed Dice coefficients to quantify the topological similarity between these alternative configurations and our original parameters (30s window, 5s step).
  
  As shown in Figure S5, our results demonstrate high topological consistency, with Dice coefficients for community structures remaining consistently above 0.8 across all tested parameter combinations. Furthermore, the core hemispheric asymmetry patterns were robustly preserved regardless of the specific windowing configuration used. These results provide strong evidence that the arousal-modulated organizational principles we reported are inherent to the data and are stable across a broad range of temporal scales.
  
  (2) The authors use k-means clustering to identify groups of brain regions and refer to these groupings as "communities." However, in general, community detection typically refers to graph-based algorithms that identify modules based on connectivity structure (e.g., modularity maximization). The clusters derived from k-means in feature space are not necessarily equivalent to graph-theoretic communities. The authors should explicitly clarify this distinction and adjust terminology accordingly to avoid conceptual ambiguity.
  
  We agree that the term "community detection" is often specifically associated with graph-based algorithms, such as modularity maximization, which define modules based on topological connectivity. In contrast, our implementation of k-means identifies groupings based on the similarity of arousal–FC coupling patterns within a high-dimensional feature space.
  
  To avoid any conceptual ambiguity or potential confusion, we have explicitly clarified this distinction in the Methods (pages 24-25, lines 533-542) section of the revised manuscript:
  
  “We employed the k-means clustering algorithm (Euclidean distance) to explore a range of cluster solutions from K = 2 to 15. To ensure the stability of the results and avoid local optima, each K was repeated 250 times with random initializations. The optimal number of clusters was determined by evaluating clustering quality and reproducibility (e.g., maximizing silhouette stability). It is important to clarify that "communities" in this context refer to clusters of edges that exhibit similar arousal-modulation motifs within a high-dimensional feature space, rather than topological modules typically derived from graph-theoretic algorithms like modularity maximization. This procedure consistently identified seven distinct communities, each representing a robust, arousal-sensitive connectivity motif that characterizes the large-scale organization of brain-pupil coupling.”
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) To strengthen confidence in the reported hemispheric effects, the authors should provide additional robustness analyses, such as subject-level consistency of lateralization measures, split-half or resampling reliability, and sensitivity to alternative preprocessing or analysis choices. Reporting the distribution of lateralization effects across individuals would help clarify whether the observed asymmetries reflect stable features or group-level averages driven by a subset of connections or participants.
  
  We agree that establishing the individual-level stability of lateralization is essential. We have now provided extensive validation, including split-half reliability tests and participant-level consistency analyses (500 iterations). These results confirm that the reported asymmetries are robust and consistent across the sample. Please refer to Reviewer #1 Weakness2 for the full analysis and associated figures (Figure. S1-S4).
  
  (2) The authors should examine whether arousal-connectivity coupling patterns are robust to plausible temporal delays between pupil diameter and BOLD signals. Lagged or time-shifted analyses would help establish that the findings do not depend on a specific zero-lag assumption.
  
  We agree that validating the coupling between pupil dynamics and the time varying FC is essential. To address this, we conducted a lag sensitivity analysis by shifting the pupil-derived arousal signal within a physiologically plausible range (-3 to +3 TR). The community architecture remains highly consistent across these temporal offsets, showing high spatial correlation and Dice coefficients with our original findings. This stability confirms that the identified organizational motifs are robust and not dependent on a specific zero-lag assumption. For the full details of this validation and the associated figures, please refer to Reviewer #1 Weakness3 and Figure S5 in the Supplementary Material.
  
  (3) Given reliance on a single sliding-window length, the authors should assess how key results vary across different window sizes. Demonstrating stability of the community structure and lateralization patterns across parameter choices would strengthen the methodological foundation of the study.
  
  We have conducted an exhaustive sensitivity analysis across various window lengths (30s, 35s, 60s, 90s) and step sizes (1s, 5s, 10s). The high Dice coefficients (>0.8) confirm that our findings are not dependent on specific windowing choices. Please refer to Reviewer #1 Weakness3 and Figure S5 for the full results.
  
  (4) The justification for the chosen number of connectivity communities would benefit from additional clustering evaluations. Complementary criteria such as measures of compactness and separation, model selection approaches for determining the number of clusters, and stability or reproducibility under resampling would help establish whether the reported community structure is robust rather than method-dependent.
  
  To strengthen the mathematical basis for our partition, we have implemented a multi-metric evaluation and the L-method for objective K selection. These metrics consistently support the seven-community structure. Please refer to our response to Reviewer #1 Weakness5 and Figure S7 for the comprehensive evaluation.
  
  (5) The manuscript would benefit from a clearer discussion of why ultra-high-field imaging was required for the present analyses and whether similar results are expected at standard field strengths. If feasible, validation using lower-field data or reference to existing datasets would substantially enhance generalizability.
  
  We have expanded our discussion to clarify that 7T was instrumental for capturing the subtle, high-frequency arousal-tvFC coupling due to its superior SNR. We also explicitly discuss the potential and limitations of generalizing these findings to 3T datasets. Please refer to our response to Reviewer #1 Weakness2 for the full discussion (page 21, lines 447-456).
  
  (6) The authors should more explicitly report exclusion related to pupil measurements and discuss how missing or noisy pupillometry may affect the applicability of the approach in other datasets or experimental settings.
  
  We agree that transparency in data screening is essential for the reproducibility of our method. In the revised manuscript, we have clarified our quality control pipeline in the quality control section in Methods (page 23, lines 502-510):
  
  “The final analyzed sample for the resting-state consisted of N = 139 healthy participants (mean age = 29.1±3.5 years, 77 female). Runs were excluded if (a) more than 20% of frames exceeded motion thresholds, (b) eye tracking did not cover the full fMRI time series, or (c) more than 90% of samples were classified as eye closure. After applying these criteria, 485 of the initial 723 scans were retained for analysis. The same quality-control pipeline was applied to the movie-watching dataset, yielding 513 usable scans out of the original 725. Detailed information on data retention and run distribution per participant is summarized in Figure S9.”
  
  Furthermore, we have added a discussion regarding how noisy or missing pupillary signals might affect the generalizability of our approach (pages 20-21, lines 437-447):
  
  “Fourth, the generalizability of our approach to external cohorts warrants caution regarding pupillary data integrity. In contexts where high-fidelity eye-tracking is technically demanding—such as in clinical settings involving patients with restricted compliance or in naturalistic fMRI studies—the prevalence of blink artifacts and signal dropouts may bias the estimation of arousal-modulated states. Excessive reliance on data interpolation in such cases could artificially smooth temporal fluctuations, leading to an overestimation of community stability. Future applications should therefore prioritize high-frequency sampling and potentially incorporate multi-modal physiological features (e.g., respiratory or cardiac signals) to cross-validate arousal dynamics when pupillary data is suboptimal (Meissner et al., 2023; Bolt et al., 2025; Weijs et al., 2025).”
  
  (7) The authors should ensure that all data and analysis code necessary to reproduce the results are made publicly available in accordance with eLife policies, including clear documentation of preprocessing steps, parameter choices, and clustering procedures.
  
  All analysis code and the necessary processed data required to reproduce our findings have been made publicly available through https://github.com/kongxy6478/Arousal-modulates-functional-connectivity. This repository includes documented pipelines for pupillometry cleaning and fMRI denoising, alongside the core Python scripts used for sliding-window connectivity calculation, k-means clustering, and hemispheric lateralization analysis.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Add a lag sensitivity analysis between pupil-derived arousal and time-resolved connectivity, and report whether the seven community structure and key lateralization findings are stable across a plausible lag range.
  
  We agree that validating the coupling between pupil dynamics and the time varying FC is essential. To address this, we conducted a lag sensitivity analysis by shifting the pupil-derived arousal signal within a physiologically plausible range (-3 to +3 TR). The community architecture remains highly consistent across these temporal offsets, showing high spatial correlation and Dice coefficients with our original findings. This stability confirms that the identified organizational motifs are robust and not dependent on a specific zero-lag assumption. For the full details of this validation and the associated figures, please refer to Reviewer #1 Weakness3 and Figure S5 in the Supplementary Material.
  
  (2) Quantify and report the extent to which residual head motion, blink rate, eye closure segments, and global signal changes explain arousal connectivity coupling, for example, via partial correlation or regression controls, and show that key effects persist.
  
  We agree that it is essential to demonstrate that the observed arousal-connectivity coupling is not driven by non-specific physiological or motion-related artifacts. As requested, we have quantified the influence of head motion (FD) and global signal on our primary results. By implementing partial correlation analyses, we confirmed that the identified arousal-modulated community structures persist even after strictly controlling for these variables. These results indicate that the arousal-tvFC coupling we report reflects a specific neuro-arousal process rather than a byproduct of motion or systemic physiological fluctuations. For the detailed quantitative results and control analysis figures, please refer to our response to Reviewer #2 Weakness3 and Figure S6 in the Supplementary Material.
  
  (3) Add participant-level validation: demonstrate that community profiles and lateralization signatures are consistent within participants across runs, and consider participant-level statistical summaries rather than treating all runs as independent observations.
  
  We agree that demonstrating participant-level consistency is vital. In response, we performed two rigorous 500-iteration resampling schemes: a split-half reliability test and a participant-level consistency assessment (N = 139). These analyses, which involved randomly partitioning the sample and selecting single sessions per participant, confirm that our community architecture and hemispheric biases are remarkably stable and not driven by sampling variability or high-dimensional noise. For a comprehensive description of these validations and the associated statistical distributions, please refer to our detailed response to Reviewer #2 Weakness3 and Figures S1–S4.
  
  (4) Provide an alternative dynamic connectivity estimator robustness check, or at a minimum, vary the window length and step size to show stability of the primary conclusions.
  
  We have conducted an exhaustive sensitivity analysis across various window lengths (30s, 35s, 60s, 90s) and step sizes (1s, 5s, 10s). The high Dice coefficients (>0.8) confirm that our findings are not dependent on specific windowing choices. Please refer to Reviewer #1 Weakness3 and Figure S5 for the full results.
  
  (5) Consider validating the seven community solutions with at least one additional unsupervised approach, and report agreement with the main k-means solution.
  
  We agree that validating the clustering scheme is essential. To this end, we implemented a multi-criteria evaluation (including Davies-Bouldin and Silhouette indices) and utilized the L-method (Salvador & Chan, 2004) to mathematically confirm K=7 as the optimal granularity (Figure S7A–B). Furthermore, we verified that the core topological features and hemispheric asymmetries remain robustly consistent across a range of granularities from K=5 to 9 (Figure S7C). These analyses demonstrate that our findings are not dependent on a specific K or subjective bias. For the full quantitative evaluation and stability maps, please refer to our response to Reviewer #2 Weakness5 and Figure S7.
  
  (6) State explicitly, early in Results, what the main inferential unit is (run or participant) for each key analysis, and clarify how repeated runs per participant are handled.
  
  We agree that defining the inferential unit is critical for methodological clarity. In the revised manuscript, we have explicitly stated at the beginning of the Results section (page 5, lines 113-116):
  
  “While our primary inferential analyses were conducted at the run level to leverage the high-density sampling of the HCP 7T dataset, we further validated the robustness of these findings using participant-level statistical summaries and resampling to account for within-participant dependencies (see Figure. S1-S2 in Supplementary Materia).”
  
  Specifically, all key findings—including community architecture and hemispheric asymmetries—were validated using participant-level statistics and resampling schemes (N = 139) to ensure that the results are not biased by within-participant dependencies.
  
  (7) When introducing the integration and segregation indices, add a brief intuitive explanation of what a positive or negative value means in plain language before the equations.
  
  We thank the reviewer for this suggestion to improve the accessibility of our methods. We have added brief, intuitive explanations for both indices in the Methods section (pages 26-27, lines 569-582):
  
  “The integration index provides a measure of the overall hemispheric dominance of arousal-modulated connections. A positive value indicates that arousal-related edges are preferentially concentrated in the left hemisphere (including its internal and outgoing connections) compared to the right.” and “The segregation index assesses whether arousal preferentially modulates local, intra-hemispheric communication versus long-range, inter-hemispheric communication. A positive value reflects a "segregated" left-hemisphere bias, where arousal strengthens within-hemisphere connections more than it strengthens across-hemisphere communication for that same hemisphere. “
  
  (8) In the Discussion, separate claims into "what we show" versus "what we hypothesize," especially when connecting findings to neuromodulatory pathways.
  
  In the revised manuscript, we have carefully separated our direct empirical findings from our mechanistic hypotheses. we have utilized more cautious and speculative language (e.g., "suggesting a potential role of," "may be mediated by," and "we hypothesize that”) (page 17, lines 352-358):
  
  “Specifically, we show the presence of low-dimensional, reproducible communities suggests that arousal modulates the connectome through coordinated motifs rather than homogeneous gain modulation. We hypothesize that this structured macroscopic architecture reflects the differentiated projection patterns of subcortical neuromodulatory systems, such as the locus coeruleus–noradrenergic pathway (Aston-Jones & Cohen, 2005; Jordan, 2024) and thalamus (Magnin et al., 2010; Lewis et al., 2015; Liu et al., 2018)”
  
  (9) Provide a clear participant-level summary (number of participants contributing to the retained runs, demographics if available, and distribution of runs per participant), alongside the reported run counts retained after quality control.
  
  We agree that clear reporting of participant-level data is essential. In the revised Methods section, we have added a detailed summary of participant demographics (age and sex) and clarified the sample composition (page 23, lines 502-503):
  
  “The final analyzed sample for the resting-state consisted of N = 139 healthy participants (mean age = 29.1±3.5 years, 77 female).”
  
  Furthermore, to provide a transparent view of the data retained after quality control, we have included Figure S9 to illustrate the distribution of valid runs per participant. This visualization confirms the amount of data contributing to our group-level inferences and accounts for exclusions due to motion or pupillary signal quality.
  
  (10) Report the robustness of results to reasonable changes in pupil preprocessing choices (for example, smoothing parameters or interpolation rules), since pupil diameter is the key arousal index.
  
  We agree that the robustness of pupil-derived arousal estimates is fundamental to our findings. To address this, we conducted an extensive validation analysis by comparing our original pupil preprocessing pipeline against 18 alternative combinations of parameters. These variations included different smoothing window sizes (100 ms, 200 ms, and 500 ms), interpolation methods (linear vs. cubic spline), and blink buffer durations (25 ms, 50 ms, and 100 ms). As shown in Figure S8, the pupil diameter time courses derived from these diverse pipelines remained highly correlated with our original estimates (all above 0.65). This demonstrates that our arousal-modulated connectivity results are remarkably robust to reasonable changes in pupil preprocessing choices.
  
  Reviewer #3 (Recommendations for the authors):
  
  I have two additional minor comments:
  
  (1) Given the overall goal of this study to identify large-scale brain communities or clusters underlying arousal, the results may be sensitive to the choice of cortical parcellation. The authors should consider:
  
  (a) including analyses using additional parcellation schemes, or
  
  (b) discussing how the current findings might depend on the chosen parcellation and the implications for robustness and generalizability.
  
  We have addressed this by adding a dedicated point in the Discussion (page 21, lines 456-465):
  
  “Sixth, our findings were derived using a single high-resolution cortical parcellation. While the specific choice of atlas can influence fine-grained regional connectivity, it is important to note that our primary conclusions—such as hemispheric asymmetries and community-level preferences—were identified and interpreted at the macroscopic network and system level. By aggregating signals across broad functional systems, this approach likely mitigates the dependency on precise regional boundary definitions. Nevertheless, future studies employing alternative parcellation schemes would be valuable to further confirm that these organizational principles are not specific to the current atlas but represent a generalizable feature of the arousal-modulated connectome.”
  
  (2) Some key details, such as the number of participants included in the study, as well as basic demographic information, are not reported.
  
  We apologize for this omission. In the revised Methods section, we have now included a detailed summary of the participant demographics, including the final sample size (N = 139), age, and sex distribution (page 23, lines 502-503):
  
  “The final analyzed sample for the resting-state consisted of N = 139 healthy participants (mean age = 29.1±3.5 years, 77 female)”
  
  Furthermore, to ensure full transparency regarding data retention, we have added a new figure (Figure S9) illustrating the distribution of valid fMRI runs per participant following our quality-control procedures. We believe these additions provide a clear and complete overview of the study sample.
  
  Reference
  
  Aston-Jones, G., & Cohen, J. D. (2005). AN INTEGRATIVE THEORY OF LOCUS COERULEUS-NOREPINEPHRINE FUNCTION: Adaptive Gain and Optimal Performance. In Annual Review of Neuroscience (Vol. 28, Issue Volume 28, 2005, pp. 403–450). Annual Reviews. https://doi.org/10.1146/annurev.neuro.28.061604.135709
  
  Bolt, T., Wang, S., Nomi, J. S., Setton, R., Gold, B. P., deB.Frederick, B., Yeo, B. T. T., Chen, J. J., Picchioni, D., Duyn, J. H., Spreng, R. N., Keilholz, S. D., Uddin, L. Q., & Chang, C. (2025). Autonomic physiological coupling of the global fMRI signal. Nature Neuroscience, 28(6), 1327–1335. https://doi.org/10.1038/s41593-025-01945-y
  
  Chandler, D. J., Gao, W.-J., & Waterhouse, B. D. (2014). Heterogeneous organization of the locus coeruleus projections to prefrontal and motor cortices. Proceedings of the National Academy of Sciences, 111(18), 6816–6821. https://doi.org/10.1073/pnas.1320827111
  
  Chang, C., Leopold, D. A., Schölvinck, M. L., Mandelkow, H., Picchioni, D., Liu, X., Ye, F. Q., Turchi, J. N., & Duyn, J. H. (2016). Tracking brain arousal fluctuations with fMRI. Proceedings of the National Academy of Sciences, 113(16), 4518–4523. https://doi.org/10/f8ktgg
  
  Gonzalez-Castillo, J., Fernandez, I. S., Handwerker, D. A., & Bandettini, P. A. (2022). Ultra-slow fMRI fluctuations in the fourth ventricle as a marker of drowsiness. NeuroImage, 259, 119424. https://doi.org/10.1016/j.neuroimage.2022.119424
  
  Hwang, K., Bertolero, M. A., Liu, W. B., & D’Esposito, M. (2017). The Human Thalamus Is an Integrative Hub for Functional Brain Networks. The Journal of Neuroscience, 37(23), 5594–5607. https://doi.org/10.1523/JNEUROSCI.0067-17.2017
  
  Jordan, R. (2024). The locus coeruleus as a global model failure system. Trends in Neurosciences, 47(2), 92–105. https://doi.org/10.1016/j.tins.2023.11.006
  
  Lewis, L. D., Voigts, J., Flores, F. J., Schmitt, L. I., Wilson, M. A., Halassa, M. M., & Brown, E. N. (2015). Thalamic reticular nucleus induces fast and local modulation of arousal state. eLife, 4, e08760. https://doi.org/10.7554/eLife.08760
  
  Liu, X., De Zwart, J. A., Schölvinck, M. L., Chang, C., Ye, F. Q., Leopold, D. A., & Duyn, J. H. (2018). Subcortical evidence for a contribution of arousal to fMRI studies of brain activity. Nature Communications, 9(1), 395. https://doi.org/10.1038/s41467-017-02815-3
  
  Lloyd, B., De Voogd, L. D., Mäki-Marttunen, V., & Nieuwenhuis, S. (2023). Pupil size reflects activation of subcortical ascending arousal system nuclei during rest. eLife, 12, e84822. https://doi.org/10.7554/eLife.84822
  
  Magnin, M., Rey, M., Bastuji, H., Guillemant, P., Mauguière, F., & Garcia-Larrea, L. (2010). Thalamic deactivation at sleep onset precedes that of the cerebral cortex in humans. Proceedings of the National Academy of Sciences, 107(8), 3829–3833. https://doi.org/10.1073/pnas.0909710107
  
  Meissner, S. N., Bächinger, M., Kikkert, S., Imhof, J., Missura, S., Carro Dominguez, M., & Wenderoth, N. (2023). Self-regulating arousal via pupil-based biofeedback. Nature Human Behaviour, 8(1), 43–62. https://doi.org/10.1038/s41562-023-01729-z
  
  Müller, E. J., Munn, B., Hearne, L. J., Smith, J. B., Fulcher, B., Arnatkevičiūtė, A., Lurie, D. J., Cocchi, L., & Shine, J. M. (2020). Core and matrix thalamic sub-populations relate to spatio-temporal cortical connectivity gradients. NeuroImage, 222, 117224. https://doi.org/10.1016/j.neuroimage.2020.117224
  
  Salvador, S., & Chan, P. (2004). Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms. 16th IEEE International Conference on Tools with Artificial Intelligence, 576–584. https://doi.org/10.1109/ICTAI.2004.50
  
  Schwarz, L. A., & Luo, L. (2015). Organization of the Locus Coeruleus-Norepinephrine System. Current Biology, 25(21), R1051–R1056. https://doi.org/10.1016/j.cub.2015.09.039
  
  Shine, J. M. (2019). Neuromodulatory Influences on Integration and Segregation in the Brain. Trends in Cognitive Sciences, 23(7), 572–583. https://doi.org/10.1016/j.tics.2019.04.002
  
  Shine, J. M., Lewis, L. D., Garrett, D. D., & Hwang, K. (2023). The impact of the human thalamus on brain-wide information processing. Nature Reviews Neuroscience, 24(7), 416–430. https://doi.org/10.1038/s41583-023-00701-0
  
  Sommer, D., & Golz, M. (2010). Evaluation of PERCLOS based current fatigue monitoring technologies. 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology, 4456–4459. https://doi.org/10.1109/IEMBS.2010.5625960
  
  Weijs, M. L., Missura, S., Potok-Szybińska, W., Bächinger, M., Badii, B., Carro-Domínguez, M., Wenderoth, N., & Meissner, S. N. (2025). Modulating cortical excitability and cortical arousal by pupil self-regulation. Nature Communications, 16(1), 4552. https://doi.org/10.1038/s41467-025-59837-5
  
  Yellin, D., Berkovich-Ohana, A., & Malach, R. (2015). Coupling between pupil fluctuations and resting-state fMRI uncovers a slow build-up of antagonistic responses in the human cortex. NeuroImage, 106, 414–427. https://doi.org/10.1016/j.neuroimage.2014.11.034
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.01.06.697875v2
www.biorxiv.org www.biorxiv.org

TGF-β drives the conversion of conventional NK cells into uterine tissue-resident NK cells to support murine pregnancy

4
1. Public_Reviews 02 Jun 2026
 
 in eLife
 
 eLife Assessment
 
 The importance of uterine natural killer (NK) cells in reproductive success has been demonstrated in mice and humans; however, it is still unclear how uterine NK cells are developed. In this important manuscript, the authors provide convincing evidence that TGF-b signaling in NK cells supports normal pregnancy in mice by the conversion of conventional NK cells into uterine tissue-resident NK cells. Previous concerns have been addressed in this revised version.
 
 Summary
2. Public_Reviews 02 Jun 2026
 
 in eLife
 
 Reviewer #1 (Public review):
 
 This is an excellent paper from Dr. Yokoyama and colleagues. The experiments are technically demanding, given the very low cell numbers and the challenges of working with implantation sites at gestational days 6.5, 10.5, and 14.5. Overall, the impact of TGF-β receptor II deficiency in the NK lineage on uterine trNK cell numbers and litter size is convincing, and the authors' conclusions are well supported by the data. Less convincing, however, is the claim that the decrease in trNK cells is compensated by an increase in cNK cells; rather, the absence of TGF-β receptor II appears to result in an overall reduction of NK/ILC1 cells.
 
 Comments on revised version:
 
 I thank the authors for addressing all my comments from my initial review.
 
 Review 1
3. Public_Reviews 02 Jun 2026
 
 in eLife
 
 Reviewer #2 (Public review):
 
 In their manuscript "TGF-β drives the conversion of conventional NK cells into uterine tissue-resident NK cells to support murine pregnancy", Yokoyama and colleagues investigate the role of Tgfbr2 expression by NK cells in the formation of tissue-resident uterine NK cells and subsequent importance in murine pregnancy. By transferring congenic splenic conventional NK cells into pregnant mice, they show conversion of circulating NK cells into uterine ivCD45 negative tissue-resident NK cells. When interfering with the formation of uterine trNK cells, spiral artery remodelling was impaired, fetal resorption rates were increased, and litter sizes were reduced.
 
 Generally, this is a research topic of high interest, yet the manuscript is lacking detailed mechanistical insights and some questions remain open. At the current state, the data represent an interesting characterisation of the Tgfbr2-fl/fl Ncr1-Cre mice in pregnancy, but considering 1) the recent publication by the group (Ref#17) on the role of Eomes+ cNK cells during pregnancy, 2) the previously described role of Tgfbr2 and autocrine TGFb expression for uterine NK cell differentiation in virgin mice (also cited by the authors), and 3) the well-known relevance of uterine NK cells during pregnancy, additional experiments addressing the specific role of Tgfb during pregnancy would help to improve novelty and significance of the manuscript.
 
 Comments on revised version:
 
 In their revised version of the manuscript and their point-by-point response, the authors have very carefully addressed and discussed all of our concerns and suggestions.
 
 Review 2
4. Public_Reviews 02 Jun 2026
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 Reviewer #1 (Public review):
 
 (1) Figure 1A and B: Although a trend is evident, it does not appear that the absolute number of cNK cells at day 14 is significantly changed from day 6.5?
 
 We thank the reviewer for this careful observation. We had not originally performed a statistical comparison between the number of cNK cells present at gds 6.5 and 14.5. We have now conducted the appropriate statistical analysis for this dataset and found that the absolute number of cNK cells at day 14.5 is in fact significantly different from day 6.5 (p = 0.0005; unpaired t test, Mann-Whitney correction). The figure and corresponding legend have been updated to reflect this analysis. Please see Figure 1B:
 
 “Statistics were calculated using unpaired t tests with the Mann-Whitney correction. Error bars indicate SEM; *** p < 0.001.”
 
 (2) Figure 2E: The authors state, "This reduction of uterine trNK cells was accompanied by a concomitant increase in the absolute number and frequency of CD49b+Eomes+ cNK cells within the pregnant uterus of TGF-βRIINcr1Δ dams (Figure 2 D, E). The number of cNK cells appears relatively low (visually ~1,000-1,300), and although the difference is statistically significant, its physiological relevance is unclear. More importantly, this modest increase does not correlate with the marked decrease in trNK and ILC1 populations, as cNK cells do not appear to accumulate. In my opinion, the conclusion "Collectively, these findings indicate that a TGF-β-driven differentiation pathway directs the conversion of peripheral cNK cells into uterine trNK cells during murine pregnancy" should be slightly toned down.
 
 We thank both reviewers for this suggestion. Regarding the absence of cNK cell accumulation in the absence of TGF-β signaling, we suggest that this may be related to the normal passage of cNK cells circulating in the placenta, i.e., these cells may not have acquired signals to remain in the uterus and are simply continuing to pass through and not accumulating. Nonetheless, we have rephrased our wording in to address this concern as follows:
 
 “This reduction of uterine trNK cells was accompanied by a small increase in the absolute number and frequency of CD49b+ Eomes+ cNK cells within the pregnant uterus of TGF-βRIINcr1∆ dams (Figure 2 D, E). Collectively, these findings suggest that a TGF-β–driven differentiation pathway directs the conversion of peripheral cNK cells into uterine trNK cells during murine pregnancy.”
 
 “The absence of cNK cell accumulation in the gravid uterus in the setting of impaired TGF-β signaling suggests a defect in tissue retention rather than recruitment. In the absence of TGF-β–mediated cues, circulating cNK cells that enter the uterine vasculature may fail to acquire the molecular programs required for residency and instead continue to transit through the tissue. This is consistent with a model in which TGF-β signaling promotes not only phenotypic conversion but also the acquisition of retention signals necessary for persistence within the uterine microenvironment, reinforcing that acquisition of tissue-residency in the gravid uterus is an actively instructed process [29,32].”
 
 (3) Figures 2-4: It is unclear whether the littermate controls are floxed mice or floxhet-Ncr1iCre mice? This distinction is important, as Ncr1iCre expression itself could potentially lead to a phenotype.
 
 To address these concerns, we characterized the uterine innate lymphoid cell compartment in the pregnant uterus of Ncr1icre dams at gestational day 6.5. We did not observe a difference in the absolute number and frequency of trNK cells, cNK cells, and ILC1s in the gravid uterus of Ncr1icre dams compared to wildtype CD45.1 C57BL/6 mice. Additionally, the number of implantation sites and resorption rates in Ncr1icre dams was comparable to wildtype CD45.1 C57BL/6 mice. Together these data indicate that Ncr1icre expression itself does not influence the phenotype we report in TGF-βRIINcr1∆ dams. These additional findings have been included in Supplementary Figure 1 and in the text as follows:
 
 “To ensure we exclude a confounding effect of Ncr1iCre expression, we profiled the uterine innate lymphoid compartment in pregnant Ncr1iCre dams at gestational day 6.5. No differences were observed in the absolute number of trNK cells, cNK cells, or ILC1s relative to wildtype controls (Figure S1 A-D), and implantation site number and resorption rates were likewise unchanged (Figure S1 E-F). These data indicate that Ncr1iCre expression alone does not perturb uterine ILC composition or early pregnancy outcomes.”
 
 Reviewer #1 (Recommendations for the authors):
 
 (1) Figure 1C &D: The adoptive transfer experiment is convincing. As a minor point, why is the gate setting for Eomes different between panels 1C and 1D?
 
 To clarify the phenotype of the adoptively transferred cNK cells, we included two additional gates depicting the expression of CD49a and CD49b in unlabeled (non-vascular) trNK cells and cNK cells in the pregnant uterus Please see the revised Figure 1C and revised figure legend:
 
 “(C) Concatenated flow plots of implantation sites showing that adoptively transferred cNK cells in pregnant uterus of wildtype dams upregulate CD49a and down regulate CD49b by gd 10.5, acquiring a CD49a+ CD49b- Eomes+ phenotype characteristic of uterine trNK cells (C57BL/6 dams n=4). Here, 2.5x106 CD45.2+ CD3- CD19- NK1.1+ NKp46+ CD49b+ splenic cNK cells were adoptively transferred into pregnant C57BL/6-CD45.1 dams at gd 0.5, and the receptor profile of these cells was subsequently assessed at gd 10.5. Gating strategy: Live, Single Cells; CD3- CD19- CD45.1- CD45.2–PE-Cy7- CD45.2–PE+ NK1.1+ NKp46+ cells.”
 
 (2) Figure 3: Has the pup ratio male/female changed?
 
 We did not observe a statistically significant difference in the female-to-male pup ratio between groups.
 
 Reviewer #2 (Public review):
 
 (1) The authors suggest cNK extravasation and local differentiation into iv- trNK. Can it be estimated how much this process contributes to the trNK pool vs. a potential local proliferation of already existing trNK? How do absolute numbers of CD49a+ Eomes+ trNK change during pregnancies? (In Figure 1A, the cell numbers of CD49a+ Eomes+ trNK seem to go down dramatically between gd 6.5 and 14.5). The plot in 1B could also include absolute numbers of ILC1s and trNKs. Would recruited cNK cells compensate for a potential loss of CD49a+ Eomes+ trNK?
 
 Our prior work as well as others have tracked the changes in uterine trNK cells, cNK cells, and ILC1s over the course of murine pregnancy. Consistent with these studies, the absolute number of uterine CD49a+ Eomes+ trNK cells peaks during early pregnancy (roughly between gds 5.5 7.5) and subsequently declines until term. The decrease in uterine trNK cells between gd 6.5 and gd 14.5 observed in Figure 1A is therefore consistent with the known physiological contraction of the decidual NK compartment as pregnancy progresses. Thus, it is unlikely that cNK cells recruited within the uterine tissue compensate for the loss of CD49a+ Eomes+ trNK cells observed. To address the reviewer’s request, we have now included the absolute number of uterine trNK cells and ILC1s in Figure 1–please see updated Figure 1C and D and corresponding figure legend (provided below). With respect to the relative contribution of cNK cells extravasation vs local proliferation of trNK cells, our data do not allow us to quantitatively distinguish between these mechanisms. Moreover, previous studies have demonstrated that uterine trNK cells express Ki67, suggesting that they exhibit proliferative activity during this period. Thus, we hypothesize that both local proliferation of existing trNK cells and recruitment of circulating cNK cells contribute to the population of uterine trNK cells during early pregnancy.
 
 “(C) Concatenated flow plots of implantation sites showing that adoptively transferred cNK cells in pregnant uterus of wildtype dams upregulate CD49a and down regulate CD49b by gd 10.5, acquiring a CD49a+ CD49b- Eomes+ phenotype characteristic of uterine trNK cells (C57BL/6 dams n=4). Here, 2.5x106 CD45.2+ CD3- CD19- NK1.1+ NKp46+ CD49b+ splenic cNK cells were adoptively transferred into pregnant C57BL/6-CD45.1 dams at gd 0.5, and the receptor profile of these cells was subsequently assessed at gd 10.5. Gating strategy: Live, Single Cells; CD3- CD19- CD45.1- CD45.2–PE-Cy7- CD45.2–PE+ NK1.1+ NKp46+ cells. (D) Proportion of uterine ILC subsets derived from adoptively transferred splenic cNK cells in the pregnant uterus of wildtype dams. Statistics were calculated using unpaired t tests with the Mann-Whitney correction. Error bars indicate SEM; ***p < 0.001.”
 
 Barahona, J.D., Yang, L. and Yokoyama, W.M., 2025. Eomesodermin defines uterine NK cells crucial for pregnancy success in mice. The Journal of Immunology, 214(10), pp.2549-2556.
 
 Filipovic, I., Chiossone, L., Vacca, P., Hamilton, R.S., Ingegnere, T., Doisne, J.M., Hawkes, D.A., Mingari, M.C., Sharkey, A.M., Moretta, L. and Colucci, F., 2018. Molecular definition of group 1 innate lymphoid cells in the mouse uterus. Nature Communications, 9(1), p.4492.
 
 (2) Figure 1C: 2.5 Mio cNK cells have been transferred, but only very few cells can be detected within the uterus (concatenated FACS plot shown). What may represent the limit to generate uterine trNK out of cNK? Is the niche supporting cNK-trNK differentiation limited? Is it only a specific subset of (splenic) cNK capable of differentiating into trNK? Is gd 0.5 the optimal timepoint for the transfer? Is there continuous recruitment of cNK into the uterus and differentiation into trNK, or is it enhanced at specific timepoints of pregnancy? Could there be local proliferation of cNK-derived trNK? This could be studied by proliferation dye dilution of WT cNK cells in this transfer-setup.
 
 We recognize that transferring cNK cells at gestational day 0.5–prior to placental formation–may partially account for the low uterine reconstitution observed. At this time point, the local signals necessary for efficient recruitment and retention of cNK cells in the uterus may not yet be fully established, potentially resulting in preferential homing to peripheral tissues such as the spleen and liver. Consistent with this possibility, we do observe a robust population of adoptively transferred cNK cells in the spleen and liver of our pregnant dams. We decided to transfer cNK cells at gestational day 0.5 to ensure that the cells were present at throughout most of early pregnancy, particularly during implantation and the initial stages of decidualization. We also did not transfer cells before mating to minimize the number of mice that did not get pregnant. Additionally, performing the transfer at this early time point minimized repeated manipulation of pregnant dams, as procedural stress itself has been shown to affect physiological processes of gestation and could thereby confound the pregnancy outcomes we were assessing. Furthermore, Filipovic et al. 2018 previously showed that both trNK cells and cNK cells in the pregnant uterus expressed Ki67 at gestational 9.5, suggesting that there could be local proliferation of cNK-derived trNK cells in the gravid uterus that could limit the migration of circulating cNK cells into this microenvironment. We have discussed in more depth in our discussion section as follows:
 
 “Interestingly, the inability to fully reconstitute the uterine trNK cell compartment following adoptive transfer suggests that only a subset of circulating cNK cells may be capable of differentiating into trNK cells during pregnancy, or alternatively that trNK cells already present in the virgin uterus may undergo in situ proliferation in the gravid uterus. Previous studies from our lab as well as others show that trNK cells within the pregnant murine uterus express marked levels of Ki67, supporting a model in which local proliferation of uterine trNK cells is a major contributor to the uterine trNK cell pool during pregnancy [7,32]. Prior studies have also described hematopoietic precursors within endometrial and decidual tissues that generate uterine trNK cells, suggesting that the compartment may be also sustained by local precursor differentiation [33-35]. Together, these findings suggest that uterine trNK cell ontogeny may be more complex than a single-source model and raise the possibility that distinct developmental pathways may operate at different stages of reproductive life. Therefore, defining the relative contribution and developmental timing of hematogenous versus locally maintained sources in vivo could provide relevant insights into the developmental trajectories and transcriptional programs that underlie decidual NK cell heterogeneity.”
 
 Zhai, Q.Y., Wang, J.J., Tian, Y., Liu, X. and Song, Z., 2020. Review of psychological stress on oocyte and early embryonic development in female mice. Reproductive Biology and Endocrinology, 18(1), p.101.
 
 Wiebold, J.L., Stanfield, P.H., Becker, W.C. and Hillers, J.K., 1986. The effect of restraint stress in early pregnancy in mice. Reproduction, 78(1), pp.185-192.
 
 Sánchez-Rubio, M., Abarzúa-Catalán, L., Del Valle, A., Méndez-Ruette, M., Salazar, N., Sigala, J., Sandoval, S., Godoy, M.I., Luarte, A., Monteiro, L.J. and Romero, R., 2024. Maternal stress during pregnancy alters circulating small extracellular vesicles and enhances their targeting to the placenta and fetus. Biological Research, 57(1), p.70.
 
 Filipovic, I., Chiossone, L., Vacca, P., Hamilton, R.S., Ingegnere, T., Doisne, J.M., Hawkes, D.A., Mingari, M.C., Sharkey, A.M., Moretta, L. and Colucci, F., 2018. Molecular definition of group 1 innate lymphoid cells in the mouse uterus. Nature Communications, 9(1), p.4492.
 
 (3) The authors should consider inducible Tgfbr2 deletion (e.g. with Tamoxifen-inducible Cre) to enable development of the uterine NK compartment in virgin mice and only ablate trNK differentiation during pregnancy. This could help to estimate the turnover of cNK into trNK, or to understand if constant cNK recruitment is required to form the uterine trNK compartment during pregnancy.
 
 Thank you for this suggestion. We did initially consider incorporating a mouse model with a tamoxifen-inducible deletion of the TGF-βRII to examine the differentiation of peripheral cNK cells into uterine trNK cells more precisely. However, the administration of tamoxifen during murine pregnancy has well-established deleterious effects on implantation, fetal viability, and placentation, which would confound our interpretations of any adverse pregnancy outcome observed in our studies. Because our goal was to assess NK cell-specific contributions to murine gestation without introducing additional pregnancy-related perturbations, we elected to use an Ncr1iCre – based mouse model in our studies.
 
 Ved, N., Curran, A., Ashcroft, F.M. and Sparrow, D.B., 2019. Tamoxifen administration in pregnant mice can be deleterious to both mother and embryo. Laboratory animals, 53(6), pp.630-633.
 
 Sun, M.R., Steward, A.C., Sweet, E.A., Martin, A.A. and Lipinski, R.J., 2021. Developmental malformations resulting from high-dose maternal tamoxifen exposure in the mouse. PLoS One, 16(8), p.e0256299.
 
 Ilchuk, L.A., Stavskaya, N.I., Varlamova, E.A., Khamidullina, A.I., Tatarskiy, V.V., Mogila, V.A., Kolbutova, K.B., Bogdan, S.A., Sheremetov, A.M., Baulin, A.N. and Filatova, I.A., 2022. Limitations of tamoxifen application for in vivo genome editing using Cre/ERT2 system. International Journal of Molecular Sciences, 23(22), p.14077.
 
 (4) Did the authors consider transfer of Tgfbr2-floxed Ncr1-Cre cNK in the same setup as in Fig. 1C? This experiment could confirm the requirement of Tgfbr-dependent signaling for cNK to trNK conversion during pregnancy versus effects of Tgfb signals on trNK numbers in the uterus at steady state (before pregnancy).
 
 We thank the reviewer for this mechanistically insightful suggestion. We did consider performing reciprocal transfer experiments using TGF-βRIIfl/fl Ncr1icre cNK cells in the same adoptive transfer system as in Figure 1C. Our current adoptive transfer experiments already directly address this question. Transfer of congenically labeled wild-type splenic cNK cells into TGF-βRIINcr1Δ dams at gestational day 0.5 resulted in partial reconstitution of the uterine trNK compartment and, importantly, this was sufficient to rescue the adverse pregnancy outcomes observed at midgestation. These findings indicate that TGF-β–competent cNK cells can differentiate and function appropriately within the pregnant uterine environment, supporting a requirement for TGF-β–dependent signaling in cNK-to-trNK conversion during pregnancy. Because restoration of TGF-β–sufficient cNK cells rescues these pregnancy outcomes, we believe this experiment functionally demonstrates the importance of TGF-β signaling in this process and therefore did not pursue reciprocal transfer of TGF-βRII–deficient cNK cells.
 
 “Partial reconstitution of uterine trNK cells restores midgestational pregnancy outcomes in TGF-βRIINcr1∆ dams
 
 To determine whether restoring uterine trNK cells could rescue the midgestational pregnancy defects observed in TGF-βRIINcr1∆ dams, we adoptively transferred wildtype, congenically labeled splenic cNK cells into pregnant TGF-βRIINcr1∆ dams at gd 0.5. By gd 10.5, donor cNK cells were detected in the pregnant uterus, where a subset upregulated CD49a and downregulated CD49b, consistent with acquisition of a uterine trNK cell phenotype (Figure 5 A). However, adoptively transferred splenic cNK cells only partially reconstituted the uterine trNK cell population in the gravid uterus of TGF-βRIINcr1∆ dams, as evidenced by reduced absolute number and frequency of donor-derived trNK cells in reconstituted TGF-βRIINcr1∆ dams (Figure 5 A-C). Notably, this partial reconstitution was sufficient to rescue the gestational defects caused by impaired TGF-β–mediated uterine trNK cell differentiation. Reconstituted TGF- βRIINcr1∆ dams exhibited implantation site numbers and fetal resorption rates at gd 10.5 comparable to those observed in littermate controls (Figure 5 D, E). Together, these findings suggest that even partial restoration of the uterine trNK cell in pregnant TGF-βRIINcr1∆ dams is sufficient to restore pregnancy outcomes at midgestation, supporting a central role for uterine trNK cells as the principal NK cell subset required for successful murine pregnancy.”
 
 (5) Figures 2D/E: The authors should state that ILC1s are reduced in the virgin uterus of female Tgfbr2-floxed or Tgfb1-floxed Ncr1-Cre mice and cite the relevant work (the Ref #29 discussed in this context did not show that?). It would be helpful to include an analysis of all three uterine ILC subsets in steady state. This could help to answer the question if the cNK cell changes are pregnancy-specific or a general phenomenon in Tgfbr2-floxed Ncr1-Cre mice.
 
 We thank the reviewer for this important comment and for noting the miscitation. We regret the error and have corrected the reference in the revised manuscript to cite the appropriate study demonstrating reduced ILC1s in the virgin uterus of Tgfb1fl/fl Ncr1iCre mice {Sparano, C. et al. 2024. Autocrine TGF-β1 drives tissue-specific differentiation and function of resident NK cells. Journal of Experimental Medicine, 222(3), p.e20240930}. Please see Line 148. Importantly, the steady-state ILC compartment in virgin Tgfb1fl/fl Ncr1iCre mice has already been carefully characterized in the previously published work, including analysis of all three uterine ILC subsets. Because the steady-state uterine ILC landscape in this mouse model has already been established by Sparano, C. et al. 2024, our study focuses specifically on the pregnancy-associated changes in the uterine ILC landscape occurring in the absence of TGF-β signaling in Ncr1-expressing cells and their subsequent effects on gestational outcomes. In the absence of TGF-β signaling there appears to be a higher frequency of cNK cells in both the virgin uterus and pregnant uterus, suggesting that this is more of a general phenomenon.
 
 “However, in the pregnant uterus, CD49a+ Eomes- ILC1s were markedly reduced in implantation sites of TGF-βRIINcr1∆ dams, paralleling the reduction of ILC1s previously reported in the virgin uterus of TGF-βRIINcr1∆ female mice [26].”
 
 (6) Figure 2E: Please phrase more carefully about the "concomitant increase" of cNKs, since this increase is much less pronounced compared to the very strong reduction (absence) of trNKs in Tgfbr2-floxed Ncr1-Cre mice. Do the authors suggest that cNKs are halted at this stage and cannot differentiate into trNK, based on these data?
 
 We thank both reviewers for this suggestion, and we have rephrased our wording to address this concern as follows:
 
 “This reduction of uterine trNK cells was accompanied by a small increase in the absolute number and frequency of CD49b+ Eomes+ cNK cells within the pregnant uterus of TGF-βRIINcr1∆ dams (Figure 2 D, E). Collectively, these findings suggest that a TGF-β–driven differentiation pathway directs the conversion of peripheral cNK cells into uterine trNK cells during murine pregnancy.”
 
 Please also see our response to Reviewer #1, Comment #2.
 
 (7) Can the reduced litter size and the abnormal spiral artery formation be rescued by transfer of WT cNK into Tgfbr2-floxed Ncr1-Cre mice?
 
 We thank the reviewers for this interesting question. In subsequent experiments, we transferred congenically labeled, splenic cNK cells from wildtype female mice into TGF-βRIINcr1∆ dams at gestational day 0.5. We only observed partial reconstitution of uterine trNK cell population; however, the number of viable implantation sites and resorption rates in reconstituted TGF-βRIINcr1∆ dams were comparable to the number of viable implantation sites and resorption rates in HBSS-treated littermate controls at gestational day 10.5. Given that partial reconstitution of the uterine trNK cell compartment in reconstituted TGF-βRIINcr1∆ dams was sufficient to rescue the defects in implantation site number and fetal resorption rates observed at midgestation, we hypothesize that this level of restoration may permit patrial but functionally sufficient spiral artery remodeling to reestablish maternal-fetal blood flow adequate to support fetal viability, although spiral artery remodeling was not directly assessed in this transfer study.
 
 “Partial reconstitution of uterine trNK cells restores midgestational pregnancy outcomes in TGF-βRIINcr1∆ dams
 
 To determine whether restoring uterine trNK cells could rescue the midgestational pregnancy defects observed in TGF-βRIIcr1∆ dams, we adoptively transferred wildtype, congenically labeled splenic cNK cells into pregnant TGF-βRIINcr1∆ dams at gd 0.5. By gd 10.5, donor cNK cells were detected in the pregnant uterus, where a subset upregulated CD49a and downregulated CD49b, consistent with acquisition of a uterine trNK cell phenotype (Figure 5 A). However, adoptively transferred splenic cNK cells only partially reconstituted the uterine trNK cell population in the gravid uterus of TGF-βRIINcr1∆ dams, as evidenced by reduced absolute number and frequency of donor-derived trNK cells in reconstituted TGF-βRIINcr1∆ dams (Figure 5 A-C). Notably, this partial reconstitution was sufficient to rescue the gestational defects caused by impaired TGF-β–mediated uterine trNK cell differentiation. Reconstituted TGF-βRIINcr1∆ dams exhibited implantation site numbers and fetal resorption rates at gd 10.5 comparable to those observed in littermate controls (Figure 5 D, E). Together, these findings suggest that even partial restoration of the uterine trNK cell in pregnant TGF-βRIINcr1∆ dams is sufficient to restore pregnancy outcomes at midgestation, supporting a central role for uterine trNK cells as the principal NK cell subset required for successful murine pregnancy.”
 
 Reviewer #2 (Recommendations for the authors):
 
 (1) Figure 1C: The shown gate seems to "cut" into the CD49b staining; staining for all transferred cells should be shown; have cNK cells been stained in parallel with the same panel to provide a positive and compensation control?
 
 To clarify the phenotype of the adoptively transferred cNK cells, we included two additional gates depicting the expression of CD49a and CD49b in unlabeled (non-vascular) trNK cells and cNK cells in the pregnant uterus Please see the revised Figure 1C.
 
 “(C) Concatenated flow plots of implantation sites showing that adoptively transferred cNK cells in pregnant uterus of wildtype dams upregulate CD49a and down regulate CD49b by gd 10.5, acquiring a CD49a+ CD49b- Eomes+ phenotype characteristic of uterine trNK cells (C57BL/6 dams n=4). Here, 2.5x106 CD45.2+ CD3- CD19- NK1.1+ NKp46+ CD49b+ splenic cNK cells were adoptively transferred into pregnant C57BL/6-CD45.1 dams at gd 0.5, and the receptor profile of these cells was subsequently assessed at gd 10.5. Gating strategy: Live, Single Cells; CD3- CD19- CD45.1- CD45.2–PE-Cy7- CD45.2–PE+ NK1.1+ NKp46+ cells.”
 
 (2) Figure 2A: The authors could include an isotype control or a staining in a genetic knockout as a control staining.
 
 Thank you for this suggestion. As suggested, we included staining in a genetic TGF-βRIINcr1∆ knockout as additional control staining. Please see the revised Figure 2A.
 
 “Representative histograms depicting TGF-β Receptor II expression on splenic NK cells from virgin TGF-βRIINcr1∆ and wildtype mice as well as splenic and uterine NK cell subsets from pregnant wildtype mice at gd 10.5 (virgin TGF-βRIINcr1∆ mice, n=2; virgin mice: C57BL/6, n=5; gd 10.5: C57BL/6 dams, n=8, implantation sites n=8). MFI, median fluorescent intensity. Gating strategy: Live, Single Cells; CD3- CD19- CD45.1- CD45.2+ NK1.1+ NKp46+ cells.”
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.11.18.688992v2
www.biorxiv.org www.biorxiv.org

Intron Retention Controls Localization of lncRNAs PURPL and MALAT1 to Promote Cell Proliferation and Migration

4
1. Public_Reviews 02 Jun 2026
  
  in eLife
  
  eLife Assessment
  
  This manuscript provides important insights into how U2AF2-dependent intron retention regulates the localization and function of long noncoding RNAs, with evidence supported by multiple complementary approaches. The work is notable for linking intron retention to nuclear speckle localization and cellular phenotypes, including proliferation and migration, although the mechanistic basis remains incompletely resolved. Overall, the study presents a compelling dataset with clear biological implications but would benefit from additional analyses to strengthen mechanistic interpretation and generality.
  
  Summary
2. Public_Reviews 02 Jun 2026
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Intron retention is observed in many long noncoding RNAs. The authors here used a powerful genome-wide screening strategy to identify proteins controlling intron retention in the long noncoding RNA PURPL. One of the top hits across multiple cell lines surprisingly, was U2AF2, which is well known to bind the polypyrimidine tract close to the 3' splice site to promote splicing. Nonetheless, U2AF2 is working in the opposite direction here. Convincing follow-up RT-PCR experiments confirmed that knocking down U2AF2 does indeed lead to reduced intron retention of PURPL. The authors then show that this intron retention event is functionally important for both the nuclear retention of PURPL as well as its ability to enhance cell proliferation.
  
  The authors then used transcriptome-wide analyses to look for additional intron retention events affected by U2AF2. Among the ~250 genes with decreased intron retention (more splicing) upon U2AF2 knockdown was MALAT1, a well-established long noncoding RNA that normally localizes to nuclear speckles. Depletion of U2AF2 or removal of the MALAT1 2nd intron resulted in reduced speckle localization and cell migration, revealing a critical and fascinating role for this intron retention event. Overall, the authors have used a set of complementary approaches to clearly demonstrate a very intriguing role for U2AF2 in controlling intron retention and functionality of a set of long noncoding RNAs.
  
  I feel the current work has revealed an important role of intron retention in controlling the localization and functionality of long noncoding RNAs, which is likely broad in scope and is likely regulated by cell state.
  
  One experimental suggestion: The authors show that expressing intron-2 containing PURPL in PURPL-depleted cells is sufficient to induce faster proliferation, but a valuable comparison would be identifying the phenotype expressing spliced PURPL transcript.
  
  Review 1
3. Public_Reviews 02 Jun 2026
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This study identified U2AF1/2 as a regulator of pre-mRNA splicing that either promotes or supresses the splicing of introns on different genes. The authors then focused on two genes PURPL and MALAT1 that U2AF1/2 can promote intron retention of specific introns, and characterized the biological implications of these introns regulated by U2AF1/2.
  
  Strengths:
  
  (1) The experiments in this manuscript are relatively rigorously designed and performed, often with validation checks such as verifying the knockout, verifying the treatment itself doesn't have an effect, etc.
  
  (2) The experiments provided comprehensive support for the claims that these specific introns are important for the stability or nuclear localization of the RNA, as well as that U2AF1/2 suppresses the splicing of these introns.
  
  (3) The writing of the manuscript is very clear and doesn't overstate the conclusions that can be drawn from the experiments.
  
  Weaknesses:
  
  I think one main weakness of this study is the lack of a deeper analysis of the mechanisms. Whether studying the mechanism is within the scope of this paper is probably debatable, but with the current experiment setup and data, I believe there are some analyses that can be relatively easily done to enhance the value or significance of this study. My detailed questions and suggestions are listed below:
  
  (1) Line 194-195 and Figure 2A: How many RBPs are included in "other RBPs" in line 194? Does "other RBPs" only include PTBP1, PRPF8 and SRSF1 in Figure 2A, or do they include all the ~100 RBPs with HepG2 eCLIP data available on ENCODE? If U2AF1/2 have the highest occupancy around the intron 2 region among the ~100 RBPs, it would be nice to visualize it.
  
  (2) Figure 2A and 2B: Why didn't U2AF2 show interaction with exon 2 and 3 in RNA-IP but showed enrichment over exon 2 and exon 3 regions in the eCLIP data?
  
  (3) Figure 3C - 3F: Maybe I misinterpreted the experiments, but to my understanding, these experiments showed that the exogenous PURPL with intron 2 promoted cell proliferation compared to when the exogenous PURPL wasn't induced, but didn't compare to the effect of the same amount of PURPL with intron 2 removed. Wouldn't it be clearer to compare the effects of exogenous PURPL with intron 2 and exogenous PURPL without intron 2 to pinpoint whether the effect is related to intron 2? Without an intron 2 specific experiment, these current experiments don't seem to provide much added value than "PURPL promotes cell proliferation".
  
  (4) It's not very clear what proportion of these introns are retained in the endogenous PURPL and MALAT1 in various tissues, cell types and conditions. I think it will be valuable to provide this background (either from previous research, public database or data from this study).
  
  (5) Since U2AF1/2 have a wide range of targets as demonstrated by Figure 4A, I think it would be valuable to have some experiments that directly disrupt the interaction between U2AF1/2 and PURPL and MALAT1 and test the effect on splicing outcomes, such as by mutating the sequence that U2AF1/2 bind to. The section on the weak py-tract of PURPL touched upon this topic but focused more on how the weak py-tract causes the intron 2 retention in the background rather than how U2AF1/2 binding and action were affected by sequence mutations. I think experiments on disrupting the direct binding between U2AF1/2 on targets can provide valuable mechanistic insights.
  
  (6) Across all the target genes of U2AF1/2, it might be feasible to do some systematic analysis to find what correlates with whether U2AF1/2 have a promoting or suppressing effect on intron splicing. For example, do genes with decreased IR after U2AF2 depletion systematically have a weak py-tract compared to genes with increased IR? This dataset can potentially provide many hypotheses for understanding the dual role of U2AF1/2.
  
  Review 2
4. Public_Reviews 02 Jun 2026
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  This manuscript characterized the splicing regulation of two long non-coding RNAs relevant to cancer, starting with a focus on PURPL and ending with insights into MALAT1. A CRISPR screen for the regulators of PURPL intron retention revealed a role for the U2AF heterodimer in inducing this retention, with U2AF2 as the actual hit. This is surprising, because the canonical function of U2AF is to recognize the polypyrimidine tract (PPT) and 3' splice site junction to induce splicing at the site. The brief mechanistic characterization of this phenomenon showed that this intron retention accounts for the nuclear localization and instability of the PURPL transcript, and seems to confer the enhanced cell proliferation feature. U2AF2 also induces retention of two introns in MALAT1, and one of them is essential for its nuclear speckle localization and enhanced cell migration.
  
  Strengths:
  
  These findings about PURPL and MALAT1 are clear and interesting.
  
  Weaknesses:
  
  The results are not sufficiently connected to each other, because one regulation is nuclear-speckle dependent but not the other.
  
  Here are my specific comments:
  
  Major comments:
  
  The main issue is the lack of focus because of the distinct and incomplete analysis pertaining to the two long noncoding RNAs, PURPL and MALAT1. The paper starts with a very good genetic screen on the former, and immunofluorescence and functional analysis on the latter, with U2AF2 as the main link to induce intron retention. The first one does not show clear localization while the second docks to nuclear speckles, apparently because of the retained intron. Hence the two mechanisms are related yet distinct. Here are some suggestions to enhance the characterization and connection between the two cases:
  
  (1) As the MALAT1 intron 2 retention contributes to its speckle localization but not the retained PURPL intron, the retained introns or their 3' splice site sequences should be swapped to see if they determine the localization.
  
  (2) Figure 3, the rescue of the PURPL knockout by the intron-retained RNA to induce proliferation is a powerful experiment, that is lacking the rescue with the RNA without the intron as a control. This must be done and shown.
  
  (3) The weakness of the PPT of PURPL intron 2 appears as a clear feature of its retention dependent on U2AF2, which appears direct, as backed by CLIP data. It would be good to show direct binding by EMSA or equivalent techniques. Furthermore, the data is also consistent with other determinants. The exon and upstream intronic sequences, including the branch point, could also be involved, so mutations in these are also required.
  
  (4) In brief, what are the commonalities and differences between PURPL and MALAT1 with regard to their U2AF2-dependent intron retention?
  
  Review 3
Visit annotations in context

Tags

Review 3

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.02.19.706780v3
www.biorxiv.org www.biorxiv.org

Complimentary vertebrate Wac models exhibit phenotypes relevant to DeSanto-Shinawi Syndrome

4
1. Public_Reviews 02 Jun 2026
 
 in eLife
 
 eLife Assessment
 
 This important study establishes the first vertebrate models of DeSanto-Shinawi Syndrome, revealing conserved craniofacial and social and behavioral phenotypes across mouse and zebrafish that mirror key clinical features. The convincing evidence is supported by behavioral, anatomical, and molecular analyses of Wac animal mutants. This study sets a baseline for future mechanistic studies and reports a platform to test approaches to reverse phenotypes.
 
 Summary
2. Public_Reviews 02 Jun 2026
 
 in eLife
 
 Reviewer #1 (Public review):
 
 [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]
 
 Summary:
 
 The authors generated mouse and zebrafish models for DeSanto-Shinawi Syndrome, caused by loss-of-function variants in the WAC gene. Using these vertebrate systems, they demonstrate conserved craniofacial and social-behavioral phenotypes that parallel human clinical features, along with deficits in GABAergic markers. They observe increased seizure susceptibility and male-biased brain volumetric changes in Wac mutant mice. Together, these findings begin to define the biological consequences of Wac haploinsufficiency and provide valuable resources for future mechanistic studies.
 
 Strengths:
 
 WAC is a high-confidence neurodevelopmental disorder gene and one of the genes identified by large-scale exome sequencing efforts, including the Satterstrom et al. (2020) autism spectrum disorder cohort. This study establishes the first vertebrate Wac models, addressing a major gap in the understanding of DeSanto-Shinawi Syndrome, and provides a framework for studying other syndromic forms of autism. The models generated will be impactful and useful to the community to study and understand DeSanto-Shinawi Syndrome.
 
 The cross-species analysis is important and well executed, and reveals both conserved and divergent phenotypes. The behavioral and anatomical assays are rigorously executed and well-controlled, and the inclusion of RNA-sequencing analyses adds valuable insights into the mechanisms underlying brain function in Wac mutants. Notably, the RNA-seq data reveal upregulation of several clustered protocadherins, genes central to neuronal identity and cell-cell interactions, which are known to be regulated by dynamic developmental regulation of chromatin architecture. This observation provides an intriguing hint that could link Wac function to higher-order chromatin organization and neuronal connectivity.
 
 Weaknesses:
 
 The evidence is solid, though the study remains incomplete in its mechanistic depth and molecular interpretation. The authors compellingly describe behavioral, anatomical, and transcriptomic phenotypes associated with WAC loss, yet do not explore how WAC mechanistically regulates chromatin or transcription. Given prior evidence that WAC interacts with the RNF20/40 ubiquitin ligase complex and promotes histone H2B ubiquitination and transcriptional elongation, the paper would benefit from a discussion of these functions as a potential link between Wac haploinsufficiency and the observed changes in neuronal gene expression. Similarly, the authors mention WAC's WW and coiled-coil domains but do not consider how these domains could mediate nuclear interactions or recruitment of transcriptional cofactors that shape gene regulation and chromatin organization in neurons.
 
 The transcriptomic analysis is rich but largely descriptive. Although the upregulation of clustered protocadherins is particularly intriguing, these findings are not validated or localized to specific neuronal populations. The study would be strengthened by independently validating the most significant RNA-seq changes, such as protocadherin gamma genes, using in situ hybridization methods to confirm the spatial and cellular specificity of expression changes.
 
 Review 1
3. Public_Reviews 02 Jun 2026
 
 in eLife
 
 Reviewer #2 (Public review):
 
 The authors describe the first deep neurological characterization of WAC mutation in two vertebrate species (zebrafish and mouse). They examine these at various levels, guided by the work in humans that has associated a heterozygous WAC mutation with DeSantos Shinawi Syndrome (DESSH). Therefore, they investigate the animals for a variety of phenotypes, following a template for what is seen when characterizing a new mouse/fish model of a developmental disability gene. Investigations include analysis of skull and jaw for abnormalities(both species), MRI of brain structure(in mice), electrophysiology(mice), assessment of signaling pathways (by Western blot, in mice), cell counts (both, more in mice), transcriptomics (mice), and behavior (both).
 
 Generally, this describes an important first characterization of the consequences of the mutation. Most of the studies appear well-conducted and reasonably powered, thus solid or convincing.
 
 Review 2
4. Public_Reviews 02 Jun 2026
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 Public Reviews:
 
 Reviewer #1 (Public review):
 
 Summary:
 
 The authors generated mouse and zebrafish models for DeSanto-Shinawi Syndrome, caused by loss-of-function variants in the WAC gene. Using these vertebrate systems, they demonstrate conserved craniofacial and social-behavioral phenotypes that parallel human clinical features, along with deficits in GABAergic markers. They observe increased seizure susceptibility and male-biased brain volumetric changes in Wac mutant mice. Together, these findings begin to define the biological consequences of Wac haploinsufficiency and provide valuable resources for future mechanistic studies.
 
 Strengths:
 
 WAC is a high-confidence neurodevelopmental disorder gene and one of the genes identified by large-scale exome sequencing efforts, including the Satterstrom et al. (2020) autism spectrum disorder cohort. This study establishes the first vertebrate Wac models, addressing a major gap in the understanding of DeSanto-Shinawi Syndrome, and provides a framework for studying other syndromic forms of autism. The models generated will be impactful and useful to the community to study and understand DeSanto-Shinawi Syndrome.
 
 The cross-species analysis is important and well executed, and reveals both conserved and divergent phenotypes. The behavioral and anatomical assays are rigorously executed and well-controlled, and the inclusion of RNA-sequencing analyses adds valuable insights into the mechanisms underlying brain function in Wac mutants. Notably, the RNA-seq data reveal upregulation of several clustered protocadherins, genes central to neuronal identity and cell-cell interactions, which are known to be regulated by dynamic developmental regulation of chromatin architecture. This observation provides an intriguing hint that could link Wac function to higher-order chromatin organization and neuronal connectivity.
 
 Weaknesses:
 
 The evidence is solid, but the study remains incomplete in its mechanistic depth and molecular interpretation. The authors compellingly describe behavioral, anatomical, and transcriptomic phenotypes associated with WAC loss, yet do not explore how WAC mechanistically regulates chromatin or transcription. Given prior evidence that WAC interacts with the RNF20/40 ubiquitin ligase complex and promotes histone H2B ubiquitination and transcriptional elongation, the paper would benefit from a discussion of these functions as a potential link between Wac haploinsufficiency and the observed changes in neuronal gene expression. Similarly, the authors mention WAC's WW and coiled-coil domains but do not consider how these domains could mediate nuclear interactions or recruitment of transcriptional cofactors that shape gene regulation and chromatin organization in neurons.
 
 We agree that many mechanisms underlying how both animal model phenotypes and human symptoms that are caused by the Wac gene still need to be worked out. Due to the need to generate a great deal of data to first describe these models in this manuscript this will be expanded upon later. In lieu of this, we plan to follow up with mechanistic papers later to fully address the gap that remains. We have now added a paragraph in the discussion to bring up these important points regarding the roles of Wac during transcription and how its protein domains might be involved in these processes.
 
 The transcriptomic analysis is rich but largely descriptive. Although the upregulation of clustered protocadherins is particularly intriguing, these findings are not validated or localized to specific neuronal populations. The study would be strengthened by independently validating the most significant RNA-seq changes, such as protocadherin gamma genes, using in situ hybridization methods to confirm the spatial and cellular specificity of expression changes.
 
 We have greatly expanded the analyses of the bulk RNA-seq data, including a more rigorous look into the differences in gene expression between sexes, which has additionally revealed males to be more impacted by Wac loss of function. We have also added new western blot data for pan protocadherin alpha, which is now validated to be upregulated in the cortex (new Figure 7I and 7J). We are holding back any additional data from this report as we have single nucleus RNA-seq data that will be reported on in follow-up papers with targeted conditional deletion models.
 
 Finally, while the behavioral and MRI results add valuable breadth, their interpretation would be improved by clearer reporting of sample sizes, statistical corrections, and effect sizes to support claims of sex-specific and regional brain volume differences.
 
 Some additional details have been added to the methods section. In addition, we have now provided sample sizes assessed in each figure legend.
 
 Reviewer #2 (Public review):
 
 The authors describe the first deep neurological characterization of WAC mutation in two vertebrate species (zebrafish and mouse). They examine these at various levels, guided by the work in humans that has associated a heterozygous WAC mutation with DeSantos Shinawi Syndrome (DESSH). Therefore, they investigate the animals for a variety of phenotypes, following a template for what is seen when characterizing a new mouse/fish model of a developmental disability gene. Investigations include analysis of skull and jaw for abnormalities(both species), MRI of brain structure(in mice), electrophysiology(mice), assessment of signaling pathways (by Western blot, in mice), cell counts (both, more in mice), transcriptomics (mice), and behavior (both).
 
 Generally, this describes an important first characterization of the consequences of the mutation. Most of the studies appear well-conducted and reasonably powered, thus solid or convincing. However, there are a few places where the data presentation could be improved for clarity, and a few concerns about some choices in analytical approach for a couple of the experiments, where improved statistical approaches could improve their sensitivity and/or better rule out false positives, and thus the support of some of these claims is currently incomplete. There is also some lack of clarity about the rationale for some decisions regarding the fish genetics. Nonetheless, this is an important and useful first characterization of many phenotypes of these lines. Such experiments form a baseline for future mechanistic studies in the same lines and a platform to test approaches to reverse phenotypes.
 
 Individual claims and their strength & weaknesses:
 
 (1) The authors developed mouse and zebrafish models of WAC deletion
 
 They used the existing KOMP floxed WAC line to generate a null allele. For the mouse, there is a Western showing that it is indeed null for the protein. The fish data is less robustly validated - they don't confirm the allele in null at the protein or RNA level, and fish have two paralogs (waca and wacb), and this paper only characterizes one of these. So this evidence is less clear. The evaluated mice are heterozygous (Het), similar to patients, while the fish appear to be evaluated as homozygous mutants.
 
 We agree with the reviewer’s comments on zebrafish genetics. Since antibodies against zebrafish Wac proteins are not available, we could not examine protein levels in zebrafish. We predicted frameshift mutations due to DNA analyses in waca and wacb KO zebrafish. We made waca KO, wacb KO, and waca/wacb double KO zebrafish. waca/wacb double KO zebrafish showed a lethal phenotype, similar to homozygous mice mutants. Since wacb KO zebrafish did not show any detectable phenotype we do not report those here. However, we now show examples of the wacb and dKO zebrafish in Figure S1. Since waca KO zebrafish showed craniofacial and behavioral phenotypes that are comparable to mice Het and human patients, they are focused on in this report.
 
 (2) The authors show that both species show altered craniofacial features
 
 These data appear well powered, and the findings are robust.
 
 We appreciate this confirmation.
 
 (3) Each model altered GABAergic neurons
 
 In mice, the authors stained with PV antibodies and saw a decrease in cells positive for this staining. A second marker, Lhx6, does not show a difference, suggesting this might be a change in PV expression rather than cell number. They could maybe look into the literature to see if this loss of just the protein also occurs in other models. Overall, the sample size here is a bit smaller than other parts of the paper (n=3), and the methods on the cell counts were less clear, so it is not as clear that this finding is as robust. The authors counted several other broad classes of cells, and those appear normal. Interestingly, there might also be some TBR1 mislocalization in layer 6 that might be significant with added power.
 
 Thank you for these suggestions. Yes, other models also show this lack of PV expression even when MGE-lineage interneurons are present at normal levels. We mention in the discussion a previous study on the ASD gene CTNNAP2 that showed this. We also agree that there is a trend going on in the Tbr1 population. We assessed another WT and Het pair for Tbr1 laminar distribution and were able to determine that these changes held up and are now significantly different; the person counting these numbers was blind to the genotypes. Finally, we added more details to the methods to describe how the counting was performed.
 
 The fish data is based on an in situ hybridization for GAD. The measure shown is the width of the positive area in the forebrain. This measure is not one I have seen much before, and has potential to be driven by something unrelated to GABA (e.g., if the whole forebrain were simply a bit smaller). So this analysis could use a couple of other approaches (density of signal?) and/or a control probe for some other brain gene showing the measure is normal, and thus it is not just a size issue.
 
 To compare altered GABAergic neurons in mice and zebrafish, we tried to isolate zebrafish PV genes and examined their expression by whole-mount in situ hybridization, now included Figure S3 but found no differences. However, we could not find any zebrafish PV gene useful for GABAergic neurons. We chose to examine gad1b expression in the positive area of the forebrain in WT and waca KO zebrafish and then found differences in the brain area with gad1b expression. Since WT and waca KO brain sizes are generally the same we believe this measurement is reasonable to make this conclusion and have added text to the results section to justify.
 
 (4) Mice were more susceptible to the seizure-inducing agent PTZ
 
 These data appear well powered, and the findings are robust. The authors also did a fair amount of useful electrophysiology that was all normal, but appeared to be well executed.
 
 Thank you, we appreciate this confirmation.
 
 (5) Mice had changes in brain volume that interact with sex
 
 The authors conducted an MRI on a good number of mice and reported a slight increase in global volume just in males. Sample size is fair, but the statistical approach here may be better if it puts males and females in the same model (to boost power and explicitly test for sex by genotype interaction that they report), and there is some chance that the brain region level differences that they report could include some false positives. They tested many regions, and it is not clear whether or not they corrected for the number of tests. Often, an FDR correction would be used in such imaging studies. It may be that only the most robust regional findings will survive those corrections. It is interesting data either way, but the analysis could be improved.
 
 Given the 80 regions (bilaterally) that we used and the number of mice, i.e. 6-7, we are underpowered to robustly undertake FDR types of corrections. In the data presented we used t-tests between sex and regions to illuminate putative regional changes. However, we did revisit our MRI data and found three data sets where the results were not normally distributed. We thus changed our statistical test to Mann Whitney for male retrosplenial cortex, male parietal cortex and female corpus callosum, which are now reflected in the figures and differential statistics noted in figure legends.
 
 (6) Several behaviors are altered in the mice as well
 
 These studies were fairly well-powered (n=15,16), and they found several positive and negative results, including alterations in memory and sociability in both species. There is a minor statistical flaw in the three-chamber analysis (they don't actually compare the Hets directly to the wildtypes in their statistical testing - a common mistake in neuroscience that should be addressed. But the data look like they will probably still be significant when correctly analyzed. In the supplement, the authors could do a bit more with the data they have to look at hyperactivity (i.e., show total motion in open field, not just time in center vs. periphery), and adding sex to their model might improve sensitivity for genotype effects.
 
 Thank you for these suggestions. We have done several things to address this behavioral paradigm. First, we added more n’s and also switched from comparing the mouse vs. object to just comparing genotypes as a variable. In addition, we switched to quantifying a discrimination index, described in Phiilips et al., 2019 PMID: 31112129 for our measurement. These new data are shown in Figure 3A. Open field total distance traveled has now been added to Figure S2A. For all other measurements, we did first assess for sex differences but found none and thus compiled both sexes for the graphs.
 
 (7) Some biochemical signaling pathways are altered in the brain
 
 These are n=4 immunoblots, and show altered phospho ERK, but no changes in other signaling events predicted from prior WAC literature like H2B ubiquitination. They appear well done, and the authors share the full blots in the supplement.
 
 Thank you, we appreciate this confirmation. Since Wac is an adaptor protein we needed to test these reported molecular changes in neurons that were previously only reported in cell lines and drosophila. We were not surprised that some of these previously reported changes would not be the same in brain cells. However, it is possible that these changes might arise in more discrete brain regions or at different times during development, which will be tested in our future conditional knockout models.
 
 (8) WAC deletion also alters gene expression in the brain
 
 These studies were well-powered for RNAseq, with 10 and 14 samples, using neonates (P2), just the forebrain. The sequencing quality metrics all looked good, and the approach to analysis was okay. It would be stronger to again include sex in the model, rather than separate by sex. There were some typos in this part of the paper that made part of the conclusions unclear, but the RNAseq nicely confirmed the mutation of the mice, and discovered many differentially expressed genes, consistent with the role of this gene as a regulator of transcription. The presentation could be expanded to make more use of the data. Overall, though, this is a useful first characterization of the transcriptome in the line.
 
 Thank you for the suggestions. We have greatly expanded our assessments of the RNA-seq data. Upon analyzation of the data we found many differences between males and females and now show combined and sex-separated data. Our new data isolate several more extreme and some unique changes in males that are better shown as stand alone figure panels. In addition to these edits, we have also reworked all the text in this section of the results for better reading.
 
 Recommendations for the authors:
 
 Reviewer #1 (Recommendations for the authors):
 
 (1) The cause and timing of lethality in the homozygous Wac knockout should be reported or discussed. Investigating Wac homozygous knockout embryos, if viable at early stages, could provide valuable insight into the developmental origins of the neuroanatomical and behavioral phenotypes described in the heterozygous animals. Even a brief histological or transcriptomic characterization of embryonic brains would strengthen the mechanistic understanding of Wac function during neurodevelopment.
 
 We agree and have collected embryos as early as embryonic day 12.5 from multiple litters but never detected a knockout. We have added this text to the animal methods sections to let readers understand effort had been done to determine when death occurs. While we don’t currently explore this further in mice we now include zebrafish waca; wacb double knockouts. Notably, while we were able to generate a few of these mutants, most died. However, some zebrafish were aged long enough to observe lethal deficits in heart formation and swim bladder development, suggesting that early loss of Wac could impact these critical organs that leads to death.
 
 (2) A better description of the data reported in Supplementary Tables 3 through 5 is needed. Supplementary Table 3 does not report any statistically significantly differentially expressed genes in the FDR column, and Supplementary Table 5 reports only two, and the reader should understand what the columns are indicating.
 
 We have now added figure legend text to the supplementary file to explain each Table mentioned here.
 
 Reviewer #2 (Recommendations for the authors):
 
 (1) Page 3, last paragraph. The description of wacb is confusing. I recommend that the authors provide the unshown data they mention and also further explanation of the breeding scheme and result. Indeed, if wacb is homozygous lethal, does that make it more like the mouse WAC gene, and thus potentially the more relevant paralogue to study? Are both waca and wacb expressed in the same tissues? How does that compare to mouse and human WAC expression? Such figures about gene expression (even when adapted with permission from public resources like Allen brain atlas or GTEX) are common in this sort of paper, as they can be helpful to understand when and where the gene is thought to act. For waca vs. wacb, they may help determine which gene is more relevant to the brain (for example, if only one is expressed in the brain).
 
 First, this is a great question and we have now added whole mount in situ for the waca and wacb genes as Figure S1. These data show low to no wacb expression in brain regions while waca is highly expressed there. Since the waca mutants showed phenotypes relevant to DESSH but wacb mutants did not, this correlates with observed expression patterns without fully excluding wacb from any role. Thus, we also made waca/wacb double KO zebrafish that showed a lethal phenotype, similar to homozygous mice mutants. Only a few waca; wacb double knockouts survived a little through development and are now shown in Figure S1. Since wacb KO zebrafish did not show any detectable phenotype on their own, we did not include the data since there are already several figures/tables in this manuscript. However, the waca KO zebrafish did show phenotypes similar to humans with DESSH and are the ones we focused on.
 
 (2) Why did the authors cross the mice into the outbred CD1 background? Usually, most labs keep the lines on an inbred background. Was there a particular rationale here? I am not saying that they could not outcross them. It is just a bit puzzling why. Perhaps a sentence of explanation in the methods section would be warranted.
 
 This is a great question and we have now added text to the animal methods section. Many labs that study development, especially on genes critical for survival/life like the Wac gene, use a more robust strain like CD-1. By doing this, we have a better chance of evaluating mutants at more mature ages and getting enough progeny to do more reproducible studies.
 
 (3) A typical first experiment in a new knockout (fish or mouse) is to establish that the deletion does indeed result in a loss of RNA and protein. In the absence of this, the rest of the paper cannot be as confidently interpreted.
 
 We did this for the mouse model and found reduced protein expression in the constitutive Het, however this datum is part of the western blots in figure 5. We now mention this in the early results section that protein levels were reduced in the Hets but maintain that the presentation of the western blot is better suited in Fig. 5 to compare to the other western blots. For zebrafish this was attempted but was more difficult. Available antibodies don’t work in zebrafish. RNA expression was attempted in both models and due to Wac being a critical gene for life, there are checks in place to upregulate faulty and normal RNA in the waca model. We screened for frameshift mutations in multiple KO lines and confirmed it by genomic DNA sequencing. In making many KOs and large-scale mutagenesis in zebrafish, we usually depend on phenotype-genotype segregation in Mendelian inheritance for many generations.
 
 (4) Are these new lines indeed knockouts? I did find a WAC western as part of a later figure for the mouse. The authors may want to mention that earlier, or present at least that data right away. What about in the fish? Is there a way to confirm at the RNA or protein level that it is indeed a null allele?
 
 Yes, as mentioned in the above response we have now mentioned our Wac western blot results early when introducing the mouse mutants and the issues with doing this in fish are presented above as well.
 
 (5) Why are fish used that are KO while mice are Hets? Are WAC homozygous mice not viable? This should be mentioned. Regardless, the rationale for examining heterozygous mice and homozygous mutant fish should be provided. Each kind of experiment is useful, but they are interpreted in different ways. Hets will genocopy the patients, who are generally hets, while KOs are often useful for a study of the essential roles of the genes, even if they are not really modeling the patient gene dose.
 
 Wac homozygous mice in our hands are embryonic lethal, now mentioned in the animal methods section, but we found early on that the Hets mimic several human DESSH patients. In zebrafish it is more complicated. We analyzed waca and wacb hets in zebrafish but found no phenotypes. This could be in part due to some complementation between the waca and wacb genes. It is also possible that a full waca KO could resemble a human DESSH individual since wacb may complement somewhat, even though deleting wacb entirely does not have a measurable phenotype. We have added more text to the discussion to explore these complexities. We also made waca/wacb double KO (dKO) zebrafish but they showed lethal phenotype, similar to homozygous mice mutants and suggesting some complementation by the wacb gene even though alone it did not exhibit phenotypes.
 
 (6) Figure 3A: It does not appear that the authors are directly statistically comparing the two groups (genotypes) that they are drawing conclusions about. This is an unfortunately common mistake in the neuroscience literature across papers. There is a nice older review about it here. https://pubmed.ncbi.nlm.nih.gov/21878926/. To draw conclusions about the differences between the mouse genotypes, they need to compare the two genotypes directly with a statistical test. See Nygard et al for a recommended approach, like comparing social preference indexes
 
 (https://onlinelibrary.wiley.com/doi/abs/10.1002/aur.2154).
 
 Thank you for this information. Previous reviewers at a different journal asked for this particular evaluation. We have now made changes to address the assessment, and graphs now reflect comparisons of genotypes instead of a single genotype between time with a mouse or object. We have also moved to using a social discrimination index to compare the genotypes, similar to the study mentioned.
 
 (7) MRI - it is a bit weird to separate the male and female brains just for the MRI. Was there a premise from human data to do so? If not, the authors should probably pool them. If they are concerned there are sex effects (or, more likely, a sex by genotype interaction) I recommend that they use a two-factor ANOVA and simply put both sex and genotype into the model. This will also have the advantage of increasing their statistical power for genotype effects a bit. If their current results are robust, they will still show up as a significant sex x genotype interaction.
 
 All data in the manuscript initially compared the sexes to each other. We have now added this text to the animal section of the methods: For MRI, some zebrafish behaviors and now the RNA-seq data, sex was a difference and due to this observation, sex was (or now is) presented independently for these measurements. We now state that if no sex differences were observed the data were pooled.
 
 (8) Also, did the authors correct for multiple testing in the MRI analysis? Since they are testing many regions, there is a risk of false positives if they do not. This could be confounded further by their splitting the data by sex, thus doubling the number of tests.
 
 As noted above we did not do multiple corrections given the large number of regions and low number of replicates.
 
 (9) How many images per animal were analyzed for the cell counts? This detail is absent from the methods and would help with evaluating the robustness of these findings. What other approaches were used to make sure the counting was unbiased?
 
 We analyzed 3-4 images per animal for counts and counted hundreds of cells per image. In addition, the person counting was blinded to avoid any bias. These details have now been updated in the methods.
 
 (10) As with the MRI, for the DEG analysis, I recommend the authors simply put sex and genotype into the same model as two factors (with an interaction), to increase their sensitivity to genotype effects, as well as be able to report on robust genotype x sex differences, if there are any. They may also consider testing the model with and without excluding the three outlier animals on their PCA. It may be that the noise of those outliers is detracting from their sensitivity for DEGs somewhat.
 
 We greatly expanded our analyses and found more robust and unique changes in males that are now added to Figure 7 and supplemental files. After considering the data, decided to highlight the sex differences separately.
 
 (11) A few more relatively simple things could readily be done with the RNAseq data to add some depth and interpretation. For example, do the hits here overlap other published IDD/autism DEG lists from mouse knockouts studies of genes like FoxP2, Chd8, Dnmt3a, Myt1l, Tcf4, etc? Do autism genes show up in the lists of hits here? And if so, more than expected by chance? Can they provide some visualization of their GO results in the main figure?
 
 When we looked into the sex differences more we found that only the males showed significant upregulation of other autism risk genes increase that was previously unappreciated when the sexes were assessed together. Yes, several autism genes do show up but is heavily biased to males. Our main Figure 7 and new supplemental files show new GO term analyses and provide additional data looking not only autism but other factors.
 
 (12) It appears the IMPC has phenotyped this mouse somewhat, including craniofacial abnormalities. They also report on some blood cell differences. Anyway, if no one has written about that data yet (as it was generated in the context of a big consortium effort), their guidelines may allow you to include some of their data as Supplementary Figures here with proper attribution. It might help to at least summarize useful findings from there in your discussion.
 
 Due to the large number of figures/tables already in this report we don’t think this will be helpful. However, we do refer readers to the consortium in the animal methods section so they can explore data already generated by the IMPC.
 
 (13) Minor/Typos:
 
 (a) Figure 2K: I am confused by the description of three genotypes in the legend, but only two in the panel?
 
 Corrected.
 
 (b) I found it a little distracting that some results figures were embedded in the introduction.
 
 We have moved the figures further in the manuscript to start in the results section.
 
 (c) I don't understand this sentence: "Due to reduced sample size, sex-stratified DE was performed without model corrections at FDR < 0.1, 7 and found genes significantly upregulated and downregulated, respectively;" The sample size here seemed robust, so I am not sure what they were referring to? Are there missing numbers form this sentence? What is the 7? I think there are enough typos here that I am not sure how to evaluate this claim. Thus, the writing and clarity of this part could be improved.
 
 This section had several typos that have now been corrected.
 
 (d) "Marwan Shinawi, (unpublished results)" is a bit atypical of a citation. Are these results being reported with his permission? If so, then it should say 'personal communication' (if the journal permits this - some do not). If not, they should not report someone else's unpublished results without their explicit permission. It might upset some people to have their results presented this way.
 
 We have changed unpublished results to personal communication. Marwin Shinawi is an author on this manuscript and has approved of everything we have reported.
 
 (e) In all figures, consider shape or color coding for sex, even when pooling the data (e.g, the data points in the behavior figures).
 
 This is a good idea but since we found no difference when analyzing the data we don’t see how this extra work will make a difference. Since we now mention that sex differences were only presented as separate graphs when observed in the methods we think this should be acceptable.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.05.26.595966v5
www.biorxiv.org www.biorxiv.org

Differential interfacial tension between oncogenic and wild-type populations forms the mechanical basis of tissue-specific oncogenesis in epithelia

4
1. Public_Reviews 01 Jun 2026
  
  in eLife
  
  eLife Assessment
  
  This important study reports that an oncogenic population in an epithelium can either be repressed or spread, depending on the tissues. This work provides convincing evidence, supported by pharmacological perturbations and numerical simulations using the vertex model, that the principle of "high heterotypic interfacial tension" that appears to drive cell sorting and tissue segregation in embryonic models similarly applies to cancer cell behaviour.
  
  Summary
2. Public_Reviews 01 Jun 2026
  
  in eLife
  
  Reviewer #1 (Public review):
  
  [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]
  
  Summary:
  
  The behaviour of cells expressing constitutively active HRas is examined in mosaic monolayers, both in MCF10a breast epithelial and Beas2b bronchial epithelial cell lines, mimicking the potential initial phase of development of carcinoma. Single HRas-positive cells are excluded from MCF10a but not Beas2b monolayers. Most interestingly, however, when in groups, these cells are not excluded, but rather sharply segregated within a MCF10a monolayer. In contrast, they freely mix with wt Beas2b cells. Biophysical analysis identifies high tension at heterotypic interfaces between HRas and wild-type cells as the likely reason for segregation of MCF10a cells. The hypothesis is supported experimentally, as myosin inhibition abolishes segregation. The probable reason for lack of segregation in the bronchial epithelium is to be found in the different intrinsic properties of these cells, which form a looser tissue with lower basal actomyosin activity. The behaviour of single cells and groups is recapitulated in a vortex model based on the principle of differential interfacial tension, under the condition of high heterotypic interfacial tension.
  
  Strengths:
  
  Despite being long recognized as a crucial event during cancer development, segregation of oncogenic cells has been a largely understudied question. This nice work addresses the mechanics of this phenomenon through a straightforward experimental design, applying the biophysical analytical approaches established in the field of morphogenesis. Comparison between two cell types provides some preliminary clues on the diversity of effects in various cancers.
  
  Review 1
3. Public_Reviews 01 Jun 2026
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors investigate the behavior of oncogenic cells in mammary and bronchial epithelia. They observe that individual oncogenic cells are preferentially excluded from the mammary epithelium, but they remain integrated in the bronchial epithelium. They also observe that clusters of oncogenic cells form a compact cluster in mammary epithelium, but they disperse in the bronchial epithelium. The authors demonstrate experimentally and in the vertex model simulations that the difference in observed behavior is due to the differential tension between the mutant and wild-type cells due to a differential expression of actin and myosin.
  
  Strengths:
  
  (1) Very detailed analysis of experiments to systematically characterize and quantify differences between mammary and bronchial epithelia.
  
  (2) Detailed comparison between the experiments and vertex model simulations to identify the differential cell line tension between the oncogenic and wild-type cells as one of the key parameters that are responsible for the different behavior of oncogenic cells in mammary and bronchial epithelia.
  
  Review 2
4. Public_Reviews 01 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The behaviour of cells expressing constitutively active HRas is examined in mosaic monolayers, both in MCF10a breast epithelial and Beas2b bronchial epithelial cell lines, mimicking the potential initial phase of development of carcinoma. Single HRas-positive cells are excluded from MCF10a but not Beas2b monolayers. Most interestingly, however, when in groups, these cells are not excluded, but rather sharply segregated within a MCF10a monolayer. In contrast, they freely mix with wt Beas2b cells. Biophysical analysis identifies high tension at heterotypic interfaces between HRas and wild-type cells as the likely reason for segregation of MCF10a cells. The hypothesis is supported experimentally, as myosin inhibition abolishes segregation. The probable reason for lack of segregation in the bronchial epithelium is to be found in the different intrinsic properties of these cells, which form a looser tissue with lower basal actomyosin activity. The behaviour of single cells and groups is recapitulated in a vortex model based on the principle of differential interfacial tension, under the condition of high heterotypic interfacial tension.
  
  Strengths:
  
  Despite being long recognized as a crucial event during cancer development, segregation of oncogenic cells has been a largely understudied question. This nice work addresses the mechanics of this phenomenon through a straightforward experimental design, applying the biophysical analytical approaches established in the field of morphogenesis. Comparison between two cell types provides some preliminary clues on the diversity of effects in various cancers.
  
  Weaknesses:
  
  Although not calling into question the main message of this study, there are a few issues that one may want to address:
  
  (1) One may be careful in interpreting the comparison between MCF10a and Beas2b cells as used in this study. The conditions may not necessarily be representative of the actual properties of breast and bronchial epithelia. How much of the epithelial organization is reconstituted under these experimental conditions remains to be established. This is particularly obvious for bronchial cells, which would need quite specific culture conditions to build a proper bronchial layer. In this study, they seemed to be on the verge of a mesenchymal phenotype (large gaps, huge protrusions, cells growing on top of each other, as mentioned in the manuscript).
  
  As an alternative to Beas2b, comparison of MCF10a with another cell line capable of more robust in vitro epithelial organization, but ideally with different adhesive and/or tensile properties, would be highly interesting, as it may narrow down the parameters involved in segregation of oncogenic cells.
  
  (2) While the seminal description of tissue properties based on interfacial tensions (Brodland 2002) is clearly key to interpreting these data, the actual "Differential Interfacial Tension Hypothesis" poses that segregation results from global differences, i.e., juxtaposition of two tissues displaying different intrinsic tensions. On the contrary, the results of the present work support a different scenario, where what counts is the actual difference in tension ALONG the tissue boundary, in other words, that segregation is driven by high HETEROTYPIC interfacial tension. This is an important distinction that should be clarified.
  
  (3) Related: The fact that actomyosin accumulates at the heterotypic interface is key here. It would be quite informative to better document the pattern of this accumulation, which is not clear enough from the images of the current manuscript: Are we talking about the actual interface between mutant and wt cells (membrane/cortex of heterotypic contacts)? Or is it more globally overactivated in the whole cell layer along the border? Some better images and some quantification would help.
  
  (4) In the case of Beas2b cells, mutant cells show higher actin than wt cells, while actin is, on the contrary, lower in mutant MCF10a cells (Figure 2b). Has this been taken into account in the model? It may be in line with the idea that HRas may have a different action on the two cell types, a possibility that would certainly be worth considering and discussing.
  
  Comments on revisions:
  
  There is still one last point that should be made even clearer:
  
  The system is being modelled based on the principle of INTERFACIAL TENSION, a description pioneered by the works of Steinberg and of Harris, and nicely conceptualized by Brodland (2002). Now the observed behaviour is a perfect case of sorting based on higher interfacial tension AT the boundary between cell types (with nice additional documentation of local actin and myosin enrichment in the revised manuscript). What needs to be made crystal clear it that this is NOT equivalent to the model of DITH ("DIFFERENTIAL INTERFACIAL TENSION HYPOTHESIS)" (Brodland 2002, Krieg et al 2008). It is important to stop using DITH in this context, as it leads to confusion and misinterpretations. Indeed, DITH predicts cell/tissue sorting based on differences in interfacial tension WITHIN the two cell types. While DITH accounts for relative POSITIONING (one tissue engulfing the other), it is now established that this is not the motor for cell sorting and tissue segregation, the key parameter is being heterotypic tension at the heterotypic interface. I thus invite the authors to avoid the terms "differential"/DITH, and rather use either "interfacial tension", or specifically to "HIGH HETEROTYPIC INTERFACIAL TENSION".
  
  Related: the authors correctly cite Canty et al NatComm2017 when discussing this phenomenon. I suggest to add an additional key supporting reference "D.M. Sussman, J.M. Schwarz, M.C. Marchetti, M.L. Manning, Soft yet sharp interfaces in a vertex model of confluent tissue, Phys. Rev. Letters 120 (2018) 058001". One may also include another pioneer work in Drosophila is "M. Aliee, J.C. Roper, K.P. Landsberg, C. Pentzold, T.J. Widmann, F. Julicher, C. Dahmann, Physical mechanisms shaping the Drosophila dorsoventral compartment boundary, Curr. Biol. 22 (2012) 967-976."
  
  We thank the reviewer for this important clarification. We fully agree that the mechanism underlying the observed segregation in our system is best described in terms of elevated heterotypic interfacial tension, rather than the classical Differential Interfacial Tension Hypothesis (DITH). As the reviewer correctly points out, DITH in its original formulation refers to differences in intrinsic interfacial tensions within each cell population, which primarily governs relative positioning (e.g., tissue engulfment), rather than the local sorting dynamics we observe here.
  
  In contrast, our experimental and modeling results support a scenario in which segregation is driven by increased tension specifically at heterotypic interfaces between HRasV12 and wild-type cells. We agree that continued use of the term “Differential interfacial tension” in this context may lead to conceptual ambiguity.
  
  Accordingly, we have revised the manuscript throughout to replace references to “differential interfacial tension” with more precise terminology, namely “interfacial tension” or “heterotypic interfacial tension”, wherever appropriate. We have also updated the Discussion to explicitly clarify this distinction and its implications for interpreting our results.
  
  We thank the reviewer for suggesting additional relevant literature which have now included.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors investigate the behavior of oncogenic cells in mammary and bronchial epithelia. They observe that individual oncogenic cells are preferentially excluded from the mammary epithelium, but they remain integrated in the bronchial epithelium. They also observe that clusters of oncogenic cells form a compact cluster in mammary epithelium, but they disperse in the bronchial epithelium. The authors demonstrate experimentally and in the vertex model simulations that the difference in observed behavior is due to the differential tension between the mutant and wild-type cells due to a differential expression of actin and myosin.
  
  Strengths:
  
  Very detailed analysis of experiments to systematically characterize and quantify differences between mammary and bronchial epithelia
  
  Detailed comparison between the experiments and vertex model simulations to identify the differential cell line tension between the oncogenic and wild-type cells as one of the key parameters that are responsible for the different behavior of oncogenic cells in mammary and bronchial epithelia
  
  Weaknesses:
  
  It is unclear what is the mechanistic origin of the shape-tension coupling, which is used in the vertex model, and how important that coupling is for the presented results. Authors claim that the shape-tension coupling is due to the anisotropic distribution of stress fibers when cells are under external stress. It is unclear why the stress fibers should affect an effective line tension on the cell boundaries and why the stress fibers should be sensitive to the magnitude of the internal isotropic cell pressure. In experiments, it makes sense that stress fibers form when cells are stretched. Similar stress fibers form when cytoskeleton or polymer networks are stretched. It is unclear why the stress fibers should be sensitive to the magnitude of internal isotropic cell pressure. If all the surrounding cells have the same internal pressure, then the cell would not be significantly deformed due to that pressure and stress fibers would not form. Authors should better justify the use of the shape-tension coupling in the model, since most of the observed behavior is already captured by the differential tension even if there is no shape-tension coupling.
  
  We thank the reviewer for this comment. We agree that we did not provide a mechanistic origin for the shape-tension coupling. In our model, stress fiber formation, along with actin ring formation, indicated that cells at the interface were elongated. Hence, we hypothesised that an interfacial force could induce nematic alignment at the interface. However, such an activity would only be feasible if the interface interaction were sufficiently high. Thus, the isotropic pressure at the heterotypic interface served as a proxy for cell-cell interactions in our model. However, inspired by recent work [1], we have tested whether activation of cells at the interface by shear stress would produce similar results. Exploring this aspect will require additional simulations.
  
  (1) Pérez-Verdugo, F., Maniou, E., Galea, G. L., & Banerjee, S. (2026). Mechanosensitive feedback organizes cell shape and motion during hindbrain neuropore morphogenesis. Current Biology.
  
  The observed difference of shape indices between the interfacial and bulk cells in simulations in the absence of differential line tension is concerning. This suggests that either there are not enough statistics from the simulations or that something is wrong with the simulations. For all presented simulation results, the authors should repeat multiple simulations and then present both averages and standard deviations. This way it would be easier to determine whether the observed differences in simulations are statistically significant.
  
  The observed differences in shape indices between interfacial and bulk cells in simulations in the zero-line-tension case (Lambda=0) remain non-zero at the zero-stress threshold because the interface cells are still subject to the shape-dependent contribution gamma_ij, since the current model treats gamma_ij as independent of Lambda. We are exploring the possible relationship between Lambda and gamma_ij, and we will update this in the next version of the manuscript.
  
  Recommendations for the authors:
  
  The editor recommends considering the new comment made by reviewer #1 in his/her report:
  
  "There is still one last point that should be made even more clear:
  
  The system is being modelled based on the principle of INTERFACIAL TENSION, a description pioneered by the works of Steinberg and of Harris, and nicely conceptualized by Brodland (2002). Now the observed behaviour is a perfect case of sorting based on higher interfacial tension AT the boundary between cell types (with nice additional documentation of local actin and myosin enrichment in the revised manuscript). What needs to be made crystal clear it that this is NOT equivalent to the model of DITH ("DIFFERENTIAL INTERFACIAL TENSION HYPOTHESIS)" (Brodland 2002, Krieg et al 2008). It is important to stop using DITH in this context, as it leads to confusion and misinterpretations. Indeed, DITH predicts cell/tissue sorting based on differences in interfacial tension WITHIN the two cell types. While DITH accounts for relative POSITIONING (one tissue engulfing the other), it is now established that this is not the motor for cell sorting and tissue segregation, the key parameter is being heterotypic tension at the heterotypic interface. I thus invite the authors to avoid the terms "differential"/DITH, and rather use either "interfacial tension", or specifically to "HIGH HETEROTYPIC INTERFACIAL TENSION".
  
  Related: the authors correctly cite Canty et al NatComm2017 when discussing this phenomenon. I suggest to add an additional key supporting reference "D.M. Sussman, J.M. Schwarz, M.C. Marchetti, M.L. Manning, Soft yet sharp interfaces in a vertex model of confluent tissue, Phys. Rev. Letters 120 (2018) 058001". One may also include another pioneer work in Drosophila is "M. Aliee, J.C. Roper, K.P. Landsberg, C. Pentzold, T.J. Widmann, F. Julicher, C. Dahmann, Physical mechanisms shaping the Drosophila dorsoventral compartment boundary, Curr. Biol. 22 (2012) 967-976."
  
  Please see response to Reviewer 1
  
  Reviewer #2 (Recommendations for the authors):
  
  The authors have improved the manuscript and addressed some of my concerns. However, some of the questions were not adequately addressed.
  
  (1) I appreciate additional justification regarding the need for the shape-tension coupling in the vertex model. However, the authors have not answered my question regarding why the shape-tension coupling model should be sensitive to the magnitude of the internal isotropic cell pressure. In experiments, it makes sense that stress fibers form when cells are stretched, but it is unclear why the stress fibers should be sensitive to the magnitude of internal isotropic cell pressure. If all the surrounding cells have the same internal pressure, then the cell would not be significantly deformed due to that pressure, and stress fibers would not form.
  
  We thank the reviewer for pointing this out. We agree that we did not provide a mechanistic origin for the shape-tension coupling. In our model, stress fiber formation, along with actin ring formation, indicated that cells at the interface were elongated. Hence, we hypothesized that an interfacial force could induce nematic alignment at the interface. However, such an activity would only be feasible if the interface interaction were sufficiently high. Thus, the isotropic pressure at the heterotypic interface served as a proxy for cell-cell interactions in our model.
  
  However, inspired by recent work [1], we have tested whether activation of cells at the interface by shear stress would produce similar results. Exploring this aspect will require additional simulations.
  
  (1) Pérez-Verdugo, F., Maniou, E., Galea, G. L., & Banerjee, S. (2026). Mechanosensitive feedback organizes cell shape and motion during hindbrain neuropore morphogenesis. Current Biology.
  
  (2) I appreciate that the authors provided additional statistics related to simulations. I am still very concerned about the observed difference in the shape indices between the cells at the interface and the bulk, when the interfacial line tension is exactly zero (Lambda=0). In that case, the cells at the interface and at the boundary are identical, and there should be no difference in the shape indices. Are cells at the interface for the zero-line tension case (Lambda=0) still subject to the shape dependent contribution gamma_ij? If that contribution is still included for the cells at the interface, then this could explain why cells at the interface are still different from cells in the bulk even when Lambda=0.
  
  The observed differences in shape indices between interfacial and bulk cells in simulations in the zero-line-tension case (Lambda=0) remain non-zero at the zero-stress threshold because the interface cells are still subject to the shape-dependent contribution gamma_ij, since the current model treats gamma_ij as independent of Lambda. We are exploring the possible relationship between Lambda and gamma_ij, and we will update this in the next version of the manuscript.
  
  (3) Authors included several additional supplemental figures (Figs. S4, S5, S6, S7) , but they are not discussed in the manuscript text. These new supplemental figures were only discussed in the rebuttal letter. These figures should also be discussed in the manuscript text.
  
  We have cited the new supplementary figures in the main text.
  
  (4) Authors have answered in the rebuttal letter what experimental data was used in Fig. 4c. This information also needs to be provided in the manuscript text.
  
  We have added this information in the caption of Figure 4
  
  (5) Supplementary Figure 3 is missing. That figure got moved to the appendix.
  
  This has been rectified in the Supplementary file and the citations have been updated accordingly in the main text.
  
  (6) At the end of section 4 in the main text, the authors introduced a new sentence regarding simulations of the vertex model with interfacial tension and mechanochemical feedback. The details of that model are described in the appendix, but it would be helpful to add a sentence or two already in the main text describing what is the mechanism of the mechanochemcial feedback.
  
  We have added a line describing the mechanism of mechanochemical feedback.
  
  (7) In the definition of the eccentricity, 'a' should be the minor axis and 'b' the major axis, i.e., 'a' and 'b' should be swapped.
  
  We have corrected this.
  
  (8) There is a typo at the end of the vertex model description in the methods section. "The details of the shape-tension coupling is described in the interface." The word interface should be an appendix.
  
  We have fixed the typo.
  
  (9) In the appendix section describing the shape-tension coupling, the authors should explain how the cell's director n is defined.
  
  We have added a line in the appendix section describing shape-tension coupling explaining how the cell’s director n is defined.
  
  (10) In Appendix Fig. 1, the two angles are defined as theta and theta' but the figure caption is defining angles theta_1 and theta_2. These angles need to be consistent.
  
  This has been fixed.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.03.14.643229v6
www.biorxiv.org www.biorxiv.org

A validated antibody toolbox for ALS research

3
1. Public_Reviews 01 Jun 2026
  
  in eLife
  
  eLife Assessment
  
  Overall, this is a manuscript with solid evidence that delivers an important community resource for those performing experimental research in amyotrophic lateral sclerosis. The authors address the lack of validated tools for the detection and quantification of proteins associated with amyotrophic lateral sclerosis (ALS) through an extensive screening of 303 commercially available antibodies to 33 protein targets. The effort invested in generating the knockout lines for validation experiments is a clear strength of the study.
  
  Summary
2. Public_Reviews 01 Jun 2026
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The authors address the lack of validated tools for the detection and quantification of proteins associated with amyotrophic lateral sclerosis (ALS) through an extensive screening of 303 commercially available antibodies to 33 protein targets. Their ALS-Reproducible Antibody Platform (ALS-RAP) delivers a validated antibody toolbox for ALS research, which will provide an advantageous starting point for researchers in this field. Ayoubi R. et al. showcase the characterization workflow, presenting as an example the characterization of antibodies targeting Galectin-1, encoded by the LGALS1 gene. A selection of these antibodies was also used to profile protein levels across human induced pluripotent stem cell (iPSC)-derived and primary neurological cell types, and the findings support that the ALS disease mechanism involves both neuronal and glial cells.
  
  Strengths:
  
  The knockout (KO)-based approach is definitely the major strength of this study, providing a high level of confidence in the data collected in human induced pluripotent stem cell (iPSC)-derived and primary neurological cell types. The focus on renewable reagents (monoclonal and recombinant antibodies) is also important. The extensive characterization of this set of antibodies will benefit any scientist interested in any of the 33 target proteins, even in fields other than neuroscience.
  
  The authors perform an interesting protein profiling study assessing 27 proteins, comparing RNA and protein expression data, and using two independent WB preparations of the same cell types.
  
  The conclusions that can be drawn from this first assessment might not be final, but the data are compelling because they have been collected with reliable and validated antibodies.
  
  Another strength of this work is the data dissemination strategy, which includes the Only Good Antibodies (OGA) platform, where YCharOS data are curated and presented in an easy and intuitive manner that facilitates antibody selection by the end user for WB, IP and IF applications.
  
  Weaknesses:
  
  The authors mentioned the development of single-chain variable fragment (scFv) recombinant antibodies raised by the SGC against the six proteins (ANXA11, OPTN, MATR3, PFN1, UBQLN2 and VCP) that had limited renewable antibodies that are commercially available. The development was optimized to generate antibodies particularly suitable for IP, and the clone selection process was carried out using IP coupled to mass spectrometry. Even though the generation of these novel reagents is not the focus of this work, the authors do not provide any data on this aspect.
  
  The protein profiling study is limited to WB data, and the authors did not provide any explanation on why there was no integration with IP and IF data, not even for those targets that have validated antibodies. Also, not all the cell types have been screened by chemiluminescence-based detection and by fluorescence-based WB, and the authors do not elaborate on the reason for such a choice.
  
  Review 1
3. Public_Reviews 01 Jun 2026
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Overall, this is a solid manuscript that delivers an important community resource. The execution is relatively simple, but the value is real, the work is rigorously performed, and the open dissemination through Zenodo, the F1000Research YCharOS Gateway and OGA is well executed. The effort invested in generating the knockout lines for validation experiments is a clear strength of the study. I have a number of comments that I think would strengthen the resource and the conclusions drawn from it.
  
  Below, I list specific points.
  
  (1) The rationale for the selection of these 33 genes is insufficient. The authors lean on the Nijs & Van Damme classification and on PubMed entry counts, but the number of PubMed entries is not a meaningful criterion for what constitutes an important ALS protein - some of the most disease-relevant genes are precisely those with fewer publications, while heavily cited genes such as CAV1 carry weak ALS-specific evidence. The authors should provide a more transparent and biologically motivated rationale for inclusion and exclusion (ClinGen evidence tier, replicated GWAS signals, large meta-analyses, ALSoD) and explain why specific risk genes outside this list were not part of ALS-RAP.
  
  (2) "107 of 231 (46%) demonstrated specific target staining in IF." The criteria used to define "specific target staining" at the IF level are not stated. From the Galectin-1 example, the mosaic WT/KO strategy provides a binary readout, but for proteins with low expression, weak punctate staining or unusual subcellular distributions, a single threshold is unlikely to capture specificity uniformly across 231 antibodies.
  
  (3) Several claims in the manuscript depend on differential protein abundance across cell types. As presented, these claims are supported by qualitative Western blot images only. They should be substantiated by quantification across multiple biological replicates.
  
  (4) This manuscript represents a unique opportunity to address antibody recognition of splicing variants, which is something of of considerable value to the community. For each target, the predicted isoforms in Ensembl could be cross-referenced against the observed bands, and the pattern of bands compared across cell types could be informative about which isoforms each antibody captures. This would convert ambiguous "extra bands" into useful biological information and would substantially increase the value of the resource. I strongly encourage the authors to include this analysis.
  
  (5) The iPSC-derived microglia receive a comprehensive QC panel (IBA1/PU.1 IF, CD45/CD11b flow, qRT-PCR for nine canonical markers; Figure S4), which allows the reader to assess culture purity. The other iPSC-derived lineages - motor neurons, dopaminergic neurons, oligodendrocytes and astrocytes - are validated by a single marker each in WB (Figure S3) without purity quantification. Given that several conclusions of the manuscript rest on the cell-type-specific detection of ALS-associated proteins, equivalent quality control should be performed for the other lineages so that the reader can evaluate the purity of each preparation.
  
  (6) The robustness of the resource would be substantially increased by validating at least a subset of the targets in a second iPSC background, in at least some of the cell types analysed.
  
  (7) The newly developed SGC scFv antibodies are arguably the most novel reagent contribution of this manuscript, yet they receive a single sentence in the body of the paper. A more thorough description is warranted.
  
  (8) Accessibility of the resource through Zenodo is not straightforward - the reader currently has to navigate to individual antibody characterization reports one by one to extract recommendations for a given target. While the use of an established public repository is important for permanence, a dedicated ALS-RAP website with an interactive, searchable interface - filterable by target, application, host species and clonality - would meaningfully improve uptake. The relationship between such a portal and the existing OGA platform should also be clarified.
  
  Review 2
Visit annotations in context

Tags

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.09.14.676084v6
www.biorxiv.org www.biorxiv.org

Depletion of extracellular asparagine impairs self-reactive T cells and ameliorates autoimmunity in a murine model of multiple sclerosis

4
1. Public_Reviews 01 Jun 2026
  
  in eLife
  
  eLife Assessment
  
  Non-essential amino acids such as glutamine have been known to be required for T cell general activation through sustaining basic biosynthetic processes, including nucleotide biosynthesis, ATP generation, and protein synthesis. In this important study, the authors found that extracellular asparagine (Asn) is required not only for T cells to generally refuel metabolic reprogramming, but to produce helper T cell lineage-specific cytokine, for instance, IL17. In particular, the importance of Asn in IL17 production was convincingly demonstrated in the mouse experimental autoimmune encephalomyelitei (EAE) model, mimicking human multiple sclerosis disease.
  
  Summary
2. Public_Reviews 01 Jun 2026
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  In this manuscript, the authors reveal that the availability of extracellular asparagine (Asn) represents a metabolic vulnerability for the activation and differentiation of naive CD4+ T cells. To deplete extracellular Asn, they employed two orthogonal approaches: activating naive CD4+ T cells in either PEGylated asparaginase (PEG-AsnASE)-treated medium or custom-formulated RPMI medium specifically lacking Asn. Importantly, they demonstrate that Asn depletion not only impaired metabolic reprogramming associated with CD4+ T cell activation but also reduced CD4+ helper T cell lineage-specific cytokine production, thereby ameliorating the severity of experimental autoimmune encephalomyelitis.
  
  The experiments presented here are comprehensive and well-designed, providing compelling evidence for the conclusions. The conclusions will be important to the field.
  
  Comments on revised version:
  
  The authors have sufficiently addressed my previous comments. The manuscript represents an excellent contribution to the field.
  
  Review 1
3. Public_Reviews 01 Jun 2026
  
  in eLife
  
  Reviewer #2 (Public review):
  
  While the importance of asparagine in the differentiation and activation of CD8 T cells has been previously reported, its role in CD4 T cells remained unclear. Using culture media containing specific amino acids, the authors demonstrated that extracellular asparagine promotes CD4 T cell proliferation. Consistent with this, depletion of extracellular asparagine using PEG-AsnASE suppressed CD4 T cell activation. Proteomic analysis focusing on asparagine content revealed that, during the early phase of T cell activation, most asparagine incorporated into proteins is derived from extracellular sources. The authors further confirmed the importance of extracellular asparagine in vivo, demonstrating improved EAE pathology.
  
  While the data are well organized and convincing, the mechanism by which asparagine deficiency leads to altered T cell differentiation remains unclear. It is also necessary to investigate the transporters involved in asparagine uptake. In particular, elucidating whether different T cell subsets utilize the same or distinct transport mechanisms would provide important insight into the immunoregulatory role of asparagine.
  
  Comments on revised version:
  
  The authors have addressed the previous concerns, and the manuscript has been significantly improved.
  
  Review 2
4. Public_Reviews 01 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  In this manuscript, the authors reveal that the availability of extracellular asparagine (Asn) represents a metabolic vulnerability for the activation and differentiation of naive CD4+ T cells. To deplete extracellular Asn, they employed two orthogonal approaches: activating naive CD4+ T cells in either PEGylated asparaginase (PEG-AsnASE)-treated medium or custom-formulated RPMI medium specifically lacking Asn. Importantly, they demonstrate that depletion not only impaired metabolic reprogramming associated with CD4+ T cell activation but also reduced CD4+ helper T cell lineage-specific cytokine production, thereby ameliorating the severity of experimental autoimmune encephalomyelitis.
  
  Strengths:
  
  The experiments presented here are comprehensive and well-designed, providing compelling evidence for the conclusions. The conclusions will be important to the field.
  
  We thank the reviewer for their assessment of our work and enthusiasm towards our findings.
  
  Weaknesses:
  
  (1) EAE is the prototypic T cell-mediated autoimmune disease model, and both Th1 and Th17 cells are implicated in its pathogenesis. In contrast, Th2 and Treg cells and their associated cytokines (such as IL-4 and IL-10) have been shown to play a role in the resolution of EAE, and potentially in the modulation of disease progression. Thus, it will be important to determine whether Asn depletion affects the differentiation of naive CD4+ T cells into corresponding subsets under Th2 and Treg polarization conditions, as well as the expression of lineage-specific transcription factors and cytokine production.
  
  We appreciate that the reviewer recognizes the functional relevance of our findings showing that Asn is important for proper Th17 differentiation and promotion of EAE (Figure 5 E-J, Figure 6). Given that multiple CD4+ T cell subsets play a role in both the initiation and resolution of EAE, we agree that it would be valuable to further support these findings with complementary Th2 and Treg differentiation experiments.
  
  To address this, we examined the effects of asparagine depletion during in vitro iTreg and TH2 differentiation. We found that the frequencies of FOXP3+ iTreg and GATA3+ Th2 cells were reduced when cultures were grown in asparagine-deficient media. These results have been added to Supplementary Figure 5.
  
  (2) EAE is characterized by inflammation and demyelination in the central nervous system (CNS), leading to neurological deficits. Myelin destruction is directly correlated with the severity of the disease. For Figure 6, did the authors perform spinal cord histological analysis by hematoxylin and eosin (H&E) or Luxol fast blue (LFB) staining? This is important to rigorously examine pathological EAE symptoms.
  
  We agree with the reviewer that histopathology including H&E and/or LFB staining is a useful indicator of EAE disease severity. However, we are no longer able to obtain PEGAsnASE (Oncaspar) to perform these studies.
  
  Reviewer #2 (Public review):
  
  While the importance of asparagine in the differentiation and activation of CD8+ T cells has been previously reported, its role in CD4+ T cells remained unclear. Using culture media containing specific amino acids, the authors demonstrated that extracellular asparagine promotes CD4+ T cell proliferation. Consistent with this, depletion of extracellular asparagine using PEG-AsnASE suppressed CD4+ T cell activation. Proteomic analysis focusing on asparagine content revealed that, during the early phase of T cell activation, most asparagine incorporated into proteins is derived from extracellular sources. The authors further confirmed the importance of extracellular asparagine in vivo, demonstrating improved EAE pathology.
  
  While the data are well organized and convincing, the mechanism by which asparagine deficiency leads to altered T cell differentiation remains unclear. It is also necessary to investigate the transporters involved in asparagine uptake. In particular, elucidating whether different T cell subsets utilize the same or distinct transport mechanisms would provide important insight into the immunoregulatory role of asparagine.
  
  (1) The finding that asparagine supplementation promotes T cell proliferation under various amino acid conditions is highly significant. However, the concentration at which this effect occurs remains unclear. A titration analysis would be necessary to determine the dosedependency of asparagine.
  
  Our studies indicate that the concentration of asparagine present in conventional RPMI lymphocyte media is sufficient to support CD4+ T cell activation and proliferation in vitro (Figure 1, Supplementary Figure 1 & Figure 2). This concentration was consistently used throughout our studies. In line with the reviewer’s comments, however, we have not yet determined the dose dependency of Asn during CD4+ T cell activation.
  
  To address this, we performed a titration experiment in which asparagine was supplemented at varying concentrations in DMEM and Asn-deficient RPMI. Activation markers were measured 24 hours after TCR stimulation under these culture conditions. We found that the critical asparagine concentration lies between 37.8 and 3.78 uM. This concentration range is consistent with the physiological concentration of asparagine in murine plasma, which is approximately 50 uM (PMID: 24842860; PMID: 23853755). These data have been added to Supplementary Figure 1.
  
  (2) The effects of asparagine deficiency occur during the early phase of T cell activation. Thus, it is likely that the transporters responsible for asparagine uptake are either rapidly induced upon activation or already expressed in the resting state. Since this is central to the focus of the manuscript, it is interesting to identify the transporter responsible for asparagine uptake during early T cell activation. A recent paper (DOI: 10.1126/sciadv.ads350) reported that macrophages utilize Slc6a14 to use extracellular asparagine. Is this also true for CD4+ T cells?
  
  While a comprehensive characterization of the amino acid transporter network is certainly of interest, it is beyond the scope of the present study. As the reviewer notes, others have explored asparagine transport in lymphocytes. For example, Wu et al. (PMID: 33420490) determined that the asparagine transporter, Slc1a5, is significantly upregulated in CD8+ T cells upon activation, based on qRT-PCR measurements comparing mRNA from naïve and activated CD8+ T cell. They further validated the functional role of Asn transporters in CD8+ T cells by measuring N15-labeled asparagine uptake in the presence of siRNAs targeting the asparagine transporters Slc1a5 or Slc38a2 and found that inhibition of either transporter significantly reduced intracellular N15-Asn accumulation.
  
  To gain additional insight into Asn transporters in distinct CD4+ T cell subsets, we reanalyzed a published RNA-seq dataset (Thakore et al., 2024; PMID: 39009838). We quantified the expression of transporters Slc1a5, Slc38a2, and Slc6a14 in naïve and activated CD4+ T cells polarized under Th1, npTh17, or pTh17 conditions at various time points. We observed that Slc1a5 expression increased upon activation in all subsets. Similarly, Slc38a2 expression increased during early activation stage, but subsequently returned to basal levels similar to naïve cells. In contrast, Slc6a14 showed relatively low basal expression in naïve cells compared to the other transporters investigated, and its expression decreased over the differentiation period in all CD4+ T cell subsets examined. These results indicate that Asn transporters Slc1a5 and Slc38a2 are expressed in CD4+ T cells during early activation and differentiation. These data have been included in Supplementary Figure 3.
  
  (3) Given that depletion of extracellular asparagine impairs differentiation of Th1 and Th17 cells, it is possible that TCR signaling is compromised under these conditions. This point should be investigated by targeting downstream signaling molecules such as Lck, ZAP70, or mTOR. Also, does it affect the protein stability of master transcription factors such as Tbet and RORgt?
  
  We agree with the reviewer that asparagine deprivation could impact several aspects of T cell function. In our study, we demonstrate that asparagine is crucial for CD4+ T cell protein synthesis and the expression of activation markers (Figure 1B-K, Figure 2K-L, and Figure 3AC). We also highlight its importance in promoting CD4+ T cell subset differentiation and lineage-defining cytokine production (Figure 5B-J). Other studies have reported a role for asparagine in early activation marker expression in CD8+ T cells and in enhancing LCK function (PMID: 33822775; PMID: 33420490). Given its proposed function as a promoter of LCK signaling function in CD8+ T cells, it will be important to determine if a similar mechanism operates during CD4+ T cell activation in future studies.
  
  We appreciate the reviewer’s inquiry regarding the stability of critical transcription factors defining Th1 and Th17 subsets. We have examined the expression of the transcription factors RORγT and Tbet in Th17 and Th1 polarized cells and observed reduced expression in the absence of asparagine. We have included these findings in Supplementary Figure 5.
  
  (4) Is extracellular asparagine also important for the differentiation of helper T cell subsets other than Th1 and Th17, such as Th2, Th9, and iTreg?
  
  Please see our response to Reviewer 1 regarding iTreg and TH2. Investigation of Th9 cells is beyond the scope of the present study.
  
  (5) Asparagine taken up from outside the cell has been shown to be used for de novo protein synthesis (Figure 3E), but are there any proteins that are particularly susceptible to asparagine deficiency? This can be verified by performing proteome analysis, and the effects on Th1/17 subset differentiation mentioned above should also be examined.
  
  The investigation of specific proteins that exhibit asparagine dependency would indeed be interesting. Given our results showing that global protein synthesis is blunted with asparagine deprivation (Figure 3A-C), it would be particularly compelling to identify proteins with a specific requirement for asparagine. However, this level of analysis is beyond the scope of our study.
  
  (6) While the importance of extracellular asparagine is emphasized, Asns expression is markedly induced during early T cell activation. Nevertheless, the majority of asparagine incorporated into proteins appears to be derived from extracellular sources. Does genetic deletion of Asns have any impact on early CD4+ T cell activation? The authors indicated that newly synthesized Asns have little impact on CD8+ T cells in the Discussion section, but is this also true for CD4+ T cells? This could be verified through experiments using CRISPR-mediated Asns gene targeting or pharmacological inhibition.
  
  We appreciate the reviewer’s consideration of the contribution of endogenous asparagine to CD4 +T cell function. However, genetic perturbation of Asns is beyond the scope of our study, which is specifically focused on defining the requirements for extracellular asparagine and its role in CD4+ T cell activation.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.06.09.658561v2
www.biorxiv.org www.biorxiv.org

Microenvironmental arginine restriction sensitizes pancreatic cancers to polyunsaturated fatty acids by suppression of lipid synthesis

5
1. Public_Reviews 01 Jun 2026
 
 in eLife
 
 eLife Assessment
 
 This important study demonstrates that nutrient stress engenders metabolic vulnerabilities in pancreatic ductal adenocarcinoma (PDAC). By combining cell line and mouse models, the authors provide compelling evidence showing that arginine depletion from the microenvironment disrupts lipid homeostasis in PDAC resulting in ferroptosis upon exposure of tumors to polyunsaturated fatty acids. This report is likely to be of broad interest to researchers interested in studying cancer biology, metabolic adaptations and stress responses.
 
 Summary
2. Public_Reviews 01 Jun 2026
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 In this study, the authors set out to define how arginine availability regulates lipid metabolism and to explore the implications of this relationship in pancreatic ductal adenocarcinoma (PDAC), a tumor type known to exist in an arginine-poor microenvironment. Using a combination of rigorous genetic and metabolomic approaches, they uncover a previously underappreciated role for arginine in maintaining lipid homeostasis. Importantly, they demonstrate that arginine deprivation sensitizes PDAC cells to ferroptosis through lipidome perturbations, which can be exploited therapeutically via co-treatment with aESA and ferroptosis inducers (FINs). These findings have meaningful implications for the field. They not only shed light on the metabolic vulnerabilities created by nutrient restriction in PDAC, but also suggest a practical avenue for combination therapies that exploit ferroptosis sensitivity. This is particularly relevant in the context of pancreatic cancer, which is notoriously resistant to conventional treatments. The methods employed are broadly applicable to other nutrient-stress contexts and may inspire similar investigations in other solid tumor types.
 
 Strengths:
 
 One of the major strengths of the study is the use of complementary and well-controlled approaches-including metabolomic profiling, genetic perturbations, and in vivo models-to support the central hypothesis. The experiments are thoughtfully designed and clearly presented, and the conclusions are, for the most part, well supported by the data. The findings provide mechanistic insight into nutrient-lipid crosstalk and identify a potential therapeutic strategy for targeting arginine-deprived tumors.
 
 Comments on revised version:
 
 The authors have substantially strengthened the revised manuscript and have addressed my prior concerns, and the evidence supports the central conclusions. This work provides meaningful insight into how nutrient limitation in the tumor microenvironment creates metabolic liabilities that may be therapeutically exploited, and it should be of interest to investigators studying cancer metabolism, pancreatic cancer, lipid biology, and ferroptosis.
 
 Review 1
3. Public_Reviews 01 Jun 2026
 
 in eLife
 
 Reviewer #2 (Public review):
 
 This study by Jonker et al., examines how the metabolic adaptations to the microenvironment by pancreatic ductal adenocarcinomas (PDAC) present vulnerabilities that could be used for therapeutic purposes. The evidence supporting the claims of the authors is mostly solid, and the multiplicity of models used, as well as the combination of in vitro and in vivo work are appreciated, but some conclusions would benefit from additional substantiation. This work would be of interest to biologists working on the impact of microenvironment and metabolism in cancer, and especially those investigating pancreatic cancer.
 
 In this study, the authors use mostly "doublings per day" as an indicator of cell death, notably for figures 4 to 6. However, proliferative arrest (or a decrease in the proliferative rate) is not necessarily synonymous with cell death. It might be nice to complement these experiments with a true measure of cell death (e.g. PI uptake).
 
 Review 2
4. Public_Reviews 01 Jun 2026
 
 in eLife
 
 Reviewer #3 (Public review):
 
 This important study investigates the impact of nutrient stress in the tumor microenvironment (TME), focusing on lipid metabolism in pancreatic ductal adenocarcinoma (PDAC). Understanding TME composition is crucial, as it highlights cancer vulnerabilities independent of intracellular mutations, particularly because PDAC tumors are often exposed to limited nutrient availability due to reduced perfusion. By utilizing a medium that mimics the nutrient conditions of PDAC tumors, the authors convincingly show that TME nutrient stress suppresses SREBP1, leading to reduced lipid synthesis, with low arginine levels identified as a key driver of this suppression. Importantly, mice with arginine-starved pancreatic tumors respond to polyunsaturated fatty acid-rich diet. This discovery uncovers a synthetic lethal interaction in the tumor microenvironment that could be leveraged through dietary interventions.
 
 Comments on revised version:
 
 The authors have satisfactorily resolved all previously raised concerns through the inclusion of additional data and clarifications in the discussion.
 
 Review 3
5. Public_Reviews 01 Jun 2026
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 Public Reviews:
 
 Reviewer #1 (Public review):
 
 Summary:
 
 In this study, the authors set out to define how arginine availability regulates lipid metabolism and to explore the implications of this relationship in pancreatic ductal adenocarcinoma (PDAC), a tumor type known to exist in an arginine-poor microenvironment. Using a combination of rigorous genetic and metabolomic approaches, they uncover a previously underappreciated role for arginine in maintaining lipid homeostasis. Importantly, they demonstrate that arginine deprivation sensitizes PDAC cells to ferroptosis through lipidome perturbations, which can be exploited therapeutically via co-treatment with aESA and ferroptosis inducers (FINs). These findings have meaningful implications for the field. They not only shed light on the metabolic vulnerabilities created by nutrient restriction in PDAC, but also suggest a practical avenue for combination therapies that exploit ferroptosis sensitivity. This is particularly relevant in the context of pancreatic cancer, which is notoriously resistant to conventional treatments. The methods employed are broadly applicable to other nutrient-stress contexts and may inspire similar investigations in other solid tumor types.
 
 Strengths:
 
 One of the major strengths of the study is the use of complementary and well-controlled approaches-including metabolomic profiling, genetic perturbations, and in vivo models-to support the central hypothesis. The experiments are thoughtfully designed and clearly presented, and the conclusions are, for the most part, well supported by the data. The findings provide mechanistic insight into nutrient-lipid crosstalk and identify a potential therapeutic strategy for targeting arginine-deprived tumors.
 
 We thank the reviewer for their positive assessment of our manuscript.
 
 Weaknesses:
 
 A key weakness of the study lies in the mechanistic connection between arginine levels and SREBP1 activation. While the authors show that arginine restriction leads to reduced SREBP1 expression, the magnitude of this effect appears modest relative to the substantial changes observed in the lipidome. The study would benefit from a deeper analysis of SREBP1 regulation-particularly whether nuclear translocation or activation is affected. This could be addressed by examining the nuclear pool of SREBP1, using either subcellular fractionation or improved immunofluorescence imaging in both cell lines and tissue samples.
 
 We thank the reviewer for this comment and in our revised manuscript have undertaken several new studies to assess how the nuclear pool of SREBP1 is regulated by arginine starvation. We further identified one mechanism by which arginine starvation suppresses SREBP1 protein levels, namely GCN activation. We believe these additional studies strengthen the manuscript and appreciate the reviewer suggesting these studies.
 
 Another area where additional context would strengthen the manuscript is in the transcriptomic profiling of PDAC cells cultured in a tumor interstitial fluid mimic (TIFM). While the study emphasizes lipid-related pathways, highlighting the most significantly upregulated and downregulated pathways in Figure 1B would give readers a broader perspective on how arginine restriction reprograms the PDAC transcriptome. For instance, because polyamines are downstream of arginine and are known to influence lipid metabolism, it would be worth discussing whether these metabolites contribute to the phenotypes observed. Similarly, an evaluation of whether Dgat1/2 expression is altered could help delineate the full scope of lipid metabolic rewiring.
 
 We thank the reviewer for suggesting this change to our manuscript and we now provide much more extensive analysis of our transcriptomic analyses in Figure 1 – Figure supplement 1, which we think will make our manuscript more useful to readers.
 
 Finally, it is worth noting that the KPC mouse model used in this study is based on conditional deletion of p53, which leads to faster-growing tumors and a distinct tumor microenvironment compared to models harboring the p53^R172H point mutation. Including a brief discussion of this distinction would help readers contextualize the translational relevance of the findings.
 
 We have revised the manuscript to include a discussion of this point.
 
 Reviewer #2 (Public review):
 
 This study by Jonker et al. examines how the metabolic adaptations to the microenvironment by pancreatic ductal adenocarcinomas (PDAC) present vulnerabilities that could be used for therapeutic purposes. The evidence supporting the claims of the authors is mostly solid, and the multiplicity of models used, as well as the combination of in vitro and in vivo work, are appreciated, but some conclusions would benefit from additional substantiation. This work would be of interest to biologists working on the impact of microenvironment and metabolism in cancer, and especially those investigating pancreatic cancer.
 
 We thank the reviewer for their positive assessment of our manuscript.
 
 In this study, the authors use mostly "doublings per day" as an indicator of cell death, notably for Figures 4 to 6. However, proliferative arrest (or a decrease in the proliferative rate) is not necessarily synonymous with cell death. It might be nice to complement these experiments with a true measure of cell death (e.g., PI uptake).
 
 We thank the reviewer for this important comment and have performed extensive additional experiments to measure cell death directly via viability markers in addition to our indirect measurements of cell number at the start and end of experiments. We believe these additions strengthen our claims that PUFAs cause arginine starved PDAC cells to undergo ferroptotic cell death.
 
 The composition of Tumor Interstitial Fluid Medium (TIFM) was published previously, but nonetheless a reminder of the composition of this medium in a Supplemental file of this study might be helpful. In particular, at the start of the Results section, the nature of serum/lipids in the different media should be specifically noted, especially given that the subsequent focus of the work is on lipids/SREBP. It is known that differences in the extracellular availability of lipids can profoundly alter de novo lipid biosynthesis pathways.
 
 We thank the reviewer for this comment. We have edited the text to provide additional context on the composition of TIFM, especially lipid availability. We further have provided a supplemental file with the composition of TIFM. We hope this will make the manuscript more useful and readily interpretable for readers.
 
 Reviewer #3 (Public review):
 
 This important study investigates the impact of nutrient stress in the tumor microenvironment (TME), focusing on lipid metabolism in pancreatic ductal adenocarcinoma (PDAC).
 
 Understanding TME composition is crucial, as it highlights cancer vulnerabilities independent of intracellular mutations, particularly because PDAC tumors are often exposed to limited nutrient availability due to reduced perfusion.
 
 By utilizing a medium that mimics the nutrient conditions of PDAC tumors, the authors convincingly show that TME nutrient stress suppresses SREBP1, leading to reduced lipid synthesis, with low arginine levels identified as a key driver of this suppression. Importantly, mice with arginine-starved pancreatic tumors respond to a polyunsaturated fatty acid-rich diet. This discovery uncovers a synthetic lethal interaction in the tumor microenvironment that could be leveraged through dietary interventions.
 
 The conclusions of this paper are mostly well supported by data; however, below are some aspects that could be further clarified.
 
 We thank the reviewer for their positive assessment of our manuscript.
 
 This study uses PDAC cells from the LSL-Kras G12D/+ ; Trp53 ; Pdx-1-Cre PDAC model. The authors convincingly demonstrate that the cell-extrinsic stimuli of low arginine availability suppress lipid synthesis and thus exert a dominant effect over the cell-intrinsic oncogenic Ras mutation, which is known to enhance fatty acid synthesis. Could the effect of low arginine on lipid synthesis be specific for certain mutations in PDAC? It would be interesting to investigate or discuss whether different mutations show the same SREBP1 reduction caused by low arginine levels, and whether these low SREBP1 levels can be ameliorated by arginine re-supplementation. Here, Jonker et al. show that human PDAC cells cultured in TIFM have reduced SREBP1 levels (Figure 1 - Figure supplement 1C). It would be further supportive of their conclusions if the authors could show that arginine re-supplementation is sufficient to restore SREBP1 levels in human PDAC cells.
 
 We thank the reviewer for this comment. In response, we have now shown that arginine supplementation increases SREBP1 levels and fatty acid synthesis in human PDAC cells (Figure 2 – Figure supplement 2). Further, we have also updated the manuscript to discuss that using the LSL-Kras G12D/+; Trp53; Pdx-1-Cre PDAC model limits our ability to assess how genetic differences influence the response to arginine starvation. We additionally discuss the genetic diversity of the human PDAC cell lines used in these studies, which do include different oncogenic mutations. We believe that these results provide some data that the findings we have made regarding arginine deprivation and SREBP in our genetically defined murine PDAC cell line are applicable to human PDAC cells with more diverse oncogenic lesions.
 
 The authors demonstrate that mPDAC cells cultured in RPMI and subsequently implanted into an orthotopic mouse model exhibit reduced expression of SREBP target genes when compared to in vitro cultured mPDAC-RPMI cells. This finding is in line with the observation that culturing PDAC cells in TIFM downregulates SREBP target genes compared to PDAC cells cultured in RPMI. However, caution is needed when directly comparing mPDAC-RPMI cultured cells to those in the orthotopic model, as the latter may include non-tumor cells and additional factors that could confound the results. The authors should explicitly acknowledge this limitation in their study.
 
 We thank the reviewer for this important caveat and we have revised to text to address this point. Importantly, we note that for all comparisons between in vitro and in vivo cultures, we carefully sort malignant cancer cells from orthotopic tumors prior to analysis. We believe this approach mitigates the impact of stromal contamination on our analyses.
 
 The in vivo evidence demonstrating that PUFA-rich tung oil reduces tumor size is compelling. However, the specific in vitro findings regarding its impact on doubling rates per day, particularly in the context of arginine-dependent PUFA supplementation, require further explanation. To enhance the robustness of their data and conclusions, the authors could consider conducting additional cell viability and proliferation assays. Moreover, it would be valuable to assess whether the observed effects on doubling rates per day remain significant after normalizing the data to the initial doubling time prior to PUFA supplementation. This is in particular important regarding the statement that "Addition of arginine significantly decreases sensitivity to a-ESA" as these cells already start with a higher doubling rate prior to a-ESA treatment.
 
 We thank the reviewer for this important comment and have performed additional experiments to measure cell death directly via viability markers in addition to our indirect measurements of cell number at the start and end of experiments. Furthermore, to address the issue of different rates of cell growth in cultures affecting the response to perturbations, we also used growth rate corrected metrics (PMID: 27135972) to ensure that affects of perturbations on cell growth and viability are not confounded by the baseline proliferative kinetics of the cells under various media conditions. We believe these additions strengthen our claims that arginine starvation sensitizes PDAC cells to PUFAs.
 
 Overall, this paper presents a compelling study that significantly enhances our understanding of the PDAC tumor microenvironment and its complex interactions with the tumor lipid metabolism.
 
 We again thank the reviewer for their positive assessment of our manuscript.
 
 Recommendations for the authors:
 
 Reviewer #1 (Recommendations for the authors):
 
 In this study, the authors employ rigorous genetic and biochemical (metabolomic) approaches to uncover a previously unappreciated role for arginine in regulating lipid homeostasis. They further demonstrate the relevance of this pathway in pancreatic tumors, a solid tumor type often characterized by limited access to extracellular arginine. The authors present compelling evidence that arginine deprivation creates a metabolic liability, rendering tumors more susceptible to lipidome perturbations. This vulnerability can be therapeutically exploited through co-treatment with aESA and FIN to induce ferroptosis. Overall, the conclusions are convincing, the manuscript is well-written, and the figures are clearly presented.
 
 We again thank the reviewer for their positive assessment of our manuscript.
 
 The key weakness of the study lies in the mechanistic link between arginine levels and SREBP1 expression. While the data support the authors' argument, the observed changes in SREBP1 expression following arginine restriction appear modest relative to the more pronounced changes in the lipidome. To strengthen this connection, the authors may consider performing cellular fractionation to focus their analysis on the nuclear (active) pool of SREBP1. Improved immunofluorescence imaging and quantification of nuclear SREBP1 levels in tissues would also provide additional support for their model.
 
 We thank the reviewers for this helpful comment. To strengthen this study, we both examined the nuclear levels of SREBP1 in TIFM cultured cells and worked to identify the mechanistic link connecting arginine levels of SREBP1 expression.
 
 First, we found that arginine starvation does not lead to nuclear exclusion of SREBP1. We believe this finding strengthens our conclusion that arginine starvation regulates SREBP1 at the level of protein expression. We do agree with the reviewer that the change in SREBP1 protein level is modest, but we do show the effects of arginine on PDAC cell lipid metabolism are SREBP1 dependent (Figure 3O-P, Figure 5F, Figure 5 – Figure supplement 2D). Thus, we interpret these data that even the relatively modest change in SREBP1 protein levels are sufficient to cause large changes in the output of this transcription factor and the cellular lipidome.
 
 Second, we determined if the arginine-responsive GCN2 signaling pathway, which is known to regulate SREBP1, could contribute to the suppression of SREBP1 observed in PDAC cells. We found that GCN2 signaling is activated in PDAC cells in TIFM culture by arginine starvation and is active in animal tumors. We further found that activation of GCN2 is in part responsible for suppression of SREBP1, which is consistent with prior literature describing a role for GCN2 activation in suppressing SREBP1 translation (PMID: 17276353). Thus, while other mechanisms are at play in transducing arginine starvation to reduced SREBP1 protein levels, we have identified one mechanism (activation of GCN2) by which arginine starvation suppresses SREBP1, leading to the lipidomic changes we observed upon starvation of this amino acid.
 
 In addition, it would be helpful for the authors to highlight the most significantly upregulated and downregulated pathways in Figure 1B to give a more comprehensive view of transcriptomic changes in PDAC cells cultured under TIFM conditions. For example, since polyamines are downstream of arginine and known to regulate lipid metabolism, could some of the observed effects be attributed to changes in polyamine levels? Similarly, do arginine levels affect the expression of Dgat1 or Dgat2?
 
 We have added an additional Figure supplement to Figure 1 that include a comprehensive list of up- and downregulated gene sets in PDAC cells cultured in TIFM via GSEA analysis. We also added additional KEGG metabolic pathway analysis via GATOM (PMID: 35639928). We hope these additions will be useful for readers and point their attention to other metabolic pathways that are significantly altered by nutrient stress, such as the TCA cycle and oxidative phosphorylation, beyond those related to lipid metabolism that we investigated here.
 
 From this analysis, we did not specifically note strong changes in the expression of polyamine metabolic enzymes or DGATs.
 
 Finally, the KPC model used in this study involves conditional deletion of p53, which is known to produce tumors with a faster progression and a distinct tumor microenvironment compared to the more commonly used p53^R172H knock-in model. Including this point in the discussion would help contextualize the findings.
 
 We thank the reviewers for mentioning this limitation of our study. In the results section of the test, we now included a discussion of the limitations of the mouse model used in the discussion of the work. We also highlight in the text now that in addition to our studies using the murine p53 deletion model that our studies make use of human PDAC lines that contain p53 mutations. We believe that these results provide some data that the findings we have made regarding arginine deprivation and SREBP in our genetically defined murine PDAC cell line are applicable to human PDAC cells with more diverse oncogenic lesions.
 
 Minor comments to improve clarity:
 
 (1) In Figure 3C, it would be helpful to annotate the PE-linked TG for clarity.
 
 We do not understand exactly what PE-linked TGs refers to. We note in Fig. 3C that ether-linked triglycerides are labeled in orange and annotated as O-TG and vinyl ether-linked triglycerides are labeled in grey and annotated as P-TG.
 
 (2) Is Figure 3P mislabeled? Both conditions are labeled as +Arg / -lipid.
 
 We thank the reviewers for pointing out this mistake in the figure and have updated it to correctly label these samples as sgSREBP1 and sgNTG transduced PDAC cell lines.
 
 Reviewer #2 (Recommendations for the authors):
 
 (1) Figure 1B: Misspelling in Y axis "Normalized enrichment score".
 
 We thank the authors for catching this mistake and have corrected this error.
 
 (2) Figure 1B: Could the authors elaborate on why they decided to focus specifically on these three hits, which are not the most downregulated genes (the "top hits") appearing in the GSEA?
 
 We chose to focus on lipid metabolism as multiple transcriptomic analysis tools, namely GSEA and GATOM, which specifically focuses on enrichment in KEGG annotated metabolic pathways, highlighted lipid synthesis as being the most transcriptionally regulated metabolic pathway in TIFM. To make this apparent to readers, we added an additional Figure supplement to Figure 1 that includes a comprehensive list of up- and downregulated gene sets in PDAC cells cultured in TIFM from GSEA and GATOM analysis. We hope these additions will make the logic for our focus on lipid synthesis clear and will be useful for readers in highlighting other metabolic pathways that are significantly altered by nutrient stress, such as the TCA cycle and oxidative phosphorylation.
 
 (3) Figure 1: It might improve the clarity of the text if the three pairs of murine cell lines (mPDAC1, mPDAC2, mPDAC3) were introduced in a bit more detail in the main text and not just in the figure legend.
 
 We have added more detail describing the three mouse cell lines used in the main text.
 
 (4) Figure 1E: The authors may wish to comment on why they chose to perform transcriptomic analyses with the mPDAC3 derived models, and not mPDAC1 or mPDAC2, given that mPDAC3 appears to exhibit the most distinct phenotype of the three, according to the results presented in Figure 1 J-L.
 
 The transcriptional analysis described in Fig. 1E was performed on a previously acquired dataset using mPDAC3 cell lines (PMID: 37254839), which is why this line was used. We have revised the text to make it clear that this transcriptional analysis uses pre-existing data from a previous publication.
 
 (5) Figure 1L: The authors may wish to clarify why they only show relative palmitate to assess global fatty acid biosynthesis in these cell lines. There is a decrease in labeled palmitate of mPDAC3 cells cultured in TIFM in comparison to the cells cultured in RPMI media, showing a decrease in the lipid biosynthesis of these cells in these conditions. However, there also seems to be lower palmitate levels in the TIFM-cultured mPDAC3 cells specifically, in comparison to their mPDAC1 and mPDAC2 counterparts. Why is that? Could the authors comment on this result?
 
 We thank the reviewers for this helpful observation. In Figure 1L (now Figure 1N), we wanted to show how culture conditions (RPMI/TIFM) affected both the total amount of palmitate in PDAC cells but also the fraction that is labeled (i.e. arising from de novo synthesis). We think this provides more information for readers by allowing them to assess both changes in pool size of palmitate and changes in the fraction of palmitate that is synthesized. We like this presentation as it shows clearly that while total palmitate levels behave differently across cell lines (with TIFM culture reducing levels in mPDAC1-2 but increasing levels in mPDAC3) the amount of palmitate that is synthesized de novo is decreased in all three cell lines when cultured in TIFM. To highlight this, we also present the fraction of palmitate that is labeled in Fig. 1O.
 
 We are unsure why TIFM culture reduces total palmitate levels in some PDAC cell lines, while others are able to maintain total palmitate pools. We assume that TIFM cultures increase lipid uptake to compensate for lack of synthesis, and potentially differences in lipid scavenging capacity between the lines could explain this difference. We are currently working on experiments to test these hypotheses and will present the results in a future study.
 
 (6) Figure 2 - Figure Supplement 1A: It would be informative and appreciated to know which nutrients are actually represented and correspond to certain points on the graph, in particular for the ones that are the most differentially present in the two different media.
 
 We have now updated this graph to highlight key metabolites that are most differentially abundant between the two media. We also now provide as a Supplementary file the composition of TIFM, which provides readers with all the information needed to understand which metabolites are differentially abundant in TIFM and any media they wish to compare.
 
 (7) Figure 2 - Related to Figure supplement 1D: It would be useful to know how or why arginine was selected for further investigation from the subset of amino acids. The authors could elaborate on this, by showing or highlighting the data that drew attention to this amino acid initially.
 
 We thank the reviewers for this note. We have tried to make Figure 2 – Figure supplement 1 more clear as to how arginine was selected for further investigation. We have updated the figure to improve clarity for the comparisons of different media that enabled us to identify differences in amino acids between RPMI and TIFM as driving the difference in lipid metabolism. We have also highlighted in Figure 2 – Figure supplement 1A that arginine is the most differentially abundant amino acid and editing the text to explain the logic that this high degree of differential abundance is why we focused on arginine amongst all the amino acids as a likely candidate for regulation of SREBP1.
 
 (8) The legends for Figures 2G and 2H could be improved, i.e., making clearer that 2H shows incorporation in the circulating fatty acids, unlike 2G.
 
 We have updated the figure with improved labeling as the reviewer suggested to denote which panels correspond to which sample type.
 
 (9) Figure 3E and 3G: The heatmaps displayed here show that the addition of arginine to TIFM culture medium restores fatty acid synthesis; however, it appears that the nature of the lipids synthesized in this condition may differ from the ones synthesized in RPMI cultured conditions.
 
 We have added additional text highlighting that arginine supplementation to TIFM and RPMI culture led to induction of different SREBP1-target genes, but that both lead to activation of fatty acid synthesis and desaturation genes, which contributes to the focus of our study on de novo synthesis of saturated and monounsaturated fatty acids in the study.
 
 (10) Figure 3O: The SREBP1 immunoblot still seems to show some residual bands for the cells transduced with SREBP1 targeting sgRNAs, therefore, the authors may want to be more nuanced and present this model as a KD, instead of a KO, as mentioned in the text?
 
 We agree with the reviewer’s suggestion, and we have changed the text to describe these as knockdowns rather than full knockouts.
 
 (11) Figure 3P: Is it possible that there is an error in the legend of the figure (Lipids + for the first bar and - for the second one?). The figure could also be improved by a legend that explains what the different colored bars represent.
 
 We thank the reviewers for pointing out this mistake in the figure and have updated it to correctly label these samples as sgSREBP1 and sgNTG transduced PDAC cell lines.
 
 (12) Figure 4: The authors are stating in Figure 4 - Figure supplement 1A-F, that argininerestricted mPDAC cells are not sensitized to xCT or GPX4 inhibitors that trigger ferroptosis and that therefore SREBP1 suppression by arginine restriction in the TME does not sensitize PDAC cells to ferroptosis inducers. However, this does not appear to be so clear with the data shown. This might be due to the limitations associated with the population doubling measurements instead of the lethality measures noted above. Likewise, later it is proposed that arginine restriction sensitizes both mPDAC cells and human PDAC cells to α-ESA induced ferroptosis. These results would benefit from a direct measure of cell death. Related to the above point, it would be useful to better understand why cells cultured in arginine-deprived TIFM do not appear to be sensitized to ferroptosis inducers, but these same cells die from ferroptosis when treated with α-ESA. It would be useful to present some thoughts.
 
 We thank the reviewers for bringing up this important point. To the reviewers first point, we repeated xCT and GPX4 inhibitor treatment experiments to include both growth corrected (PMID: 27135972) proliferation assays and Sytox-based viability assays. In both cases, we did not find consistent sensitization to xCT or GPX4 inhibitors across multiple PDAC lines when cultured in TIFM. In contrast, we found consistent sensitization to PUFA treatment across multiple murine and human PDAC cell lines cultured in TIFM. Together, this analysis suggests that arginine starvation specifically sensitizes PDAC cells to PUFAs, but not other ferroptosis inducers.
 
 We agree with the reviewer that this is an interesting and unexpected observation. We do not have a mechanistic understanding as to why this is the case. However, we believe this is quite interesting and suggests that PUFAs maybe a better method of inducing ferroptosis in certain conditions than other ferroptosis inducing approaches. We have added text to the discussion to highlight this interesting and unexplained observation.
 
 (13) Figure 6: The authors mention that α-ESA is used here at sublethal doses, which do not affect viability or proliferation, but this is not shown in either the main or supplementary data. These data should be provided somewhere. It might also be nice to mention in the main text (not just in the legend) the dose of α-ESA used for the combination treatments.
 
 We thank the reviewers for this helpful suggestion. To illustrate that α-ESA is used at a sublethal dose, we altered each panel to be on a linear rather than logarithmic x-axis, therefore including the DMSO control arm for each ferroptosis inducer in combination with α-ESA. We hope this now clearly illustrates that this dose α-ESA is not perturbing cell growth or viability in these assays.
 
 (14) Figure 6B: Fer-1 treatment does not seem to rescue the phenotype very clearly. This could again be because cell death is being conflated (to degree) with effects on proliferation, and Fer-1 is not expected to affect cell proliferation. Again, measuring cell death directly would be better than measuring population doublings.
 
 We thank the reviewers for this helpful comment. To address this concern, we have added Sytox-based viability assays to figure 6. These assays indicate that Fer-1 treatment rescues the viability of PDAC cells treated with ferroptosis inducers, α-ESA, or the two in combination.
 
 Reviewer #3 (Recommendations for the authors):
 
 General notes:
 
 (1) It would be easier for the reader if one condition were consistently placed in the same position throughout the graphs. For example, RPMI results should always appear first and TIFM second. Currently, this is inconsistent throughout the manuscript (e.g., Figure 1 - Figure Supplement 1: RPMI is first and TIFM second; Figure 2 - Figure Supplement 1: TIFM is first and RPMI second).
 
 We thank the reviewers for this note. We have updated the figures to remain consistent in their ordering throughout the manuscript.
 
 (2) Please briefly explain the differences between PDAC1-3 and clarify why most follow-up experiments were conducted using PDAC1. Presumably, this was because PDAC1 showed the most robust effect on fatty acid synthesis.
 
 We have added additional text in the results section of the manuscript describing the different murine PDAC lines used in this study. We performed most studies with mPDAC1 as this line has robust differences in fatty acid synthesis between culture conditions. However, murine PDAC lines recapitulate the transcriptional subtype diversity of PDAC (PMID: 29364867), so we critically repeat key experiments in multiple mPDAC lines to determine if a given finding is translatable to other PDAC subtypes.
 
 (3) Are only SREBP1 protein levels affected or are SREBP1 RNA levels also decreased in low arginine TME?
 
 We appreciate this important comment. We have added SREBP1 RNA levels to Figure 1 to show that RNA levels do not differ between conditions, whereas protein levels of SREBP1 change significantly.
 
 (4) What was the rationale for investigating lipid metabolism even though it was not the top changed metabolic gene signature? It would be interesting to briefly discuss which pathways were the most enriched.
 
 We chose to focus on lipid metabolism as multiple transcriptomic analysis tools, namely GSEA and GATOM, which specifically focuses on enrichment in KEGG annotated metabolic pathways, highlighted lipid synthesis as being the most transcriptionally regulated metabolic pathway in TIFM. To make this apparent to readers, we added an additional Figure supplement to Figure 1 that includes a comprehensive list of up- and downregulated gene sets in PDAC cells cultured in TIFM from GSEA and GATOM analysis. We hope these additions will make the logic for our focus on lipid synthesis clear and will be useful for readers in highlighting other metabolic pathways that are significantly altered by nutrient stress, such as the TCA cycle and oxidative phosphorylation.
 
 Further comments:
 
 (1) Figure 1 Supplement 1A: It is not clear which SREBP target genes are significant. Please indicate this more clearly.
 
 The analysis in this section was done on expression level of all the indicated genes between groups (tumor/normal) rather testing for significance of individual genes between the two groups. We have updated both the text and the figure legend to clarify this as the statistical analysis that was performed.
 
 (2) Figure 1J and 2C: The Western blot loading control (Actin) does not appear equal across all samples. It would be helpful to include a quantification normalized to the Actin loading control.
 
 We have included quantification of each western blot to help interpret these immunoblots.
 
 (3) Supplementary Figure 2: How often has this experiment been performed? The TIFM results appear to consistently show the same values. If this is the case, it needs to be labeled appropriately.
 
 Thank you for pointing out that how we presented the data was confusing as to how the experiment described was performed. Initially, we performed multiple separate experiments to identify arginine starvation as the TIFM-driver of SREBP1 suppression. To compare across all the separate media conditions, we performed one experiment with all the relevant media conditions together, which is the experiment that is described in the manuscript. Thus, there was one set of control TIFM/RPMI conditions to which we compared all of the different media conditions. As we initially presented the data, it appeared as if we had performed multiple experiments in which the TIFM/RPMI controls had exactly the same behavior, which is not the case. We have updated the data presentation in this figure to make it clear that this was the experimental design for the data presented.
 
 (4) Figure 3P: Please add a legend for this panel.
 
 We thank the reviewers for point out this mistake in the figure and have updated it to correctly label these samples as sgSREBP1 and sgNTG transduced PDAC cell lines.
 
 (5) Figure 4 - Figure Supplement 1: Please review the legend carefully. The legend currently includes only circles, but some of the graphs (A and F) display squares.
 
 Thank you for catching this mistake. We have updated the panels and legends for this figure so they are concordant.
 
 (6) Figure 4D: The effect of a-ESA treatment on the doubling delta of arginine-treated versus non-treated TIFM cells looks similar. It looks like the difference is because cells treated with arginine start at higher doubling values from the beginning. I would suggest looking at the delta and subsequently tone down the statement: "Addition of arginine significantly decreases sensitivity to a-ESA."
 
 Thank you for this helpful comment. To avoid any confounding effects of differences in basal growth rate between mPDAC cells grown in different media, we have converted all of our data to GR values as described in (PMID: 27135972) which enables us to take into account the basal growth rates of cultures when calculating the effects of treatments/perturbations on culture growth and viability. We hope this addition makes the effect that arginine has on α-ESA sensitivity clear beyond the impact that arginine has on basal growth rate.
 
 In addition, we also measured the viability of α-ESA treated mPDAC cells with and without supplemental arginine (current Fig. 5E) by Sytox-exclusion assay. We believe this new data supports the claim that arginine makes PDAC cells resistant to the addition of exogenous PUFAs.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.03.10.642426v2
www.biorxiv.org www.biorxiv.org

Divergent spatiotemporal integration of whole-field visual motion in medaka and zebrafish larvae

4
1. Public_Reviews 01 Jun 2026
  
  in eLife
  
  eLife Assessment
  
  This important study provides a quantitative comparison of how zebrafish and medaka larvae process visual motion, revealing clear differences in how they integrate information across space and time. The evidence is convincing, combining a broad set of behavioral assays with response decomposition and mechanistic modeling that together support the central conclusions. Some aspects remain incomplete, particularly the link between the spatial and temporal findings, the extent to which the model accounts for the full range of behavioral results, and the framing of broader evolutionary or social interpretations. Overall, the work offers a careful and informative analysis that should be of broad interest to researchers studying visual processing, sensorimotor computation, and comparative neuroscience.
  
  Summary
2. Public_Reviews 01 Jun 2026
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This study investigates how two closely related fish species differ in their processing of visual motion, with a focus on spatial and temporal integration underlying behavior. Using a series of behavioral assays combined with computational modeling, the authors identify clear species-specific differences in how visual information is integrated to guide movement.
  
  Strengths:
  
  A major strength of the work is the systematic and quantitative behavioral analysis, which reveals robust differences between species, including broader spatial integration and longer temporal persistence in medaka compared to zebrafish. The decomposition of behavior into distinct components provides a useful framework for interpreting these differences.
  
  Weaknesses:
  
  The computational modeling captures several key aspects of the observed temporal dynamics, particularly differences in response persistence. However, the modeling framework is primarily focused on temporal processing and does not incorporate spatial integration, which is a central finding of the study. In addition, some experimental observations, such as responses to short-duration stimuli and certain frequency-dependent features, are only partially reproduced. These limitations indicate that the link between the model and the full range of behavioral results remains incomplete.
  
  Review 1
3. Public_Reviews 01 Jun 2026
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This manuscript presents a comparative analysis of optomotor behavior in zebrafish and medaka larvae. Using multiple behavioral paradigms, the authors argue that the two species differ in both the spatial and temporal integration of visual motion. They further decompose turning behavior into large- and small-turn components and use a simple mechanistic model to capture several of the main response features. Overall, the study addresses an interesting question, and the comparative framework gives the work a clear conceptual appeal.
  
  Strengths:
  
  A major strength of the manuscript is the breadth of the behavioral analysis. The authors use several stimulus paradigms to probe spatial extent, temporal persistence, and response dynamics, which makes the cross-species comparison richer and more informative than a single-assay study. The decomposition into large and small turn components is also a useful feature of the work, as it provides a more structured account of where the species differences may arise. The modeling further helps organize the results and offers a useful framework for interpreting the behavioral differences.
  
  Weaknesses:
  
  The main limitations are in presentation and clarity rather than in the overall motivation or approach. In several places, it is difficult to determine exactly how some quantities are summarized statistically, and some figures and legends would benefit from clearer explanations. In addition, a few of the more specific interpretive claims would be strengthened by more explicit statistical framing and slightly clearer presentation. These issues appear addressable and do not detract from the overall interest of the study.
  
  Review 2
4. Public_Reviews 01 Jun 2026
  
  in eLife
  
  Author response:
  
  We appreciate the constructive feedback from the reviewers and are currently working diligently to address all concerns raised in both the public reviews and the recommendations for the authors. Below, we outline the revisions planned for the revised manuscript.
  
  (1) We acknowledge the limitations of the current modeling framework regarding spatial integration, and we agree that the present model does not account for the short lifetime of the dot stimuli.
  
  For spatial integration, our current data suggest a relatively narrow, center-weighted integration function in zebrafish, compared to a broader integration function in medaka. While incorporating such spatial weighting into the model would improve its completeness, we do not expect it to substantially alter our current interpretation of the underlying mechanisms.
  
  Regarding the responses to short-lifetime dot stimuli, we hypothesize that medaka may possess local retinal receptive units that function as low-pass filters, as illustrated schematically in Figure 3e. At present, however, we believe that explicitly modeling this component would remain largely uninformative and would not substantially increase the explanatory power of the model.
  
  In the revised manuscript, we will discuss these limitations and the possible neural implementations more explicitly in the Discussion section.
  
  (2) We appreciate the reviewer’s comments regarding the clarity of data presentation and statistical descriptions.
  
  In the revised manuscript, we will improve the clarity of the figures and legends and provide more explicit explanations of the statistical analyses and summary metrics used throughout the study. We will also revise several sections of the text to improve the framing and interpretation of the results.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.08.22.671687v2
www.biorxiv.org www.biorxiv.org

Modality-Specific and Amodal Language Processing by Single Neurons

5
1. Public_Reviews 01 Jun 2026
 
 in eLife
 
 eLife Assessment
 
 This study presents a large-scale characterization of single-neuron responses during reading and listening, enabling examination of both 'low-level' (orthographic/phonological) and 'higher-level' (syntactic) features, as well as links between single-neuron activity and multi-scale field potentials, making it a valuable resource for bridging micro- and macroscale accounts of language processing. The analyses identify modality-specific and putatively modality-independent responses across distributed brain regions, offering an intriguing framework for understanding how sensory-specific and abstract representations may relate. However, the evidence supporting the central claims is currently incomplete, due to limited population-level quantification, insufficient statistical characterization of how many neurons encode the relevant features, ambiguity in the interpretation of encoding model results, and a lack of rigorous tests of cross-modal generalization and alternative accounts, which together weaken the conclusions about amodal representations and hierarchical processing.
 
 Summary
2. Public_Reviews 01 Jun 2026
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 This paper presents rare and unique recordings of single neurons, LFPs, and SEEG data from human patients performing reading and listening tasks. They identify single neurons in temporal and ventral occipito-temporal cortex that respond specifically to spoken and written language, and primarily encode either phonological or orthographic features of the stimuli. They also identify neurons in the middle temporal and inferior frontal cortex that respond to both modalities, which they interpret as amodal language responses. In general, neuronal population firing rates are correlated with both micro- and macro- scale broadband gamma responses, though they observe some dissociations, particularly with the macro-scale. The results are interpreted to support a model of modality-specific to amodal processing throughout many distributed brain areas for language.
 
 Strengths:
 
 (1) The data are truly unique, providing a large-scale characterization of single neuron responses from the human brain during written and spoken language processing.
 
 (2) The task and stimulus conditions allow for examination of both low-level (e.g., orthographic/phonological) and higher-level (e.g., syntactic) encoding.
 
 (3) Showing relationships between single neuron and multi-scale LFP recordings from the same sites helps bridge neuronal and meso/macroscale literatures.
 
 Weaknesses:
 
 (1) My main comment about the paper is that it feels like a collection of somewhat random descriptions of a very small number of hand-picked single neurons. I think that the task and stimulus design shown in Figure 1A sets up some clear hypotheses that could be tested rigorously across the full neuronal population, but instead, the authors pick a few neurons and fit encoding models that don't take advantage of the contrasts. I agree that encoding models are a powerful approach, but with only 508 total words and what appears to be a limited set of variability across the various features, it's not clear to me that the stimuli, which were apparently designed as minimal pairs, provide enough power to find robust results. Perhaps this is why the majority of the results only show a very small number of units (most of which are actually buried in the supplement), but it's odd to me that they don't show the results of the minimal contrasts other than for length.
 
 (2) Related to point (1), other than Figure 2H and Figure 6A-B, the results are only shown for a tiny number of units. This is great for demonstrating qualitatively what the effects look like, but there is no quantification of the findings across the population, which undermines the point in the abstract that 1000 neurons were recorded. This is acknowledged in some places, but as a reader, it leaves me wondering how seriously to take the interpretations if they seemingly cannot be replicated. I understand this is a challenge with human single neuron recordings, but as presented, the paper as a whole comes across as largely anecdotal.
 
 (3) Some of the key claims rest on the idea that neurons were recorded from the superior temporal gyrus and fusiform gyrus. For the STG claim, I don't understand how this was done, or what specifically they mean by STG, since the microwire locations do not appear to be anywhere near the lateral surface. This makes sense given the profile of the Behnke-Fried electrodes, but if they want to claim that there are neurons from the STG, they need to be more specific and show where precisely these wires are. If they are more medial as it appears, they need to explain how they dissociated STG from Heschl's gyrus. Similarly, for the fusiform neurons, I can only see a couple of probes that appear to have their tips near where I would think this area is. Perhaps this is more of a visualization issue with Figure 1F, but overall, I am not convinced that the neurons are exactly where they say they are.
 
 (4) Related to point (3), some of the authors have made strong claims in prior work about the precise coordinates of the VWFA, so it would help to know how many units are within this exact region. The ROIs marked in Figure 2 are quite large, and given results like Vinckier et al. 2007, it's important to know where along the hierarchy the recordings were actually performed. Similarly, given the framing in the intro around the VWFA as a key area, the idea that some of the best example neurons are from the right fusiform is a bit confusing. I don't think they can make the claims about visual hemifields since it does not appear that they recorded eye tracking to verify constant central fixation, and it may be a bit surprising to see such strong orthographic selectivity in the right hemisphere (though, as a result, it may suggest a more nuanced view of lateralization of reading at the single neuron.
 
 (5) In many sections of the paper, there are vague and unquantified claims like "many neurons" or "a large number of units". This needs to be made explicit. It would also help to show where statistical threshold cutoffs are on plots like Figure 2H, since the "brain-score" is used to select units for many analyses.
 
 (6) More detail on the TRF models is needed in the methods. At the very least, a complete list of the features in each group is necessary to evaluate claims about very broad sets of features like "syntax". It would also help to know how the features were coded, especially where there is a mixture of continuous and discrete features within the model.
 
 (7) Depending on how exactly the features were defined, I'm skeptical of some of the claims, like position-specific "w". There are some obvious confounds that need to be controlled here, like whether word-initial "w" is strongly associated with shorter, higher frequency words (like "wh-" words). There are other examples, like whether specific forked letters tend to appear in certain syllables in English words. While it may be the case that these kinds of patterns are uniformly distributed, it needs to be established in this particular stimulus set.
 
 (8) The claim that there is monotonic encoding of word length does not seem strongly supported in the data. In both PC1 and the single neuron examples, it seems like there may be a non-linear relationship, which could suggest that another correlated feature (e.g., word frequency) is involved.
 
 Minor Points:
 
 (1) What are "boundaries"? They are not described anywhere I could find, but they are a feature group that was used in the TRFs. )
 
 (2) The caption for Figure 6C says MTG and insula, but the text says MTG and IFG. Similar to the above comment about STG and fusiform, it's not clear to me how they achieved single-unit recordings with Behnke-Fried probes in these areas.
 
 (3) The somewhat less robust correlations between firing rate and BGA in macro vs micro contacts are potentially interesting. However, did they verify that the closest macro contact was always in the gray matter of the same gyrus as the microwire?
 
 Review 1
3. Public_Reviews 01 Jun 2026
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 This manuscript, "Modality-Specific and Amodal Language Processing by Single Neurons," presents an intracranial electrophysiology study investigating how language is represented in the human brain across spoken and written modalities. The authors analyze activity from over one thousand single neurons and local field potentials recorded in twenty-one neurosurgical patients while participants read and listened to sentences. Using encoding models based on temporal receptive fields, they examine whether neural responses track modality-specific features, such as phonological and orthographic information, as well as higher-level linguistic features. The results are interpreted as evidence for a dissociation between modality-specific processing in sensory regions and modality-independent ("amodal") representations in temporal and frontal cortices, supporting a two-stage model of language processing.
 
 Strengths:
 
 This study uses a rare and valuable dataset, combining single-neuron recordings with broader field potential measures in human participants. The large-scale recording, in terms of both neuron count and anatomical coverage across multiple regions and individuals, represents a significant technical achievement for intracranial research.
 
 The use of encoding models to relate neural activity to multiple levels of linguistic representation is methodologically rigorous and provides a unified framework to compare phonological, orthographic, and higher-level features. This approach allows the authors to systematically test how different aspects of language are represented across neurons and regions.
 
 Another key strength is the attempt to directly link concepts from Linguistics to neural data. By framing the results in terms of modality-specific versus amodal representations, the study engages with longstanding theoretical questions and offers a potential bridge between linguistic theory and systems neuroscience.
 
 The manuscript is also very well written, and the data are presented clearly and effectively. The inclusion of raw data and raster plots is particularly valuable, as it allows readers to directly assess the neural responses and strengthens the transparency of the analyses.
 
 Weaknesses:
 
 Despite these strengths, the central claims of the paper are not fully supported by the analyses presented, and several key issues limit the strength of the conclusions.
 
 A primary concern is the lack of clear reporting and statistical characterization of the proportion of neurons that significantly encode the tested linguistic features. While the paper presents illustrative examples and regional patterns of encoding, it does not systematically quantify how many neurons exhibit significant effects across conditions, nor does it provide formal statistical comparisons of these proportions across brain regions or feature types. As a result, it is difficult to determine whether the reported dissociations reflect robust population-level phenomena or relatively sparse subsets of neurons identified through model fitting. Figure 2H offers a visual depiction of the distribution of Brain-Score (a measure of model evaluation) across the fusiform gyrus and superior temporal gyrus, but it falls short of providing formal statistical testing or quantitative summaries, limiting its interpretability in supporting the authors' claims. Given that the authors employ temporal receptive field (TRF) analyses, the framework naturally allows for straightforward quantification of the proportion of neurons that significantly encode any linguistic features in the model, which could be reported by region as well as by stimulus condition (auditory vs. visual). Including such analyses would further strengthen the population-level interpretation of the results.
 
 Relatedly, the interpretation of "amodal" neurons is not sufficiently substantiated. The classification of neurons as modality-independent relies on encoding model performance across conditions, but the statistical criteria for establishing cross-modal generalization are not always clearly defined or rigorously tested. Without explicit comparisons (e.g., testing whether the same neurons significantly encode features in both modalities above chance, and whether this exceeds what would be expected under appropriate null models), the claim of modality-independent representation remains somewhat underdetermined.
 
 More generally, the reliance on encoding models introduces some interpretational ambiguity. Although the observed dissociation between fusiform and superior temporal regions is consistent with orthographic and phonological processing, respectively, the feature spaces used in the models are partially linked to lower-level sensory properties (e.g., visual form and acoustic features). The authors' single-neuron results suggest these effects reflect genuine linguistic selectivity, but the findings do not uniquely distinguish between linguistic and perceptual explanations. While fully disentangling these factors may be beyond the scope of the current study, the manuscript could benefit from a brief discussion acknowledging these correlations or clarifying how lower-level sensory contributions were considered.
 
 Another limitation is that the proposed two-stage model of language processing is not directly tested against competing hypotheses. While the dissociation between modality-specific and amodal representations is consistent with this model, the authors note that higher-level features, such as syntax, may be encoded in a distributed or overlapping manner. These possibilities are not systematically tested, so the conclusions risk overinterpreting correlational patterns as evidence for a specific processing hierarchy. A more explicit discussion or quantitative consideration of these alternative accounts would strengthen the interpretation, while still allowing the two-stage model to be presented as a plausible framework.
 
 Review 2
4. Public_Reviews 01 Jun 2026
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary
 
 This paper analyzes human single-neuron activity recorded with Behnke-Fried electrodes during naturalistic listening and reading. The authors demonstrate a double dissociation between superior temporal gyrus neurons (responsive during listening but not reading) and fusiform gyrus neurons (responsive during reading but not listening), and report that these two classes of neurons show selectivity to specific phonological and orthographic features of the stimulus, respectively. Across the language network, the authors also report neurons whose responses are amodal (active during both listening and reading), which they organize into a modal-to-amodal processing hierarchy. A separate thread of analyses tracks the relationship between single-neuron spiking, micro-wire, and macro-wire signals across these regions. The authors interpret their findings as evidence for hierarchical processing across the language network and for a "compositional code" for orthography in reading.
 
 Strengths
 
 The dataset is rare and valuable. Simultaneous single-neuron, micro-wire, and macro-wire recordings during naturalistic reading and listening in the same patients are difficult to obtain, and the experimental design reflects substantial care. The cross-modality comparison at single-neuron resolution is a novel measurement, and the paper presents these results while also situating them against prior neuroimaging and intracranial work. The simultaneous availability of signals at three spatial scales within the human language network is an unusual and potentially important resource for the field.
 
 Weaknesses
 
 (1) Framing and novelty
 
 The paper appropriately situates its modality-selectivity findings against prior neuroimaging and intracranial work (citing Buchweitz et al. 2009 among others) and frames its novel contribution as bringing single-neuron resolution to a question that has previously been examined at population scales. This framing is fair as far as it goes. However, two issues remain. First, the paper does not engage with neuroimaging evidence that complicates its clean modality-selectivity story - most notably Wilson, Bautista, & McCarron (2018), who found that the dorsal superior temporal sulcus is activated by both intelligible and unintelligible inputs in both modalities. Several reconciliations of single-neuron modality selectivity with population-level cross-modal activation are possible (sparse coding, BOLD-vs-spiking dissociations, etc.), and the paper should engage with these possibilities. Second, the paper's discussion extends well beyond the modality-selectivity result that is its headline contribution, into broader claims about a "compositional code" for orthography and "hierarchical processing" across the language network. These broader claims are not supported by the analyses presented (see Weakness 3), and their inclusion distracts from and weakens the core finding rather than building on it. The paper would be stronger if these claims were either subjected to the population-level analyses they require or scaled back to exploratory observations.
 
 These framing issues are compounded by writing problems that obscure what the paper is claiming. Some passages, such as the assertion that the dataset "suggests an unprecedented examination of linguistic features across various brain regions at various resolutions," are not interpretable as written and should be rewritten.
 
 (2) Methodological concerns about the TRF analyses
 
 The selectivity findings in Figures 3 and 5 rest on temporal response function / temporal receptive field (TRF) analyses with several core issues.
 
 2.1) First, the construction of the TRF feature stream for the reading condition is not specified in the methods. Reading stimuli are presented in RSVP, with all letters of a word appearing simultaneously. How letter or letter-position features are mapped to a time-varying regressor reflects a substantive hypothesis about the psychological mechanisms of reading, with statistical consequences for what the TRF can recover and how reading and listening analyses can be compared.
 
 2.2) Second, the stimulus distribution limits which effects can be reliably estimated. While the design appears balanced for some features (e.g., subject gender and number), the features that drive the TRF analyses - particularly letter identity and position in the orthographic TRF - are unlikely to be well covered in a small stimulus set. This raises a concern about high-variance feature importance estimates.
 
 2.3) Third, the TRF feature set includes syntactic, semantic, and discourse predictors alongside phonological and orthographic features. The paper does not justify this choice in fitting single-neuron responses in STG and FSG, and the consequences for the unique-variance analyses are not discussed. Because syntactic features are correlated with phonological and orthographic features in natural stimuli (function words are short, have characteristic phoneme distributions, and so on), the unique variance attributed to each feature set depends on what is being controlled for. Including syntactic predictors when fitting STG or FSG neurons also risks inflating overall TRF fit by chance, particularly in the absence of cross-neuron correction.
 
 2.4) Fourth, there seems to be no correction for multiple comparisons across the neuron × feature grid. The within-neuron feature-importance procedure briefly described in the Figure 3 caption may help combat overestimates of feature importance within a single fit, but does not address the question of how many of the "selective" neurons reported across the paper would survive correction at the population level. With many neurons, many features, and a limited stimulus set, some neurons will appear selective to some features by chance alone, and these are likely to be the ones that appear as example panels in figures.
 
 Together, these issues mean the per-feature selectivity results cannot be interpreted as the paper currently interprets them. This is consequential because the per-feature selectivity findings underpin the paper's broader claims about a compositional code for orthography and about hierarchical processing across feature levels.
 
 (3) Claims that outrun the evidence
 
 Several of the paper's broader claims are not supported by the analyses presented.
 
 3.1) The authors claim a "compositional code" for orthography, in which single neurons code for the combination of letter identity and position. This claim is illustrated with two example neurons. A claim about a coding scheme is a population-level claim and requires a population-level analysis. A natural test would be a per-neuron model comparison between a TRF with letter identity alone and a TRF including letter identity × position interactions, controlled for model complexity, asking how many neurons show improved prediction with the interaction features. As noted above in {section sign}2.2, this analysis would also need to grapple with which letters and positions the data can support estimating. There is a potential connection to the data sparsity worries here: the n=2 example neurons may have the only selectivity profiles for which the relevant interactions could be estimated at all.
 
 3.2) The "hierarchical processing" claim is motivated by neurons selective to features at multiple levels - graphemes and sub-graphemes in reading, single phonemes and diphthongs in listening. This claim is not specified mechanistically. The paper does not state what kind of structural linguistic hierarchy is intended (segmental phonology to syllabic structure?), what kind of hierarchical neurocomputational mechanism is being proposed, or why selectivity at multiple levels of a feature hierarchy is evidence for that mechanism rather than for any other mechanism (e.g., parallel feature detectors). As written, the claim is too underspecified to evaluate.
 
 3.3) The "forked letters" finding (selectivity to k, v, w, y, z) is potentially confounded with letter frequency and co-occurrence structure. These letters are low-frequency, with some exhibiting strong positional asymmetries, and they infrequently co-occur with other letters. Under the unique-variance analysis, decorrelation from other features inflates apparent unique variance even in the absence of genuine selectivity.
 
 3.4) The word-length effect in Figure 4 is established by PCA on the top five fusiform neurons, with no analysis showing the effect is qualitatively similar across a broader selection. Beyond establishing that something varies with word length, the paper makes no substantive claim about what the neural code represents - for instance, whether it reflects letter- or word-specific processing or a more general visual response to stimulus extent. Prior intracranial work has reported word-length effects in regions posterior to the VWFA but not within it (Thesen et al. 2012), raising the question of whether the effect reported here reflects letter-specific processing or a more general visual response that happens to correlate with stimulus extent.
 
 (4) Missed opportunities
 
 Several aspects of the paper are not so much wrong as underdeveloped, in ways that the authors are well-positioned to address.
 
 4.1) The cross-scale comparison between single-neuron, micro-wire, and macro-wire signals is presented descriptively, without articulating what conclusion these analyses support about the relationship between scales of measurement. Given the rarity of simultaneous recordings at these scales, this is a substantial missed opportunity. The rasters in Figure 2 visually suggest a tight relationship between spiking and micro-population activity that is not evident in the summary in Figure 2g. This discrepancy is not explained. Characterizing the functional and temporal relationship linking spike rates to micro- and macro-HGA is a substantive scientific question, and the paper is well-positioned to address it.
 
 4.2) The stimuli include controlled grammatical manipulations, but these manipulations are used as nuisance regressors in the TRF analyses rather than as the object of structured analysis. A design with controlled comparisons is being treated as if it were unconstrained naturalistic stimulation, which underuses the experimental structure the authors built.
 
 4.3) Finally, the paper foregrounds the dataset as a contribution but does not describe data sharing plans. Given that several of this review's recommendations call for analyses the authors have not yet done, the long-term value of the dataset to the community will depend substantially on what is shared and how.
 
 Buchweitz, A., Mason, R. A., Tomitch, L. M., & Just, M. A. (2009). Brain activation for reading and listening comprehension: An fMRI study of modality effects and individual differences in language comprehension. Psychology & neuroscience, 2(2), 111-123.
 
 Jobard, G., Vigneau, M., Mazoyer, B., & Tzourio-Mazoyer, N. (2007). Impact of modality and linguistic complexity during reading and listening tasks. Neuroimage, 34(2), 784-800. Thesen, T., McDonald, C. R., Carlson, C., Doyle, W., Cash, S., Sherfey, J., Felsovalyi, O., Girard, H., Barr, W., Devinsky, O., Kuzniecky, R., & Halgren, E. (2012). Sequential then interactive processing of letters and words in the left fusiform gyrus. Nature communications, 3, 1284.
 
 Wilson, S. M., Bautista, A., & McCarron, A. (2018). Convergence of spoken and written language processing in the superior temporal sulcus. Neuroimage, 171, 62-74.
 
 Review 3
5. Public_Reviews 01 Jun 2026
 
 in eLife
 
 Author response:
 
 We thank the editors and reviewers for their constructive feedback on our manuscript. We accept the reviewers' recommendations and will implement them fully in our revised manuscript and include all of the suggested literature references. Below, we highlight several key points raised during the evaluation and outline exactly how we will address them. We will also explicitly address every other point and minor recommendation raised by the reviewers in our final, comprehensive point-by-point response.
 
 Population-level quantification and statistical thresholds: The reviewers noted that our manuscript relied on single-neuron examples without fully demonstrating how widespread these patterns are across the recorded population. To address this, we will add population-level quantification across the recorded units using standard False Discovery Rate (FDR) corrections for multiple comparisons. We will include summary tables in the text and add statistical threshold lines to the distribution figures to report the proportion of significant neurons per region.
 
 Identifying amodal neurons: Reviewers raised concerns that our classification of amodal language neurons required a more direct test. We will provide additional measures of modality and, in particular, we will implement a cross-modal generalization analysis where our encoding models are trained on one modality (e.g., listening) and evaluated on the other (e.g., reading). This additional procedure will classify neurons as amodal if their cross-modal predictive performance exceeds a baseline null model.
 
 Isolating linguistic features from sensory confounds: A point was raised regarding whether some neurons were tracking low-level sensory properties (like sound amplitude or visual text size) rather than language features. We will address this by running encoding analyses that include additional basic acoustic envelopes and visual baseline properties as control variables. This will allow us to evaluate the unique variance explained by linguistic features after accounting for these low-level sensory baselines.
 
 Evaluating the "Compositional Code" in the Fusiform Gyrus: Reviewers pointed out that our claim regarding a "compositional code" (neurons tracking a combination of letter identity and position) was supported primarily by individual examples. To provide population-level context, we will perform a model comparison across our fusiform gyrus neurons. We will compare a baseline letter-only model against a model that includes letter-by-position interactions to report how many neurons statistically support this compositional structure.
 
 TRF Feature and procedure explanation: Reviewers requested clarification on the construction of our TRF features. We will update the Methods section to explicitly detail how the features were constructed for both modalities. We will also include a feature correlation matrix in the Supplementary Materials. Furthermore, in order to contrast low-level possible confounds and high-level linguistic features, we will also conduct a control analysis tracking, e.g., specific affixes across different structural roles – for example, comparing how neurons respond to the phoneme /-s/ when it functions as a plural number marker versus when it appears as part of a lexical item (e.g., pass) or a third-person verb agreement. We will conduct such analyses in addition to fitting the main TRF models with these additional confounds included, ensuring a clear dissociation between high and low-level features.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.11.16.623907v2
www.biorxiv.org www.biorxiv.org

Contrasting walking styles map to discrete neural substrates in the mouse brainstem

4
1. Public_Reviews 01 Jun 2026
 
 in eLife
 
 eLife assessment
 
 This is a valuable survey of movements and locomotor patterns produced by circuits in the medial reticular formation (MRF) of the brainstem. The authors provide solid evidence that activation of GABAergic MRF neurons slowed down walking, activation of glutamatergic neurons induced a specific "shuffle" limb trajectory, and the activation of serotonergic neurons increased locomotor speed without affecting walking signature. This study adds to the growing body of knowledge about the effects of brainstem circuits on specific aspects of locomotor function.
 
 Summary
2. Public_Reviews 01 Jun 2026
 
 in eLife
 
 Reviewer #1 (Public Review):
 
 The medial reticular formation (MRF) in the brainstem has long been implicated in the regulation of locomotion. One common - albeit very simple - model often presents the MRF as a major relay station receiving inputs from MLR circuits, among other brain regions, that together convey locomotor signals through efferent projections targeting the caudal brainstem and the spinal cord. Yet, the MRF is a particularly large brain area whose cellular complexity is far from understood. How molecularly distinct MRF ensembles contribute to the regulation of locomotor behaviors is largely unknown. Here, the authors apply focal activation of either glutamatergic, GABAergic, or serotonergic neurons throughout the MRF using a chemogenetic gain-of-function approach to uncover the putative modulatory properties of these neuronal ensembles during walking. Using kinematic analysis of mice limbs during self-paced over-ground walkway locomotion, the authors find that activation of GABAergic MRF neurons can selectively slow down walking, whereas activation of glutamatergic neurons can induce a specific "shuffle" limb trajectory, altogether revealing that distinct MRF populations may retain the capability to engage divergent walking signatures, whose behavioral relevance are not yet clear. In contrast, the activation of serotonergic neurons did not affect walking signatures as described for the other two subgroups but led to an increase of locomotor speed. Interestingly, MRF neurons in each regional activation "hotspots" appear to target different domains in the lumbar spinal cord, suggesting that distinct circuit mechanisms are at play for the slowmo vs shuffle effects.
 
 Major points:
 
 1. While the experiments are carefully done and the results are well analyzed and clearly presented in a series of beautiful figures, several aspects of the methodology remain very confusing. In particular, the initial choice for the injection coordinates is not justified and the authors don't leverage the mapping of spinal projection neurons to drive their chemogenetic screen. Similarly, the authors group very different injection schemes (unilateral or bilateral targeting of MRF neurons), that should be analyzed separately. The choice of Z score cutoff that dictates the in-depth analysis of the chemogenetic phenotypes appears arbitrary and is not grounded in a set of objective criteria.
 
 2. One issue that arise from the work presented here is that we don't know if these MRF neurons are active during locomotion in normal, unperturbed conditions. Knowing the recruitment profile of these MRF neurons would clarify whether the chemogenetic activation boosts the firing of neurons that are already active during walking, or activate neurons that are otherwise silent. Disentangling between these possibilities may have a profound impact on the overall interpretation of the results.
 
 3. The results should be discussed in the broader context of historic stimulation experiments, notably in cats and other species, as well as more recent circuit mapping approaches in rodents. For instance, the notion that focal stimulation of distinct area within the MRF can elicit or modify the pattern of locomotion is not really new, so is the notion that some of these modulations are phase-specific and can influence the duration of single muscle activation during stance or swing phases. This last point has for instance already been assessed through individual muscle recordings paired with MRF stimulation in cats. Perhaps better introducing these key studies and a thorough discussion of what the results presented in this manuscript bring in terms of novelty will help readers ground this work into a more comprehensive and larger body of work.
 
 Review 1
3. Public_Reviews 01 Jun 2026
 
 in eLife
 
 Reviewer #2 (Public Review):
 
 This paper is an interesting conceptual work where certain hotspot areas were found to induce unique gait patterns. These patterns differed from a classic change in speed or gait pattern from a walk to a gallop. From this, a hypothesis was formed that these areas could be important for possible alternative walking patterns seen, for example, during pathologies such as Parkinson's disease or perhaps related to stalking behaviors.
 
 While I liked the work and found it interesting, it remains descriptive in that the actual behaviors observed can't be causally related to a particular behavior such as stalking or shuffling. If the necessity or sufficiency of this region was related to a specific hunting behavior, for example, its interest to the field would be greater.
 
 Nevertheless, this paper does contribute to growing evidence that specific behaviors can be triggered by specific neuronal populations within the brainstem.
 
 Review 2
4. Public_Reviews 01 Jun 2026
 
 in eLife
 
 Author response:
 
 Reviewer #1 (Public Review):
 
 The medial reticular formation (MRF) in the brainstem has long been implicated in the regulation of locomotion. One common - albeit very simple - model often presents the MRF as a major relay station receiving inputs from MLR circuits, among other brain regions, that together convey locomotor signals through efferent projections targeting the caudal brainstem and the spinal cord. Yet, the MRF is a particularly large brain area whose cellular complexity is far from understood. How molecularly distinct MRF ensembles contribute to the regulation of locomotor behaviors is largely unknown. Here, the authors apply focal activation of either glutamatergic, GABAergic, or serotonergic neurons throughout the MRF using a chemogenetic gain-of-function approach to uncover the putative modulatory properties of these neuronal ensembles during walking. Using kinematic analysis of mice limbs during self-paced over-ground walkway locomotion, the authors find that activation of GABAergic MRF neurons can selectively slow down walking, whereas activation of glutamatergic neurons can induce a specific "shuffle" limb trajectory, altogether revealing that distinct MRF populations may retain the capability to engage divergent walking signatures, whose behavioral relevance are not yet clear. In contrast, the activation of serotonergic neurons did not affect walking signatures as described for the other two subgroups but led to an increase of locomotor speed. Interestingly, MRF neurons in each regional activation "hotspots" appear to target different domains in the lumbar spinal cord, suggesting that distinct circuit mechanisms are at play for the slowmo vs shuffle effects.
 
 Major points:
 
 (1) While the experiments are carefully done and the results are well analyzed and clearly presented in a series of beautiful figures, several aspects of the methodology remain very confusing.
 
 A) In particular, the initial choice for the injection coordinates is not justified and the authors don't leverage the mapping of spinal projection neurons to drive their chemogenetic screen.
 
 Thank you for pointing this out. To clarify this, we now start the results with an extra paragraph and accompanying figures (Figure 2 and its supplementary figures) in which we define the region of interest (ROI) within the mRF. The ROI is based upon the distribution of reticulospinal neurons in the brainstem mRF that connect directly with the lumbosacral enlargement (whether or not this ROI projects to other CNS sites), which contains the main networks important for hindlimb control during locomotion, including walking gait. Reticulospinal neurons in the mRF in the caudal pons and medulla oblongata form longitudinal columns that together occupy up to more than half of the entire brainstem. While the morphology of the medulla and caudal pons varies little from level to level, in contrast to rapid changes at the midbrain level, this doesn’t necessarily mean that the neuronal populations, even within neurotransmitter classes, are homogeneous in connectivity and function. We have now clearly denoted the rostrocaudally extensive field with its dorsoventral and mediolateral dimensions that comprises the anatomical region of interest in the new figure. While this dataset is rather basic, it allows us to directly refer back to it and clarify additional queries that came up related to the anatomy (i.e. that the hotspots for slomo- and shuffle-like gaits only cover a small portion of the reticulospinal field).
 
 We then included detailed anatomical mapping of the spinal projections for the identified hotspots for changes in walking quality (phenomenology), the central theme of the study, and immediately adjacent regions to highlight contrasting location-connectivity-functional properties between these adjacent sites. To better incorporate these mapping results we now present it directly following the walking function based transfection site mapping, but before delving into the details of the walking gait phenotypes. We did not systematically include mapping results from all sites in the mRF ROI into this manuscript as this was beyond the scope of this already very large functional-anatomical study.
 
 B) Similarly, the authors group very different injection schemes (unilateral or bilateral targeting of MRF neurons), that should be analyzed separately.
 
 We now clarify early in the results section how uni- and bilateral groups were composed and what the rationale was for this. As pilot data suggested that the slomo gait style was only seen following bilateral activation in VGaT-cre mice, but not in all bilateral cases, we designed the VGaT cohort to contain mainly bilateral injections, spread across the mRF region of interest, with a smaller group of unilateral injections to verify the pilot data.
 
 For the shuffle gait style, pilot data suggested that both uni- and bilateral activation of VGluT2 neurons could elicit this style, but only in a subset of uni- and bilateral cases. Therefore we mainly included unilateral injections in this group with a smaller bilateral cohort for verification. This approach served the main goal of the study, which was to map the walking style changes to subregions in the mRF.
 
 However, laterality is indeed very important when it comes to locomotor control. The effects of laterality on the walking gait styles generated from the hotspots were included in supplemental figures and accompanying Tables. We have now better highlighted these in the body of the text and we have added analyses of the motor tests for uni- or bilateral groups.
 
 Furthermore, it should be noted that the uni- and bilateral groups are heterogeneous when it comes to rostrocaudal and dorsoventral placement within the mRF ROI. As such, we were not able to rigorously compare uni- versus bilateral activation effects while at the same time separating cases out by dorsoventral and rostrocaudal location (which would be needed to do justice to the functional anatomical organization of the mRF) as we do not have sufficient power in each of the subgroups (i.e. 3 rostrocaudal levels, with each a dorsal, intermediate and ventral region to target, which each would have to be injected unilaterally and bilaterally). This was beyond the scope of this already very large study. Further studies designed to balance ipsi- and contralateral groups will be necessary to map out the hotspots for mobility phenotypes that may be driven by the mRF beyond the slomo- and shuffle-hotspots or to systematically study the impact of laterality on mobility from the mRF.
 
 To summarize, analyses of uni- vs bilateral stimulation demonstrate that bilateral inhibition within the slomo hotspot is necessary to create the slomo walking phenotype, and that unilateral inhibition within the shuffle hotspot is sufficient to create the shuffle walking phenotype (with bilateral stimulation not enhancing the phenotype further). Unilateral activation of the slomo hotspot did not induce asymmetries in gait or a reduction in motor performance, whereas unilateral activation of the shuffle hotspot induced an asymmetry in swing time but not stride length, with laterality affecting horizontal ladder but not other motor tests. Mice with transfection sites within the mRF region of interest but outside of the slomo and shuffle hotspots did not display these walking phenotypes but did display slowed walking without qualitative changes. The connectivity to spinal and other supraspinal substrates differed between these sites, providing clues for the substrates that mediate these differential functions.
 
 C) The choice of Z score cutoff that dictates the in-depth analysis of the chemogenetic phenotypes appears arbitrary and is not grounded in a set of objective criteria.
 
 We are sorry that the Z score cutoff appeared arbitrary as that was not our intention.
 
 The values to separate mice with and without a significant change were simply set at 2 standard deviations from the population mean in the control mice (i.e. Z=2). Two standard deviations from the population mean is widely used in all types of statistical analyses. We have now included the rationale for the cutoff of Z=2 in the text. Where group size allowed, to increase contrast between positive and negative groups in terms of gait characteristics, other behavioral assays and mapping, we used data from Z scores >3 (or < -3), but can assure that all moderately positive data (i.e. from mice with gait style Z scores between 2 and 3, and between -3 and -2) was reported as well in the statistical tables or supplementary figures. We have now included the links to theses supplementary tables and figures in the text, rather than only in the figure legends.
 
 The Z scores for the different gait styles indeed appear to map to discrete sites, but the Z score cutoff was not informed by these sites or by anatomical data. Similarly, Z scores for changes in tonic muscle activity elicited by activation of inhibitory neurons also mapped to a hotspot in the same rostrocaudal column as the slomo gait style, but further caudally. This further demonstrates the strength of function-based mapping.
 
 (2) One issue that arise from the work presented here is that we don't know if these MRF neurons are active during locomotion in normal, unperturbed conditions. Knowing the recruitment profile of these MRF neurons would clarify whether the chemogenetic activation boosts the firing of neurons that are already active during walking, or activate neurons that are otherwise silent. Disentangling between these possibilities may have a profound impact on the overall interpretation of the results.
 
 We agree that this knowledge would improve our ability to interpret and apply the findings of the current study. It is indeed important to learn when these mRF sites are being recruited, whether part of normal modulatory strategies in order to navigate through a complex environment or as part of specialized behavioral modules or both. Another question is how loss of function in these sites impacts behavior and function. This concept has been added to the discussion and these questions can now be pursued in future experiments.
 
 (3) The results should be discussed in the broader context of historic stimulation experiments, notably in cats and other species, as well as more recent circuit mapping approaches in rodents. For instance, the notion that focal stimulation of distinct area within the MRF can elicit or modify the pattern of locomotion is not really new, so is the notion that some of these modulations are phase-specific and can influence the duration of single muscle activation during stance or swing phases. This last point has for instance already been assessed through individual muscle recordings paired with MRF stimulation in cats. Perhaps better introducing these key studies and a thorough discussion of what the results presented in this manuscript bring in terms of novelty will help readers ground this work into a more comprehensive and larger body of work.
 
 There is indeed a rich series of meticulous work done in cats, which included effects from stimulation of inhibitory and excitatory neurons on limb EMG, and rodent work focusing on excitatory mRF neurons. These studies show that distinct neurons or sites within the mRF drive distinct changes in motor readouts, albeit not described in terms of modulation of walking gait as we do here in terms of gait signatures. Despite this solid body of prior work, the notion of phase specificity and separate modulation of swing versus stance phase metrics has been underappreciated and therefore deserves to be emphasized. We have expanded the discussion to better highlight prior work and the interpretation of phase specificity has been enriched.
 
 Reviewer #2 (Public Review):
 
 This paper is an interesting conceptual work where certain hotspot areas were found to induce unique gait patterns. These patterns differed from a classic change in speed or gait pattern from a walk to a gallop. From this, a hypothesis was formed that these areas could be important for possible alternative walking patterns seen, for example, during pathologies such as Parkinson's disease or perhaps related to stalking behaviors.
 
 While I liked the work and found it interesting, it remains descriptive in that the actual behaviors observed can't be causally related to a particular behavior such as stalking or shuffling. If the necessity or sufficiency of this region was related to a specific hunting behavior, for example, its interest to the field would be greater.
 
 Nevertheless, this paper does contribute to growing evidence that specific behaviors can be triggered by specific neuronal populations within the brainstem.
 
 We thank the reviewer for their thoughtful comments. We agree that more studies are necessary to understand how the slomo and shuffle hotspots serve behavioral repertoires (such as stalking or other internally driven activities) and adaptations (such as object avoidance or more subtle adjustments to terrain or internal cues). The experimental details of the present study leave ample leads for the research community to pursue these new directions.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.04.19.537568v1
May 2026
www.biorxiv.org www.biorxiv.org

Impacts of DNA methylation on H2A.Z deposition and nucleosome stability

5
1. Public_Reviews 29 May 2026
 
 in eLife
 
 eLife Assessment
 
 This study provides valuable mechanistic insight into the mutually exclusive distributions of the histone variant H2A.Z and DNA methylation by testing two hypotheses: (i) that DNA methylation suppresses H2A.Z deposition by ATP-dependent chromatin remodelling complexes, and (ii) that DNA methylation destabilizes H2A.Z nucleosomes, thereby preventing H2A.Z retention. Through a series of well-designed and carefully executed experiments, solid support is presented for the first hypothesis. The evidence supporting the second hypothesis is less complete, and the extent to which either mechanism is responsible for H2A.Z exclusion from methylated DNA remains not entirely clear. This work will be of broad interest to researchers in chromatin biology and epigenetics.
 
 Summary
2. Public_Reviews 29 May 2026
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 The authors considered the mechanism underlying previous observations that H2A.Z is preferentially excluded from methylated DNA regions. They considered two non-mutually exclusive mechanisms. First, they tested the hypothesis that nucleosomes containing both methylated DNA and H2A.Z might be intrinsically unstable due to their structural features. Second, they explored the possibility that DNA methylation might impede SRCAP-C from efficiently depositing H2A.Z onto these DNA methylated regions. Their structural analyses revealed subtle differences between H2A.Z-containing nucleosomes assembled on methylated versus unmethylated DNA. To test the second hypothesis, the authors allowed H2A.Z assembly on sperm chromatin in Xenopus egg extracts and mapped both H2A.Z localization and DNA methylation in this transcriptionally inactive system. They compared these data with corresponding maps from a transcriptionally active Xenopus fibroblast cell line. This comparison confirmed the preferential deposition or enrichment of H2A.Z on unmethylated DNA regions, an effect that was much more pronounced in the fibroblast genome than in sperm chromatin. Furthermore, nucleosome assembly on methylated versus unmethylated DNA, along with SRCAP-C depletion from Xenopus egg extracts, provided a means to test whether SRCAP-C contributes to the preferential loading of H2A.Z onto unmethylated DNA.
 
 Strengths:
 
 The strength and originality of this work lie in its focused attempt to dissect the unexplained observation that H2A.Z is excluded from methylated genomic regions.
 
 Weaknesses:
 
 The study has two weaknesses. First, although the authors identify specific structural effects of DNA methylation on H2A.Z-containing nucleosomes, they do not provide evidence demonstrating that these structural differences lead to altered histone dynamics or nucleosome instability. Second, building on the elegant work of Berta and colleagues (cited in the manuscript), the authors implicate SRCAP-C in the selective deposition of H2A.Z at unmethylated regions. Yet the role of SRCAP-C appears only partial, and the study does not address how the structural or molecular consequences of DNA methylation prevent efficient H2A.Z deposition. Finally, additional plausible mechanisms beyond the two scenarios the authors considered are not investigated or discussed in the manuscript.
 
 Comments on revisions:
 
 The authors have addressed all previously raised concerns and propose a revised version of the manuscript. Notably, the abstract and discussion sections have been improved, and new experimental data have been incorporated. Collectively, these revisions enhance the rigor and clarity of the data interpretation and discussion.
 
 Given these improvements, this reviewer believes that the manuscript could be published, particularly if this publication is accompanied by the critical points discussed in the rebuttal letter.
 
 Review 1
3. Public_Reviews 29 May 2026
 
 in eLife
 
 Reviewer #2 (Public review):
 
 This manuscript aims to elucidate the mechanistic basis for the long-standing observation that DNA methylation and the histone variant H2A.Z occupy mutually exclusive genomic regions. The authors test two hypotheses: (i) that DNA methylation intrinsically destabilizes H2A.Z nucleosomes, thereby preventing H2A.Z retention, and (ii) that DNA methylation suppresses H2A.Z deposition by ATP-dependent chromatin-remodelling complexes. The revised manuscript addresses a number of previous concerns, and the manuscript has therefore improved accordingly. However, several limitations remain.
 
 Comments on revisions:
 
 The authors have addressed a number of my previous concerns, and the manuscript has improved accordingly. However, several limitations remain that, in my view, constrain the strength of the conclusions. In particular, the absence of a direct comparison with a canonical nucleosome assembled on the same DNA template. This control is essential to determine whether the observed effects are specific to H2A.Z or reflect more general properties of methylated DNA-nucleosome interactions. Notably, even within the authors' own data, there is a trend suggesting that methylated canonical H2A nucleosomes may also exhibit increased accessibility. Although this does not reach statistical significance, the authors themselves argue that subtle differences can be biologically meaningful; it is therefore plausible that extended digestion conditions (e.g., longer HinfI exposure) could reveal a significant effect. Unless a direct structural comparison with a canonical nucleosome is performed, the possibility that the reported phenomenon is not specific to H2A.Z remains. This is compounded by the reliance on a single restriction enzyme-based assay, which represents a limited experimental approach. Such an approach is insufficient to unequivocally support the central claim that DNA methylation increases accessibility of H2A.Z-containing nucleosomes. Additional orthogonal assays would be required to substantiate this conclusion. With respect to the cryo-EM analysis of methylated and unmethylated 601L H2A.Z nucleosomes, and in general, the authors still do not adequately consider the positional context of CpG methylation. Extensive literature demonstrates that the effects of DNA methylation on canonical nucleosome structure and stability are highly position-dependent. Without accounting for the location of methylated CpGs relative to key DNA-histone contact sites, the structural data remain difficult to interpret mechanistically. Overall, while the manuscript has improved, it remains a relatively limited study that draws broad mechanistic conclusions from a minimal experimental data.
 
 Review 2
4. Public_Reviews 29 May 2026
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 Histone variant H2A.Z is evolutionarily conserved among various species. The selective incorporation and removal of histone variants on the genome play crucial roles in regulating nuclear events, including transcription. Shih et al. aimed to address antagonistic mechanisms between histone variant H2A.Z deposition and DNA methylation. To this end, the authors reconstituted H2A.Z nucleosomes in vitro using methylated or unmethylated human satellite II DNA sequence and examined how DNA methylation affects H2A.Z nucleosome structure and dynamics. The cryo-EM analysis revealed that DNA methylation induces a more open conformation in H2A.Z nucleosomes. Consistent with this, their biochemical assays showed that DNA methylation subtly increases restriction enzyme accessibility in H2A.Z nucleosomes compared with canonical H2A nucleosomes. The authors identified genome-wide profiles of H2A.Z and DNA methylation using genomic assays and found their unique distribution between Xenopus sperm pronuclei and fibroblast cells. Using Xenopus egg extract systems, the authors showed SRCAP complex, the chromatin remodelers for H2A.Z deposition, preferentially bind to unmethylated DNA to deposit H2A.Z.
 
 Strengths:
 
 The experiments are rigorously performed, and interpretations are clear. The study presents a high-resolution cryo-EM structure of human H2A.Z nucleosome with methylated DNA. Although the effect of DNA methylation on the physical stability of the H2A.Z nucleosome is subtle, this would be important finding that warrants further functional investigation. The discovery that the SRCAP complex senses DNA methylation is novel and provides important mechanistic insight into the antagonism between H2A.Z and DNA methylation.
 
 Weaknesses:
 
 The authors have satisfactorily addressed my concerns.
 
 Review 3
5. Public_Reviews 29 May 2026
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the current reviews.
 
 Reviewer #1.
 
 We appreciate the constructive comments, which greatly improved this manuscript.
 
 Reviewer #2.
 
 We appreciate Reviewer #2's thorough analysis of our manuscript. However, we are concerned that the reviewer criticized a conclusion different from the one we claim in the manuscript. Although Reviewer #2's public comment stated, "Such an approach is insufficient to unequivocally support the central claim that DNA methylation increases accessibility of H2A.Z-containing nucleosomes", we did not draw such a bold conclusion. In the Abstract, we cautiously described that the impact of DNA methylation we observed was subtle and based on satellite II-derived DNA sequences. We made a nuanced proposal regarding this observation, stating, "Altogether, we propose that SRCAP drives the biased association of H2A.Z to unmethylated DNA, while additional mechanisms, potentially taking advantage of the subtle DNA methylation-induced physical effects, further assist the exclusion of H2A.Z from methylated DNA". We believe our analysis will contribute valuable insights into the mechanistic basis behind the antagonism between DNA methylation and H2A.Z.
 
 Reviewer #3.
 
 We appreciate the constructive comments, which greatly improved this manuscript.
 
 The following is the authors’ response to the original reviews.
 
 eLife Assessment
 
 This study provides valuable mechanistic insight into the mutually exclusive distributions of the histone variant H2A.Z and DNA methylation by testing two hypotheses: (i) that DNA methylation destabilizes H2A.Z nucleosomes, thereby preventing H2A.Z retention, and (ii) that DNA methylation suppresses H2A.Z deposition by ATP-dependent chromatin remodeling complexes. Through a series of well-designed and carefully executed experiments, findings are presented in support of both hypotheses. However, the evidence in support of either hypothesis is incomplete, so that the proposed mechanisms underlying the enrichment of H2A.Z on unmethylated DNA remain somewhat speculative.
 
 We would like to thank the editor and reviewers for their critical assessments of our manuscript. While we do acknowledge the limitations of our work, we believe that our results provide important mechanistic insights into the long-standing question of how H2A.Z is preferentially enriched in hypomethylated genomic DNA regions. First, our structural and biochemical data suggest that DNA methylation increases the openness and physical accessibility of H2A.Z, albeit the effect is relatively subtle and is sequence-dependent. Second, using Xenopus egg extracts and synthetic DNA templates, we provide the first clear and direct evidence that DNA methylation-sensitive H2A.Z deposition is due to the H2A.Z chaperone SRCAP-C, corroborated by our discovery that SRCAP-C binding to DNA is suppressed by DNA methylation. Although the molecular details by which DNA methylation inhibits binding of SRCAP-C is an important area of future study, in our current manuscript, we do provide evidence that directly links the presence of SRCAP-C to the establishment of the DNA methylation/H2A.Z antagonism in a physiological system. Thanks to criticisms by the reviewers, we realized that we did not clearly state in our Abstract that the impact of DNA methylation on intrinsic H2A.Z nucleosome stability is relatively subtle, although we did explain these observations and limitations in the main text. In our revised manuscript, we are willing to edit the text to better clarify the criticisms raised by the reviewers.
 
 Public Reviews:
 
 Reviewer #1 (Public review):
 
 Summary:
 
 The authors considered the mechanism underlying previous observations that H2A.Z is preferentially excluded from methylated DNA regions. They considered two non-mutually exclusive mechanisms. First, they tested the hypothesis that nucleosomes containing both methylated DNA and H2A.Z might be intrinsically unstable due to their structural features. Second, they explored the possibility that DNA methylation might impede SRCAP-C from efficiently depositing H2A.Z onto these DNA methylated regions.
 
 Their structural analyses revealed subtle differences between H2A.Z-containing nucleosomes assembled on methylated versus unmethylated DNA. To test the second hypothesis, the authors allowed H2A.Z assembly on sperm chromatin in Xenopus egg extracts and mapped both H2A.Z localization and DNA methylation in this transcriptionally inactive system. They compared these data with corresponding maps from a transcriptionally active Xenopus fibroblast cell line. This comparison confirmed the preferential deposition or enrichment of H2A.Z on unmethylated DNA regions, an effect that was much more pronounced in the fibroblast genome than in sperm chromatin. Furthermore, nucleosome assembly on methylated versus unmethylated DNA, along with SRCAP-C depletion from Xenopus egg extracts, provided a means to test whether SRCAP-C contributes to the preferential loading of H2A.Z onto unmethylated DNA.
 
 Strengths:
 
 The strength and originality of this work lie in its focused attempt to dissect the unexplained observation that H2A.Z is excluded from methylated genomic regions.
 
 Weaknesses:
 
 The study has two weaknesses. First, although the authors identify specific structural effects of DNA methylation on H2A.Z-containing nucleosomes, they do not provide evidence demonstrating that these structural differences lead to altered histone dynamics or nucleosome instability. Second, building on the elegant work of Berta and colleagues (cited in the manuscript), the authors implicate SRCAP-C in the selective deposition of H2A.Z at unmethylated regions. Yet the role of SRCAP-C appears only partial, and the study does not address how the structural or molecular consequences of DNA methylation prevent efficient H2A.Z deposition. Finally, additional plausible mechanisms beyond the two scenarios the authors considered are not investigated or discussed in the manuscript.
 
 Although we acknowledge the limitations of our study and are willing to expand our discussion to more thoroughly discuss these points, we believe our manuscript provides several important mechanistic insights which this reviewer may not have fully appreciated.
 
 Our first conclusion that H2A.Z nucleosomes on methylated DNA are more open and accessible compared to their unmethylated counterparts is supported by both our cryo-EM study and the restriction enzyme accessibility assay. Although the physical effect of DNA methylation is relatively subtle and is likely sequence dependent, as we clearly noted within the manuscript, the difference does exist and is valuable information for the chromatin field at large to consider.
 
 The second major conclusion of our manuscript is that SRCAP-C exhibits preferential binding to unmethylated DNA over methylated DNA, and that SRCAP-C represents the major mechanism that can explain the biased deposition of H2A.Z to unmethylated DNA in Xenopus egg extracts. Furthermore, our experiments using Xenopus egg extract clearly demonstrated that H2A.Z is deposited by both DNAmethylation sensitive and insensitive mechanisms. Depletion of SRCAP-C almost completely eliminated the levels of DNA-methylation-sensitive H2A.Z deposition and reduced the total level of H2A.Z on chromatin to less than half of that seen in non-depleted extract. This result demonstrated that DNA methylation-sensitive H2A.Z loading is primarily regulated by SRCAP-C, at least in our experimental context where transcription, replication, and other epigenetic modifications are not involved. It is likely that additional mechanisms do further contribute, implicated by our sequencing experiments, particularly at regions with active transcription, and we have noted these possibilities and the rationale for their existence in the Discussion.
 
 Our study also suggests that a SRCAP-independent, DNA methylation-insensitive mechanism of H2A.Z loading exists, which we suspect to be mediated by Tip60-C. In line with this possibility, our data suggest that Tip60-C binds DNA in a DNA methylation-insensitive manner in Xenopus egg extract. Since antibodies to deplete Tip60-C from Xenopus egg extract are currently unavailable, we were unable to directly test that hypothesis and decided not to include Tip60-C into our final model as we lacked experimental evidence for its role. However, whether or not Tip60-C is the complex responsible for the DNA methylation-insensitive pathway does not influence our final conclusion that SRCAP-C plays a major role in DNA methylation-sensitive H2A.Z loading. We are planning to edit our manuscript to more comprehensively discuss these points.
 
 Please note that while Berta et al reported that DNA methylation increases at H2A.Z loci in tumors defective in SRCAP-C, they selected those regions based off where H2A.Z is typically enriched within normal tissues (Berta et al., 2021). They did not show data indicating whether H2A.Z is still retained specifically at those analyzed loci upon mutation of SRCAP-C subunits. Thus, although we greatly admire their work and are pleased that many of our findings align with theirs, their paper did not directly address whether SRCAP-C itself differentiates between DNA methylation status nor the impact that has on H2A.Z and DNA methylation colocalization. In contrast, our Xenopus egg extract system, where de novo methylation is undetectable (Nishiyama et al., 2013; Wassing et al., 2024) offers a unique opportunity to examine the direct impact of DNA methylation on H2A.Z deposition using controlled synthetic DNA substrates. Corroborated with our demonstration that DNA binding of SRCAP-C is suppressed by DNA methylation, we believe that our manuscript provides a specific mechanism that can explain the preferential deposition of H2A.Z at hypomethylated genomic regions.
 
 Reviewer #2 (Public review):
 
 This manuscript aims to elucidate the mechanistic basis for the long-standing observation that DNA methylation and the histone variant H2A.Z occupy mutually exclusive genomic regions. The authors test two hypotheses: (i) that DNA methylation intrinsically destabilizes H2A.Z nucleosomes, thereby preventing H2A.Z retention, and (ii) that DNA methylation suppresses H2A.Z deposition by ATPdependent chromatin-remodelling complexes. However, neither hypothesis is rigorously addressed. There are experimental caveats, issues with data interpretation, and conclusions that are not supported by the data. Substantial revision and additional experiments, including controls, would be required before mechanistic conclusions can be drawn. Major concerns are as follows:
 
 We appreciate the critical assessment of our manuscript by this reviewer. Although we acknowledge the limitations of our study and will revise the manuscript to better describe them, we would like to respectfully argue against the statement that our "conclusions […] are not supported by the data".
 
 (1) The cryo-EM structure of methylated H2A.Z nucleosomes is insufficiently resolved to address the central mechanistic question: where the methylated CpGs are located relative to DNA-histone contact points and how these modifications influence H2A.Z nucleosome structure. The structure provides no mechanistic insights into methylation-induced destabilization.
 
 The fact that the DNA resolution in the methylated structure was not high enough to resolve the positions of methylated CpGs despite a high overall resolution of 2.78 Å implies that 1) the Sat2R-P DNA was not as stably registered as the 601L sequence, requiring us to create two alternative Sat2R-P atomic models to account for the variable positioning in our samples, and 2) that the presence of DNA methylation increases that positional variability. We understand that one may prefer to see highly resolved density around each methylation mark, but we do believe that our inability to accomplish that is actually a feature rather than a weakness and has important biological implications. The decrease in local DNA resolution on the methylated Sat2R-P structure compared to its unmethylated counterpart is meaningful and suggests to us that DNA methylation weakens overall DNA wrapping and positioning on the nucleosome, supported by the increased flexibility seen at the linker DNA ends as well as an increase in the population of highly shifted nucleosomes amongst the methylated particles. Additionally, one major view in the DNA methylation/nucleosome stability field is that the presence of DNA methylation can make DNA stiffer and harder to bend, causing opening and destabilization of nucleosomes (Ngo et al., 2016). The increased opening of linker DNA ends and accessibility of methylated H2A.Z nucleosomes in our hands also aligns with such an idea, again suggesting decreased histone-DNA contact stability on methylated DNA substrates. We plan to revise the writing in our manuscript to better reflect these ideas.
 
 The experimental system also lacks physiological relevance. The template DNA sequence is artificial, despite the existence of well-characterised native genomic sequences for which DNA methylation is known to inhibit H2A.Z incorporation. Alternatively, there are a number of studies examining the effect of DNA methylation on nucleosome structure, stability, DNA unwrapping, and positioning. Choosing one of these DNA sequences would have at least allowed a direct comparison with a canonical nucleosome. Indeed, a major omission is the absence of a cryo-EM structure of a canonical nucleosome assembled on the same DNA template - this is essential to assess whether the observed effects are H2A.Z-specific.
 
 The reviewer raises a fair question about whether canonical H2A would experience the same DNA methylation-dependent structural effects. We had considered solving the H2A structures, however, ultimately decided against it for a few reasons. First, there already exists crystal structures of canonical H2A nucleosomes using a DNA sequence highly similar to our Sat2R-P with and without the presence of DNA methylation (PDB: 5CPI and 5CPJ). The authors of this study did not see any physical differences present in their structures (Osakabe et al., 2015). Additionally, we had included canonical H2A conditions within our restriction enzyme accessibility assay and did not see a significant impact of DNA methylation on those samples (Fig 3). Because of the previous report and our own negative data, we expected that only limited additional insights would be obtained from the canonical H2A structures and decided not to pursue that analysis.
 
 One of the primary reasons we chose the Sat2R-P sequence was, as noted above, that there already was a published study examining how DNA methylation affects nucleosome structure using a variant of this sequence which we could compare to our results, as the reviewer has suggested. We did have to modify the sequence, namely by making it palindromic, in order to increase the final achievable resolution. We viewed the Sat2R-P sequence as an attractive candidate because it is physiologically relevant; the initial sequence was taken directly from human satellite II. Several modifications were made for technical reasons, including making the sequence palindromic as described above and also ensuring that each CpG is recognizable by a methylation-sensitive restriction enzyme so that we could be certain about the degree of methylation on our substrates. These practical concerns outweighed the necessity of maintaining a strict physiological sequence to us. However, we still believe the final Sat2R-P more closely mimics physiological sequences than Widom 601. Additionally, human satellite II is a highly abundant sequence in the human genome that is known to undergo large methylation changes on the onset of many disorders, like cancer, as well as during aging. Thus, there are interesting biological questions surrounding how the methylation state of this particular sequence affects chromatin structure.
 
 Furthermore, it has been reported that satellite II is devoid of H2A.Z (Capurso et al., 2012). Beyond those reasons, the satellite II sequence is generally interesting to our lab because we have been studying genes involved in ICF syndrome, where hypomethylation of satellite II sequences forms one of the hallmarks of this disorder (Funabiki et al., 2023; Jenness et al., 2018; Wassing et al., 2024). We understand that sequence context plays a large role in nucleosome wrapping and stability. This is why we strived to test multiple sequences in each of our assays. We do agree that it would be interesting to use DNA sequences where H2A.Z binding has already been described to be affected in a DNA methylation-dependent manner, forming an exciting future study to pursue.
 
 Furthermore, the DNA template is methylated at numerous random CpG sites. The authors' argument that only the global methylation level is relevant is inconsistent with the literature, which clearly demonstrates that methylation effects on canonical nucleosomes are position-dependent. Not all CpG sites contribute equally to nucleosome stability or unwrapping, and this critical factor is not considered.
 
 We did not argue that only the global methylation level is relevant. We also would appreciate it if the reviewer could provide specific references that "clearly demonstrates that methylation effects on canonical nucleosomes are position-dependent". We are aware of a series of studies conducted by Chongli Yuan's group, including one testing the effect of placing methylated CpGs at different positions along the Widom 601 sequence. In that study (Jimenez-Useche et al., 2013), they did find that positioning of mCpGs has differential impacts on the salt resistance of the nucleosomes, with 5 tandem mCpG copies at the dyad causing the most dramatic nucleosome opening whereas having mCpGs only at the DNA major grooves, but not elsewhere, increased nucleosome stability. However, they did also find that methylation of the original Widom 601 sequence also caused destabilization, albeit to a lesser degree, and another study by the same group (Jimenez-Useche et al., 2014) also found that CpG methylation decreased nucleosome-forming ability for all tested variants of the Widom 601 sequence, regardless of CpG density or positioning.
 
 Other studies monitored how distribution of methylated CpGs correlates with nucleosome positioning (Collings et al., 2013; Davey et al., 1997; Davey et al., 2004). However, these studies assessed the sequence-dependent effects specifically on nucleosome assembly during in vitro salt dialysis, which is a different physical process than the one our manuscript focuses on, especially when considering the fact that H2A.Z is deposited onto preassembled H2A-nucleosome. Our cryo-EM analysis examines the structural changes induced by DNA methylation on already formed nucleosomes rather than the process of formation. Thus, probing accessibility changes using a restriction enzyme was the more appropriate biochemical assay to verify our structures.
 
 We do very much agree that DNA context can influence nucleosome stability under different conditions. A study of molecular dynamics simulations concluded that the "combination of overall DNA geometrical and shape properties upon methylation" makes nucleosomes resistant to unwrapping (Li et al., 2022), while another modeling study suggests that DNA methylation impacts nucleosome stability in a manner dependent on DNA sequence, where "[s]trong binding is weakened and weak binding is strengthened" (Minary and Levitt, 2014). While G/C-dinucleotides are preferentially placed at major groove-inward positions in the nucleosomes in vivo (Chodavarapu et al., 2010; Segal et al., 2006) and G/C-rich segments are excluded from major groove-outward positions in Widom 601-like nucleosomes (Chua et al., 2012), methylated CpG dinucleotides are preferably, if not exclusively, located at major groove-outward positions in vivo. Mechanisms behind this biased mCpG positioning on the nucleosome remain speculative, likely caused by a combination of multiple factors, but the fact that we did not observe clear structural impacts using the Widom 601L sequence, where mCpGs are located at the major groove-outward and -inward positions ((Chua et al., 2012) and our structure), deserves a space for discussion. On the other hand, positioning of mCpG on satellite II-derived sequences that we used in this study was based on a physiological sequence, and thus it may not be appropriate to say that those CpGs are placed at multiple "random" positions. Although we decided not to discuss the position of 5mC on our Sat2R nucleosome structure due to ambiguous base assignments, neither of our two atomic models is consistent with an idea that DNA methylation repositions the CpG to the outward major grooves. As the potential contribution of how DNA methylation affects the nucleosome structure via modulating DNA stiffness has been extensively studied (Choy et al., 2010; Li et al., 2022; Ngo et al., 2016; Perez et al., 2012), we believe that it is appropriate to consider overall DNA properties along the whole DNA sequence, though we are willing to discuss potential positional effects in the revised manuscript.
 
 Perhaps one of the most important points that we did not emphasize enough in our original manuscript was that in contrast to the subtle intrinsic effect of DNA methylation that was DNA sequence dependent, we observed SRCAP-dependent preferential H2A.Z deposition to unmethylated DNA over methylated DNA in both 601 and satellite II DNAs. In the revised manuscript, we will make the value of comparative studies on 601 and satellite II in two distinct mechanisms.
 
 Finally, and most importantly, the reported increase in accessibility of the methylated H2A.Z nucleosome is negligible compared with the much larger intrinsic DNA accessibility of the unmethylated H2A.Z nucleosome. These data do not support the authors' hypothesis and contradict the manuscript's conclusions. Claims that methylated H2A.Z nucleosomes are "more open and accessible" must therefore be removed, and the title is misleading, given that no meaningful impact of DNA methylation on H2A.Z nucleosome stability is demonstrated.
 
 We respectfully disagree with this reviewer's criticism. We investigated the potential impact of DNA methylation on nucleosome stability to the best of our abilities through complementary assays and reported our observations. The effect of DNA methylation is smaller than the difference between H2A.Z and H2A, but we were able to see an effect. It is also not uncommon for small differences to have functional impacts in biological systems. We agree that further testing is required to determine whether this subtle effect is functionally important, and it remains the subject of future research due to the many technical challenges associated with addressing said question. We would like to note that 18 years have passed since Daniel Zilberman first reported the antagonistic relationship between H2AZ and DNA methylation (Zilberman et al., 2008) but very few studies have since directly tested specific mechanistic hypotheses. We believe that our study lays the groundwork for exciting future investigation that better elucidates the pathways that contribute to this antagonism and will have meaningful impacts on the field in general. However, thanks to the reviewer's criticism, we realized that we did not clearly state in the Abstract the relatively subtle effect of DNA methylation on the intrinsic H2A.Z nucleosome stability. Therefore, we will accordingly revise the Abstract to make this point clearer.
 
 (2) The cryo-EM structures of methylated and unmethylated 601L H2A.Z nucleosomes show no detectable differences. As presented, this negative result adds little value. If anything, it reinforces the point that the positional context of CpG methylation is critical, which the manuscript does not consider.
 
 We believe the inclusion and factual reporting of negative data is important for the scientific community as one of the major issues currently in biology research is biased omission of negative data. We considered eLife as a venue to publish this work for this reason. We understand that the reviewer believes our 601L structures may detract from the overall message of our manuscript. We believe this data rather emphasizes the importance of DNA sequence context, something that the reviewer also rightfully notes. It is standard practice in the nucleosome field to use the Widom 601 sequence, along with its variants. Our experience has shown that use of an artificially strong positioning sequence may mask weaker physical effects that could play a physiological role. Thus, we were careful to validate all further assays with multiple DNA sequences and believed it important to report these sequence-dependent effects on nucleosome structure.
 
 (3) Very little H3 signal coincides with H2A.Z at TSSs in sperm pronuclei, yet this is neither explained nor discussed (Supplementary Figure 10D). The authors need to clarify this.
 
 Our H3 signal, which represents the global nucleosome population, is more broadly distributed across the genome than H2A.Z, which is known to localize at specific genomic sites. Since both histone types were sequenced to similar read depths, H3 peaks are generally shallower than H2A.Z and peak heights cannot be directly compared (i.e. they should be represented in separate appropriate data ranges).
 
 (4) In my view, the most conceptually important finding is that H2A.Z-associated reads in sperm pronuclei show ~43% CpG methylation. This directly contradicts the model of strict mutual exclusivity and suggests that the antagonism is context-dependent. Similarly, the finding that the depletion of SRCAP reduces H2A.Z deposition only on unmethylated templates is also very intriguing. Collectively, these result warrants further investigation (see below).
 
 (5) Given that H2A.Z is located at diverse genomic elements (e.g., enhancers, repressed gene bodies, promoters), the manuscript requires a more rigorous genomic annotation comparing H2A.Z occupancy in sperm pronuclei versus XTC-2 cells. The authors should stratify H2A.Z-DNA methylation relationships across promoters, 5′UTRs, exons, gene bodies, enhancers, etc., as described in Supplementary Figure 10A.
 
 We agree that the substantial presence of co-localized H2A.Z and DNA methylation specifically in the sperm pronuclei samples and the changes in pattern between nuclear types are highly interesting and require further investigation. However, we faced technical challenges in our sequencing experiments that made us refrain from conducting a more detailed analysis for fear of over-interpreting potential artifacts. These challenges mainly stemmed from the difficulties in collecting enough material from Xenopus egg extracts and Tn5’s innate bias towards accessible regions of the genome. Because of this, open regions of the genome tend to be overrepresented in our data (as noted in our Discussion), making it challenging to rigorously compare methylation profiles and H2A.Z/H3 associated genomic elements.
 
 While the degree of separation seems to be dependent on nuclei type, we still believe the antagonism exists in both the sperm pronuclei and XTC-2 samples when comparing H2A.Z methylation profiles to the corresponding H3 condition. Our study also demonstrates that H2A.Z is preferentially deposited to hypomethylated DNA in a manner dependent of SRCAP-C (the loss of SRCAP only reduces H2A.Z on unmethylated substrates) but an additional methylation-insensitive H2A.Z deposition mechanism also exists. We realized that this interesting point was not clearly highlighted in Abstract, so we will revise it accordingly.
 
 (6) Although H2A.Z accumulates less efficiently on exogenous methylated substrates in egg extract, substantial deposition still occurs (~50%). This observation directly challenges the strong antagonistic model described in the manuscript, yet the authors do not acknowledge or discuss it. Moreover, differences between unmethylated and methylated 601 DNA raise further questions about the biological relevance of the cryo-EM 601 structures.
 
 As depicted in Figure 6 and described in the Discussion, we clearly indicated that both methylation-sensitive and methylation-insensitive pathways exist to deposit H2A.Z within the genome. We also directly stated in our Discussion that a substantial proportion of H2A.Z colocalizes with DNA methylation both in our study as well as in previous reports, which is of major interest for future study. Additionally, we further discussed how the absence of transcription in Xenopus eggs is a likely reason for the more limited effect of DNA methylation restricting H2A.Z deposition in our egg extract system.
 
 As noted in our response to (2), the lack of a clear impact on our 601L structures implies that this is due to the extraordinarily strong artificial nucleosome positioning capacity of the 601 sequence and its variants. Since 601 is heavily used in chromatin biology, including within DNA methylation research, such negative data are still useful to include and publish.
 
 (7) The SRCAP depletion is insufficiently validated i.e., the antibody-mediated depletion of SRCAP lacks quantitative verification. A minimum of three biological replicates with quantification is required to substantiate the claims.
 
 We are willing to address this concern. However, please note that our data showed that methylation-dependent H2A.Z deposition is almost completely erased upon SRCAP depletion, indicating functionally effective depletion. The specificity of the custom antibody against Xenopus SRCAP was verified by mass spectrometry. Additionally, we have obtained the same effect using another commercially available SRCAP antibody, though we did not include this preliminary result in our original manuscript. Due to its relatively low abundance and high molecular weight, SRCAP western blot signals are weak, making it challenging to quantify the degree of depletion. We also believe that the value of quantification in this context, with the points noted above, is rather limited. In the past, our lab has published papers on depleting the H3T3 kinase Haspin from Xenopus egg extracts (Ghenoiu et al., 2013; Kelly et al., 2010) but were never able to detect Haspin via western blot. This protein was only detected by mass spectrometry specifically on nucleosome array beads with H3K9me3 (Jenness et al., 2018). However, depletion of Haspin was readily monitored by erasure of H3T3ph, the enzymatic product of Haspin. In these experiments, it was impossible, and not critical, to quantitatively monitor the depletion of Haspin protein in order to investigate its molecular functions. Similarly, in this current study, the important fact is that depletion of SRCAP suppressed methylation-sensitive H2A.Z deposition and quantifying the degree of SRCAP depletion would not have a major impact on this conclusion.
 
 (8) It appears that the role of p400-Tip60 has been completely overlooked. This complex is the second major H2A.Z deposition complex. Because p400 exhibits DNA methylation-insensitive binding (Supplementary Figure 14), it may account for the deposition of H2A.Z onto methylated DNA. This possibility is highly significant and must be addressed by repeating the key experiments in Figure 5 following p400-Tip60 depletion.
 
 We are aware that the Tip60 complex is a very likely candidate for mediating DNA methylation insensitive H2A.Z deposition, which is why we tested whether DNA binding of p400 is methylation sensitive. Therefore, the reviewer's statement that we "completely overlooked" Tip60-C’s role does not fairly report on our efforts. We wished to test the potential contribution of Tip60-C, but, unfortunately, the antibodies we currently have available to us were not successful in depleting the complex from egg extract. Since we had no direct experimental evidence indicating the role Tip60-C plays, we decided to take a conservative approach to our model and leave the methylation-insensitive pathway as mediated by something still unidentified. While further investigating Tip60-C’s contribution to this pathway is of definite value, we do not believe that it impacts our major conclusion that SRCAP-C is the main mediator responsible for H2A.Z deposition on unmethylated DNA and thus remains a subject for future study.
 
 (9) The manuscript repeatedly states that H2A.Z nucleosomes are intrinsically unstable; however, this is an oversimplification. Although some DNA unwrapping is observed, multiple studies show that H3/H4 tetramer-H2A.Z/H2B interactions are more stable (important recent studies include the following: DOI: 10.1038/s41594-021-00589-3; 10.1038/s41467-021-22688-x; and reviewed in 10.1038/s41576-02400759-1).
 
 We understand that the H2A.Z stability field is highly controversial. We have introduced the many conflicting reports that have been published in the field but can further expand on the controversies if desired. We also understand that the term “nucleosome stability” is broad and encompasses many physical aspects. As noted in a prior response, we will better specify our use of the term within the manuscript. In our assays, we are most focused on the DNA wrapping stability of the nucleosome and have consistently seen in our hands that H2A.Z nucleosomes are much more open and accessible compared to canonical H2A on satellite II-derived sequences, regardless of methylation status. However, we do understand that many groups have observed the opposite findings while others have obtained results similar to us. We reported on our findings of the general H2A.Z stability with the hopes to help clarify some of the field’s controversies.
 
 In summary, the current manuscript does not present a convincing mechanistic explanation for the antagonism between DNA methylation and H2A.Z. The observation that H2A.Z can substantially coexist with DNA methylation in sperm pronuclei, perhaps, should be the conceptual focus.
 
 We appreciate this reviewer’s advice. However, please note that the first author who led this project has already successfully defended their PhD thesis primarily based on this project, making it impractical and unrealistic to completely change the focus of this manuscript to include an entirely new avenue of research. We believe that our data provide important insights into the mechanisms by which H2A.Z is excluded from methylated DNA, particularly via the DNA methylation-sensitive binding of SRCAP-C, which has never been described before. We agree that many questions are still left unanswered, including the exact molecular mechanism behind how DNA methylation prevents SRCAP-C binding. We have preliminary data that suggest none of the known DNA-binding modules of SRCAP-C, including ZNHIT1, by themselves can explain this sensitivity. This implies that domain dissection in the context of the holo-SRCAP complex is required to fully address this question. We believe this represents a very exciting future avenue of study; however, it does not negate our finding that SRCAP-C itself is important for maintaining the DNA methylation/H2A.Z antagonism. Therefore, we respectfully disagree with this reviewer's summary statement, which misleadingly undermines the impact of our work.
 
 Reviewer #3 (Public review):
 
 Summary:
 
 Histone variant H2A.Z is evolutionarily conserved among various species. The selective incorporation and removal of histone variants on the genome play crucial roles in regulating nuclear events, including transcription. Shih et al. aimed to address antagonistic mechanisms between histone variant H2A.Z deposition and DNA methylation. To this end, the authors reconstituted H2A.Z nucleosomes in vitro using methylated or unmethylated human satellite II DNA sequence and examined how DNA methylation affects H2A.Z nucleosome structure and dynamics. The cryo-EM analysis revealed that DNA methylation induces a more open conformation in H2A.Z nucleosomes. Consistent with this, their biochemical assays showed that DNA methylation subtly increases restriction enzyme accessibility in H2A.Z nucleosomes compared with canonical H2A nucleosomes. The authors identified genome-wide profiles of H2A.Z and DNA methylation using genomic assays and found their unique distribution between Xenopus sperm pronuclei and fibroblast cells. Using Xenopus egg extract systems, the authors showed SRCAP complex, the chromatin remodelers for H2A.Z deposition, preferentially deposit H2A.Z on unmethylated DNA.
 
 Strengths:
 
 The study is solid, and most conclusions are well-supported. The experiments are rigorously performed, and interpretations are clear. The study presents a high-resolution cryo-EM structure of human H2A.Z nucleosome with methylated DNA. The discovery that the SRCAP complex senses DNA methylation is novel and provides important mechanistic insight into the antagonism between H2A.Z and DNA methylation.
 
 We are grateful that this reviewer recognizes the importance of our study.
 
 Weaknesses:
 
 The study is already strong, and most conclusions are well supported. However, it can be further strengthened in several ways.
 
 (1) It is difficult to interpret how DNA methylation alters the orientation of the H4 tail and leads to the additional density on the acidic patch. The data do not convincingly support whether DNA methylation enhances interactions with H2A.Z mono-nucleosomes, nor whether this effect is specific to methylated H2A.Z nucleosomes.
 
 The altered H4 tail orientation and extra density seen on the acidic patch were incidental findings that we thought could be interesting for the field to be aware of but decided not to follow up on as there were other structural differences that were more directly related to our central question. We do believe that the above two differences are linked to each other because we used a highly purified and homogenous sample for cryo-EM analysis and the H4 tail/acidic patch interaction is a well characterized contact that mediates inter-nucleosome interactions. Additionally, other groups have reported that the presence of DNA methylation causes condensation of both chromatin and bare DNA (cited within our manuscript), though the mechanics behind this phenomenon remain to be elucidated. We believed that our structure data may also align with those findings. However, the reviewer is fair in pointing out that we do not provide further experimental evidence in verifying the existence of these increased interactions. We can revise our writing to clarify that these points are currently hypotheses rather than validated results.
 
 (2) It remains unclear whether DNA methylation alters global H2A.Z nucleosome stability or primarily affects local DNA end flexibility. Moreover, while the authors showed locus-specific accessibility by HinfI digestion, an unbiased assay such as MNase digestion would strengthen the conclusions.
 
 We would like to thank the reviewer for bringing up these issues. Although our current data cannot explicitly clarify these possibilities, we favor an idea that DNA methylation specifically alters histone to DNA contacts and that this effect is felt globally across the entire nucleosome rather than only at specific locations. The intrinsic flexibility of linker DNA ends means that that region tends to exhibit the greatest differences under different physical influences, hence the focus on characterizing that area; flexibility of a thread on a spool is most pronounced at the ends. However, we also found that the DNA backbone of H2A.Z on methylated DNA had a lower local resolution compared to its unmethylated counterpart, despite that structure having a higher global resolution, which suggested to us that DNA positioning along the nucleosome is overall weaker under the presence of DNA methylation. This is corroborated by the increased population of open/shifted structures in our classification analysis. The reviewer raises a fair point about the use of a specific restriction enzyme versus MNase. We agree that our accessibility assay is highly influenced by the position of the restriction site and have previously seen that moving the cut site too close to the linker DNA end will abolish any DNA methylation-dependent differences. We did initially attempt an MNase digestion-based assay, but the data were not as reproducible as with the use of a specific restriction enzyme. We do not know the reason behind this irreproducibility though we believe that the processivity of MNase could make it difficult to capture subtle effects like those induced by DNA methylation on already highly accessible H2A.Z nucleosomes. Overall, while we believe that DNA methylation does exert a physical effect, its subtlety may explain the many contradictory studies present within the DNA methylation and nucleosome stability field.
 
 References
 
 Berta, D.G., H. Kuisma, N. Valimaki, M. Raisanen, M. Jantti, A. Pasanen, A. Karhu, J. Kaukomaa, A. Taira, T. Cajuso, S. Nieminen, R.M. Penttinen, S. Ahonen, R. Lehtonen, M. Mehine, P. Vahteristo, J. Jalkanen, B. Sahu, J. Ravantti, N. Makinen, K. Rajamaki, K. Palin, J. Taipale, O. Heikinheimo, R. Butzow, E. Kaasinen, and L.A. Aaltonen. 2021. Deficient H2A.Z deposition is associated with genesis of uterine leiomyoma. Nature. 596:398–403.
 
 Capurso, D., H. Xiong, and M.R. Segal. 2012. A histone arginine methylation localizes to nucleosomes in satellite II and III DNA sequences in the human genome. BMC Genomics. 13:630.
 
 Chodavarapu, R.K., S. Feng, Y.V. Bernatavichute, P.Y. Chen, H. Stroud, Y. Yu, J.A. Hetzel, F. Kuo, J. Kim, S.J. Cokus, D. Casero, M. Bernal, P. Huijser, A.T. Clark, U.
 
 Kramer, S.S. Merchant, X. Zhang, S.E. Jacobsen, and M. Pellegrini. 2010. Relationship between nucleosome positioning and DNA methylation. Nature. 466:388–392.
 
 Choy, J.S., S. Wei, J.Y. Lee, S. Tan, S. Chu, and T.H. Lee. 2010. DNA methylation increases nucleosome compaction and rigidity. J Am Chem Soc. 132:1782–1783.
 
 Chua, E.Y., D. Vasudevan, G.E. Davey, B. Wu, and C.A. Davey. 2012. The mechanics behind DNA sequence-dependent properties of the nucleosome. Nucleic Acids Res. 40:6338–6352.
 
 Collings, C.K., P.J. Waddell, and J.N. Anderson. 2013. Effects of DNA methylation on nucleosome stability. Nucleic Acids Res. 41:2918–2931.
 
 Davey, C., S. Pennings, and J. Allan. 1997. CpG methylation remodels chromatin structure in vitro. J Mol Biol. 267:276–288.
 
 Davey, C.S., S. Pennings, C. Reilly, R.R. Meehan, and J. Allan. 2004. A determining influence for CpG dinucleotides on nucleosome positioning in vitro. Nucleic Acids Res. 32:4322–4331.
 
 Funabiki, H., I.E. Wassing, Q. Jia, J.D. Luo, and T. Carroll. 2023. Coevolution of the CDCA7-HELLS ICF-related nucleosome remodeling complex and DNA methyltransferases. Elife. 12.
 
 Ghenoiu, C., M.S. Wheelock, and H. Funabiki. 2013. Autoinhibition and polo-dependent multisite phosphorylation restrict activity of the histone h3 kinase haspin to mitosis. Mol Cell. 52:734–745.
 
 Jenness, C., S. Giunta, M.M. Muller, H. Kimura, T.W. Muir, and H. Funabiki. 2018. HELLS and CDCA7 comprise a bipartite nucleosome remodeling complex defective in ICF syndrome. Proc Natl Acad Sci U S A. 115:E876–E885.
 
 Jimenez-Useche, I., J. Ke, Y. Tian, D. Shim, S.C. Howell, X. Qiu, and C. Yuan. 2013. DNA methylation regulated nucleosome dynamics. Sci Rep. 3:2121.
 
 Jimenez-Useche, I., D. Shim, J. Yu, and C. Yuan. 2014. Unmethylated and methylated CpG dinucleotides distinctively regulate the physical properties of DNA. Biopolymers. 101:517–524.
 
 Kelly, A.E., C. Ghenoiu, J.Z. Xue, C. Zierhut, H. Kimura, and H. Funabiki. 2010. Survivin reads phosphorylated histone H3 threonine 3 to activate the mitotic kinase Aurora B. Science. 330:235– 239.
 
 Li, S., Y. Peng, D. Landsman, and A.R. Panchenko. 2022. DNA methylation cues in nucleosome geometry, stability and unwrapping. Nucleic Acids Res. 50:1864–1874.
 
 Minary, P., and M. Levitt. 2014. Training-free atomistic prediction of nucleosome occupancy. Proc Natl Acad Sci U S A. 111:6293–6298.
 
 Ngo, T.T., J. Yoo, Q. Dai, Q. Zhang, C. He, A. Aksimentiev, and T. Ha. 2016. Effects of cytosine modifications on DNA flexibility and nucleosome mechanical stability. Nat Commun. 7:10813.
 
 Nishiyama, A., L. Yamaguchi, J. Sharif, Y. Johmura, T. Kawamura, K. Nakanishi, S. Shimamura, K. Arita, T. Kodama, F. Ishikawa, H. Koseki, and M. Nakanishi. 2013. Uhrf1-dependent H3K23 ubiquitylation couples maintenance DNA methylation and replication. Nature. 502:249–253.
 
 Osakabe, A., F. Adachi, Y. Arimura, K. Maehara, Y. Ohkawa, and H. Kurumizaka. 2015. Influence of DNA methylation on positioning and DNA flexibility of nucleosomes with pericentric satellite DNA. Open Biol. 5.
 
 Perez, A., C.L. Castellazzi, F. Battistini, K. Collinet, O. Flores, O. Deniz, M.L. Ruiz, D. Torrents, R. Eritja, M. Soler-Lopez, and M. Orozco. 2012. Impact of methylation on the physical properties of DNA. Biophys J. 102:2140–2148.
 
 Segal, E., Y. Fondufe-Mittendorf, L. Chen, A. Thastrom, Y. Field, I.K. Moore, J.P. Wang, and J. Widom. 2006. A genomic code for nucleosome positioning. Nature. 442:772–778.
 
 Wassing, I.E., A. Nishiyama, R. Shikimachi, Q. Jia, A. Kikuchi, M. Hiruta, K. Sugimura, X. Hong, Y. Chiba, J. Peng, C. Jenness, M. Nakanishi, L. Zhao, K. Arita, and H. Funabiki. 2024. CDCA7 is an evolutionarily conserved hemimethylated DNA sensor in eukaryotes. Sci Adv. 10:eadp5753.
 
 Zilberman, D., D. Coleman-Derr, T. Ballinger, and S. Henikoff. 2008. Histone H2A.Z and DNA methylation are mutually antagonistic chromatin marks. Nature. 456:125–129.
 
 Recommendations for the authors:
 
 Reviewer #1 (Recommendations for the authors):
 
 The authors designed two sets of experiments to explore the molecular mechanisms underlying the mutually exclusive distribution of H2A.Z and DNA methylation previously reported by several groups.
 
 First, they examined how DNA methylation affects the physical stability of H2A.Z-containing nucleosomes. Although their results point to subtle differences between nucleosomes assembled on methylated versus unmethylated DNA, the authors did not extend their analyses to directly test the stability of these H2A.Z-containing nucleosomes under more challenging conditions. Prior studies have demonstrated that certain nucleosomes, such as those containing H3.3-H2A.Z or H2A.Z-H3K56Q, exhibit specific instability, but such instability is only revealed under challenging conditions, for example, altered salt concentrations or the presence of additional factors like FACT (PMID: 17575053; PMID: 19633671; PMID: 19639024; PMID: 41303375). In light of this literature, the observable structural features noted here for nucleosomes containing H2A.Z and methylated DNA are suggestive of increased instability, yet the authors did not employ comparable approaches to rigorously test whether such instability might explain the absence of H2A.Z from methylated genomic regions.
 
 As a result, at this stage of analysis, the idea that nucleosomes containing both H2A.Z and methylated DNA are intrinsically unstable, and that this instability accounts for the depletion of H2A.Z from methylated regions, remains unsubstantiated.
 
 We thank the reviewer's constructive criticisms. Through our response to these points, we were able to significantly improve our manuscript, including major rewriting of the Abstract and Discussion as well as incorporation of new data.
 
 We agree that combinations with other histone variants, modifications, and mutations could further affect our observed impact of DNA methylation on H2A.Z-nucleosome stability. What we observed based on satellite II-derived DNA was that DNA methylation made H2A.Znucleosomes (with H3.2) more open, although the effect of DNA methylation is relatively small (as compared to the general impact of H2A.Z incorporation). We readily admit that such a subtle physical effect is unlikely to be the main driver of the antagonistic distribution of H2A.Z and DNA methylation, though small physical changes have been known to influence larger biological functions, and sought to describe additional regulatory factors that could play major roles.
 
 We also agree that H3.3 is of major interest when discussing H2A.Z. In our Xenopus egg extract experiments using DNA beads, the primary H3 variant deposited is H3.3 as no DNA replication occurs on the beads to allow for H3.1/.2 replication-coupled deposition. From those experiments, we demonstrated that preferential loading of H2A.Z can be primarily explained by SRCAP. In other words, in the absence of SRCAP, loading/retention of H2A.Z on H3.3nucleosomes was not noticeably affected by DNA methylation, indicating that DNA methylation’s physical effects on H2A.Z nucleosomes plays little, if any, role in the preferential accumulation of H2A.Z on unmethylated DNA at least in the context of synthetic DNA beads incubated in
 
 Xenopus egg extract lacking active transcription. Our sequencing data hints at the interesting possibility that transcription, along with other factors missing in egg extract, may be involved in further pruning H2A.Z from methylated DNA which conceivably could take advantage of subtle physical alterations. However, we agree we lack firm supporting evidence for such a mechanism which led us to forgo including that in our final model figure and we instead only report on our observations with discussions on potential biological implications and limitations. Of note, it has been reported that the H2A.Z nucleosome is more accessible than the H2A nucleosome, while inclusion of H3.3 does not further enhance accessibility of the H2A.Z nucleosome (PMID 38920622). We have now noted these points in the Discussion of our revised manuscript.
 
 We appreciate and agree with this reviewer’s point that nucleosome instability sometimes requires challenging conditions to be fully revealed. However, in our system, use of H2A.Z was the challenge provided as we find in our hands that H2A.Z by itself substantially destabilizes histone-DNA contacts compared to canonical H2A. And it is only with this already destabilized nucleosome that we see further enhancement of accessibility/openness in the presence of DNA methylation. This is similar to findings by [PMID: 23260052] that reported that only an intrinsically destabilized sub-population of canonical H2A nucleosomes on 601 DNA experienced detectable physical changes in the presence of DNA methylation.
 
 In response to this reviewer's comment, we edited the Abstract and Discussion to clearly note the subtly of the impact of DNA methylation on H2A.Z nucleosome structure, and that the potential functional significance remains an open question.
 
 Second, the authors investigated whether SRCAP-C contributes to preferential H2A.Z incorporation into unmethylated DNA. The absence of H2A.Z from methylated regions does not necessarily imply that it cannot be incorporated there; it may instead reflect the chromatin environment associated with DNA methylation, which could disfavor SRCAP-C activity, whereas open chromatin environments strongly promote SRCAP-dependent H2A.Z deposition.
 
 This reviewer suggested an alternative model where SRCAP prefers to act on open chromatin and that the apparent preferential H2A.Z deposition to unmethylated DNA is due solely to the increased accessibility associated with unmethylated DNA. Following such a model, one would predict that SRCAP-C's preference to unmethylated DNA would be eliminated on nucleosome-free DNA in Xenopus egg extracts. To test this alternative model, we repeated the SRCAP-C binding experiment in egg extracts depleted of the HIRA complex, the H3.3-H4 chaperone responsible for de novo nucleosome assembly on exogenously added DNA in egg extracts. Contrary to this prediction, both SRCAP and ZNHIT1 still display preferential binding to unmethylated DNA substrates in HIRA-depleted extracts in which nucleosome assembly is suppressed (newly added Suppl Fig 16). The results argue that discrimination of SRCAP-C from methylated DNA is not due to a potential effect of chromatin compaction by DNA methylation. Furthermore, our new result is in line with an idea that SRCAP employs 1D diffusion on the linker DNA before engaging the H2A nucleosome (PMID 39131301), implying that discrimination of SRCAP-C from methylated linker DNA contributes to this process. This is now illustrated in the new model Figure 6.
 
 Please note we also indicate in both our model and in text that there exists an additional methylation-insensitive mechanism that drives H2A.Z deposition on methylated DNA, leading to a substantial amount of colocalized H2A.Z and DNA methylation. Why two different deposition pathways for H2A.Z differing in their methylation sensitivities must exist is an interesting topic for future work and has not been described prior to our report.
 
 This interpretation is consistent with the authors' own comparative mapping of H2A.Z and DNA methylation in sperm pronuclei incubated in egg extract versus a transcriptionally active Xenopus fibroblast line. They observed that about 40% of H2A.Z-associated genomic DNA is methylated in sperm pronuclei, but only 3% in fibroblasts. As they note, the major difference between these systems is the presence of transcription in fibroblasts, a process known to drive H2A.Z eviction/recycling, and which is absent in the egg-extract system. Thus, no specific inhibition of SRCAP-C by methylated DNA needs to be invoked: H2A.Z deposition on both methylated and unmethylated accessible regions, followed by preferential eviction from methylated sites in active nuclei, could fully account for the observed patterns.
 
 As the reviewer correctly notes here, we proposed that transcription is likely to play an important role in pruning H2A.Z from methylated DNA. Our observations and proposed mechanism do not argue against the possible existence of a DNA methylation-insensitive, transcription-dependent mechanism that promotes dissociation of H2A.Z from methylated DNA, which we believe likely would be correlated to gene body methylation. In fact, we did propose in our Discussion that such a transcription-mediated mechanism may conceivably take advantage of the subtly destabilized DNA wrapping of H2A.Z nucleosomes on methylated DNA to further selectively prune H2A.Z at colocalized regions. However, such a mechanism would be an additional component to what we have already described and does not explain the observed preferential recruitment of SRCAP-C to unmethylated DNA in Xenopus egg extracts in the absence of active transcription.
 
 In this respect, studies from the Felsenfeld laboratory showing that double-variant nucleosomes are highly unstable under physiological ionic conditions are particularly relevant (PMID: 19633671; PMID: 19639024). They demonstrated that such unstable nucleosomes are only evident under low ionic strength extraction conditions, emphasizing that the apparent absence of H2A.Z may reflect facilitated removal rather than failure of assembly.
 
 The authors may also have been influenced by the study of Berta et al. (cited in the manuscript), which examined uterine leiomyomas harboring somatic or germline mutations in SRCAP-C subunits. In those tumors, the normal association of H2A.Z with accessible, active chromatin, and its exclusion from methylated regions, was lost. However, this observation does not demonstrate that SRCAP-C actively prevents H2A.Z incorporation into methylated DNA. Instead, it may simply reflect that in the absence of SRCAP-C, a default, less efficient deposition pathway operates regardless of whether the chromatin environment is normally permissive or restrictive for SRCAP-dependent activity.
 
 Even if one accepts the more straightforward interpretation proposed by the present authors, that SRCAP-C is actively inhibited by methylated DNA, as suggested by their pull-down experiments from Xenopus egg extracts using unmethylated and methylated DNA, the hypothesis lacks mechanistic support.
 
 Considering this reviewers' criticism, we have expanded our discussion to indicate a possibility that SRCAP-C may have an alternative mechanism to find open chromatin independent of DNA methylation status. However, our data show that SRCAP-C preferentially binds to unmethylated DNA in a manner independent of transcription or other epigenetic status in Xenopus egg extracts, and that SRCAP-C carries the major mechanism that explains preferential deposition of H2A.Z to unmethylated DNA. Therefore, we believe that our study for the first time offers a mechanistic explanation of how H2A.Z discrimination from methylated DNA is accomplished through SRCAP-dependent H2A.Z deposition.
 
 The following points summarize the issues discussed above:
 
 (1) The authors did not sufficiently test the hypothesis that H2A.Z-methylated DNA nucleosomes are inherently unstable and could explain the exclusion of H2A.Z from methylated genomic regions.
 
 We stand by our conclusion that DNA methylation has an intrinsic capacity to make the H2A.Z nucleosome more open and accessible, even though the effect is subtle. We did not argue that this subtle effect can fully explain the exclusion of H2A.Z from methylated genomic regions. Rather, our Xenopus egg extract experiment suggested that in the transcriptionally inactive egg extract setting, such a mechanism plays little or no role and it is SRCAP-C instead that is the major driver. Whether this physical mechanism also contributes to their exclusion in cells with active transcription remains a future subject of study.
 
 (2) The proposed active role of SRCAP-C in preventing H2A.Z assembly on methylated DNA is supported only by limited experimental data and lacks a mechanistic explanation. In particular, this hypothesis does not account for the significant H2A.Z assembly observed on methylated DNA regions in sperm nuclei after incubation in egg extract.
 
 We respectfully disagree with this summary assessment. Our conclusions are well aligned with the substantial H2A.Z association with methylated DNA in sperm pronuclei assembled in Xenopus egg extracts seen. We demonstrated that:
 
 (1) In transcriptionally-silent Xenopus egg extracts using synthetic DNA beads, DNAbinding of SRCAP-C is inhibited by DNA methylation.
 
 (2) In this set up, H2A.Z is preferentially, if not exclusively, loaded to unmethylated DNA over methylated DNA.
 
 (3) Depletion of SRCAP-C almost completely eliminated preferential association of H2A.Z to unmethylated DNA, while leaving some DNA methylation-insensitive H2A.Z loading.
 
 (4) These data indicate the presence of a SRCAP-C-dependent, DNA methylationsensitive mechanism as well as a SRCAP-C-independent, DNA-methylation-insensitive mechanism to load H2A.Z to chromatin. This conclusion matches well with our genomic analysis showing that H2A.Z is preferentially but not exclusively loaded to hypomethylated genomic segments to sperm pronuclei in Xenopus egg extracts.
 
 (5) As we clearly discussed, this SRCAP-C-dependent mechanism by itself is insufficient to explain the much clearer exclusion of H2A.Z in somatic cells. We discussed the possibility that transcription contributes to further pruning of H2A.Z from methylated DNA.
 
 To deliver this overall message with nuances that we noted above, we have heavily revised the Abstract, the model Figure 6, and Discussion. Thanks to the criticisms raised by this reviewer, we believe that our revised manuscript has been significantly improved.
 
 Reviewer #2 (Recommendations for the authors):
 
 (1) A major omission is the absence of a cryo-EM structure of a canonical nucleosome assembled on the same DNA template - this is essential to assess whether the observed effects are H2A.Z-specific.
 
 We had considered solving the H2A structures, however, ultimately decided against it for a few reasons. First, there already exists crystal structures of canonical H2A nucleosomes using a DNA sequence highly similar to our Sat2R-P with and without the presence of DNA methylation (PDB: 5CPI and 5CPJ). The authors of this study did not see any physical differences present in their structures (Osakabe et al., 2015). Additionally, we had included canonical H2A conditions within our restriction enzyme accessibility assay and did not see a significant impact of DNA methylation on those samples (Fig 3). Because of the previous report and our own negative data, we expected that only limited additional insights would be obtained from the canonical H2A structures and decided not to pursue that analysis, considering the cost and effort for this additional cryo-EM analysis.
 
 (2) The reported increase in accessibility of the methylated H2A.Z nucleosome is negligible compared with the much larger intrinsic DNA accessibility of the unmethylated H2A.Z nucleosome. Claims that methylated H2A.Z nucleosomes are "more open and accessible" must therefore be removed, and the title is misleading, given that no meaningful impact of DNA methylation on H2A.Z nucleosome stability is demonstrated.
 
 We respectfully disagree with this reviewer's criticism. We investigated the potential impact of DNA methylation on nucleosome stability to the best of our abilities through complementary assays and reported our observations. The effect of DNA methylation is smaller than the difference between H2A.Z and H2A, but we were able to see an effect. It is also not uncommon for small differences to have functional impacts in biological systems. We agree that further testing is required to determine whether this subtle effect is functionally important, and it remains the subject of future research due to the many technical challenges associated with addressing said question. We would like to note that 18 years have passed since Daniel Zilberman first reported the antagonistic relationship between H2AZ and DNA methylation (Zilberman et al., 2008) but very few studies have since directly tested specific mechanistic hypotheses. We believe that our study lays the groundwork for exciting future investigation that better elucidates the pathways that contribute to this antagonism and will have meaningful impacts on the field in general. However, thanks to the reviewer's criticism, we realized that we did not clearly state in the Abstract that the effect of DNA methylation on intrinsic H2A.Z nucleosome stability is relatively subtle. We will accordingly revise the Abstract, the model Figure 6, and Discussion to make this point clearer.
 
 (3) The cryo-EM structures of methylated and unmethylated 601L H2A.Z nucleosomes show no detectable differences. As presented, this negative result adds little value and should be removed.
 
 We believe the inclusion and factual reporting of negative data is important for the scientific community as one of the major issues currently in biology research is biased omission of negative data. We considered eLife as a venue to publish this work for this reason. We understand that the reviewer believes our 601L structures may detract from the overall message of our manuscript, however, we believe that this data rather emphasizes the importance of DNA sequence context, something that the reviewer also rightfully notes. It is standard practice in the nucleosome field to use the Widom 601 sequence, along with its variants. Our experience has shown that use of an artificially strong positioning sequence may mask weaker physical effects that could play a physiological role. Thus, we were careful to validate all further assays with multiple DNA sequences and believed it important to report these sequence-dependent effects on nucleosome structure.
 
 (4) Very little H3 signal coincides with H2A.Z at TSSs in sperm pronuclei, yet this is neither explained nor discussed (Supplementary Figure 10D). The authors need to clarify this.
 
 Our H3 signal, which represents the global nucleosome population, is more broadly distributed across the genome than H2A.Z, which is known to localize at specific genomic sites. Since both histone types were sequenced to similar read depths, H3 peaks are generally shallower than H2A.Z and peak heights cannot be directly compared (i.e. they should be represented in separate appropriate data ranges).
 
 (5) In my view, the most conceptually important finding is that H2A.Z-associated reads in sperm pronuclei show ~43% CpG methylation. This directly contradicts the model of strict mutual exclusivity and suggests that the antagonism is context-dependent. Similarly, the finding that the depletion of SRCAP reduces H2A.Z deposition only on unmethylated templates is also very intriguing. Collectively, these result warrants further investigation (see below).
 
 (6) Given that H2A.Z is located at diverse genomic elements (e.g., enhancers, repressed gene bodies, promoters), the manuscript requires a more rigorous genomic annotation comparing H2A.Z occupancy in sperm pronuclei versus XTC-2 cells. The authors should stratify H2A.ZDNA methylation relationships across promoters, 5′UTRs, exons, gene bodies, enhancers, etc., as described in Supplementary Figure 10A.
 
 We appreciate recognition of the importance of our finding by this reviewer. We agree that the substantial presence of co-localized H2A.Z and DNA methylation specifically in the sperm pronuclei samples and the changes in pattern between nuclear types are highly interesting and require further investigation. However, we faced technical challenges in our sequencing experiments that made us refrain from conducting a more detailed analysis for fear of over-interpreting potential artifacts. These challenges mainly stemmed from the difficulties in collecting enough material from Xenopus egg extracts and Tn5’s innate bias towards accessible regions of the genome. Because of this, open regions of the genome tend to be overrepresented in our data (as noted in our Discussion), making it challenging to rigorously compare methylation profiles and H2A.Z/H3 associated genomic elements.
 
 While the degree of separation seems to be dependent on nuclei type, we still believe the antagonism exists in both the sperm pronuclei and XTC-2 samples when comparing H2A.Z methylation profiles to the corresponding H3 condition. Our study also demonstrates that H2A.Z is preferentially deposited to hypomethylated DNA in a manner dependent of SRCAP-C (the loss of SRCAP only reduces H2A.Z on unmethylated substrates) but an additional methylationinsensitive H2A.Z deposition mechanism also exists. We realized that this interesting point was not clearly highlighted in Abstract, so we will revise it accordingly.
 
 (7) Although H2A.Z accumulates less efficiently on exogenous methylated substrates in egg extract, substantial deposition still occurs (~50%). This observation directly challenges the strong antagonistic model described in the manuscript. The authors need to discuss this in more detail.
 
 As depicted in Figure 6 and described in the Discussion, we indicated that both methylation-sensitive and methylation-insensitive pathways exist to deposit H2A.Z within the genome. We also directly stated in our Discussion that a substantial proportion of H2A.Z colocalizes with DNA methylation both in our study as well as in previous reports, which is of major interest for future study. Additionally, we further discussed how the absence of transcription in Xenopus eggs is a likely reason for the more limited effect of DNA methylation restricting H2A.Z deposition in our egg extract system. In the revised manuscript, we heavily edited the Discussion to better clarify these points.
 
 (8) The SRCAP depletion is insufficiently validated, i.e., the antibody-mediated depletion of SRCAP lacks quantitative verification. A minimum of three biological replicates with quantification is required to substantiate the claims.
 
 In response to this, quantification of the SRCAP depletion is now included as Supplementary Figure 13A and B. Since our anti-ZNHIT1 antibodies reproducibly detected ZNHIT1 on DNA beads isolated from egg extracts, we have conducted additional verification of the SRCAP depletion by probing for SRCAP and ZNHIT1 on DNA beads, confirming that these proteins were depleted on DNA beads upon immunodepletion with anti-SRCAP antibodies (Author response image 1). To further validate this conclusion, we added data showing that the effect of SRCAP depletion on methylation-sensitive H2A.Z deposition was reproduced through use of a different commercially available antibody raised against human SRCAP (newly added Suppl Fig 14).
 
 Author response image 1.
 
 Verification of SRCAP depletion using DNA beads. DNA beads were incubated in interphase-cycled Xenopus egg extract that had been depleted with either our custom SRCAP antibody or an IgG negative control. SRCAP and ZNHIT1 association was then assessed via Western Blot.
 
 (9) It appears that the role of p400-Tip60 has been completely overlooked. This complex is the second major H2A.Z deposition complex. Because p400 exhibits DNA methylation-insensitive binding (Supplementary Figure 14), it may account for the deposition of H2A.Z onto methylated DNA. This possibility is highly significant and must be addressed by repeating the key experiments in Figure 5 following p400-Tip60 depletion.
 
 Thank you very much for raising this interesting point. We were aware that the TIP60 complex is a very likely candidate for mediating DNA methylation-insensitive H2A.Z deposition, which is why we tested whether DNA binding of p400 is methylation sensitive (shown in the revised Supplementary Figure 15). We wished to test the potential contribution of TIP60-C, but, unfortunately, the antibodies we currently have available to us were not successful in depleting the complex from egg extract. Since we had no direct experimental evidence indicating the role TIP60-C plays, we decided to take a conservative approach to our model and leave the methylation-insensitive pathway as mediated by something still unidentified. While further investigating TIP60-C’s contribution to this pathway is of definite value, we do not believe that it impacts our major conclusion that SRCAP-C is the main mediator responsible for H2A.Z deposition on unmethylated DNA and thus remains a subject for future study. However, we have now added descriptions to note that TIP60-C is a likely candidate to execute the SRCAPindependent and methylation-insensitive mechanism of H2A.Z loading in Xenopus egg extracts. In the model figure, we initially did not include Tip60-C, but we now infer TIP60-C is a likely candidate in the revised model (Figure 6) to facilitate the future research in the field.
 
 (10) The manuscript repeatedly states that H2A.Z nucleosomes are intrinsically unstable; however, this is an oversimplification. Although some DNA unwrapping is observed, multiple studies show that H3/H4 tetramer-H2A.Z/H2B interactions are more stable (important recent studies include the following: DOI: 10.1038/s41594-021-00589-3; 10.1038/s41467-021-22688-x; and reviewed in 10.1038/s41576-024-00759-1). These references should be considered.
 
 We appreciate that the reviewer points out this important issue. Although we had described that controversy exists regarding how H2A.Z and DNA methylation contributes to nucleosome stability, it was not clearly explained. We understand that this confusion was in part due to the term “nucleosome stability”, which is broad and encompasses many physical aspects. As noted in a prior response, we now better specify our use of the term within the manuscript, emphasizing the nucleosome openness and accessibility, particularly at the nucleosome core particle entry/exit sites. As noted by published studies (PMID 38920622), the impact on nucleosome stability may differ between the internal and external segments of nucleosomal DNA. In our assays, we are most focused on the DNA wrapping stability of the nucleosome and have consistently seen in our hands that H2A.Z nucleosomes are much more open and accessible at DNA ends compared to canonical H2A on satellite II-derived sequences, regardless of methylation status. However, we do understand that many groups have observed the opposite findings while others have obtained results similar to us. This may be caused by usage of different assays (for example, nucleosome assembly during salt dialysis or salt sensitivity vs openness/accessibility of preassembled nucleosome). In the Discussion of the revised manuscript, we now explain these factors, with the hope that our study will help clarify some of the field’s controversies.
 
 Reviewer #3 (Recommendations for the authors):
 
 (1) Since the cryo-EM structure determined by single-particle analysis represents only one major population, it would be important to determine the dyad axis position by complementary biochemical assays, such as MNase-seq or chemical digestion by the Fenton reaction (PMID: 22929776).
 
 We would like to thank the reviewer for bringing up this important issue. We agree that the high-resolution structure represents only a subpopulation in which we specifically selected for the most stably wrapped nucleosomes in each sample. This issue is why we then supplemented our high-resolution structure with our in-silico classification analysis to survey the overall structure distribution of the full nucleosome particle population. The classification input contains all nucleosome-like particles picked from both unmethylated and methylated sample micrographs mixed together, ensuring that all particles are taken into consideration and that both samples have been analyzed in an identical manner. From our sorting analysis, we find an increased population of open and shifted nucleosome structures present in our methylated DNA sample, indicating destabilization of DNA-histone wrapping with DNA methylation. This is corroborated by the lower local resolution seen on the DNA backbone of our high-resolution H2A.Z on methylated DNA structure, despite it having a higher global resolution compared to its unmethylated counterpart. This suggested to us that DNA positioning along the nucleosome is overall weaker under the presence of DNA methylation.
 
 The reviewer raises a fair point about the use of a specific restriction enzyme versus MNase. We agree that our accessibility assay is highly influenced by the position of the restriction site and have previously seen that moving the cut site too close to the linker DNA end will abolish any DNA methylation-dependent differences. We realized that we did not explain how we decided to place the HinfI site in the context of our solved cryo-EM structure. In the revised Figure 3B, we now illustrate that the HinfI site is located at a segment where H2A/H2A.Z directly contacts the DNA and explained that this segment belongs to the region that exhibited clear methylation-induced flexibility in our cryo-EM structures. Thus, our structure helped us design this experiment.
 
 We did initially attempt an MNase digestion-based assay, but the data were not as reproducible as with the use of a specific restriction enzyme. We do not know the reason behind this irreproducibility though we believe that the processivity of MNase could make it difficult to capture subtle effects like those induced by DNA methylation on already highly accessible H2A.Z nucleosomes, as subtle technical errors in the MNase concentration can have significant effects. Overall, while we believe that DNA methylation does exert a physical effect, its subtlety may explain the many contradictory studies present within the DNA methylation and nucleosome stability field.
 
 (2) I assume that the authors confirmed complete DNA methylation by restricted enzyme digestion. It would be helpful to include this validation in supplementary figures.
 
 We would like to thank the reviewer for pointing out that this critical verification was missing from our initial manuscript. DNA methylation of Sat2R-P and Sat2R was verified via BstBI digestion (Suppl Fig 1B and 7D, respectively); 601L verified with HpaII digestion (Suppl Fig 6B); and 19x601 DNA verified via BstUI digestion (Suppl Fig 11A). All data has been added to the specified figures. Unfortunately, the 16xHSat2 DNA substrate we used in our assays does not contain appropriate cut-sites for methylation-sensitive restriction enzymes. Due to that, we always prepared the 16xHSat2 DNA in parallel with the 19x601 substrate under identical conditions then use digestion of the 19x601 substrate to verify quality of methylation for each batch. To more directly verify methylation of 16xHSat2 DNA, we used Xenopus laevis ZHX2 and ZHX3, which we recently identified as proteins that selectively associate with methylated DNA in Xenopus egg extracts. Although identification and characterization of Xenopus ZHX2/3 will be described elsewhere, previous published proteomic studies have also identified mammalian ZHXs as proteins that enrich on methylated DNA (PMID 21029866, 23434322). By incubating DNA beads in Xenopus egg extract and probing for endogenous ZHX2/3 (our antibody recognizes both ZHX2 and ZHX3), we verified that ZHXs selectively binds to methylated 16xHSat2 but not unmethylated DNA (Author response image 2). Although this does not necessarily verify that all CpGs in 16xHSat2 were methylated, we observed comparable methylation-induced inhibition of SRCAP binding between 16x601 and 16HSat2, supporting our conclusion.
 
 Author response image 2.
 
 Verification of 16xHSat2 methylation status via ZHX2/3 protein binding. 16xHSat2 DNA beads were incubated in Xenopus egg extract and endogenous ZHX2/3 protein binding assessed via Western Blot with a custom generated antibody that recognizes both ZHX2 and ZHX3.
 
 (3) Figure 1A: The dyad position is difficult to identify. Please indicate it clearly using a distinct color (not green).
 
 We now directly indicate each sequence midpoint with a black triangle and also changed the font of DNA sequences to further clarify that the dyad resides at the palindromic center.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.07.31.667981v3
www.biorxiv.org www.biorxiv.org

Deciphering interferon functions in avian influenza: Insights from receptor knockout models in the natural host

3
1. Public_Reviews 29 May 2026
 
 in eLife
 
 eLife Assessment
 
 This study reports on the development and characterization of chickens with genetic deficiencies in type I or type III interferon receptors, which is an important contribution to the field of avian immunology. The data reflecting the development of the new interferon-receptor-deficient chickens is compelling. The initial characterization of IFN biology and infection responses in these knockout chickens provides a solid foundation for future studies on the distinct contributions of type I and type III interferon signaling to antiviral responses.
 
 Summary
2. Public_Reviews 29 May 2026
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 This is a laudable effort to help dissect the contributions of type I and type III IFNs to the antiviral response in chicken and therefore represents an important piece of work, not least in the light of birds being a key carrier and worldwide distributor of influenza virus. The first part of the study characterises the generation of IFNAR and IFNLR KO chicken strains and describes basic differences. Four different viruses are then tested in chicken embryos, while the subsequent analysis of the antiviral response in vivo is performed with one influenza H3N1 strain.
 
 Strengths:
 
 Having these two KO chicken strains as a tool is a great achievement. The initial analysis is solid. Clear effect of IFNAR deficiency in in vivo infection, less so for IFNLR deficiency.
 
 Weaknesses:
 
 (1) The antibody induction by KLH immunisation: We still don't know whether or not this vaccination induces IFN responses in wt mice, so it is still not possible to judge whether the effects observed are due to steady-state differences or to differential effects of IFN induced during the vaccination phase. Pre-immune results are now shown and are indeed zero. As suggested, the whole figure 4 is now condensed into one or two panels by proper calculation of Ab titers - would these titres be significantly different? This as all of the other in vivo experiments have not been repeated if I understand the methods section correctly. I understand that there are three R restrictions that are tighter in some countries, and I accept that with the numbers used here, some statistical significance is reached, but this is for instance not the case for survival.
 
 (2) The basic conundrum here and in later figures is now addressed by the authors in the discussion: Situations where IFN type 1 and 3 signalling deficiency each have an independent effect (i.e. fig.4d) suggest that they act by separate, unrelated mechanisms. However, all the literature about these IFN families suggest that they show almost identical signalling and gene induction downstream of their respective receptors. How can the same signalling, clearly active here downstream of the receptors for IFN type 1 or type 3, be non-redundant, i.e. why does the unaffected IFN family not stand in? The mouse studies, which showed a rather subtle phenotype when only one of the two IFN systems was missing, but a massive reduction in virus control in double KO mice, are discussed, but a clear-cut explanation for the differences has not been reached. Reasons could be a direct effect of IFNab on B cells and an indirect effect of IFNL through non-B cells, timing issues, and many other scenarios can be envisaged. The authors do not address this question experimentally, which limits the depth of analysis, they have however now included a discussion of this dilemma.
 
 (3) In the one in vivo experiment performed with chickens, only one virus tested, more influenza strains should be included as well as non-influenza viruses. I appreciate that this is logistically difficult.
 
 (4) The basic conundrum of point 2 applies equally to Fig. 6a, both KOs have a phenotype. Again, in 6d, both IFNs appear to be separately required for Mx induction. An explanation has been attempted, but more experiments, for instance looking at different time points to understand if we are dealing simply with different kinetics of the response, have not been attempted, despite the fact that such experiments are likely not covered by strict three R rules.
 
 (5) The in vivo infection is the most interesting experiment, and the key outcome here is that IFN type 1 is crucial for anti-H3N1 protection in chickens, while type 3 is less impactful. However, this experiment suffers from the different time points when chickens were culled, so many parameters are impossible to compare (e.g. weight loss, histopathology). Some explanation is given as to the comparisons chosen here, but a more thorough analysis at several time points would have strengthened this study.
 
 Comments on revised version:
 
 In the rebuttal, the authors have gone to some length to add to the discussion of the experiments, and some aspects are better explained now than before. Many of these explanations remain speculative however, so the study remains inconclusive in several aspects. As no new data was added, my overall judgement of this study remains unchanged.
 
 Review 1
3. Public_Reviews 29 May 2026
 
 in eLife
 
 Author response:
 
 Public Reviews:
 
 Reviewer #1 (Public review):
 
 Summary:
 
 This manuscript presents an extensive body of work and an outstanding contribution to our understanding of the IFN type I and III system in chickens. The research started with the innovative approach of generating KO chickens that lack the receptor for IFNα/β (IFNAR1) or IFN-λ (IFNLR1). The successful deletion and functional loss of these receptors was clearly and comprehensively demonstrated in comparison to the WT. Moreover, the homozygous KO lines (IFNAR1-/- or IFNLR1-/-) were found to have similar body weights, and normal egg production and fertility compared to their WT counterparts. These lines are a major contribution to the toolbox for the study of avian/chicken immunology.
 
 The significance of this contribution is further demonstrated by the use of these lines by the authors to gain insight into the roles of IFN type I and IFN-type III in chickens, by conducting in ovo and in vivo studies examining basic aspects of immune system development and function, as well as the responses to viral challenges conducted in ovo and in vivo.
 
 Based on solid, state-of the-art methods and convincing evidence from studies comparing various immune system related functions in the IFNAR1-/- or IFNLR1-/- lines to the WT, revealed that the deletion of IFNAR1 and/or IFNLR1 resulted in:
 
 (1) impaired IFN signaling and induction of anti-viral state;
 
 (2) modulation of immune cell profiles in the peripheral blood circulation and spleen;
 
 (3) modulation of the cecum microbiome;
 
 (4) reduced concentrations of IgM and IgY in the blood plasma before and following immunization with model antigen KLH, whereby also line differences in the time-course of the antibody production were observed;
 
 (5) decrease in MHCII+ macrophages and B cells in the spleen of IFNAR1 KO chickens, although the MHCII-expression per cell was not affected in this line; and
 
 (6) reduction in the response of αβ1 TCR+ T cells of IFNAR1 KO chickens as suggested by clonal repertoire analyses.
 
 These studies were then followed by examination of the role of type I and type III IFN in virus infection, using different avian influenza A virus strains as well as an avian gamma corona virus (IBV) in in ovo challenge experiments. These studies revealed: viral titers that reflect virus-species and strain-specific IFN responses; no differences in the secretion of IFN-α/β in both KO compared to the WT lines; a predominant role of type I IFN in inducing the interferon-stimulated gene (ISG) Mx; and that an excessive and unbalanced type I IFN response can harm host fitness (survival rate, length of survival) and contribute to immunopathology.
 
 Based on guidance from the in ovo studies, comprehensive in vivo studies were conducted on host-pathogen interactions in hens from the three lines (WT, IFNAR1 KO, or IFNLR1 KO). These studies revealed the early appearance of symptoms and poor survival of hens from the IFNR1 KO line challenged with H3N1 avian influenza A virus; efficient H#N1 virus replication in IFNAR1 KO hens, increased plasma concentrations of IFNα/β and mRNA expression of IFN-λ in spleens of the IFNAR1 KO hens; a pro-inflammatory role of IFN-λ in the oviduct of hens infected with H3N1 virus; increased proinflammatory cytokine expression in spleens of IFNAR1 KO hens, and Impairment of negative feedback mechanisms regulating IFN-α/β secretion in IFNAR1-KO hens and a significant decrease in this group's antiviral state; additionally it was demonstrated that IFN-α/β can compensate IFN-λ to induce an adequate antiviral state in the spleen during H3N1 infection, but IFN-λ cannot compensate for IFN-α/β signaling in the spleen.
 
 Strengths:
 
 (1) Both the methods and results from the comprehensive, well-designed, and well-executed experiments are considered excellent. The results are well and correctly described in the result narrative and well presented in both the manuscript and supplement Tables and Figures. Excellent discussion/interpretation of results.
 
 (2) The successful generation of the type I and type III IFN KO lines offers unprecedented insight and opens multiple new venues for exploring the IFN system in chickens. The new knowledge reported here is direct evidence of the high impact of this model system on effectively addressing a critical knowledge gap in avian immunology.
 
 (3) The thoughtful selection of highly relevant viruses to poultry and human health for the in ovo and in vivo challenge studies to examine and assess host-pathogen interactions in the IFNR KO and WT lines.
 
 (4) Making use of the unique opportunities in the chicken model to examine and evaluate the host's IFN system responses to various viral challenges in ovo, before conducting challenge studies in hens.
 
 (5) The new knowledge gained from the IFNAR1 and IFNLR1 KO lines will find much-needed application in developing more effective strategies to prevent health challenges like avian influenza and its devastating effects on poultry, humans, and other mammals.
 
 (6) The excellent cooperation and contributions of the co-authors and institutions.
 
 Weaknesses:
 
 No weaknesses were identified by this reviewer.
 
 We thank Reviewer #1 for the very positive and thoughtful evaluation of our manuscript. We appreciate the recognition of the effort involved in generating and characterizing the IFNAR1-/- and IFNLR1-/- chicken lines and for highlighting their significance as valuable tools for advancing avian immunology.
 
 We are grateful for the reviewer’s clear summary of our findings and for acknowledging the quality of the experimental design, data presentation, and interpretation. The encouraging feedback affirms the broader impact of our study and its contribution to understanding type I and type III interferon biology and antiviral defense mechanisms in chickens.
 
 We have carefully considered all reviewer comments and revised the manuscript accordingly to further clarify methodological details and improve the presentation of our results.
 
 Reviewer #1 (Recommendations for the authors):
 
 Minor suggestions/corrections:
 
 (1) Line 192, 193, 196 - the superscript "+" sign appears to be underlined.
 
 We corrected the formatting of all superscript "+" symbols (L 192-196).
 
 (2) L195: ...in the spleen "of both IIFNR KO lines" (or some clarification of what you are comparing).
 
 The sentence was revised to read “in the spleen of both IFNR knockout lines” for clarity (L 195).
 
 (3) L198: replace "highlighting" with "and".
 
 “Highlighting” was replaced with “and” as suggested (L 198).
 
 (4) L231 and 235: change "monocytes" to "macrophages" as this description appears to refer to spleen cells. Also, make this change in Figure 3b and in the Figure 3 caption (e.g. monocytes/macrophages).
 
 “Monocytes” was replaced with “macrophages” to accurately describe spleen cells. The same correction was made in Figure 3b and the Figure 3 caption as well as in the supplementary Figure 4 (L 229-234).
 
 (5) L257: indicate this significant difference in Figure 5b.
 
 The significant difference has now been clearly indicated in Figure 5b.
 
 (6) L420, 421: change "monocytes" to "macrophages" as this discussion appears to refer to the spleen.
 
 “Monocytes” was replaced with “macrophages” to reflect the correct cell type discussed in the spleen context (L 226-227).
 
 (7) L564-565: has the anti-human MX antibody been shown to cross-react with chicken Mx?
 
 We thank the reviewer for this valuable comment. Yes, the cross-reactivity of the anti-human MxA monoclonal antibody (clone M143, mouse IgGκ; Merck, Germany) with chicken Mx protein has been previously demonstrated. This antibody has been used successfully to detect chicken Mx in several published studies, including Schusser et al., Journal of Virology (2011). Accordingly, supporting references have been added to the revised manuscript (L584-586).
 
 (8) L608: how were PBMC and splenocytes (mononuclear spleen cells?) isolated -Line 647 on page 14 mentions their isolation using Histopaque-1077 density gradient centrifugation
 
 We thank the reviewer for this helpful comment. A detailed description of the isolation procedure for PBMCs and mononuclear spleen cells has now been added to the Materials and Methods section under the new subsection titled “Isolation of peripheral blood and splenic mononuclear cells” In this section, we specify that both PBMCs and splenic mononuclear cells were isolated using Histopaque®-1077 density gradient centrifugation as described on page (14), lines (668-676)
 
 Reviewer #2 (Public review):
 
 Summary:
 
 This study attempts to dissect the contributions of type I and type III IFNs to the antiviral response in chickens. The first part of the study characterises the generation of IFNAR and IFNLR KO chicken strains and describes basic differences. Four different viruses are then tested in chicken embryos, while the subsequent analysis of the antiviral response in vivo is performed with one influenza H3N1 strain.
 
 Strengths:
 
 Having these two KO chicken strains as a tool is a great achievement. The initial analysis is solid. Clear effect of IFNAR deficiency in in vivo infection, less so for IFNLR deficiency.
 
 Weaknesses:
 
 (1) The antibody induction by KLH immunisation: No data indicated whether or not this vaccination induces IFN responses in wt mice, so the effects observed may be due to steady-state differences or to differential effects of IFN induced during the vaccination phase. No pre-immune results are shown. The differences are relatively small and often found at only one plasma dilution - the whole of Figure 4 could be condensed into one or two panels by proper calculation of Ab titers - would these titres be significantly different? This, as all of the other in vivo experiments, has not been repeated, if I understand the methods section correctly.
 
 We thank the reviewer for the valuable comments and helpful suggestions.
 
 Regarding interferon induction by KLH immunisation, we agree that KLH is not known to strongly induce type I or type III interferon responses. Importantly, the goal of this experiment was not to quantify IFN induction per se, but to assess how the absence of IFN receptors affects adaptive antibody responses under standard immunisation conditions. KLH is a highly immunogenic, copper‑containing extracellular oxygen‑carrier protein derived from the marine gastropod Megathura crenulata and is widely used as a T cell–dependent model antigen to study B‑cell activation, antibody production, and class switching in vivo (Harris & Markl, Micron 1999, doi: 10.1016/s0968-4328(99)00036-0; Schusser et al., 2016, doi: 10.1002/eji.201546171). Because chickens are extremely unlikely to encounter KLH under natural conditions, KLH behaves as a neo‑antigen, and anti‑KLH antibodies can be considered to arise from de novo adaptive responses rather than pre‑existing antigen experience. Owing to its structural complexity and unusual glycosylation, KLH provides broad antigenic stimulation and engages adaptive immune mechanisms largely independently of pathogen‑specific innate pattern recognition, while still supporting robust T helper cell responses (Swaminathan et al., 2014, doi: 10.1111/bcp.12422; Geyer et al., 2004, doi: 10.1016/j.micron.2003.10.033). This makes KLH particularly suitable for dissecting intrinsic differences in adaptive immune responses between genotypes.
 
 We have now included pre-immune plasma controls (Figure 4 c, d), demonstrating that baseline antibody levels did not differ statistically between groups and were negligible prior to immunisation.
 
 As for the use of different plasma dilutions, this was necessary to ensure that all samples were measured within the linear detection range of our in-house ELISA. For example, after the primary immunisation, IgY concentrations were relatively low (e.g., day 5 post-immunisation), and plasma samples had to be diluted only 1:100 to detect measurable differences between groups. In contrast, after the booster immunisation, IgY concentrations increased substantially, and lower dilutions such as 1:100 led to signal saturation. Therefore, higher dilutions (up to 1:1600) were required to keep the values within the measurable range.
 
 Following the reviewer’s recommendation, we have now unified the presentation of results by showing data at a single representative dilution for each isotype: 1:100 for IgM (Figure 4C) and 1:1600 for IgY (Figure 4D). These dilutions fall within the linear part of the standard curve to distinguish between groups. We also calculated endpoint antibody titers, which confirmed that the observed differences remain statistically significant (p < 0.05).
 
 Regarding experimental replication, the study design already incorporated sufficient biological replication and longitudinal sampling to ensure robustness of the findings. Each experimental group consisted of ten animals, including three animals that served as negative controls. In addition, animals were sampled at multiple time points following immunisation, allowing the dynamics of the antibody response to be monitored over time. This longitudinal design provides repeated biological measurements within the same experimental cohort and allows confirmation of consistent response patterns across time points. All ELISA measurements were performed in technical triplicates. Together, the combination of adequate group size, appropriate controls, repeated sampling over time, and technical replication provides sufficient statistical power and internal validation of the observed effects. Furthermore, all animal experiments were conducted under strict approval of the Government of Upper Bavaria and in accordance with German animal welfare regulations, which limit unnecessary repetition of in vivo experiments beyond the approved experimental design.
 
 (2) The basic conundrum here and in later figures is never addressed by the authors: Situations where IFN type 1 and 3 signalling deficiency each have an independent effect (i.e., Figure 4d) suggest that they act by separate, unrelated mechanisms. However, all the literature about these IFN families suggests that they show almost identical signalling and gene induction downstream of their respective receptors. How can the same signalling, clearly active here downstream of the receptors for IFN type 1 or type 3, be non-redundant, i.e., why does the unaffected IFN family not stand in? This is a major difference from the mouse studies, which showed a rather subtle phenotype when only one of the two IFN systems was missing, but a massive reduction in virus control in double KO mice (the correct primary paper should be quoted here, not only the review by McNab). Reasons could be a direct effect of IFNab on B cells and an indirect effect of IFNL through non-B cells, timing issues, and many other scenarios can be envisaged. The authors do not address this question, which limits the depth of analysis. 
 
 We thank the reviewer for this insightful comment. Indeed, this represents one of the most interesting and novel findings of our study. Unlike in mice, where both type I and type III interferon systems need to be disrupted to observe clear susceptibility to influenza infection, in our chicken model the loss of IFNAR1 alone was sufficient to render the animals highly susceptible. This highlights a key difference between mammalian and avian interferon biology and supports the main goal of our work, to investigate the specific biological activities of avian interferons rather than directly transferring conclusions from mammalian systems.
 
 In relation to Figure 4d (anti-KLH IgY), we observed that both IFNAR1-/- and IFNLR1-/- animals reduced IgY levels compared to wild type at day 3 after the booster immunisation. However, by day 5 post-booster, IgY levels in IFNLR1-/- animals had returned to wild-type levels, while IFNAR1-/- animals still showed significantly lower IgY. This indicates that type III IFN contributes to the early phase of the IgY response but that its absence can later be compensated by type I IFN signalling. In contrast, loss of type I IFN cannot be compensated by type III IFN, suggesting that type I IFN plays a more dominant or sustained role in antibody induction.
 
 Although type I and type III IFNs share overlapping signaling pathways and induce similar sets of ISGs, their effects are not entirely redundant in chickens. A likely explanation is the difference in receptor distribution: IFNAR1 is broadly expressed across most cell types, while IFNLR1 expression is mainly confined to epithelial cells (Reuter et al. 2014, doi: 10.1128/jvi.02764-13; Santhakumar et al., 2017, doi: 10.3389/fimmu.2017.00049). This systemic versus localized receptor pattern likely determines the range of responsive cells and may account for the differential outcomes observed when either receptor is absent.
 
 Taken together, our findings indicate that while type I and type III IFNs share overlapping signaling mechanisms, they maintain distinct biological functions in chickens, consistent with their differing receptor expression and cellular responsiveness. This contrasts with mammalian models, where redundancy between these systems is more apparent and only double knockouts show strong phenotypes especially during influenza infection (Mordstein et al., 2008, doi: 10.1371/journal.ppat.1000151; Mordstein et al., 2010, doi: 10.1128/jvi.00272-10). We have now cited this primary study instead of the McNab review and expanded the Discussion to reflect this interpretation (Page 10, Line 463-467).
 
 (3) In the one in vivo experiment performed with chickens, only one virus was tested; more influenza strains should be included, as well as non-influenza viruses.
 
 We thank the reviewer for this valuable suggestion. The main objective of the present study was to generate and characterize novel chicken models lacking type I and type III interferon receptors in order to investigate their physiological relevance and to obtain the first insights into their roles during viral infection with more emphasis on avian influenza. As part of this manuscript, we performed detailed in ovo experiments using both influenza and non-influenza viruses (Figure 6). These included three influenza strains: H1N1, a mammalian-adapted strain; H3N1, a low pathogenic avian strain showing features of high pathogenicity; and H9N2, a low pathogenic avian strain, as well as a non-influenza virus, the infectious bronchitis virus (IBV). The in ovo analyses revealed clear strain-dependent modulation of interferon responses, and have provided a comprehensive overview of virus-specific interferon activity in chickens. The subsequent in vivo experiment was therefore designed as a proof of concept using the most suitable viral strain to robustly challenge the immune system and to identify the distinct functions of chicken interferons.
 
 (4) The basic conundrum of point 2 applies equally to Figure 6a; both KOs have a phenotype. Again in 6d, both IFNs appear to be separately required for Mx induction. An explanation is needed.
 
 We thank the reviewer for raising this important point. We have revised the Discussion (page 10, lines 442-454) and provided supporting references to clarify how the composition of the chorioallantoic membrane (CAM) and virus tropism together determine the apparent requirement for type I and type III interferons. The CAM contains both epithelial and mesodermal–vascular layers, which support complementary interferon functions: type I IFN acts mainly in systemic and vascular compartments, while type III IFN provides localized protection at the epithelial surface. Consequently, viruses that replicate in both compartments (e.g., WSN33, H3N1) require both IFN pathways for maximal Mx induction (Figures 6a, 6d), whereas viruses with a predominant or prolonged epithelial phase (e.g., H9N2, IBV) at the time point analyzed are effectively controlled by type I IFN signaling alone.
 
 These differences likely reflect virus-specific factors, including cell tropism, replication kinetics, and the spatial–temporal dynamics of receptor expression and signaling. Notably, our measurement of Mx expression at 24 hours post infection (hpi) may represent a phase when type I IFN signaling is dominant and can compensate for the absence of type III IFN. It remains possible that IFN-λ plays a more critical, non-redundant role at earlier stages post infection, when rapid antiviral protection is first required at the epithelial surface. Thus, the apparent redundancy observed at 24 hpi likely reflects temporal compensation and crosstalk between the IFN pathways rather than a lack of biological relevance for type III IFN.
 
 (5) Line 308, where are the viral titers you refer to in the text? The statement that the results demonstrate that excessive IFNab has a negative impact is overstretched, as no IFN measurements of the infected embryos are shown here.
 
 We thank the reviewer for this comment and would like to clarify that measurements of type I IFN (IFN-α/β) concentrations were indeed performed. The data are presented in Figure 6b and cited in the Results section (“Knockout of IFNAR1 and IFNLR1 did not affect IFN-α/β secretion in ovo”). To avoid misunderstanding, the Results section has been revised to explicitly reference the IFN-α/β measurements supporting this conclusion (line 302-309).
 
 These data indicate that all genotypes produced comparable IFN-α/β levels upon viral infection, with the IBV infection inducing approximately tenfold higher IFN-α/β secretion than the influenza strains tested (Figure 6b). The interpretation that an excessive type I IFN response can negatively affect host fitness is based on the combination of quantified IFN-α/β data (Figure 6b) and survival probability results (Supplementary Figure 10), where embryos exhibiting the highest IFN-α/β levels (embryos of all genotypes infected with IBV and embryos infected with IFNLR1-/- H9N2) showed the poorest survival despite moderate or low viral titers.
 
 (6) The in vivo infection is the most interesting experiment, and the key outcome here is that IFN type 1 is crucial for anti-H3N1 protection in chickens, while type 3 is less impactful. However, this experiment suffers from the different time points when chickens were culled, so many parameters are impossible to compare (e.g., weight loss, histopathology, IFN measurements, and more). Many of these phenomena are highly dynamic in acute virus infections, so disparate time points do not allow a meaningful comparison between different genotypes. What are the stats in 7b? Is the median rather than the mean indicated by the line? Otherwise, the lines appear in surprising places. SD must be shown, and I find it difficult to believe that there is a significant difference in weight, for e.g., IFNAR KO, unless maybe with a paired t test. What is the statistical test?
 
 We thank the reviewer for these thoughtful comments and agree that disease progression and sampling time can influence comparisons in acute infection studies. Hens were euthanized upon reaching predefined humane endpoint scores in full compliance with the Bavarian animal welfare regulations. Because the infection produced markedly different clinical kinetics among genotypes, all data were interpreted with reference to matched disease stages rather than absolute days post-infection.
 
 For matched comparisons: Viral titers in the trachea and cloaca, as well as plasma IFN-α/β concentrations, were compared between day 2 in IFNAR1-/- hens and day 3 in WT and IFNLR1-/- hens, which represent equivalent clinical stages before the sharp viral rise seen later in WT and IFNLR1-/- birds. At these comparable stages, viral titers were still low and IFN-α/β concentrations remained significantly lower in WT and IFNLR1-/- than in IFNAR1-/- hens (Figure 7c, d, f), indicating that uncontrolled viral replication and IFN-α/β secretion in the absence of type I signaling occur earlier and more intensely.
 
 For Figure 7b: Because chickens reached humane endpoints at different days post infection (2 dpi for IFNAR1-/- and 5–7 dpi for WT and IFNLR1-/-), statistical comparisons were performed within each genotype using paired t-tests and all data were shown together as mean ± SD.
 
 We acknowledge that unequal survival times limit direct temporal comparison. However, the consistent pattern across all parameters including early severe disease, high viral load, and excessive IFN-α/β secretion in IFNAR1-/- hens versus delayed onset in WT and IFNLR1-/-, supports the conclusion that type I IFN signaling is essential for early viral restriction and host survival, while type III IFN contributes mainly to localized inflammatory responses. The experiment cannot be repeated under the current animal welfare authorization.
 
 (7) Figures 7e,f: these comparisons are very difficult to interpret as the virus loads at these time points already differ significantly, so any difference could be secondary to virus load differences.
 
 We thank the reviewer for this valuable comment. We agree that viral load can influence interferon induction; however, our comparisons in Figures 7e and 7f were designed to reflect equivalent stages of disease progression rather than identical time points post-infection. For IFN-λ mRNA expression (Fig. 7e), spleens from IFNAR1-/- hens were sampled on day 2 post-infection, when viral titers were maximal, and compared to WT and IFNLR1-/- hens sampled on day 5 post-infection, at which point viral titers reached comparable levels. Thus, this comparison represents the phase of peak infection and systemic immune activation across all genotypes rather than an absolute temporal comparison.
 
 Similarly, for IFN-α/β concentrations (Fig. 7f), two levels of comparison were made: between IFNAR1-/- hens at day 2 post-infection (high viral titer) and WT and IFNLR1-/- hens at day 3 (low viral titer), and between WT and IFNLR1-/- hens at day 5 post-infection (high viral titer). In both cases, IFN-α/β levels remained disproportionately elevated in IFNAR1-/- hens, indicating that the excessive type I IFN response is primarily due to the loss of receptor-mediated feedback regulation rather than viral load alone.
 
 We have clarified this rationale in the legend of figure 7 and in the results (Line 338-345). We believe these results are valuable as they provide important insight into the temporal dynamics and regulatory interplay between type I and type III interferons during avian influenza infection.
 
 Reviewer #2 (Recommendations for the authors):
 
 Experiments need to be repeated. Comparisons in infection experiments must be done on the same day. More viruses need to be tested.
 
 We thank the reviewer for these constructive recommendations. All infection experiments were conducted under approved animal welfare regulations, which limited the number of replicates and prevented repeating in vivo challenges beyond the authorized design, in line with the 3R principles, particularly Reduction, to avoid unnecessary animal use. To ensure comparability, samples were analyzed at matched disease stages rather than identical time points, as clarified in the revised figure legends (figure 7) and Results (Line 338-345). The study already includes multiple influenza and non-influenza viruses (H1N1, H3N1, H9N2, and IBV) tested in ovo to capture virus-specific interferon responses, while the in vivo H3N1 infection served as a proof-of-concept to dissect genotype-specific immune dynamics.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.01.24.634794v3
www.biorxiv.org www.biorxiv.org

Constraints on the G1/S transition pathway may favor selection of multicellularity as a passenger phenotype

3
1. Public_Reviews 29 May 2026
  
  in eLife
  
  eLife Assessment
  
  This important study implicates that changes in cell regulation may contribute to the evolution of multicellularity. The evidence supporting the conclusions is convincing, with rigorous methods used to test alternative hypotheses. The work will be of broad interest to cell and evolutionary biologists and those studying the cell cycle and cancer.
  
  Summary
2. Public_Reviews 29 May 2026
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Ducrocq et al. present research exploring the genetic link between simple multicellular group formation (ace2Δ/ace2Δ) and its interaction with cell-cycle progression mutants (e.g., cln3Δ/cln3Δ), demonstrating that this combination can provide fitness benefits during fluctuating resource conditions, resulting in a rapid increase in the fraction of multicellular cell-cycle mutants over unicellular yeast without selection for multicellular size. Because both the multicellular phenotype and the regulatory link enabling faster escape from the stationary phase are controlled by the ACE2 transcription factor, this work demonstrates that multicellular cluster formation can arise as a side effect of a completely independent fitness advantage unrelated to the benefits of group formation itself. As a "passenger phenotype," multicellularity could thus emerge for other selective reasons, potentially facilitating a later transition to more entrenched multicellularity if novel conditions arise that make multicellular group formation directly beneficial.
  
  Importantly, while the literature generally assumes that multicellular group formation incurs a cell-level fitness cost, this work demonstrates that certain genetic - environmental interactions can confer fitness benefits even at the level of individual cells forming multicellular groups. This finding should inspire both theoretical and empirical work exploring multicellular group formation selected for benefits at the level of individual cells, rather than the benefits of forming a larger organismal size that most work has relied on so far.
  
  Strengths:
  
  This work is novel and exciting for research exploring the very first steps of the transition from unicellularity to simple multicellularity. The formation of multicellular groups is almost always assumed to come at a cell-level fitness cost due to reduced reproductive fitness compared to remaining unicellular, which generally needs to be outweighed by the benefits of multicellular group formation (e.g., large size to escape predation) for the multicellular phenotype to be stable. However, this study presents an interesting case of a genetic and environmental condition under which individual cells forming simple multicellular clusters can actually have higher reproductive fitness than solitary living yeast cells. This contrasts with previous snowflake yeast studies where the multicellular phenotype was primarily beneficial due to strong selection for large groups (rather than cell-level fitness gains).
  
  The claims and interpretation of the results align well with the data presented. This is due to the careful and straightforward experimental design testing predictions with a clear, stepwise methodology. The authors rule out alternative explanations and provide support for the proposed link between the mutations (ace2, cln3, and others), their impact on faster exit from quiescence and earlier entry into reproduction in fresh media, and the resulting higher fitness in the snowflake yeast phenotype compared to unicellular yeast.
  
  This experimental framework (combining cell-cycle mutants under the same multicellular background) is very much likely to be adopted by others in the community to explore downstream implications of these results in laboratory and environmental yeast isolates.
  
  Weaknesses:
  
  The authors show that the same multicellular phenotype with higher cell-level fitness due to faster exit from the stationary phase can also be observed with alleles found at other loci in non-laboratory yeast strains, implying that the results are likely not specific to a peculiar case genetically engineered in laboratory strains, but that similar phenotypes may be present in nature. However, this remains to be explored by examining the natural ecology of commercially available or wild yeast isolates and their genomes. This is not a weakness of this study per se, but rather a direction for future work. It does mean, however, that the relevance of these findings for early multicellularity in yeast, and even more so for nascent multicellularity in distinct taxa, remains to be explored in the future. Until then, it is difficult to make strong claims about how applicable these results would be for non-laboratory yeast and other taxa. Regardless, this work represents a very exciting finding.
  
  Comments on revised version:
  
  The authors addressed all concerns thoroughly.
  
  Review 1
3. Public_Reviews 29 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Ducrocq et al. present research exploring the genetic link between simple multicellular group formation (ace2Δ/ace2Δ) and its interaction with cell-cycle progression mutants (e.g., cln3Δ/cln3Δ), demonstrating that this combination can provide fitness benefits during fluctuating resource conditions, resulting in a rapid increase in the fraction of multicellular cell-cycle mutants over unicellular yeast without selection for multicellular size. Because both the multicellular phenotype and the regulatory link enabling faster escape from the stationary phase are controlled by the Ace2 transcription factor, this work demonstrates that multicellularity can arise as a side-effect of a completely independent fitness advantage unrelated to the benefits of group formation itself. As a "passenger phenotype," multicellularity could thus emerge for other selective reasons, potentially facilitating a later transition to more entrenched multicellularity if novel conditions arise where group formation becomes directly beneficial.
  
  Strengths:
  
  This work is novel and exciting for research exploring the very first steps of the transition from unicellularity to simple multicellularity. This is particularly significant because the formation of multicellular groups is almost always assumed to come at a cell-level fitness cost due to reduced reproductive fitness compared to remaining unicellular. This cell-level fitness cost generally needs to be outweighed by the benefits of multicellular group formation (e.g., large size escaping predation) for the multicellular phenotype to be stable, which is true for a large number of cases studied in the literature, where the multicellular phenotype can only evolve over unicellular competitors under strong selection for multicellular groups. However, this study presents an interesting case of a genetic and environmental condition under which individual cells (forming simple multicellular clusters) can actually have higher reproductive fitness than unicellular yeast. This demonstrates that the assumed cost at the single-cell level does not always apply. In summary, this work represents a unique example contrary to common assumptions regarding the costs of multicellular phenotypes, showing that simple multicellular phenotypes can evolve and remain stable without requiring strong selection for multicellular size or other benefits of group formation.
  
  The claims and interpretation of the results align well with the data presented. This is due to the careful and straightforward experimental design testing predictions with a clear, stepwise methodology, ruling out alternative explanations and providing support for the proposed link between the mutations (ace2, cln3, and others), their impact on faster exit from quiescence, and thus earlier entry into reproduction in fresh media, resulting in higher fitness in the snowflake yeast phenotype compared to unicellular yeast.
  
  Weaknesses:
  
  The authors show that the same multicellular phenotype with higher cell-level fitness due to faster exit from the stationary phase can also be observed with alleles found at other loci in non-laboratory yeast strains, implying that the results are likely not specific to a peculiar case genetically engineered in laboratory strains, but that similar phenotypes may be present in nature. However, this remains to be explored further by examining the natural ecology of commercially available or wild yeast isolates and their genomes. This is by no means a weakness of this study and, therefore, not necessarily something the current work can improve. It does mean, however, that the relevance of these findings for early multicellularity in yeast, and even more so for nascent multicellularity in distinct taxa, remains to be explored in the future. Until then, it is difficult to make strong claims about how applicable these results would be for non-laboratory yeast and other taxa. Regardless, this work does its part by representing a very exciting finding.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Here, the authors attempt to demonstrate that a simple model of multicellularity - snowflake yeast - exhibits key ecologically relevant changes in the regulation of the cell cycle. By examining the effects of the ace2 mutation in environments where multicellularity is not directly selected for or against, and combining it with mutations in key cell cycle regulators, they hope to show that mutations driving simple multicellularity can be selectively favored due to their effects on the release from quiescence rather than their effects on multicellularity itself.
  
  Strengths:
  
  The experiments performed are extensive and thorough. The yeast genotypes examined are judiciously chosen, so as to map out a functional model of the relationship between alterations to cell cycle control and changes to multicellularity phenotypes. Multiple possible interactions are examined, with the causal link and model of the relationship between the multicellular passenger phenotype and the selectable quiescence-release phenotype being well-supported. There are extensive controls demonstrating the separation between the 'passenger' multicellular phenotype and the cell cycle regulation phenotypes examined, including haploid/diploid strains with different multicellular phenotypes but similar cell cycle regulation phenotypes, and phenocopy strains in which downstream enzymes are deleted rather than key central regulators.
  
  Weaknesses:
  
  My only concerns about these results relate to the focus on selection on cell cycle control being examined in a model of multicellularity with key core cell cycle mutations rather than in a wild-type background, as this is a somewhat artificial system.
  
  I believe, however, that the authors convincingly make their case that this work on the multicellular phenotypes of yeast represents a potent proof-of-concept that simple multicellularity can be driven into existence or selected for as a passenger phenotype due to pleiotropic effects of mutations under selection from real-world ecological pressures. They are able to connect this phenotype back to known mutations of particular cell cycle regulators (RB) in other multicellular lineages and demonstrate that ecologically relevant changes to the cell cycle are connected to multicellular phenotypes. As a proof of concept of the connection between these phenotypes, rather than a study of a particular event in the past of a living lineage, it makes a strong case.
  
  A longstanding question in the field of multicellularity is the selective pressures that can drive simple multicellularity into existence and then act on simple multicells to drive their increased size and complexity. This work brings to the table tangible evidence of the possibility that, instead of being selected for on its own, simple multicellularity can be a side-effect of selection on other key phenotypes.
  
  This separates the question of the origins of multicellularity and the forces that drive its further evolution. This separation can reframe how the field is studied, especially in the context of the apparent dichotomy between dozens of origins of 'simple' multicellularity across the tree of life and a few origins of 'complex' multicellularity in the history of Earth. Especially in light of other evidence that multicellularity is connected to changes in cell cycle regulation, I believe that this is an important insight that will alter the way we think about the origins of this key evolutionary transition.
  
  We thank the reviewers for their insightful comments on our work.
  
  We agree with reviewer #1 that further experiments would be needed to figure out how the observations done on lab strains can apply to yeast in various ecological conditions and particularly in the wild. We here provide a proof of principle that multicellularity selection can arise as a side-effect. It obviously does not prove that it took place during yeast evolution, but we would like to emphasize that resource fluctuations are very common in ecological conditions, making it highly likely that the environmental conditions necessary for the selection of the side effects described have arisen.
  
  We agree with reviewer #2 that our work on yeast strains is “somewhat artificial” as often the case with model organisms under laboratory conditions. Importantly though, we showed that the effect found with the cln3 knock-out mutation can be phenocopied by overexpression of WHI5 (encoding the yeast equivalent of Rb). We propose that variations in the levels of cell cycle regulators during evolution may have played a role in multicellularity selection as a side effect. We agree that this is merely a hypothesis to explain the selection of multicellularity (just like predator escape) and that there is no direct evidence that this occurred in the history of the lineage. Nevertheless, our work provides a first evidence that such a selection of multicellularity as a side effect could be possible, and gives a framework to understand how multicellularity can persist in the wild, even when it is not the primary target of selection.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  As mentioned in my public review, I very much appreciate this work, its interpretation for early multicellularity as an example opposite to the assumed cost of multicellular phenotypes, and the robust design behind the premise and claims. Therefore, my suggestions below are mostly aimed at improving the readability and data presentation.
  
  (1) In the abstract, Lines 24-27 (the last sentence): This statement is worded too generally and therefore reads as too strong. I think the authors' work provides an example that multicellularity itself does not need to be beneficial all the time - this is really exciting and makes sense! However, there is a substantial body of work showing the origin and maintenance of multicellularity for its direct benefits. Relative to that body of work, this represents a special case, and therefore, while we should definitely reconsider the view that "multicellularity always comes at a cell-level fitness cost," we cannot overgeneralize these findings. Please consider reframing this statement.
  
  Done, now line 25 (addition of “in some cases”)
  
  (2) Line 48 (Introduction): "This mostly concerns two major regulators, RB and Cyclin D." Which organisms are you referring to? Please specify.
  
  Done.
  
  (3) In the Introduction, there are at least three sentences that need citations: L57-58, L59-60, and L65. For instance, I do not know what makes CLN3 the yeast functional equivalent of RB, and I wanted to verify this claim, but no references are cited. Please ensure citations are provided throughout the manuscript.
  
  Done: ref 11,12 and 13 were added
  
  (4) This is my main request regarding data collection and presentation. The authors share some microscopy images of mutant strains in Figure 2 for different purposes (e.g., Figure 2B compares the fraction of budded cells between two genotypes). However, I would appreciate seeing a collected microscopy figure showcasing the phenotypes of all genotypes that went into competition experiments, including the planktonic (WT lab strain) yeast, either where they appear or in a supplementary figure, all presented with the same magnification and scale to make them comparable. Because cell size, shape, and multicellular phenotype are all key aspects of the competition experiments, being able to see all those genotypes/phenotypes would prepare the reader to make predictions about the fitness assays and other experiments.
  
  Done Supplementary Figure 1 B-E were added
  
  (5) Related to my previous point, I would appreciate seeing cell size measurements for the different genotypes (both single cells of planktonic genotypes and single cells forming multicellular clusters). Cell size is a key trait that directly impacts the results shown in the paper, and summary statistics comparing them would be helpful for interpreting the results.
  
  Done Supplementary Figure 1 F was added
  
  (6) In competition experiments, the authors mix unicellular and multicellular yeast clusters at 50/50 and measure the fraction of a phenotype of interest (usually the % of snowflake). It took me a while to understand what is being counted under the "% snowflake yeast" category. This is because, while each cell in unicellular yeast should be counted as one unit, one can count a snowflake yeast composed of 50 cells as 50 units or as 1 unit. Please clearly state what is being counted for the Y-axis labeled "% of snowflake yeast" (or relabel those Y-axes in plots to make this clear).
  
  Done: Added in figure legend 1A and Y-axes of competition figures
  
  (7) I recommend editing the genotype labels in figures (see, for instance, Figure 1B, C, D). In Figure 1B, the bars are labeled as "CLN3/CLN3 co-culture" or "cln3Δ/cln3Δ co-culture," etc. These are actually co-cultures of SF vs. PK (with or without a CLN3 copy). Please consider using more representative labels that will be easier for readers to understand.
  
  Done: this has been changed in all concerned figures
  
  (8) In the Results, L225, you begin referring to AMN1368D as AMN1. I suggest using the full allelic form throughout the text so it will be clear each time that you are referring to that specific allele, as I was confused about whether you were discussing the allele or the gene AMN1 itself.
  
  This has been changed throughout the text.
  
  (9) Discussion, Lines 250-252, states that this is a "situation that is likely to happen very often under ecological conditions." Are there any examples you can cite?
  
  Done, as also requested by reviewer #2 (now line 256-7)
  
  (10) Lines 272-275 contain a strong, general statement suggesting that co-evolution of cell cycle regulation and multicellularity could be more general (which is acceptable as speculation). However, the suggestion that this co-evolution could have "started very early in the evolution of eukaryotic cells" is too speculative. I would recommend sticking with the alternative, suggesting that the link between the two phenotypes may be a case of convergent evolution.
  
  Done
  
  (11) Lines 278-279 are both vague and too bold. The text mentions a link between cancer and multicellularity and then extends this link through cell cycle regulators. Without explaining the connection between cancer and multicellularity and then trying to link it to cell cycle regulators, all in a few words without background, this sentence is too vague. Please consider deleting this or spending more time clearly explaining the link, which would at best still be speculative.
  
  These speculative sentences were removed.
  
  (12) First, I wanted to note that I highlighted Lines 284-287, as this passage is clearly written and provides a nice argument. I also wonder if you could mention that your work shows simple multicellular cluster formation should not always come at a cost, contrary to the general assumption in the literature, and add a few citations to support that claim. This would highlight how significant this work is within the broader multicellularity literature.
  
  Changed in discussion (now line 242-4 with additional references 30 and 31)
  
  (13) I recommend labeling the genotype of your "quintuple mutant" in Figure 3. You can refer to it as the quintuple mutant in the text, but I had to go back and forth to see what those mutations were when trying to think about potential genetic interactions. Even the legend of Figure 3 does not specify the genotype and refers to it only as the "quintuple mutant."
  
  Now explicitly stated in the title of the figure
  
  Reviewer #2 (Recommendations for the authors):
  
  I find the presented research to be of high quality, with very important implications. I have suggestions for improvement of the manuscript, but they are largely stylistic, with one paper that I believe deserves citation regarding the proteins involved. I see little need for additional experiments or analysis, just a clearer description of the results and their significance.
  
  (1) Line 62: Yeast CLN3 definitely performs the same role as cyclin D in the cell cycle, but has an unclear phylogenetic relationship with the rest of the cyclins. See Cross, Buchler, & Skotheim 2011 ("Evolution of networks and sequences in eukaryotic cell cycle control"). This reference also covers the functional relationship between RB and Whi5, referred to in nearby sentences, as does Medina, Walsh, and Buchler 2019 ("Evolutionary innovation, fungal cell biology, and the lateral gene transfer of a viral KilA-N domain").
  
  The reference has been added
  
  (2) Line 69: Is the question whether the evolution of G1/S regulation favoring multicellularity the question, or the two of them being connected such that the evolution of one can affect the other?
  
  It is clearly the first of the two questions.
  
  (3) Line 73: Comma after Ace2.
  
  Done
  
  (4) Line 76: It would be clearer to specify that snowflake and ACE2 yeast were co-cultured without settling selection or other selection that explicitly favors multicellularity, unlike in experiments where multicellular evolution is observed, as in Ratcliff publications.
  
  This is now specified.
  
  (5) Line 80: Specify which phenotypes observed for ace2 mutants are observed, specifically, both the multicellularity and the release from quiescence.
  
  Done
  
  (6) Line 146: This observation should be noted as another indication that the multicellular phenotype is not behind the selective pressure, because it is so different between unicells and multicells.
  
  Overall, you have very strong evidence that this is the case, and emphasizing this would benefit the paper!
  
  Done.
  
  (7) Line 151: specify that you are maintaining yeast in proliferation in coculture.
  
  Done.
  
  (8) Line 181: This is another key experiment showing that the multicellular phenotype is not the causal reason for the change in quiescence. It might make things clearer to bring all these confirmatory experiments together, particularly the haploids and the sonicated single cells.
  
  This is now clearly stated line 195.
  
  (9) Line 225: The choice of referring to the non-laboratory strain as the 'AMN1' wild type default may be confusing to readers, who may treat the genetic background you are using as the ground truth wild type. I recommend throughout the paper always specifying the allele's amino acid to avoid any confusion.
  
  The genotype is now clearly presented throughout the text.
  
  (10) Line 238: I would continue to specify that the multicellular phenotype has no selective advantage, specifically when no selection for size is applied.
  
  See added sentence Line 242-4 (revised version)
  
  (11) Line 243: I would say that the evolution of cell cycle regulation may interact with the multicellular phenotype.
  
  This was changed (now line 248)
  
  (12) Line 244: Strike 'indeed' and the 'the' before AMN1 and ACE2.
  
  Done
  
  (13) Line 252: Suggest some ecological conditions under which quiescence exit is likely, such as boom and bust or moving from rotting fruit to rotting fruit.
  
  Done
  
  (14) Line 267: Are you suggesting that the specific genes AMN1 and ACE2 had particular effects on actual organisms in the past, or that it represents a broad pattern of evolution in which multicellularity could be more broadly related to exit from quiescence? I believe it is the latter, and I think that should be clearer.
  
  Modified as suggested
  
  (15) Line 280: In this paragraph, I think that the point being made could be slightly clearer - if I am not mistaken, you are making the distinction between the appearance of multicellularity and its refinement under selection, and that the former may be more common than previously believed, given this proof of concept. I think this can be made clearer. Furthermore, it is worth noting that all experiments that show effects of the multicellular phenotype are in mutant backgrounds, and explaining why this is still relevant to wild organisms. It might be taken by some as indicating that the multicellular phenotypes are not relevant to a wild population, but the connection to known RB mutations in known multicellular lineages and the fact that it is connected to a very key aspect of cell cycle regulation, I think, overcomes this issue, and this should be made clear.
  
  Our study reveals a genetic link between multicellularity and Whi5 and Cln3, two important G1/S cell cycle regulators. Similar genetic interactions have been observed in phylogenetically distant species, reinforcing the idea that the interplay between cell cycle regulation and multicellularity is a general feature and not a mere artifact of mutant background.
  
  The neutral fitness effect of multicellularity in wild-type backgrounds is particularly of interest. By being maintained as a side effect of selection on fundamental cellular processes, the neutral effect of multicellularity may have provided “an evolutionary scheme” for its repeated emergence throughout the tree of life. As such, the "passenger selection" hypothesis fits well with the observations of phenotypic reversibility and facultative multicellularity, despite varying and specific selective pressures. Our work thus gives a framework to understand how multicellularity can persist in the wild, even when it is not the primary target of selection.
  
  (16) Line 314: What promoters are they driven by?
  
  Specified
  
  (17) Line 336: What was the culture volume, and the volume transferred?
  
  Specified
  
  (18) Line 362: How was the proportion of blue-stained cells scored? Manually, or with an imaging software cutoff?
  
  Specified
  
  (19) Figure 1: I think that the full genotypes of each strain should be specified, either in the legend or the key of the figure, rather than always specifying the ACE2 genotype and other mutations separately.
  
  Done as requested by reviewer #1
  
  (20) Figure 2E, 2F: Same as Figure 1, regarding genotypes.
  
  Done
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.08.20.671217v3
www.biorxiv.org www.biorxiv.org

Paternal over- and under-nutrition program fetal and placental development in a sex-specific manner in mice

4
1. Public_Reviews 29 May 2026
 
 in eLife
 
 eLife Assessment
 
 This important study demonstrates that paternal diet influences not only testicular morphology but also placental and fetal development, supporting a role for paternal contributions to offspring health. The study also considers potential links between the microbiome and male reproductive health. By combining transcriptomic and histological analyses across multiple tissues, the evidence supporting the central conclusions of the study is convincing.
 
 Summary
2. Public_Reviews 29 May 2026
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 Morgan et al. studied how paternal dietary alteration influenced testicular phenotype, placental and fetal growth using a mouse model of paternal low protein diet (LPD) or Western Diet (WD) feeding, with or without supplementation of methyl-donors and carriers (MD). They found diet- and sex-specific effects of paternal diet alteration. All experimental diets decreased paternal body weight and the number of spermatogonial stem cells, while fertility was unaffected. WD males (irrespective of MD) showed signs of adiposity and metabolic dysfunction, abnormal seminiferous tubules and dysregulation of testicular genes related to chromatin homeostasis. Conversely, LPD induced abnormalities in the early placental cone, fetal growth restriction and placental insufficiency, which was partly ameliorated by MD. The paternal diets changed placental transcriptome in a sex-specific manner and led to a loss of sexual dimorphism in the placental transcriptome. These data provide a novel insight on how paternal health can affect the outcome of pregnancies, which is often overlooked in prenatal care.
 
 Strengths:
 
 The authors have performed a well-designed study using commonly used mouse models of paternal underfeeding (low protein) and overfeeding (Western diet). They performed comprehensive phenotyping at multiple timepoints including of the fathers, the early placenta and late gestation feto-placental unit. The inclusion of both testicular and placental morphological and transcriptomic analysis is a powerful non-biased tool for such exploratory observational studies. The authors describe changes in testicular gene expression revolving around histone (methylation) pathways that are linked to altered offspring development (H3.3 and H3K4), which is in line with hypothesised paternal contributions to offspring health. The authors report sex differences in control placentas that mimic those in humans, providing potential for translatability of the findings. The exploration of sexual dimorphism (often overlooked) and its absence in response to dietary modification is novel and contributes to the evidence-base for the inclusion of both sexes in developmental studies.
 
 Comments on revised version:
 
 The authors have done a great job addressing my concerns. The description of the data analysis and the figures are now much clearer. The inclusion of the potential links between the microbiome and male reproductive fitness is informative and improves the flow of the discussion.
 
 Review 1
3. Public_Reviews 29 May 2026
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The authors investigated the effects of a low-protein diet (LPD) and a high sugar- and fat-rich diet (Western diet, WD) on paternal metabolic and reproductive parameters and feto-placental development and gene expression. They did not observe significant effects on fertility; however, they reported gut microbiota dysbiosis, alterations in testicular morphology, and severe detrimental effects on spermatogenesis. In addition, they examined whether the adverse effects of these diets could be prevented by supplementation with methyl donors. Although LPD and WD showed limited negative effects on paternal reproductive health (with no impairment of reproductive success), the consequences on fetal and placental development were evident and, as reported in many previous studies, were sex-dependent.
 
 Strengths:
 
 This study is of high quality and addresses a research question of great global relevance, particularly in light of the growing concern regarding the exponential increase in metabolic disorders, such as obesity and diabetes, worldwide. The work highlights the importance of a balanced paternal diet in regulating the expression of metabolic genes in the offspring at both fetal and placental levels. The identification of genes involved in metabolic pathways that may influence offspring health after birth is highly valuable, strengthening the manuscript and emphasizing the need to further investigate long-term outcomes in adult offspring.
 
 The histological analyses performed on paternal testes clearly demonstrate diet-induced damage. Moreover, although placental morphometric analyses and detailed histological assessments of the different placental zones did not reveal significant differences between groups, their inclusion is important. These results indicate that even in the absence of overt placental phenotypic changes, placental function may still be altered, with potential consequences for fetal programming.
 
 Comments on revised version:
 
 The authors have adequately addressed all my previous comments.
 
 Review 2
4. Public_Reviews 29 May 2026
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 Public Reviews:
 
 Reviewer #1 (Public review):
 
 Summary:
 
 Morgan et al. studied how paternal dietary alteration influenced testicular phenotype, placental and fetal growth using a mouse model of paternal low protein diet (LPD) or Western Diet (WD) feeding, with or without supplementation of methyl-donors and carriers (MD). They found diet- and sex-specific effects of paternal diet alteration. All experimental diets decreased paternal body weight and the number of spermatogonial stem cells, while fertility was unaffected. WD males (irrespective of MD) showed signs of adiposity and metabolic dysfunction, abnormal seminiferous tubules, and dysregulation of testicular genes related to chromatin homeostasis. Conversely, LPD induced abnormalities in the early placental cone, fetal growth restriction, and placental insufficiency, which were partly ameliorated by MD. The paternal diets changed the placental transcriptome in a sex-specific manner and led to a loss of sexual dimorphism in the placental transcriptome. These data provide a novel insight into how paternal health can affect the outcome of pregnancies, which is often overlooked in prenatal care.
 
 Strengths:
 
 The authors have performed a well-designed study using commonly used mouse models of paternal underfeeding (low protein) and overfeeding (Western diet). They performed comprehensive phenotyping at multiple timepoints, including the fathers, the early placenta, and the late gestation feto-placental unit. The inclusion of both testicular and placental morphological and transcriptomic analysis is a powerful, non-biased tool for such exploratory observational studies. The authors describe changes in testicular gene expression revolving around histone (methylation) pathways that are linked to altered offspring development (H3.3 and H3K4), which is in line with hypothesised paternal contributions to offspring health. The authors report sex differences in control placentas that mimic those in humans, providing potential for translatability of the findings. The exploration of sexual dimorphism (often overlooked) and its absence in response to dietary modification is novel and contributes to the evidence-base for the inclusion of both sexes in developmental studies.
 
 Weaknesses:
 
 The data are overall consistent with the conclusions of the authors. The paternal and pregnancy data are discussed separately, instead of linking the paternal phenotype to offspring outcomes. Some clarifications regarding the methods and the model would improve the interpretation of the findings.
 
 (1) The authors insufficiently discuss their rationale for studying methyl-donors and carriers as micronutrient supplementation in their mouse model. The impact of the findings would be better disseminated if their role were explained in more detail.
 
 We acknowledge the Reviewer’s comments regarding the amount of detail in support of the inclusion of methyl carriers and donors within our diet. Therefore, we will revise the manuscript to include more justification, especially within the Introduction section, for their inclusion. Please see lines 111-120.
 
 (2) It is unclear from the methods exactly how long the male mice were kept on their respective diets at the time of mating and culling. Male mice were kept on the diet between 8 and 24 weeks before mating, which is a large window in which the males undergo a considerable change in body weight (Figure 1A). If males were mated at 8 weeks but phenotyped at 24 weeks, or if there were differences between groups, this complicates the interpretation of the findings and the extrapolation of the paternal phenotype to changes seen in the fetoplacental unit. The same applies to paternal age, which is an important known factor affecting male fertility and offspring outcomes.
 
 We thank the Reviewer for their comments regarding the ages of the males analysed. As we had 5 treatment groups, and intended to generate a minimum of 8 litters of offspring per treatment group, this resulted in over 40 litters in total. In order to dissect these litters appropriately, and in a timely fashion, we had to stagger their generation over time. As such, this resulted in utilising our males at different ages/durations on the diet. However, in all our statistical analysis, we factored in the duration of time on the diet, which also acted as a proxy measure of paternal age. We also ensured that we staggered the generation of litters in each diet group so that any age effects were experienced across all paternal regimens.
 
 We have revised the manuscript to acknowledge this fact and to highlight that the duration of time on any diet was factored into the statistical analysis.
 
 (3) The male mice exhibited lower body weights when fed experimental diets compared to the control diet, even when placed on the hypercaloric Western Diet. As paternal body weight is an important contributor to offspring health, this is an important confounder that needs to be addressed. This may also have translational implications; in humans, consumption of a Western-style diet is often associated with weight gain. The cause of the weight discrepancy is also unaddressed. It is mentioned that the isocaloric LPD was fed ad libitum, while it is unclear whether the WD was also fed ad libitum, or whether males under- or over-ate on each experimental diet.
 
 We agree with the Reviewer that the general trend towards a lighter body weight for our experimental animals is unexpected. We can confirm that all diets were fed ad libitum. However, as males were group housed, we were unable to measure food consumption for individual males. We also observed that for males fed the high fat diets, they often shredded significant quantities of their diet, rather than eating it, so preventing accurate measurement of food intake.
 
 We also agree with the Reviewer that body weight can be a significant confounder for many paternal and offspring parameters. However, while the experimental males did become lighter, there were no statistical differences between groups in mean body weight. As such, body weight was not included as a variable within our statistical analysis.
 
 (4) The description and presentation of certain statistical analyses could be improved.
 
 (i) It is unclear what statistical analysis has been performed on the time-course data in Figure 1A (if any). If one-way ANOVA was performed at each timepoint (as the methods and legend suggest), this is an inaccurate method to analyse time-course data.
 
 (ii) It is unclear what methods were used to test the relative abundance of microbiome species at the family level (Figure 2L), whether correction was applied for multiple testing, and what the stars represent in the figure. 3) Mentioning whether siblings were used in any analyses would improve transparency, and if so, whether statistical correction needed to be applied to control for confounding by the father.
 
 We apologies for the lack of clarity regarding the statistical analyses. Going forward, we will revise the manuscript and include a more detailed description of the different analyses, inclusion of siblings and correction for multiple testing.
 
 Reviewer #1 (Public review):
 
 Summary:
 
 The authors investigated the effects of a low-protein diet (LPD) and a high sugar- and fat-rich diet (Western diet, WD) on paternal metabolic and reproductive parameters and fetoplacental development and gene expression. They did not observe significant effects on fertility; however, they reported gut microbiota dysbiosis, alterations in testicular morphology, and severe detrimental effects on spermatogenesis. In addition, they examined whether the adverse effects of these diets could be prevented by supplementation with methyl donors. Although LPD and WD showed limited negative effects on paternal reproductive health (with no impairment of reproductive success), the consequences on fetal and placental development were evident and, as reported in many previous studies, were sex-dependent.
 
 Strengths:
 
 This study is of high quality and addresses a research question of great global relevance, particularly in light of the growing concern regarding the exponential increase in metabolic disorders, such as obesity and diabetes, worldwide. The work highlights the importance of a balanced paternal diet in regulating the expression of metabolic genes in the offspring at both fetal and placental levels. The identification of genes involved in metabolic pathways that may influence offspring health after birth is highly valuable, strengthening the manuscript and emphasizing the need to further investigate long-term outcomes in adult offspring.
 
 The histological analyses performed on paternal testes clearly demonstrate diet-induced damage. Moreover, although placental morphometric analyses and detailed histological assessments of the different placental zones did not reveal significant differences between groups, their inclusion is important. These results indicate that even in the absence of overt placental phenotypic changes, placental function may still be altered, with potential consequences for fetal programming.
 
 Weaknesses:
 
 Overall, this manuscript presents a rich and comprehensive dataset; however, this has resulted in the analysis of paternal gut dysbiosis remaining largely descriptive. While still valuable, this raises questions regarding why supplementation with methyl donors was unable to restore gut microbial balance in animals receiving the modified diets.
 
 We thank the Reviewer for their considered thoughts on the gut dysbiosis induced in our models the minimal impact of the methyl donors and carriers. We will include additional text within the Discussion to acknowledge this. However, at this point in time, we are unsure as to why the methyl donors had minimal impact. It could be that the macronutrients (i.e. protein, fat, carbohydrates) have more of an influence on gut bacterial profiles than micronutrients. Alternatively, due to the prolonged nature of our feeding regimens, any initial influences of the methyl donors may become diluted out over time. We will amend the text to reflect these potential factors.
 
 Recommendations for the authors:
 
 Reviewer #1 (Recommendations for the authors):
 
 The authors have done an immense amount of work, which should be commended. In addition to the public review, I have a few suggestions for improvement.
 
 (1) To further explore the weight discrepancy between the males subjected to diet alteration and those on the control diet, further details about the intake and provision of the diets would be beneficial. Seeing as the fat mass was increased in males fed a WD, do you have information on where the weight 'loss' originated from?
 
 We thank the Reviewer for their insight into the changes in male body weight. We agree that the differences in total body weight verses the amount of adipose tissue, is intriguing. Unfortunately, we were unable to monitor the food intake of our animals for two main reasons. The first was that for animal welfare considerations, all our males were initially group housed prior to mating. This meant that typically, males were housed in groups of 4 during the initial feeding (pre-mating) period. Males were only housed singly upon them being used for mating. As such, it was not possible to obtain food consumption data for individual males.
 
 A second limitation arose due to the high extend of males who were fed the Western Diet effectively shredding the diet. This meant that it was not possible to weight the food to obtain a crude idea of how much they were consuming. The reason for this shredding is not clear to us. All mice received environmental enrichment, as we did not observed this behaviour for our control or low protein diet fed males.
 
 With regards to the weight of the other organs, we did not observe and significant overall changes in organ weight, or weight relative to body weight. Unfortunately, we did not have access to, or conduct any whole body scanning, such as DEXA, which would have given more insight into the body composition of our mice.
 
 (2) The testicular abnormalities and gene expression findings are linked nicely to the offspring's story. This is not as compelling for other findings, including the gut microbiome changes, which are not discussed in the context of the fetoplacental outcomes. More discussion of the potential impact of paternal changes on fetal outcomes would strengthen claims that these findings are impactful.
 
 We thank the Reviewer for their comments and suggestion. Our caution with connecting the gut microbiota to offspring development is that, to the best of our understanding, there is little data with regards to its effect on post-fertilisation development. While there is data showing that the microbiome can produce compounds and metabolites that can affect sperm quality and metabolism, lipid composition and testicular morphology, the connection with post-fertilisation development is limited. Additionally, as we saw no difference in fundamental fertility, as measured by changes in litter size, we propose that there no overall changes in the ability of the sperm from our experimental males to reach, fertilise and support development.
 
 However, we acknowledge the Reviewers comments on strengthening the manuscript and so have included some additional text within the Discussion to highlight the links between the microbiome and male reproductive fitness. Please see lines 337-348.
 
 (3) It is clarified in the methods that n=8 males were used in the study, but different nnumbers are shown for some parameters. It would improve transparency for the reader if it were clarified whether these differences result from missing data or from the removal of statistical outliers.
 
 The Reviewer is correct that while 8 males were initially placed on their respective diets, for some of the analyses, the n-number is less than 8. In some instances, for example the analysis of total body fat (Fig. 1D), data was unfortunately not collected during an initial round of dissections. As such, the n number here is only 6 in each group. Additionally, due to the high cost associated with sequencing the microbiome for 5 groups, we decided to only sequence 6 samples per group. However, we do not feel that this impacts significantly on the overall focus of the data presented.
 
 (4) Despite this, you may have been underpowered to detect differences in some parameters, for example, the placental stereology. Alternative approaches, such as immunostaining with whole-section quantification, may be more sensitive to detect subtle changes. Alternatively, have you considered using smaller grids for improved sensitivity of the stereological analysis?
 
 We thank the Reviewer for their insight into the data and their suggestion for immunostaining. We agree with the Reviewers that a greater number of samples would have strengthened our analyses. However, we are not in the possession of further samples which have been processed in the correct manner for additional stereological analysis. We are hoping to conduct further placental analyses based on our RNA-Seq data, but this will require the generation of new samples.
 
 (5) It would be easier to interpret the figures if it were clear which datasets were analysed using non-parametric tests. Were Figure 2F, 2G, 6A, 6E, and 6I are shown differently for that reason, perhaps? It would improve transparency if non-normally distributed data are shown as medians, as that's what's being compared in a non-parametric test.
 
 We apologies for any confusion regarding the analysis of our data. The Reviewer is correct that the data in 2F and 2G were analysed using a non-parametric test. We have now made this clearer in the legend to the figure highlighting which data sets were analysed by ANOVA or Kruskal–Wallis test. We have also done this for the other figure legends where appropriate. With regard to Figure 6, the data presented in Panels A, E and I were intended to show the range of data extending above and below the 90th and 10th centiles of the CD fetuses. As such, we felt that violin plots were the most appropriate way to display these data.
 
 (6) Supplemental Figure 1 seems to be missing.
 
 We apologise sincerely for the lack of inclusion of Supplemental Figure 1. We will ensure that it is included in our resubmission
 
 (7) Line 523 states that samples with RIN < 7 were used for microarray analysis. Do the authors mean RIN > 7?
 
 We thank the Reviewer for identifying our mistake. The Reviewer is correct that this should have been a RIN >7. We have now corrected this.
 
 (8) It is mentioned in lines 603-604 that paraffin shrinkage was accounted for. It could be useful to describe how this was done.
 
 We have revised the text within the Materials and Methods to provide additional clarity on how we compensated for the shrinkage due to the paraffin processing.
 
 In the revised Methods we have added a brief “Shrinkage correction” subsection describing how paraffin-embedding shrinkage was quantified for each placenta individually. Specifically, we now state that post-embedding placental volume was estimated using the Cavalieri Principle on systematic and uniformly-random sampled H&E sections, and a per-placenta volume shrinkage coefficient (kV = Vpost/Vpre) was calculated.
 
 We have also added the equations showing how this coefficient was used to correct compartment volumes and the derived surface area estimates (surface area calculated from Sv and the corresponding shrinkage-corrected placenta volume). Please see lines 618-644.
 
 (9) This may be due to the generation of the reviewer PDF, but Figure 4E and 4H are illegible in our version of the manuscript.
 
 We apologies for the lower resolution with these figures and the difficulty in seeing the information presented. We have created revised versions of these figures which we hope are of higher quality and clarity.
 
 (10) What do the stars represent in Figure 6A, E, I - compared to what, controls?
 
 The Reviewer is correct that the asterisks in Figures 6A, E and I represent differences in the proportion of fetuses either above or below the 90th and 10th centile of the CD fetuses respectively. As such, in panel A, for both the LPD and MD-LPD groups, there are significantly more fetuses who are below the 10th centile of the CD group. Similarly, in panel E, there are significantly more placentas in the LPD group that have a weight above the 90th centile of the CD group. We have revised the graphs to make these differences, and their comparisons clearer.
 
 Reviewer #2 (Recommendations for the authors):
 
 Some Recommendations for improving the writing and presentation, and minor corrections to the text and figures:
 
 (1) Please describe Wnt signaling in the Abstract.
 
 The Abstract has been amended to provide some additional text regarding Wnt signalling. Please see lines 60-63.
 
 (2) Page 6, line 134: A brief explanation of why measuring the inhibin beta-A chain should be included.
 
 The text within this section has been amended to include a brief description of the role of Inhibin β-A chain on testicular function. Please see lines 135-139.
 
 (3) The methodology used for Tnf determination is missing and should be described.
 
 We apologies for the lack of detail regarding our analysis of serum Tnf in our males. This has now been included. Please see lines 479-480.
 
 (4) It is important to mention that free fatty acid levels in the MD-WD group were similar to those in the CD group, although they remained comparable to the WD group.
 
 We agree with the Reviewer and have amended the text to indicate that there was no difference in the FFA profile of the MD-WD males to either the CD or WD males. Please see lines 147-148.
 
 (5) Figure 2 presents both metabolic parameters and bacterial profile analyses. Although the authors appear to relate these outcomes, clarity would be improved by presenting them in separate figures.
 
 As requested, we have now presented these data as two separate Figures
 
 (6) Figure 3H: The data suggest that the decrease in the number of spermatogonia (PLZF⁺) observed in the LPD and WD groups was prevented when the diets were supplemented with methyl donors.
 
 (7) However, the description and interpretation of this result (or of a neutral effect) are missing.
 
 We agree with the Reviewer in their interpretation of the PLZF+ data. We have indicated this in the text within the Results and Discussion sections. Please see lines 177-178 and lines.
 
 (8) Line 284: Please check the abbreviation for MD-LPD.
 
 We thank the Reviewer for identifying this typographical mistake. This has now been corrected to state MD-LPD and not MDL.
 
 (9) Line 285: Please check the lettering in the text and in Figure 6H-K.
 
 We thank the Reviewer for identifying this typographical mistake. This has now been corrected to state the panels are Figure 9H-K, as we have split the original Figure 2 into two figures.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.11.14.688439v2
www.biorxiv.org www.biorxiv.org

The titin N2A-MARP signalosome constrains muscle longitudinal hypertrophy in response to stretch

4
1. Public_Reviews 29 May 2026
 
 in eLife
 
 eLife Assessment
 
 The work by van der Pijl presents important findings on the role of titin-associated muscle ankyrin repeat proteins (MARPs) on hypertrophy via mTOR signalling. The study presents rigourous data using in vivo loss-of-function and pharmacological approaches to investigate effects on hypertrophy. While the evidence supporting the role of MARPs on hypertrophy is solid, there are limitations. For example, the use of Rapamycin only inhibits some aspects of mTORC1 signalling and the study is limited to analysis of the diaphragm and thus it is not clear if the mechanisms are conserved across other muscle types.
 
 Summary
2. Public_Reviews 29 May 2026
 
 in eLife
 
 Reviewer #1 (Public review):
 
 [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]
 
 Summary:
 
 In this manuscript, the authors employ diaphragm denervation in rats and mice to study titin-based mechanosensing and longitudinal muscle hypertrophy. By integrating bulk RNA-seq, proteomics, and phosphoproteomics, they map the stretch-responsive signalling landscape, uncovering robust induction of the muscle-ankyrin-repeat proteinsௗ(MARP1-3) together with enhanced phosphorylation of titin's N2A element.
 
 Genetic ablation of MARPs in mice amplifies longitudinal fibre growth and is accompanied by activation of the mTOR pathway, whereas systemic rapamycin treatment suppresses the hypertrophic response, highlighting mTORC1 as a key downstream effector of titin/MARP signalling.
 
 Strengths:
 
 The authors address a clear biological question: "how titin-associated factors translate mechanical stretch into longitudinal fibre growth" using a unique and clinically relevant animal model of diaphragm denervation. Using a comprehensive multiomics approach, the authors identify MARPs as potential mediators of these effects and use a genetic mouse model to provide compelling evidence supporting causality. Additionally, connecting these findings to rapamycin, a drug widely used clinically, further increases the relevance and potential impact of the study.
 
 Review 1
3. Public_Reviews 29 May 2026
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 Muscle hypertrophy is a major regulator of human health and performance. Here, van der Pilj and colleagues assess the role of the giant elastic protein, titin, in regulating the longitudinal hypertrophy of diaphragm muscles following denervation. Interestingly, the authors find an early hypertrophic response, with 30% new serial sarcomeres added within 6 days, followed by subsequent muscle atrophy. Using RBM20 mutant mice, which express a more compliant titin, the authors discovered that this longitudinal hypertrophy is mediated via titin mechanosensing. Through an omics approach, it is suggested that the Muscle ankyrin proteins may regulate this approach. Genetic ablation of MARPs 1-3 blocks the hypertrophic response, although single knockouts are more variable, suggesting extensive complementation between these titin binding proteins. Finally, it is found through the administration of rapamycin that the mTOR signalling pathway plays a role in longitudinal hypertrophic growth.
 
 Strengths:
 
 This paper is well written and uses an impressive suite of genetic mouse models to address this interesting question of what drives longitudinal muscle growth.
 
 Weaknesses:
 
 While the findings are of interest, they lack sufficient mechanistic detail in the current state to separate cross-sectional versus longitudinal hypertrophy. The authors have excellent tools such as the RBM20 model to functionally dissect mTOR signalling to these processes. It is also unclear if this process is unique to the diaphragm or is conserved across other muscle groups during eccentric contractions.
 
 Review 2
4. Public_Reviews 29 May 2026
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 eLife Assessment
 
 The study presents important insights into the regulation of muscle hypertrophy, regulated by Muscle Ankyrin Repeat Proteins (MARPs) and mTOR. The methods are overall solid and complementary, with only minor limitations. Overall, the findings will be of interest for both muscle-biology specialists and the broader mechanobiology community.
 
 We thank the editors for their interest in our manuscript. Below we respond to the reviewer’s comments. Based on these comments we made extensive textual revisions throughout the manuscript, and we added additional analyses to the revised results.
 
 Reviewer #1 (Public review):
 
 Summary:
 
 In this manuscript, the authors employ diaphragm denervation in rats and mice to study titin‑based mechanosensing and longitudinal muscle hypertrophy. By integrating bulk RNA‑seq, proteomics, and phosphoproteomics, they map the stretch‑responsive signalling landscape, uncovering robust induction of the muscle‑ankyrin‑repeat proteins (MARP1‑3) together with enhanced phosphorylation of titin's N2A element. Genetic ablation of MARPs in mice amplifies longitudinal fibre growth and is accompanied by activation of the mTOR pathway, whereas systemic rapamycin treatment suppresses the hypertrophic response, highlighting mTORC1 as a key downstream effector of titin/MARP signalling.
 
 Strengths:
 
 The authors address a clear biological question: "how titin‑associated factors translate mechanical stretch into longitudinal fibre growth" using a unique and clinically relevant animal model of diaphragm denervation. Using a comprehensive multiomics approach, the authors identify MARPs as potential mediators of these effects and use a genetic mouse model to provide compelling evidence supporting causality. Additionally, connecting these findings to rapamycin, a drug widely used clinically, further increases the relevance and potential impact of the study.
 
 We thank the reviewer for their kind words and critical review of our manuscript. The roles of the MARP proteins are diverse and form an intriguing target for further study.
 
 Weaknesses:
 
 There are several areas where the manuscript could be substantially improved.
 
 (1) The statistical analysis of multi-omics data needs clarification. Typically, analyses across multiple experimental groups require controlling the false discovery rate (FDR) simultaneously to avoid reporting false-positive findings. It would be very helpful if the authors could specify whether adjusted p-values were calculated using a multi-factorial statistical model (e.g., ~group) or through separate pairwise contrasts.
 
 We agree with the reviewer that the description of the statistical analysis could be improved. We report the q-values in the supplemental data tables to correct for false positive data, the p-values reflect pairwise comparisons. Statistical testing was performed on whole proteomes or phospho-proteomes, making for very stringent testing (please also see reply to reviewer 2, response 5). Unbiased quantitative proteomics functions primarily as a screen, in-solution digestion of muscle proteins yields comparatively few peptides making population adjusted p-value calculation very stringent, suggesting no/few differences in expression. Hence, we compared RNAseq to proteome data to isolate consistently differential proteins. We have revised the method section (lines 745-746) to include clarifications of the FDR analysis.
 
 (2) (A)There are three separate points regarding MARP3 that could be improved. First, the authors report that MARP3-KO mice exhibit smaller increases in muscle mass after diaphragm denervation compared to wild-type mice (a -13% difference), indicating MARP3 likely promotes rather than attenuates hypertrophy. However, the manuscript currently states the opposite (lines 215-216); this interpretation should be revisited. (B) Second, it would be valuable if the authors could provide data showing whether MARP3 transcript or protein levels change response to denervation - if they do not, discussing mechanisms behind the observed phenotype would help clarify the findings. (C) Finally, given that some MARP-KO mice already exhibit baseline differences, employing and reporting the full two-way ANOVA (including genotype × treatment interaction) would allow a direct statistical assessment of whether MARP deficiency modifies the muscle's response to stretch. This analysis would help clearly resolve any existing ambiguity.
 
 (A) Compared to wildtype mice, MARP3 KO mice exhibit baseline diaphragm hypertrophy. This suggests that MARP3 may normally restrain hypertrophy under basal conditions. However, in response to UDD, MARP3 KO mice display an attenuated hypertrophic response, which could be interpreted as MARP3 promoting hypertrophy under stress conditions, as noted by the reviewer. The relationship between MARP3 and metabolism remains incompletely understood, but prior studies indicate that loss of MARP3 enhances glucose tolerance and insulin sensitivity (PMID: 12456686), suggesting that MARP3 may act as a negative regulator of metabolic signaling. Both glucose and insulin can activate the PI3K pathway to promote hypertrophy (PMID: 16679293), which may contribute to the baseline hypertrophy observed in MARP3 KO diaphragms. In addition, MARP3 deficiency has been associated with activation of AMPK signaling (PMID: 26398569). AMPK is a key regulator of metabolic pathways and a well-established inhibitor of hypertrophic signaling, in part through suppression of mTOR activity, and is also responsive to mechanical stimuli (PMID: 18556591). Thus, increased AMPK activity in MARP3 KO mice may limit hypertrophy in response to UDD. Supporting this, our phospho-proteomics data indicate increased activation of the AMPK β-subunit following UDD, suggesting a potential role for AMPK signaling in stretch-induced hypertrophy. Based on these considerations, we have removed the statement that MARP3 attenuates hypertrophy and instead incorporated the potential role of AMPK signaling into the Discussion (lines 354–355). While the present study focuses on the triple MARP KO model, future work will examine the specific contributions of individual MARP proteins to muscle hypertrophy.
 
 (B) MARP3 (Ankrd23) upregulation at the RNA level was detected by RNA-seq in rat diaphragm following both UDD and BDD (Supplemental Tables 1 and 2). This is consistent with our prior findings in mice, where western blot analysis showed increased MARP3 protein expression following UDD (PMID: 29978560). We note that reliable detection of MARP3 protein remains technically challenging due to limited availability of specific antibodies.
 
 (C) We agree with the reviewer and have added the results of the two-way ANOVA to the figures (see updated Figure 4). The three MARP proteins exhibit differential effects on diaphragm hypertrophy, supporting their role as modulators of stretch-induced hypertrophy.
 
 (3) The current presentation of multi-omics data is somewhat difficult to follow, making it challenging to determine whether observed changes occur at the transcript or protein level due to inconsistent gene/protein naming and capitalization (e.g., proper forms are mTOR, p70 S6K, 4E-BP1). Clearly organizing and presenting transcript and protein-level changes side-by-side, especially for key molecules discussed in later experiments, would make the data more accessible and provide clearer insights into the biology of titin-mediated mechanosensing.
 
 We agree with the reviewer that naming conventions between gene and protein can be hard to follow. We kept the names for titin-associated proteins as some have multiple protein names and the most common names is shown here. However, we made the suggested changes for the mTOR related proteins (for example, see figure 5).
 
 (4) The current analysis relies on total protein measurements downstream of mTOR, yet mTOR's primary mode of action is to change phosphorylation status. Because the authors have already generated a phosphoproteomic dataset, it would be very helpful to report - or at least comment on - whether known mTOR target phosphosites were detected and how they respond to denervation and rapamycin. Including even a brief summary of canonical sites such as S6K1 Thr389 or 4E - BP1 Thr37/46 would make the link between mTOR activity and hypertrophy much clearer.
 
 We agree with the reviewer that the mTOR data requires more work to ascertain its function in regulating hypertrophy following UDD. We investigated S6K1 Thr389 or 4E BP1 Thr37/46 in both the phosphoproteomic dataset and by western blot. These sites do not appear in phosphoproteome mass spectrometry (supplemental data table 13) and 4E BP1 Thr37/46 was unchanged by western blot (not shown). The S6K1 Thr389 antibody was aspecific in our hands, but Norrby et al (PMID: 22657251) saw increased levels by 6-days UDD. Hence the mTOR aspect of this study is quite complex, suggesting mTOR plays a major role in UDD hypertrophy, but potentially through an alternative activation pathway from what is classically described for muscle hypertrophy. We are investigating the mTOR mechanism further focusing on mTOR’s role in regulating longitudinal hypertrophy with potential connection to titin signaling and hope to publish this in the next few years. We revised the discussion to include canonical mTOR activation in hypertrophy, please see lines 388-392.
 
 (5) Finally, since rapamycin blocks only a subset of mTOR signalling, a brief discussion that distinguishes rapamycin‑sensitive from rapamycin‑insensitive pathways would be valuable. Clarifying whether diaphragm stretch relies exclusively on the sensitive branch or also engages the resistant branch would place the results in a broader mTOR context and deepen the mechanistic narrative.
 
 We agree with the reviewer that distinguishing between rapamycin-sensitive and -insensitive mTOR signaling adds useful context to the interpretation of stretch-induced hypertrophy. Rapamycin primarily inhibits mTORC1, whereas mTORC2 is generally considered rapamycin-insensitive, although prolonged or high-dose exposure can also affect mTORC2 activity. Our data indicate that UDD induces a form of hypertrophy that is sensitive to rapamycin, supporting a prominent role for mTORC1 in this process. However, we cannot exclude the possibility that rapamycin-insensitive pathways, including mTORC2 signaling, also contribute. Notably, denervation itself may influence mTORC2 activity, which could complicate the distinction between stretch- and denervation-mediated signaling. Given these considerations, we have added a brief discussion to acknowledge potential contributions of rapamycin-insensitive mTOR signaling (lines 379-384). A more comprehensive dissection of mTORC1 versus mTORC2 signaling in this context will require targeted approaches and falls beyond the scope of the present study.
 
 Reviewer #1 (Recommendations for the authors):
 
 Minor comments:
 
 (6) The manuscript notes that KEGG analysis "confirmed" the GO‑term findings. Because KEGG pathways and GO terms describe different types of biological information, it might be clearer simply to present them as complementary lines of evidence rather than one validating the other.
 
 We agree and modified the text accordingly. “Concurrently, KEGG PATHWAY database searches (Supplemental data Table 6) indicated that the DEG’s are involved in muscle remodeling.” See lines 166-169.
 
 (7) Figure 2's legend mentions a two‑way ANOVA, but the specific factors tested are not specified. Listing those two factors would help readers interpret the statistics more easily.
 
 The two-way ANOVA refers to the violin plot in figure 2E and tests the difference of the 2 surgical modalities sham vs UDD and sham vs BDD. Sham groups were combined in the graphs for easy comparison. We clarified the text of figure legend 2.
 
 (8) The Methods briefly describe phosphopeptide enrichment, but additional details on the criteria for site identification - such as the localisation algorithm, probability cut‑off, and FDR thresholds - would make the phosphoproteomics section more transparent and reproducible.
 
 Please see the updated method section, lines 756-765
 
 Reviewer #2 (Public review):
 
 Summary:
 
 Muscle hypertrophy is a major regulator of human health and performance. Here, van der Pilj and colleagues assess the role of the giant elastic protein, titin, in regulating the longitudinal hypertrophy of diaphragm muscles following denervation. Interestingly, the authors find an early hypertrophic response, with 30% new serial sarcomeres added within 6 days, followed by subsequent muscle atrophy. Using RBM20 mutant mice, which express a more compliant titin, the authors discovered that this longitudinal hypertrophy is mediated via titin mechanosensing. Through an omics approach, it is suggested that the Muscle ankyrin proteins may regulate this approach. Genetic ablation of MARPs 1-3 blocks the hypertrophic response, although single knockouts are more variable, suggesting extensive complementation between these titin binding proteins. Finally, it is found through the administration of rapamycin that the mTOR signalling pathway plays a role in longitudinal hypertrophic growth.
 
 Strengths:
 
 This paper is well written and uses an impressive suite of genetic mouse models to address this interesting question of what drives longitudinal muscle growth.
 
 We appreciate the reviewer’s kind words on our manuscript and their critical review of our work. A potential separate mechanism governing cross-sectional versus longitudinal hypertrophy is of great interest and something we aim to address in future manuscripts.
 
 Weaknesses:
 
 While the findings are of interest, they lack sufficient mechanistic detail in the current state to separate cross-sectional versus longitudinal hypertrophy. The authors have excellent tools such as the RBM20 model to functionally dissect mTOR signalling to these processes. It is also unclear if this process is unique to the diaphragm or is conserved across other muscle groups during eccentric contractions.
 
 Reviewer #2 (Recommendations for the authors):
 
 (1) Cross-sectional hypertrophy characterization: The paper emphasizes longitudinal hypertrophy but does not quantify the contribution of radial (cross-sectional) hypertrophy to the total mass increase. Given that the denervated costal diaphragm shows ~50% increase in mass (Figure 1B) but there is only ~30% fiber lengthening, it is important to determine the proportion attributable to fiber diameter changes. Histological analysis of muscle fiber cross-sectional area would clarify the relative contributions of longitudinal versus radial hypertrophy to the overall mass phenotype.
 
 We agree with the reviewer that radial hypertrophy is an important mechanism for muscle weight gain in UDD. In previous work we characterized both the radial and longitudinal hypertrophy response in 6-day UDD and found that ~20% of the mass gain seen in UDD is radial hypertrophy (PMID: 29978560). We reference this paper in the discussion section, line 277-278. Doing a full histological work-up of UDD diaphragm would be interesting but falls outside the scope of this manuscript. Our focus was to characterize longitudinal hypertrophy by addition of sarcomeres in series and provide insight into titin’s role in regulating longitudinal hypertrophy. We hope that the reviewer agrees with this approach.
 
 (2) Titin isoform expression analysis: At line 103, the authors propose that longitudinal hypertrophy reduces strain on titin by decreasing fractional sarcomere extension. However, this hypothesis does not exclude the possibility of isoform switching to a less elastic titin variant, which may compensate for changes in mechanical stress. The RNA-sequencing data should be analyzed for titin exon usage patterns between sham and UDD to determine whether changes in isoform composition (e.g., PEVK region splicing) accompany longitudinal hypertrophy. If isoform switching occurs, this represents an alternative or complementary mechanism to sarcomere addition.
 
 We analyzed titin exon usage in rat following both UDD and BDD. Increases in sarcomeres in series associated with UDD show modest changes in titin exon usage, though not significant by population adjusted p-values. The denervation effect of BDD did show changes in splicing, indicating lower inclusion of PEVK encoding exons, suggesting a stiffening of the titin molecules. Stiffening of titin molecules might be protective for the fully paralyzed diaphragm and preserve muscle mass. This would align with our prior publication (PMID: 29978560) which showed that stiffer titin generated more radial hypertrophy in response to UDD. In response to the reviewer’s comment, we added the splicing data to the supplemental data as new figure 2 and briefly address titin splicing in the results section, see lines 121-125.
 
 (3) The comparison of 3-day unilateral diaphragm denervation (UDD) and bilateral diaphragm denervation (BDD) in rats (Figure 1D-E) is used to argue that hypertrophic signaling is stretch-dependent rather than denervation-dependent. However, this interpretation requires clarification. In mice, hypertrophy is detectable as early as 1 day post-UDD, whereas the 3-day BDD protocol may drive an accelerated hypertrophic-to-atrophic remodelling process given the severity of the model. Moreover, longitudinal and global muscle hypertrophy may operate through distinct mechanisms: denervation could suppress longitudinal hypertrophy through a separate pathway while promoting or delaying cross-sectional hypertrophy. The authors should acknowledge that the current evidence does not fully exclude denervation-dependent mechanisms and should consider extended BDD time points or additional mechanistic studies to clarify this distinction.
 
 UDD and BDD are both denervation models and hypertrophy occurs in the denervated costal of UDD operated animals. Stretch is thus the mechanical difference between UDD and BDD and thus the trigger for hypertrophy signaling. At the denervation signaling level both models should in principle be comparable and are unlikely to play different roles between UDD and BDD, except that UDD also induces a more potent hypertrophy signaling profile on top of the atrophy program. That said, BDD is a more severe model and respiration rate is depressed compared to UDD where respiration rate is elevated. BDD rats also engage in abdominal breathing, which mildly stretches the diaphragm. Hypoxia is likely to play a stronger role in BDD than UDD and could thus further enhance the atrophy profile of BDD. We agree with the reviewer that more work is needed to elucidate the BDD remodeling response, however UDD induced stretch is the main driver of longitudinal hypertrophy. In response to the reviewer’s comment, we have added clarifying text to the discussion, lines 286-292.
 
 The potential for there being two independent mechanisms for both radial and longitudinal hypertrophy is of great interest to us. We foresee that dissecting out these differences will require a cell culture-based approach and will aid in avoiding the complexity of overlapping denervation and hypertrophy signals as seen in this manuscript.
 
 (4) Characterization of RBM20 models: The RBM20 experiments rely on the assumption that increased titin compliance reduces stretch sensitivity. However, the paper provides minimal baseline characterization of the diaphragms. Specifically: (a) What are the sarcomere lengths in RBM20-deficient diaphragms at rest and under stretch? (b) How does the passive force-length relationship differ between wildtype and RBM20-deficient diaphragm muscles? and (c) Would RBM20-deficient muscles, despite having longer sarcomeres at baseline, actually experience sufficient strain to activate mechanosensing? These data are necessary to interpret why RBM20-deficient mice show attenuated mass gain rather than none (as in BDD) during UDD (Supplemental Figure 2A-C). Additionally, what would the authors hypothesize would happen if rapamycin were used in RMB20 UDD models? It appears to be an attractive experimental approach to separate potential mTOR contributions to longitudinal versus cross-sectional hypertrophy.
 
 We agree with the reviewer that more work is needed on Rbm20 deficient mice and rats to elucidate their response to stretch. Part of this characterization has previously been published (PMID: 29978560) and Rbm20 splice-deficient mice have reduced passive stiffness in the diaphragm and show a robust mechanosensing response to UDD. Rbm20 splice-deficient mice also show a similar increase in longitudinal hypertrophy, but a blunted radial hypertrophy in response to 6-days UDD. The main reason for not expanding on these mice/rats further was the added complexity of Rbm20 splicing multiple targets that could affect hypertrophy signaling, for example LDB3 (ZASP) and FLNC (Filamin C) are both associated with hypertrophic cardiomyopathy. Hence for the purpose of this manuscript we showed mice and rats having a similar response to UDD, hypertrophy wise, and that titin stiffness (reduced in Rbm20-deficient animals) affects hypertrophy at the diaphragm mass level.
 
 Testing rapamycin on Rbm20-deficient animals could be interesting, however the complexities of also changing splicing of non-titin targets will make interpretation of mTOR signaling difficult. Perhaps an alternative approach would be to generate a titin mouse model with more compliant titin (e.g. increase the size of the PEVK segment), a model we are considering for future studies. TtnΔ112-158 mice, deleting a large portion of the PEVK region (PMID: 30565562) show increases in sarcomere number. We would expect a model with more PEVK to thus show a reduction in the number of sarcomeres in series. We discuss the role of titin stiffness in the discussion and how titin stiffness ties to longitudinal hypertrophy, please see lines 302-314.
 
 (5) Statistical analysis and multiple hypothesis correction: The proteomic analyses appear to employ a nominal p-value threshold (p < 0.05) without correction for multiple comparisons or false discovery rate (FDR) control. This is particularly concerning given the large number of comparisons. For example, the authors report 142 titin phosphorylation sites significantly different between sham and UDD at p < 0.05 (approximately 20% of ~700 identified sites). However, with proper FDR correction (adjusted p < 0.05), only 14 sites remain significant - a 90% reduction. This discrepancy is critical for the discussion on titin N2A phosphorylation sites pS9459 and pS9520, where only pS9520 achieves statistical significance after FDR adjustment. The authors should justify their choice of statistical thresholds and reanalyze key findings using FDR-corrected p-values. Additionally, the phosphoproteomics dataset should be screened for duplicate phosphosite identifications to ensure each site is counted only once.
 
 Reviewer 1 has voiced similar concerns, and we have thus expanded the methodology to explain the statistical tests used to analyze the data and the process of establishing Z-scores of isobaric peptides for the same phospho-sites (see lines 756-765). Our statistical analysis covers all detected peptides, when we only analyze the titin peptides: pS9459 is only significant in t-test, likely due to large variation in isobaric peptides. pS9520 is significant in both independent t-test and FDR. We changed figure 3D to show the fold change instead of the previous Z-score for more intuitive interpretation.
 
 Minor comments:
 
 (6) Line 52: "thesarcomeres" should read "the sarcomeres".
 
 A space has been added, please see line 52.
 
 (7) Line 52: "half-sarcomer" should read "half-sarcomere"
 
 Spelling has been corrected, please see line 52.
 
 (8) Figure clarity: Figure 1 (B-C) presents mouse data, while Figure 1 (D-E) presents rat data. This distinction should be clearly labeled in the figure legend or on the axes to prevent misinterpretation, particularly for readers unfamiliar with the experimental design.
 
 We added the species to the y-axis of revised figure 1B-E and added additional clarification in the figure legend.
 
 (9) Supplementary tables: When reporting statistical comparisons in the supplementary tables, please consider including the directionality of the statistical tests (e.g., which group was higher or lower) alongside p-values. This will facilitate interpretation without requiring reference to the main text figures.
 
 We agree with the reviewer and added statistical direction as a new column next to the p-values, please see the revised supplemental tables.
 
 (10) Given the interesting divergent findings in MARPtKO versus single knockouts, it would be interesting to assess by immunofluorescence the association of each MARP with the N2A region of titin following UDD.
 
 We agree with the reviewer that localization is important. Miller et al (PMID: 14583192) previously localized MARP1-3 to the N2A segment by immuno-EM and our work previously localized MARP1 to N2A using SR-SIM (PMID: 29978560). We will further investigate MARPs binding to the N2A region in an upcoming study that we intend to publish soon.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.06.19.660595v2
www.biorxiv.org www.biorxiv.org

Photo-downregulation of SIRT4 mitigates aging in mice by enhancing H3K9ac via fatty acid metabolism

4
1. Public_Reviews 29 May 2026
 
 in eLife
 
 eLife Assessment
 
 This potentially valuable study investigates the anti-senescence effects of red light exposure, proposing that reduced SIRT4 levels enhance fatty acid metabolism and H3K9ac, thereby attenuating ageing-related phenotypes. The authors use multiple approaches, including cultured cells, animal models, and molecular analyses, to support their conclusions. However, the evidence remains incomplete, as additional controls and stronger mechanistic data are needed to fully support the proposed pathway, particularly how red light exposure reduces SIRT4 levels.
 
 Summary
2. Public_Reviews 29 May 2026
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 Deng and colleagues pursue the possibility that red light exposure can provide some benefits and anti-senescence effects in aged mouse models. In addition, they show how red light influences metabolism in cultured keratinocytes. The authors provide a long dissection of the potential paths involved in the changes promoted by red light exposure, identifying CytC oxidase, SIRT4, PPARa and MCD as key players.
 
 Strengths:
 
 The authors did a thorough exploration of the multiple potential avenues by which red light exposure influences metabolism. The in vitro and in vivo evidence nicely complement each other.
 
 Weaknesses:
 
 This is a challenging hypothesis that would require some additional experimental controls. The pathway dissection, while extensive, is sometimes approached in unconvincing ways, and the results are not always evident to judge or interpret. Technically, the western blots and transcriptomic analyses require notable improvements.
 
 Review 1
3. Public_Reviews 29 May 2026
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 This work identifies a previously unknown way that red light can slow ageing. The authors show that red light lowers the level of a protein called SIRT4 in skin cells. Reducing SIRT4 boosts fatty acid use and increases a type of histone modification that keeps genes active. These changes help cells clear away signs of ageing, reduce inflammation, and restore normal metabolism. The findings open the possibility of developing new treatments that target SIRT4 to reverse age‑related decline.
 
 Strengths:
 
 The evidence is solid because the authors use several complementary methods. They test red light in both cultured cells and naturally aged mice, and they confirm the key role of SIRT4 by silencing its gene. Measurements of metabolism, protein changes, and ageing markers all point in the same direction. However, the exact way red light lowers SIRT4 levels is not fully explained, which leaves a minor gap. Overall, the conclusions are well supported and convincing.
 
 Weaknesses:
 
 The paper does not evolve to use the mechanistic discoveries of the manuscript to help our community to identify the mechanism of photobiomodulation, which is not known so far.
 
 I would like to draw attention to a recently published paper by Herrera et al. (FEBS Letters 2025, doi:10.1002/1873-3468.70195), which shows that red light (660 nm) stimulates mitochondrial fatty acid oxidation in keratinocytes via AMPK‑dependent phosphorylation of ACC, without altering expression of electron transport chain complexes. I believe this paper is highly complementary to the current study.
 
 Herrera et al. demonstrate that red light increases basal, ATP‑linked, and maximal oxygen consumption rates in keratinocytes specifically through enhanced fatty acid oxidation (inhibited by etomoxir). This independently validates the central finding of the current manuscript, i.e., red light boosts lipid metabolism, strengthening the robustness of this concept.
 
 While the current manuscript focuses on the SIRT4‑MCD axis, Herrera et al. identify AMPK phosphorylation and ACC inhibition as key effectors. The authors can integrate and expand their discussion, since SIRT4 downregulation may converge on AMPK activation, or they may represent parallel, reinforcing mechanisms. This would enrich the mechanistic model and open new hypotheses.
 
 The mechanism of photobiomodulation: Herrera et al. explicitly challenge the prevailing paradigm that red light acts solely via cytochrome c oxidase (by showing long‑lasting effects, unchanged OXPHOS protein levels, and no difference in permeabilised cells). The current finding (red light acts through SIRT4 downregulation, i.e., not direct enzymatic activation) aligns perfectly with Herrera´s critique.
 
 Long‑term metabolic effects - Herrera et al. show that a single red light exposure elevates oxygen consumption for up to 2 days. The current study focuses on changes at 12‑24 h. Their data extend the time window and suggest that the metabolic reprogramming you describe may persist longer than currently discussed, which is clinically relevant.
 
 Discussing Herrera et al.'s results would not only acknowledge independent, corroborating evidence but would also allow the authors to position their SIRT4‑centric mechanism within a broader, emerging understanding of red‑light photobiomodulation.
 
 Review 2
4. Public_Reviews 29 May 2026
 
 in eLife
 
 Author response:
 
 Reviewer #1 (Public review):
 
 Weaknesses:
 
 This is a challenging hypothesis that would require some additional experimental controls. The pathway dissection, while extensive, is sometimes approached in unconvincing ways, and the results are not always evident to judge or interpret. Technically, the western blots and transcriptomic analyses require notable improvements.
 
 We would like to thank the reviewer for the careful and patient examination of the issues identified in our manuscript. The poor quality of some of the Western blot bands in Figure 4 may have been caused by inappropriate electrophoresis conditions during the Western blot experiments. In the revised manuscript, we will optimize the electrophoresis conditions to obtain higher-quality protein bands and update the quantitative data. Regarding the quantification format, we believe that heatmaps provide a more intuitive representation of trends in protein expression across different treatment groups. This approach more accurately reflects the results of our biological replicates than simply analyzing the significance of differences in the grayscale values of protein bands. For the analysis of transcriptomic data, we will conduct a more detailed analysis of signal pathway enrichment and the identified differentially expressed genes to ensure that predicted genes are excluded from our current results and redundant data presentation is removed.
 
 Regarding additional experimental controls, such as incorporating experimental data under blue light treatment conditions as a control for red light. While exploring the optimal red light irradiation dose at the cellular level, we simultaneously conducted experiments on the effects of blue light irradiation at the same dose on keratinocyte activity. The results indicated that as the blue light irradiation dose increased (0–160 J/cm2), the keratinocyte activity exhibited a dose-dependent decline. This indicates that blue light is phototoxic to keratinocytes. The relevant experimental results have already been published in our previous study (Communications Biology 2024, doi: 10.1038/s42003-024-06973-1). Taken together with the data from our study, this demonstrates that the anti-aging effects of red light reported in the current manuscript are indeed driven by red light.
 
 Reviewer #2 (Public review):
 
 Weaknesses:
 
 The paper does not evolve to use the mechanistic discoveries of the manuscript to help our community to identify the mechanism of photobiomodulation, which is not known so far.
 
 I would like to draw attention to a recently published paper by Herrera et al. (FEBS Letters 2025, doi:10.1002/1873-3468.70195), which shows that red light (660 nm) stimulates mitochondrial fatty acid oxidation in keratinocytes via AMPK‑dependent phosphorylation of ACC, without altering expression of electron transport chain complexes. I believe this paper is highly complementary to the current study.
 
 Herrera et al. demonstrate that red light increases basal, ATP-linked, and maximal oxygen consumption rates in keratinocytes specifically through enhanced fatty acid oxidation (inhibited by etomoxir). This independently validates the central finding of the current manuscript, i.e., red light boosts lipid metabolism, strengthening the robustness of this concept.
 
 While the current manuscript focuses on the SIRT4-MCD axis, Herrera et al. identify AMPK phosphorylation and ACC inhibition as key effectors. The authors can integrate and expand their discussion, since SIRT4 downregulation may converge on AMPK activation, or they may represent parallel, reinforcing mechanisms. This would enrich the mechanistic model and open new hypotheses.
 
 The mechanism of photobiomodulation: Herrera et al. explicitly challenge the prevailing paradigm that red light acts solely via cytochrome c oxidase (by showing long-lasting effects, unchanged OXPHOS protein levels, and no difference in permeabilised cells). The current finding (red light acts through SIRT4 downregulation, i.e., not direct enzymatic activation) aligns perfectly with Herrera´s critique.
 
 Long-term metabolic effects-Herrera et al. show that a single red light exposure elevates oxygen consumption for up to 2 days. The current study focuses on changes at 12-24 h. Their data extend the time window and suggest that the metabolic reprogramming you describe may persist longer than currently discussed, which is clinically relevant.
 
 Discussing Herrera et al.'s results would not only acknowledge independent, corroborating evidence but would also allow the authors to position their SIRT4-centric mechanism within a broader, emerging understanding of red-light photobiomodulation.
 
 We would like to thank the reviewer for providing us with constructive suggestions for discussion. Our results showed that under red light conditions, both glycolipid and lipid metabolism were activated in keratinocytes, and cellular metabolic flux increased. The activation of lipid metabolism directly led to an increase in metabolism-associated H3K9ac and drove the upregulation of anti-aging-related genes; we believe this is key to the anti-aging effects of red light. Mechanistic analysis combining proteomics and acetylation proteomics revealed that red light significantly downregulated SIRT4 expression and increased the acetylation of MCD, a protein regulated by SIRT4 that governs cellular fatty acid oxidation rates. Through validation using cell-level knockdown and inhibitors, we confirmed that SIRT4 inhibition exerts anti-aging effects in vitro and that inhibiting MCD function under red light conditions suppresses H3K9ac. These results establish the role of the SIRT4-MCD signalling axis in mediating the anti-aging effects of red light.
 
 The study by Herrera et al. included a substantial body of validation data confirming the role of red light in promoting fatty acid oxidation, providing robust empirical support for our research. Furthermore, Herrera et al. revealed that red light-induced fatty acid oxidation depends on AMPK and ACC phosphorylation. This mechanism of red-light photobiomodulation may refute the notion that its bio-regulatory effects rely solely on the action of mitochondrial cytochrome c oxidase. Furthermore, together with our study revealing that red light exerts anti-aging photobiomodulatory effects via the SIRT4-MCD signalling axis, these findings independently confirm that red light regulates cellular fatty acid oxidation, thereby demonstrating the pivotal role of activated fatty acid oxidation in the bio-regulatory effects of red light. In the revised manuscript, we will include a discussion on the potential link between the red light-driven downregulation of SIRT4 and the phosphorylation of AMPK/ACC. This will be of positive value in elucidating how SIRT4 exerts its anti-aging effects by regulating lipid metabolism, as well as in explaining the possible mechanisms by which red light downregulates SIRT4.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.04.07.717004v1
www.biorxiv.org www.biorxiv.org

ARHGEF6-dependent cytoskeletal regulation underlies a conserved program of forebrain interneuron development

5
1. Public_Reviews 29 May 2026
  
  in eLife
  
  eLife Assessment
  
  The study presents valuable findings regarding the impact of ARHGEF6 deletion, a RhoGTPase regulator linked to X-linked intellectual disability (XLID46), in the development of interneurons. The evidence supporting the observed cellular and developmental phenotypes collected in both mouse and human iPSC models is convincing, although further work would strengthen the mechanistic interpretation and clarify the specificity of the findings. This work offers new insights into ARHGEF6 function and the potential contribution of its dysfunction to neurodevelopmental disorders.
  
  Summary
2. Public_Reviews 29 May 2026
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The manuscript has several strengths, including a technically comprehensive approach that combines mouse genetics, electrophysiology, live imaging in assembloids, and human organoid models, providing a rich and multifaceted dataset. Cross-species validation through the parallel use of mouse and human systems strengthens the generality of the observed phenotypes and increases relevance to human neurodevelopment.
  
  Consistent phenotypic observations across systems show that ARHGEF6 loss affects migration, neurite morphology, growth cone structure, and neuronal survival, supporting a coherent role in cytoskeletal regulation.
  
  There is clear evidence for developmental defects, including reduced interneuron numbers, increased apoptosis in the ganglionic eminences, and migration deficits, all well supported by quantitative analyses. Also, there is a high-quality electrophysiological characterization that demonstrates reduced firing in interneurons, providing a well-controlled functional phenotype.
  
  Strengths:
  
  The manuscript has several strengths, including a technically comprehensive approach that combines mouse genetics, electrophysiology, live imaging in assembloids, and human organoid models, providing a rich and multifaceted dataset. Cross-species validation through the parallel use of mouse and human systems strengthens the generality of the observed phenotypes and increases relevance to human neurodevelopment.
  
  Consistent phenotypic observations across systems show that ARHGEF6 loss affects migration, neurite morphology, growth cone structure, and neuronal survival, supporting a coherent role in cytoskeletal regulation.
  
  There is clear evidence for developmental defects, including reduced interneuron numbers, increased apoptosis in the ganglionic eminences, and migration deficits, all well supported by quantitative analyses. Also, there is a high-quality electrophysiological characterization that demonstrates reduced firing in interneurons, providing a well-controlled functional phenotype.
  
  Weaknesses:
  
  Despite the strengths mentioned above, the study has some conceptual and experimental weaknesses that reduce its impact. The mechanistic insight is limited, as the research does not directly establish how ARHGEF6 regulates downstream signaling pathways.
  
  Also, there is insufficient evidence for interneuron specificity; although the central claim is that ARHGEF6 plays a selective role in interneurons, the data do not adequately exclude the possibility that the observed effects reflect broader neuronal defects. The study lacks critical controls across cell types, as several phenotypes observed in organoids and progenitors, including apoptosis, reduced neuronal output, and altered morphology, could also affect multiple neuronal populations without being directly tested. Furthermore, the data are predominantly descriptive, with many results remaining correlative and failing to establish causal relationships.
  
  Some more comments:
  
  (1) Given that ARHGEF6 is a guanine nucleotide exchange factor for Rac1 and Cdc42, the absence of direct measurements of GTPase activity or downstream signaling represents a significant gap. The interpretation that the observed phenotypes are mediated through specific cytoskeletal pathways, therefore, remains inferential.
  
  (2) The manuscript repeatedly interprets the findings as interneuron-specific. However, several key observations are not demonstrated to be restricted to IN. Without direct comparison to excitatory neurons or other cell types, it is difficult to conclude that ARHGEF6 plays a selective role in interneurons rather than a more general role in neuronal development. The well-done analysis of the transcriptomic dataset is not sufficient to claim IN specificity. This issue is particularly important for the interpretation of the human organoid experiments, where reductions in SOX2⁺ progenitors and NEUN⁺ neurons, as well as increased apoptosis, could reflect global developmental defects. Similarly, in the mouse experiments, the reduction in GAD67⁺ cells is compelling, but it is not shown whether other neuronal populations are also affected.
  
  (3) The study provides a strong phenotypic description but limited causal resolution. For example, migration defects, altered growth cone morphology, and reduced branching are all consistent with impaired cytoskeletal regulation, but the links between these phenotypes are not directly established. Likewise, while the electrophysiological data convincingly show reduced firing in interneurons, the connection between altered cytoskeletal dynamics and intrinsic excitability is not explored.
  
  (4) Several aspects of data presentation could be improved. In multiple figures (e.g., Figure 1A, D; Figure 4 and Video S1, 2), the images are difficult to interpret due to high cellular density, limited magnification, or lack of clear annotation. In some cases, it is not fully clear how quantifications were performed or which regions were analyzed. Improving the visual clarity with arrows, boxes, and high-magnification inserts of the data would strengthen confidence in the conclusions.
  
  Review 1
3. Public_Reviews 29 May 2026
  
  in eLife
  
  Reviewer #2 (Public review):
  
  The authors investigate the impact of the deletion of the small GTPase regulator ARHGEF6 on the development and physiology of interneurons. Using public databases, they first show that ARHGEF6 is enriched in interneurons or in areas that give rise to them, both in development and adulthood, in humans and mice. Using a complete KO mouse previously reported, and using a GAD67-GFP reporter mice line, they show that in the adult mouse cortex and hippocampus, there is a notorious reduction GFP+ cells. These mice show increased apoptotic cells at different timepoints and areas of the brain during development. In the developing cortex of ARHGEF6-KO mice, there are fewer IN in all layers of the developing cortex, and cells present processes not correctly oriented. IN from the hippocampus in culture show reduced excitability and impaired neurite branching. The authors then established isogenic hiPSCs lines to study ARHGEF6 deletion in human cells and differentiated ventral forebrain neurons, to find interneuron-related and non-related phenotypes. Most importantly, human interneurons grown in organoids show reduced branching and altered growth cone morphology. The authors claim that the novel interneuron phenotypes found in these models can explain, in part, the human intellectual disabilities associated with mutations in this protein. The study is well conducted and opens new avenues of research not only for the role of small GTPases regulation in early nervous system development, but also for how interneuron deficiencies impact a wider range of intellectual disability syndromes found in humans.
  
  However, most conclusions of the present version would be strengthened after considering the following comments:
  
  Major comments
  
  (1) The reported biological processes evaluated at different developmental stages may be directly or indirectly related to ARHGEF6 function itself. As a model of a hereditary disease, full organism gene deletion is valid, since the human patients suffer from that condition as well. However, to investigate the roles of a protein, complete deletions may not be very accurate since they can give rise to phenotypes that are only indirectly related to the protein function itself. Most conclusions of the present manuscript should either be discussed in this regard or add evidence for a direct role of the protein. One such evidence is typically performed with acute knockdowns in culture, or in developing brains by in utero electroporation. For example, Figure 1C shows that the principal excitatory neurons in the hippocampus do not express ARHGEF6. However, most electrophysiological and behavioral evidence of defects in ARHGEF6-KO mice arises from evaluating these cells (Remakers et al., 2012). I am not suggesting that either previous or actual evidence is wrong. But I believe readers would benefit from a clear distinction (or add caution notes) between a functional consequence of the deletion (that can be months away and in other cells than the actual molecular defect) and a true cell biological function of the protein under study. In favor of the authors, this is a concern with most conclusions derived from KO organisms.
  
  (2) Figure 1E-G H I. All conclusions are made with a GAD67-GFP reporter, which is a very powerful and reliable tool for large-scale screening. All the conclusions of the paper would be strengthened if some immunohistochemical staining in the same areas of specific markers for interneurons would be added as supporting complementary evidence.
  
  (3) Cell death in development: It is surprising that the high amount of TUNEL staining during development does not translate into gross histological changes in the adult brain (studied elsewhere). Can authors discuss possible explanations?
  
  (4) Section 4 (Figures 2F-J) - The authors present this staining as an analysis of migration. Normally, migration studies are performed with a "pulse-chase" paradigm, where a single cohort is labeled and then followed over time (normally by in utero electroporation of a fluorescent protein). Tissue is then fixed at different time points, and migration can be followed. On the contrary, the evidence is from a single point, in an experimental setting in which all Gad67 IN are stained, and hence, one cannot imply a defect in migration. The differences between WT and ARHGEF6-KO are obvious and interesting; it is just that they cannot be solely attributed to a problem in migration.
  
  Also, a true phenotype of migration in the current setting should have found that the cells that failed to migrate are accumulated in deeper layers. My impression is that the changes in IN per layer are easier explained by total cell number, rather than migration. Perhaps evaluating earlier timepoints could clarify this.
  
  (5) It is known that ARHGEF6 deletion produces severe F-actin phenotypes in neurons. Have the authors confirmed in their hippocampal cultures GAD67 cells ALSO have these phenotypes? Stress fibers in somas, growth cones, and actin patches along neurites.
  
  (6) Section 4. The authors present data for deficient migration of the GFP-labeled interneurons. Is it possible to assess, in the same sections, whether other cell types are also affected? Although the hypothesis that ARHGEF6 deletion will have an impact in IN is well rooted in expression data, by assessing other cell types, one can even include a positive control or evidence for a cell-autonomous phenotype.
  
  (7) ARHGEDF6 deletion has an important impact on organoid development (size, shape, etc). Have the authors analysed whether these organoids produced fewer interneurons?
  
  (8) In assembloids, the differences in migration parameters are very small between WT and ARHGEF6-KO, which reinforces that perhaps what is observed in the different layers of cortex during mouse development is likely not entirely due to migration, as concluded.
  
  (9) To properly weigh the present evidence -interneuron deficits- using the ARHGEF6-KO model, authors should include a deeper discussion in light of much work that has been done using these mice. How does the finding of a diminished IN population in the brain of these mice explain the large amount of electrophysiological and behavioral evidence produced before with these animals? Perhaps the most important work to discuss these aspects is the initial ARHGEF6-KO report by Ramakers and colleagues (2012), but there are others.
  
  Minor comments
  
  (1) Figure 1A. It looks clear that the GE shows the highest expression of ARHGEF6; however, the reader needs the reference levels where the log2 expression is calculated. What are the reference levels?
  
  (2) Have the authors compared the number of GAD67-eGFP cells in the hippocampal cultures between WT and ARHGEF6-KO mice?
  
  (3) Section 3, as a caution note, authors should mention that it is not possible to know from the evidence provided which cells are dying.
  
  (4) In the dorsal-ventral assembloids, it is expected that the ventral organoid would contain lots of GFP expression compared to the dorsal, but in the image shown (Figure 5A) both parts of the assembloid seem to have the same amount and distribution of GFP. How is that possible?
  
  Review 2
4. Public_Reviews 29 May 2026
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  ARHGEF6 is a RAC1/CDC42 guanine nucleotide exchange factor that has been proposed to be associated with X-linked intellectual disability, but its relevance to the pathology is not well established. ARHGEF6 has been assigned a role in spine density and plasticity of hippocampal pyramidal neurons, but nothing is known about its role in interneuron development. Here, the authors show that ARHGEF6 is expressed early in development in the inhibitory lineage during the peak of interneuron generation and migration. The aim of the study is therefore to investigate whether, in addition to its role in pyramidal neurons, ARHGEF6 could play a role in inhibitory neuron development. Using both ARHGEF6-KO mice and organoids from ARHGEF6-KO hiPSCs, the authors show that ARHGEF6 plays a critical role in interneuron development and function
  
  Strengths:
  
  The major strength of the paper is the very detailed analysis of the role of ARHGEF6 using two different systems: ARHGEF6-KO mice and deletion of ARHGEF6 in human iPSC-derived organoids. Strikingly, deletion of ARHGEF6 in both systems induces similar defects such as an increase in apoptosis, reduced neuronal output, impaired neuronal morphology, and disrupted migratory dynamics. This compelling evidence demonstrates that ARHGEF6, in addition to its already well-described role in spine formation and plasticity, is playing a crucial role during embryonic development through its function in interneurons.
  
  Weaknesses:
  
  (1) In Figure 1, the authors show that ARHGEF6 is expressed in different regions of the brain, including the interneuron lineage, and that depletion of ARHGEF6 reduces the number of GABAergic neurons in the adult cortex and hippocampus. To try to better characterize this defect, the authors in Figure 2 investigate whether deletion of ARHGEF6 affects interneuron migration and survival during embryonic development. To do so, ARHGEF6 ko mice were crossed with the GAD67-eGFP reporter line to follow the inhibitory lineage. The authors analyse apoptosis using TUNEL staining, and show that it is significantly increased in the ganglion eminence of ARHGEF6-KO E14.5 embryos. The authors claim that this is not the case in the cortex. However, the image shown in Figure 2A really suggests that staining is increased. Which part of the neocortex is analysed for quantification? This should be clarified.
  
  (2) In Figure 2F-J, the authors investigate the migration of interneurons by analysing the GAD67-eGFP staining, and clearly show that the migratory abilities of the depleted neurons are reduced. However, the authors do not discuss the fact that, because depletion of ARHGEF6 increases apoptosis, there are fewer neurons available for migration. This is important for the interpretation of the data. This point should be clarified.
  
  (3) In Supplementary Figure S2, the authors describe the establishment of the ARHGEF6-KO human iPSC line and test the ability of these cells to undergo correct development, especially for the generation of neural progenitor cells. I was wondering why the authors do not present the data of both control and ARHGEF6-KO cells.
  
  (4) At the molecular level, how ARHGEF6 depletion could affect neuronal survival is missing. In addition, as ARHGEF6 is a GEF for RAC1 and Cdc42 amongst other GEFs, I would have expected that the authors test how RAC1 activity (and Cdc42) is affected in ARHGEF6-depleted brains and in ARHGEF6-KO organoids. The measure of phalloidin staining and the anisotropy index are not really meaningful.
  
  (5) The authors show that ARHGEF6-KO forebrain organoids were markedly smaller compared to their isogenic controls, and their study suggests that ARHGEF6 expression impacts progenitor maintenance and neurogenesis. Despite representing only a minority of the total neuronal population, I was wondering whether ARHGEF6-KO mice present brain morphology defects such as microcephaly.
  
  Review 3
5. Public_Reviews 29 May 2026
  
  in eLife
  
  Author response:
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The manuscript has several strengths, including a technically comprehensive approach that combines mouse genetics, electrophysiology, live imaging in assembloids, and human organoid models, providing a rich and multifaceted dataset. Cross-species validation through the parallel use of mouse and human systems strengthens the generality of the observed phenotypes and increases relevance to human neurodevelopment.
  
  Consistent phenotypic observations across systems show that ARHGEF6 loss affects migration, neurite morphology, growth cone structure, and neuronal survival, supporting a coherent role in cytoskeletal regulation.
  
  There is clear evidence for developmental defects, including reduced interneuron numbers, increased apoptosis in the ganglionic eminences, and migration deficits, all well supported by quantitative analyses. Also, there is a high-quality electrophysiological characterization that demonstrates reduced firing in interneurons, providing a well-controlled functional phenotype.
  
  Strengths:
  
  The manuscript has several strengths, including a technically comprehensive approach that combines mouse genetics, electrophysiology, live imaging in assembloids, and human organoid models, providing a rich and multifaceted dataset. Cross-species validation through the parallel use of mouse and human systems strengthens the generality of the observed phenotypes and increases relevance to human neurodevelopment.
  
  Consistent phenotypic observations across systems show that ARHGEF6 loss affects migration, neurite morphology, growth cone structure, and neuronal survival, supporting a coherent role in cytoskeletal regulation.
  
  There is clear evidence for developmental defects, including reduced interneuron numbers, increased apoptosis in the ganglionic eminences, and migration deficits, all well supported by quantitative analyses. Also, there is a high-quality electrophysiological characterization that demonstrates reduced firing in interneurons, providing a well-controlled functional phenotype.
  
  We thank the reviewer for their positive and thoughtful assessment of our manuscript. We appreciate their recognition of the technical breadth of the study, including the integration of mouse genetics, electrophysiology, live imaging in assembloids, and human organoid models. We are also grateful that the reviewer highlights the value of our cross-species approach, as a major goal of the study was to determine whether ARHGEF6 loss produces convergent developmental and cellular phenotypes in both mouse and human systems.
  
  Weaknesses:
  
  Despite the strengths mentioned above, the study has some conceptual and experimental weaknesses that reduce its impact. The mechanistic insight is limited, as the research does not directly establish how ARHGEF6 regulates downstream signaling pathways.
  
  We appreciate the reviewer’s constructive comment. We agree that, although our data establish a phenotypic link between ARHGEF6 loss and interneuron development, they do not directly dissect the molecular mechanisms underlying the observed defects. Our interpretation that the mutant phenotype involves dysregulation of cytoskeletal dynamics is based on the directly observed defects in actin polymerization and organization in neural progenitor cells and neuronal growth cones respectively, and is consistent with the abnormalities observed in neurite morphology and neuronal migration. This interpretation is further supported by the established role of Arhgef6 as a regulator of the small Rho GTPases Rac1 and Cdc42. Previous evidence shows that Arhgef6 loss reduces the activity of both GTPases and deregulates the expression of the cytoskeletal regulators Pak1–3, Limk1, and Cofilin in the mouse brain (Ramakers et al., 2012). Moreover, spine abnormalities in Arhgef6-knockdown ex vivo slice cultures can be rescued by expressing the active form of Pak3, a downstream effector of Rac1 and Cdc42 (Node-Langlois et al., 2006). Together, these findings support a model in which the loss of the protein affects development through cytoskeletal dysregulation, likely involving altered Rho GTPase signalling. We nevertheless agree that further experiments would be required to establish a direct causal relationship between ARHGEF6 loss, Rho GTPase activity, cytoskeletal dysregulation, and the interneuron phenotypes described here. We will therefore revise the manuscript to clarify that this mechanistic link remains an interpretation supported by our data and the literature, rather than a direct demonstration within the present study.
  
  Also, there is insufficient evidence for interneuron specificity; although the central claim is that ARHGEF6 plays a selective role in interneurons, the data do not adequately exclude the possibility that the observed effects reflect broader neuronal defects. The study lacks critical controls across cell types, as several phenotypes observed in organoids and progenitors, including apoptosis, reduced neuronal output, and altered morphology, could also affect multiple neuronal populations without being directly tested.
  
  We agree that the current data do not exclude the possibility of alterations in other neuronal lineages, specifically the excitatory lineage. With regard to this, we would like to emphasize that the investigation of excitatory cell phenotypes was beyond the scope of the present study, as this aspect has previously been examined by Ramakers et al., 2012 and Node-Langlois et al., 2006, particularly in the context of hippocampal pyramidal cells, which are among the few cell types showing consistent expression of the gene in the adult mouse brain (Allen Brain Atlas; Yao et al., 2021). In this context, it is interesting to note that, in Ramakers et al., 2012 (Figure S1), MAP2 immunostaining of hippocampal formations revealed comparable distribution and intensity of neuronal cell bodies and dendrites throughout the hippocampus of both wild-type and Arhgef6-KO animals. With regard to morphological maturation of excitatory cells, whereas we observe a simplification of interneuron morphology in both mouse and human models, Ramakers et al., 2012 reported increased dendritic arborization complexity in hippocampal pyramidal cells. With regard to migration, a direct comparison with excitatory neurons would be intrinsically difficult, as excitatory and inhibitory neurons undergo highly distinct migratory processes and are therefore not directly comparable. We greatly appreciate the reviewer’s comment, as it gives us the opportunity to better discuss the relationship between our findings and previous studies in the Discussion. We will revise the manuscript and avoid implying that the phenotype observed is exclusive to interneurons.
  
  Furthermore, the data are predominantly descriptive, with many results remaining correlative and failing to establish causal relationships.
  
  We agree that our study primarily establishes a phenotypic framework and does not fully resolve the causal hierarchy among altered survival, migration, cytoskeletal morphology, and intrinsic excitability. We will revise the manuscript to make this limitation explicit, avoiding statements that imply direct causality beyond the data presented.
  
  Some more comments:
  
  (1) Given that ARHGEF6 is a guanine nucleotide exchange factor for Rac1 and Cdc42, the absence of direct measurements of GTPase activity or downstream signaling represents a significant gap. The interpretation that the observed phenotypes are mediated through specific cytoskeletal pathways, therefore, remains inferential.
  
  We appreciate the comment. The interpretation that our phenotype involves dysregulated cytoskeletal dynamics is based on the observed defects in actin polymerization and F-actin organization in neuronal growth cones and is consistent with the abnormalities in neurite morphology and neuronal migration. We will explicitly state in the Discussion that, since we did not directly measure Rac1 and Cdc42 activity levels in our models, our hypothesis regarding the involvement of this molecular pathway in the establishment of the observed phenotype therefore remains inferential, despite being supported by the current literature.
  
  (2) The manuscript repeatedly interprets the findings as interneuron-specific. However, several key observations are not demonstrated to be restricted to IN. Without direct comparison to excitatory neurons or other cell types, it is difficult to conclude that ARHGEF6 plays a selective role in interneurons rather than a more general role in neuronal development. The well-done analysis of the transcriptomic dataset is not sufficient to claim IN specificity. This issue is particularly important for the interpretation of the human organoid experiments, where reductions in SOX2⁺ progenitors and NEUN⁺ neurons, as well as increased apoptosis, could reflect global developmental defects. Similarly, in the mouse experiments, the reduction in GAD67⁺ cells is compelling, but it is not shown whether other neuronal populations are also affected.
  
  As previously mentioned, we understand the reviewer’s concern regarding the specificity of the observed phenotypes in interneurons and agree that the claims should be tempered. However, it is important to note that the interpretation of the human organoid experiments should be reconsidered. The use of specifically ventralized MGE-like organoids allowed us to assess the cell-autonomous nature of defects such as the reduction in inhibitory progenitors’ neuronal output, the increased apoptosis, and the morphological abnormalities of inhibitory neurons. We will acknowledge in the Discussion the limitations of the study with regard to assessing the cell-autonomous nature of the observed migration defects.
  
  (3) The study provides a strong phenotypic description but limited causal resolution. For example, migration defects, altered growth cone morphology, and reduced branching are all consistent with impaired cytoskeletal regulation, but the links between these phenotypes are not directly established. Likewise, while the electrophysiological data convincingly show reduced firing in interneurons, the connection between altered cytoskeletal dynamics and intrinsic excitability is not explored.
  
  The observed migration defects, altered growth-cone morphology, and reduced branching are consistent with impaired cytoskeletal regulation. However, we acknowledge that the mechanistic links among these phenotypes remain to be directly demonstrated. Similarly, although our electrophysiological data show reduced firing in ARHGEF6-KO interneurons, the present study does not provide direct evidence linking impaired excitability to altered cytoskeletal dynamics. In the latter case, we think that the underlying mechanisms should be further investigated at the subcellular level, particularly with respect to cytoskeleton-mediated intracellular trafficking and localization and distribution of ion channels. One limitation of the present study, which may have masked electrophysiological alterations associated with differences in membrane composition (current Figure S1D–H), is that different interneuron subtypes with distinct intrinsic properties were pooled together in the analysis. We will expand the Discussion to address these limitations.
  
  (4) Several aspects of data presentation could be improved. In multiple figures (e.g., Figure 1A, D; Figure 4 and Video S1, 2), the images are difficult to interpret due to high cellular density, limited magnification, or lack of clear annotation. In some cases, it is not fully clear how quantifications were performed or which regions were analyzed. Improving the visual clarity with arrows, boxes, and high-magnification inserts of the data would strengthen confidence in the conclusions.
  
  We would like to thank the reviewer for pointing this out. We agree that some images and videos would benefit from clearer annotation. In the revised manuscript, we will add high-magnification insets, arrows or boxes highlighting the relevant regions/cells, and clearer descriptions of the quantified regions. We will also improve legends and video labels to indicate genotype, region, and tracked cells.
  
  Reviewer #2 (Public review):
  
  The authors investigate the impact of the deletion of the small GTPase regulator ARHGEF6 on the development and physiology of interneurons. Using public databases, they first show that ARHGEF6 is enriched in interneurons or in areas that give rise to them, both in development and adulthood, in humans and mice. Using a complete KO mouse previously reported, and using a GAD67-GFP reporter mice line, they show that in the adult mouse cortex and hippocampus, there is a notorious reduction GFP+ cells. These mice show increased apoptotic cells at different timepoints and areas of the brain during development. In the developing cortex of ARHGEF6-KO mice, there are fewer IN in all layers of the developing cortex, and cells present processes not correctly oriented. IN from the hippocampus in culture show reduced excitability and impaired neurite branching. The authors then established isogenic hiPSCs lines to study ARHGEF6 deletion in human cells and differentiated ventral forebrain neurons, to find interneuron-related and non-related phenotypes. Most importantly, human interneurons grown in organoids show reduced branching and altered growth cone morphology. The authors claim that the novel interneuron phenotypes found in these models can explain, in part, the human intellectual disabilities associated with mutations in this protein. The study is well conducted and opens new avenues of research not only for the role of small GTPases regulation in early nervous system development, but also for how interneuron deficiencies impact a wider range of intellectual disability syndromes found in humans.
  
  We appreciate the reviewer’s positive evaluation of our manuscript and their recognition of this work’s potential to expand the focus of intellectual disability research on the development and function of the inhibitory system. We are particularly encouraged that the reviewer highlights the strength of our combined mouse and human cellular models, as well as the relevance of the interneuron-related phenotypes we identify across systems.
  
  However, most conclusions of the present version would be strengthened after considering the following comments:
  
  Major comments:
  
  (1) The reported biological processes evaluated at different developmental stages may be directly or indirectly related to ARHGEF6 function itself. As a model of a hereditary disease, full organism gene deletion is valid, since the human patients suffer from that condition as well. However, to investigate the roles of a protein, complete deletions may not be very accurate since they can give rise to phenotypes that are only indirectly related to the protein function itself. Most conclusions of the present manuscript should either be discussed in this regard or add evidence for a direct role of the protein. One such evidence is typically performed with acute knockdowns in culture, or in developing brains by in utero electroporation. For example, Figure 1C shows that the principal excitatory neurons in the hippocampus do not express ARHGEF6. However, most electrophysiological and behavioral evidence of defects in ARHGEF6-KO mice arises from evaluating these cells (Ramakers et al., 2012). I am not suggesting that either previous or actual evidence is wrong. But I believe readers would benefit from a clear distinction (or add caution notes) between a functional consequence of the deletion (that can be months away and in other cells than the actual molecular defect) and a true cell biological function of the protein under study. In favor of the authors, this is a concern with most conclusions derived from KO organisms.
  
  We agree with the reviewer that phenotypes observed in constitutive knockout models may, in some contexts, reflect indirect or compensatory consequences of long-term gene loss. Conditional and/or inducible knockout or knockdown approaches can certainly help dissect the nature of the observed defects and better define the effects of gene ablation at different developmental stages or in specific cell types. However, in the context of our study, it is important to note that the experiments performed in ventralized MGE-like organoids allowed us to assess the cell-autonomous nature of very early developmental defects in the inhibitory lineage, in isolation from other cell types. These defects include reduced neuronal output from inhibitory progenitors, increased apoptosis, and morphological abnormalities in inhibitory neurons. Therefore, the phenotypes reported here are less likely to reflect effects originating in, or indirectly caused by, cell types that do not express Arhgef6.
  
  With regard to Figure 1C, we state in the Results that “among excitatory populations, only CA3 pyramidal neurons and mossy cells exhibited expression levels comparable to those observed in inhibitory clusters (Figure 1D, Table S2),” thereby not neglecting the potential effect of the lack of a functional protein in these populations.
  
  (2) Figure 1E-G H I. All conclusions are made with a GAD67-GFP reporter, which is a very powerful and reliable tool for large-scale screening. All the conclusions of the paper would be strengthened if some immunohistochemical staining in the same areas of specific markers for interneurons would be added as supporting complementary evidence.
  
  We appreciate the insightful comment of the reviewer. Additional validation using established interneuronal markers will further strengthen the GAD67-eGFP analysis. We will perform complementary stainings (e.g., PVALB and CCK) and quantifications and include these data as a Supplementary Figure.
  
  (3) Cell death in development: It is surprising that the high amount of TUNEL staining during development does not translate into gross histological changes in the adult brain (studied elsewhere). Can authors discuss possible explanations?
  
  We appreciate the thoughtful consideration of our findings. We think that possible explanations include partial compensatory mechanisms during development, which may mitigate the long-term anatomical consequences of increased cell death. In addition, the phenotype may be restricted to specific neuronal populations or developmental windows, thereby producing functional alterations without necessarily resulting in overt macroanatomical defects. Thus, although increased developmental cell death may contribute to altered circuit assembly and neuronal output, it may not be sufficient to produce gross histological changes detectable at the adult brain level.
  
  (4) Section 4 (Figures 2F-J) - The authors present this staining as an analysis of migration. Normally, migration studies are performed with a "pulse-chase" paradigm, where a single cohort is labeled and then followed over time (normally by in utero electroporation of a fluorescent protein). Tissue is then fixed at different time points, and migration can be followed. On the contrary, the evidence is from a single point, in an experimental setting in which all Gad67 IN are stained, and hence, one cannot imply a defect in migration. The differences between WT and ARHGEF6-KO are obvious and interesting; it is just that they cannot be solely attributed to a problem in migration.
  
  Also, a true phenotype of migration in the current setting should have found that the cells that failed to migrate are accumulated in deeper layers. My impression is that the changes in IN per layer are easier explained by total cell number, rather than migration. Perhaps evaluating earlier timepoints could clarify this.
  
  We appreciate the reviewer’s suggestion to implement an additional time point in the in vivo migration analysis. Since an earlier in vivo time point would most likely not reveal migration-related defects, as most cells would still be confined to the ganglionic eminence (Liaci et al., 2022), we will include analyses performed at a later developmental time point as supplementary evidence. We will also revise the wording to clarify that the fixed-tissue data show altered distribution and orientation of GAD67-eGFP-positive interneurons, which are consistent with impaired migratory behavior when considered together with the in vitro live-imaging data. At the same time, we will acknowledge that reduced interneuron survival and/or neuronal output may also contribute to the observed phenotype.
  
  (5) It is known that ARHGEF6 deletion produces severe F-actin phenotypes in neurons. Have the authors confirmed in their hippocampal cultures GAD67 cells ALSO have these phenotypes? Stress fibers in somas, growth cones, and actin patches along neurites.
  
  We did not directly assess F-actin organization in GAD67-eGFP murine primary cultures. Direct analyses of F-actin organization, growth-cone morphology, and cytoskeletal organization were performed only in the human system. To further assess this phenotype, we will perform phalloidin staining on GAD67-eGFP brain sections to evaluate F-actin organization in interneurons in vivo.
  
  (6) Section 4. The authors present data for deficient migration of the GFP-labeled interneurons. Is it possible to assess, in the same sections, whether other cell types are also affected? Although the hypothesis that ARHGEF6 deletion will have an impact in IN is well rooted in expression data, by assessing other cell types, one can even include a positive control or evidence for a cell-autonomous phenotype.
  
  We thank the reviewer for their thoughtful suggestions. We agree that extending the analysis to additional cell types would provide further insight into the specificity of the phenotype; however, a comprehensive evaluation of all neuronal populations falls beyond the scope of this research. The use of ventralized MGE-like organoids enabled us to examine whether key defects were cell-autonomous, including the reduced neuronal output of inhibitory progenitors, increased apoptosis, and abnormal inhibitory-neuron morphology.
  
  (7) ARHGEDF6 deletion has an important impact on organoid development (size, shape, etc). Have the authors analysed whether these organoids produced fewer interneurons?
  
  We would like to clarify that the organoids analyzed in the study are ventral MGE-like organoids and therefore the reduction in neuronal output (current Figure 4K) primarily reflects the ventral/interneuron lineage in this model.
  
  (8) In assembloids, the differences in migration parameters are very small between WT and ARHGEF6-KO, which reinforces that perhaps what is observed in the different layers of cortex during mouse development is likely not entirely due to migration, as concluded.
  
  We agree that the migration parameters in assembloids should not be interpreted in isolation. We will revise the text to emphasize that the reduction in the number of interneurons observed in the adult brains is part of a broader pattern that also includes altered neuronal output and reduced viability.
  
  (9) To properly weigh the present evidence -interneuron deficits- using the ARHGEF6-KO model, authors should include a deeper discussion in light of much work that has been done using these mice. How does the finding of a diminished IN population in the brain of these mice explain the large amount of electrophysiological and behavioral evidence produced before with these animals? Perhaps the most important work to discuss these aspects is the initial ARHGEF6-KO report by Ramakers and colleagues (2012), but there are others.
  
  We appreciate the reviewer’s emphasis on the importance of framing our findings within the broader context of the existing literature. We will expand the Discussion to better integrate previous work on ARHGEF6-KO mice. Specifically, we will discuss how reduced interneuron number and altered interneuronal function may contribute to previously reported electrophysiological and behavioral phenotypes, acting in concert with previously described alterations in excitatory neurons and synaptic plasticity (Ramakers et al., 2012).
  
  Minor comments:
  
  (1) Figure 1A. It looks clear that the GE shows the highest expression of ARHGEF6; however, the reader needs the reference levels where the log2 expression is calculated. What are the reference levels?
  
  We would like to thank the reviewer for pointing this out. We will clarify in the caption that the log2(RPKM+1) expression values are shown as absolute values and are not relative to a reference condition.
  
  (2) Have the authors compared the number of GAD67-eGFP cells in the hippocampal cultures between WT and ARHGEF6-KO mice?
  
  We did not rely on total GAD67-eGFP counts in dissociated hippocampal cultures because differences could reflect initial plating composition, survival, and maturation. In our experience, the MGE-like organoid system provides a more controlled in vitro context to assess neuronal output in the ventral lineage.
  
  (3) Section 3, as a caution note, authors should mention that it is not possible to know from the evidence provided which cells are dying.
  
  We agree with the reviewer and will add a cautionary statement noting that TUNEL staining alone does not identify the precise dying cell type. We will clarify that increased cell death in the ganglionic eminence and MGE-like organoids is consistent with a prominent involvement of the ventral/inhibitory lineage, while acknowledging the limits of the assay.
  
  (4) In the dorsal-ventral assembloids, it is expected that the ventral organoid would contain lots of GFP expression compared to the dorsal, but in the image shown (Figure 5A) both parts of the assembloid seem to have the same amount and distribution of GFP. How is that possible?
  
  We appreciate the thoughtful comment of the reviewer. After two weeks of fusion, a considerable number of interneurons are expected to have migrated from the ventral to the dorsal compartment of the assembloid (Birey et al., 2017; Sloan et al., 2018). In terms of distribution, we think that current Figure 5A shows a gradient of eGFP-positive cells within the dorsal compartment, with the number of labeled cells decreasing as the distance from the fusion interface between the two organoids increases. By contrast, a comparable gradient is not evident in the ventral compartment, where several labeled neurons remain present even in regions distal to the fusion site.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  ARHGEF6 is a RAC1/CDC42 guanine nucleotide exchange factor that has been proposed to be associated with X-linked intellectual disability, but its relevance to the pathology is not well established. ARHGEF6 has been assigned a role in spine density and plasticity of hippocampal pyramidal neurons, but nothing is known about its role in interneuron development. Here, the authors show that ARHGEF6 is expressed early in development in the inhibitory lineage during the peak of interneuron generation and migration. The aim of the study is therefore to investigate whether, in addition to its role in pyramidal neurons, ARHGEF6 could play a role in inhibitory neuron development. Using both ARHGEF6-KO mice and organoids from ARHGEF6-KO hiPSCs, the authors show that ARHGEF6 plays a critical role in interneuron development and function
  
  Strengths:
  
  The major strength of the paper is the very detailed analysis of the role of ARHGEF6 using two different systems: ARHGEF6-KO mice and deletion of ARHGEF6 in human iPSC-derived organoids. Strikingly, deletion of ARHGEF6 in both systems induces similar defects such as an increase in apoptosis, reduced neuronal output, impaired neuronal morphology, and disrupted migratory dynamics. This compelling evidence demonstrates that ARHGEF6, in addition to its already well-described role in spine formation and plasticity, is playing a crucial role during embryonic development through its function in interneurons.
  
  We thank the reviewer for this positive assessment of our work and for highlighting the strength of our combined in vivo and human iPSC-derived organoid approaches. We are pleased that the reviewer recognizes the consistency of the phenotypes observed across both systems and acknowledges that our findings support a crucial role, during early stages of embryonic development, for a protein previously thought to be relevant primarily in the synaptic context.
  
  Weaknesses:
  
  (1) In Figure 1, the authors show that ARHGEF6 is expressed in different regions of the brain, including the interneuron lineage, and that depletion of ARHGEF6 reduces the number of GABAergic neurons in the adult cortex and hippocampus. To try to better characterize this defect, the authors in Figure 2 investigate whether deletion of ARHGEF6 affects interneuron migration and survival during embryonic development. To do so, ARHGEF6 ko mice were crossed with the GAD67-eGFP reporter line to follow the inhibitory lineage. The authors analyse apoptosis using TUNEL staining, and show that it is significantly increased in the ganglion eminence of ARHGEF6-KO E14.5 embryos. The authors claim that this is not the case in the cortex. However, the image shown in Figure 2A really suggests that staining is increased. Which part of the neocortex is analysed for quantification? This should be clarified.
  
  We would like to thank the reviewer for pointing this out. The region analyzed was the same as that used to assess GAD67-eGFP-positive cells in Figure 2F. We will clarify the exact neocortical region used for TUNEL quantification and revise the figure and legend to make the analyzed area explicit. We will also analyze additional animals to improve the accuracy of the analysis.
  
  (2) In Figure 2F-J, the authors investigate the migration of interneurons by analysing the GAD67-eGFP staining, and clearly show that the migratory abilities of the depleted neurons are reduced. However, the authors do not discuss the fact that, because depletion of ARHGEF6 increases apoptosis, there are fewer neurons available for migration. This is important for the interpretation of the data. This point should be clarified.
  
  We appreciate this comment and believe that it is particularly relevant to the interpretation of the data shown in Figure 2F–G. We will clarify the limited interpretation of this specific analysis in the Results section. The altered directionality observed in vivo, together with evidence of impaired migratory behavior obtained through in vitro live imaging, supports the possibility that altered migratory dynamics contribute to the phenotype, although increased apoptosis and reduced neuronal output may also contribute.
  
  (3) In Supplementary Figure S2, the authors describe the establishment of the ARHGEF6-KO human iPSC line and test the ability of these cells to undergo correct development, especially for the generation of neural progenitor cells. I was wondering why the authors do not present the data of both control and ARHGEF6-KO cells.
  
  We thank the reviewer for pointing this out. All staining reported in the organoids and assembloids in this paper shows that the WT ATCC-DYS0100 cell line, as well as the mutant, efficiently differentiates into neuronal tissue. The Supplementary Figure was intended to validate the impact of the mutation on the ability of the iPSC line to retain its differentiation capacity as a preliminary step before proceeding with organoid differentiation. We will integrate stainings for NPC markers on the WT line in the Supplementary Figure.
  
  (4) At the molecular level, how ARHGEF6 depletion could affect neuronal survival is missing. In addition, as ARHGEF6 is a GEF for RAC1 and Cdc42 amongst other GEFs, I would have expected that the authors test how RAC1 activity (and Cdc42) is affected in ARHGEF6-depleted brains and in ARHGEF6-KO organoids. The measure of phalloidin staining and the anisotropy index are not really meaningful.
  
  We appreciate the thoughtful comment of the reviewer. Previous evidence already shows that Arhgef6 loss reduces the activity of both GTPases and deregulates the expression of the cytoskeletal regulators Pak1–3, Limk1, and Cofilin in the mouse brain (Ramakers et al., 2012). Regarding organoids, we agree that direct RAC1/CDC42 activity measurements would have strengthened the molecular mechanism. We will revise the manuscript to avoid implying that our phalloidin-based measurements alone establish the underlying dysregulated molecular pathway.
  
  (5) The authors show that ARHGEF6-KO forebrain organoids were markedly smaller compared to their isogenic controls, and their study suggests that ARHGEF6 expression impacts progenitor maintenance and neurogenesis. Despite representing only a minority of the total neuronal population, I was wondering whether ARHGEF6-KO mice present brain morphology defects such as microcephaly.
  
  We appreciate the comment. We did not perform a morphometric analysis for microcephaly in the present study. We will add this limitation to the Discussion and note that gross brain morphology changes were not reported in the previously published ARHGEF6-KO mouse characterization (Ramakers et al., 2012). We will also clarify that the smaller organoid phenotype may reflect developmental defects that may reflect developmental defects that are not fully compensated in a reductionist in vitro model and therefore do not necessarily imply overt microcephaly in vivo.
  
  References
  
  Allen Institute for Brain Science. Allen Mouse Brain Atlas: Arhgef6 ISH data. Available from: Allen Brain Map.
  
  Birey, F., Andersen, J., Makinson, C. D., Islam, S., Wei, W., Huber, N., Fan, H. C., Metzler, K. R. C., Panagiotakos, G., Thom, N., O’Rourke, N. A., Steinmetz, L. M., Bernstein, J. A., Hallmayer, J., Huguenard, J. R., & Pașca, S. P. (2017). Assembly of functionally integrated human forebrain spheroids. Nature, 545(7652), 54–59. https://doi.org/10.1038/nature22330
  
  Liaci, C., Camera, M., Zamboni, V., Sarò, G., Ammoni, A., Parmigiani, E., Ponzoni, L., Hidisoglu, E., Chiantia, G., Marcantoni, A., Giustetto, M., Tomagra, G., Carabelli, V., Torelli, F., Sala, M., Yanagawa, Y., Obata, K., Hirsch, E., & Merlo, G. R. (2022). Loss of ARHGAP15 affects the directional control of migrating interneurons in the embryonic cortex and increases susceptibility to epilepsy. Frontiers in Cell and Developmental Biology, 10, 875468. https://doi.org/10.3389/fcell.2022.875468
  
  Nodé-Langlois, R., Muller, D., & Boda, B. (2006). Sequential implication of the mental retardation proteins ARHGEF6 and PAK3 in spine morphogenesis. Journal of Cell Science, 119(23), 4986–4993. https://doi.org/10.1242/jcs.03273
  
  Pelkey, K. A., Chittajallu, R., Craig, M. T., Tricoire, L., Wester, J. C., & McBain, C. J. (2017). Hippocampal GABAergic inhibitory interneurons. Physiological Reviews, 97(4), 1619–1747. https://doi.org/10.1152/physrev.00007.2017
  
  Ramakers, G. J. A., Wolfer, D., Rosenberger, G., Kuchenbecker, K., Kreienkamp, H.-J., Prange-Kiel, J., Rune, G., Richter, K., Langnaese, K., Masneuf, S., Bösl, M. R., Fischer, K.-D., Krugers, H. J., Lipp, H.-P., van Galen, E., & Kutsche, K. (2012). Dysregulation of Rho GTPases in the αPix/Arhgef6 mouse model of X-linked intellectual disability is paralleled by impaired structural and synaptic plasticity and cognitive deficits. Human Molecular Genetics, 21(2), 268–286. https://doi.org/10.1093/hmg/ddr457
  
  Sloan, S. A., Andersen, J., Pașca, A. M., Birey, F., & Pașca, S. P. (2018). Generation and assembly of human brain region-specific three-dimensional cultures. Nature Protocols, 13(9), 2062–2085. https://doi.org/10.1038/s41596-018-0032-7
  
  Yao, Z., Nguyen, T. N., van Velthoven, C. T. J., Goldy, J., Sedeno-Cortes, A. E., Baftizadeh, F., Bertagnolli, D., Casper, T., Chiang, M., Crichton, K., Ding, S.-L., Fong, O., Garren, E., Glandon, A., Gouwens, N. W., Gray, J., Graybuck, L. T., Hawrylycz, M. J., Hirschstein, D., … Zeng, H. (2021). A taxonomy of transcriptomic cell types across the isocortex and hippocampal formation. Cell, 184(12), 3222–3241.e26. https://doi.org/10.1016/j.cell.2021.04.021
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.03.09.710568v2
www.biorxiv.org www.biorxiv.org

Examining Alzheimer's Disease modifiable risk factors: Impact of physical activity and diet on neuroanatomy and behaviour in mouse models

4
1. Public_Reviews 29 May 2026
  
  in eLife
  
  eLife Assessment
  
  This important study examines the effects of diet and exercise on brain structure and behaviour in the 3xTg mouse model of Alzheimer's disease. They show that combined access to a low-fat diet and exercise improves regional brain volume and behaviour in transgenic and wild-type control mice in a sex-specific manner, with analyses linking functional improvements to glucose homeostasis. Although some claims are well supported, the overall strength of the evidence is incomplete and hampered by a lack of clarity regarding the statistical analyses chosen. The work may be of interest to researchers studying neurodegenerative disease, particularly in preclinical contexts.
  
  Summary
2. Public_Reviews 29 May 2026
  
  in eLife
  
  Reviewer #1 (Public review):
  
  A triple-transgenic (3xTgAD) mouse model of Alzheimer's disease was exposed to a high-fat diet and assigned to one of three interventions: voluntary physical activity, a low-fat diet, and their combination. A high-fat diet significantly increased body weight and induced widespread neuroanatomical changes, with effects modulated by sex and genotype. The combined intervention led to significant weight loss in males of both genotypes. Neuroanatomical analyses revealed that a high-fat diet significantly reduced hippocampal and cerebellar volumes in wild-type mice but had a less pronounced effect on 3xTgAD mice; nevertheless, interventions, particularly the combined approach, increased localized brain volumes in these regions regardless of genotype. Spatial gene enrichment analysis of this pattern identified glucose homeostasis. Overall, these findings suggest that voluntary physical activity and a low-fat diet can modulate brain structure and behaviour, partially counteracting the effects of a high-fat diet, and potentially recruiting biological processes that may support brain health.
  
  The authors describe studies of the 3xTg mouse model of Alzheimer's disease (AD). They set out to study the interactions of diet and exercise on three outcomes: weight gain, MRI, and either the novel object recognition or Morris water maze tasks of memory.
  
  They conclude there are sex and genotype effects on hippocampal volume.
  
  There are several strengths to the study. First, they start out with a great deal of mice. Once they are divided into groups, the sample sizes are not always strong, however. It would be good to know that they were sufficiently powered.
  
  The data are also interesting. Mice were placed on several different diets during the study, which will be of interest to many who question the role of diet in outcomes. They also add exercise as an intervention, and study not only diet but also the combined effect of diet and exercise. This is relevant to those interested in controlling dementia by diet and exercise. Finally, they perform some very interesting analyses to study the data.
  
  That said, the study also has several limitations. For example, it is quite complex. Mice had a standard diet until 2 months of age, then were switched to either a low-fat or a high-fat diet. Some mice had both a different diet and exercise. MRI was performed at 2, 4, and 6 months, when behavior was tested. A drawback of this design is that no assessment of outcomes relevant to this animal model, such as amyloid-beta or tau phosphorylation, was conducted. Also, they used the novel object recognition task, despite stating in the Discussion that this task does not show impairments until well after 6 months of age. They added exercise, but it is not clear whether the animals used the exercise apparatus equally. Also, the animals were housed "communally", so adding an exercise wheel may have made the cage crowded, adding stress to the study. The diets were not simply low- or high-fat because many constituents besides fat content also changed. Regarding fat, the type of fat also changed between diets. Therefore, the gut microbiome was probably affected differently by factors other than fat intake. There was no measurement of food consumption, so some mice may not have eaten as much of the new diet as they did of the old diet they were used to.
  
  Regarding the data, only the outcomes of complex analyses are shown. One would first want to see the changes in body weight and perhaps later how it is analyzed in a more complex way. For behavior, one would first want to see outcomes as typically presented. For example, learning, recall, platform test results from the Morris water maze, and discrimination indices for object recognition. Note that, at one point, I believe the authors note that some groups did not explore thoroughly, which would make novel object recognition hard to interpret. If there was any difficulty with ambulation, both tasks would be hard to interpret.
  
  Regarding MRI, from what can be seen, structures cannot be distinguished clearly. At least some raw data should be shown to demonstrate this and to determine what the data show. The raw data suggest that some of the larger structures can be distinguished, and we should see the data for these areas, even if all areas can't be assessed. Lifestyle interventions can mitigate the effects of diet-induced obesity on body weight, behaviour, and brain anatomy in mouse models. Using a longitudinal design, wild-type and triple-transgenic (3xTgAD) mouse models of Alzheimer's disease were exposed to a high-fat diet and assigned to one of three interventions: voluntary physical activity, a low-fat diet, and their combination. A high-fat diet significantly increased body weight and induced widespread neuroanatomical changes, with effects modulated by sex and genotype. The combined intervention led to significant weight loss in males of both genotypes. Neuroanatomical analyses revealed that a high-fat diet significantly reduced hippocampal and cerebellar volumes in wild-type mice but had a less pronounced effect on 3xTgAD mice; nevertheless, interventions, particularly the combined approach, increased localized brain volumes in these regions regardless of genotype. Multivariate integration of behavioural and neuroanatomical measures identified a brain pattern linking hippocampal and cerebellar volumes to intervention and behavioural performance. Spatial gene-enrichment analysis of this pattern identified biological processes, including glucose homeostasis, as potential biological mechanisms underlying intervention effects. Overall, these findings suggest that voluntary physical activity and a low-fat diet can modulate brain structure and behaviour, partially counteracting the effects of a high-fat diet, and potentially recruiting biological processes that may support brain health. In the end, the authors focus primarily on the hippocampus and discuss the cerebellum, but it seems that changes occur throughout the brain. The choice to focus on the hippocampus and cerebellum needs to be supported.
  
  To gain further insight, the authors analyze genes across different brain regions using the Allen Brain Atlas. Although this seems reasonable in theory, once one realizes how many genes are shared across diverse brain regions, one wonders how such an analysis was conducted. More understanding of this approach, as well as how it was validated, is important. In the end, the authors conclude that the glucose homeostatic pathways were primarily altered, and one would like to understand whether that is indeed true and whether it is the only set of pathways that were changed.
  
  This raises another point: what occurs in a normal wild-type mouse on the standard diet during the first 6 months of life? Do the glucose homeostatic pathways change simply due to age? Sex? It may be that, with age, the mice become more sedentary, which is why. Once that is resolved, what occurs on the standard diet for the 3xTg mice? Perhaps they are more active or more sedentary, regardless of diet or exercise? Thus, the studies end up raising more questions than answers.
  
  Given so much work has already been done, it seems best to simply reorganize the presentation with raw data first, followed by the analysis. For the second section, the implicit assumptions of the analyses should be very clear so that the analyzed data are understood and believable. Limitations of the assumptions, pooling some groups, etc., need to be clear.
  
  Figures. In Figure 1, the weekly measurements are not shown. The points are connected, so an unbroken line is shown. Around the line are lighter lines indicating errors, but with all the lines and colours, one does not know what standard errors surround the values for any given group. This makes the data hard to interpret. In later figures, significant differences are indicated with asterisks, but this seems to be done inconsistently.
  
  In the text, more caution is needed for some assertions. For example, it is not clear that a 2- to 6-month-old is an adolescent. Opinions about the ages of mice that correspond to human life stages have always been debated. Another example is indicating that male mice might gain weight differently than females, as if it were an outcome of diet or exercise. This is because male rodents continue to gain weight in adulthood, but females stabilize because estrogen limits appetite. Additionally, females may not show group differences because they are more variable. This can relate to their estrous cycle. If stressed or housed without males nearby, they may not have a regular estrous cycle, which can then affect their outcomes. This may be particularly true for behavior when they may have been tested during different estrous cycle phases, if they had estrous cycles.
  
  Review 1
3. Public_Reviews 29 May 2026
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This manuscript describes an investigation into the effect of diet and exercise interventions in WT and transgenic (male and female) mice who are exposed to either a high-fat or a low-fat diet. The outcome variables include MRI volume and brain morphology, as well as memory performance. First, this study measured the impact of genotype (WT vs 3xTgAD mice), then examined the impact of a high-fat or low-fat diet in each group, and finally examined the impact of a low-fat diet, exercise, or a combined low-fat diet and exercise intervention. This is an important study as it allows us to better understand how changes to lifestyle can affect neurocognitive function and potentially change a person's AD risk.
  
  Strengths:
  
  (1) The study uses a well-controlled longitudinal design, allowing the authors to track how diet and exercise interventions influence brain and behaviour over time.
  
  (2) The integration of multiple levels of analysis (brain imaging, behaviour, and multivariate modelling) provides a rich and comprehensive assessment of intervention effects.
  
  (3) The inclusion of both genotype and sex as key variables strengthens the relevance and interpretability of the findings, given known differences in risk and response across groups.
  
  Weaknesses:
  
  There are a lot of analyses in this paper, and I had a little bit of trouble distilling the major take-home messages. For example, I was left wondering:
  
  (1) If the effect of genotype and the effect of the high-fat diet were consistent in the current study compared to the authors' previous work (e.g. Rollins et al., 2019). A more direct report on the consistency of these findings (maybe even an overlap map, if possible) would benefit the reader.
  
  (2) How consistent/different are the volumetric and morphometric (DBM) results from each other? Especially in the regions of interest (hippocampus and cerebellum), are increases in volumes always related to "expansion" of a given region using DBM? Some of the similarities are reported in the results, but for transparency, a side-by-side table comparing the results across techniques for each effect of interest might provide more clarity.
  
  (3) I was interested in the Partial Least Squares approach that the authors used to investigate how patterns of brain measures relate to the behavioral variables. Because they are presented mostly in the supplement (except for Figure 6E), it's difficult to map the LVs described onto the univariate contrasts in Figures 2-5. In general, greater clarity is needed regarding how the PLS-derived latent variables relate to the univariate findings, and whether the emphasis on LV3 reflects a principled selection or post hoc interpretation.
  
  (4) If I understand the results correctly, there were only modest differences in behavior reported, and the patterns were somewhat inconsistent across sex and genotype. In fact, the authors report that the high-fat diet alone did not impair memory on the Morris Water maze (line 323). The discrepancy between robust neuroanatomical effects and relatively modest behavioural changes raises important questions about the functional significance of the observed structural alterations.
  
  (5) On line 507, the authors state, "Notably, 3xTgAD mice already show smaller brain volumes at baseline, which may constrain the detectable impact of the diet." Is this true for the entire brain or just the hippocampus and cerebellum? Would a global reduction in brain volume due to the 3xTgAD AD model affect the interpretation of the intervention effects?
  
  Review 2
4. Public_Reviews 29 May 2026
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The authors sought to determine the individual and combined effects of exercise and low-fat diet consumption on regional brain volume and cognitive function in triple-transgenic Alzheimer's disease mice and wild-type controls.
  
  Strengths:
  
  (1) A strength of this study is its longitudinal design, which captures regional changes in brain volume across the interventions tested.
  
  (2) Its comprehensive design includes 10 groups and is well-powered to isolate genotype-, sex-, diet- and exercise-related effects (and interactions).
  
  (3) The analyses of volumetric and voxel-based measures are comprehensive.
  
  Weaknesses:
  
  (1) Use of automated tracking for NOR data reduces confidence in the behavioural data.
  
  (2) No measures of Ab or tau pathology appear to be performed.
  
  (3) Mice from the critical 'combined' intervention groups are not included in the PLS regression model that integrates behavioural and brain data.
  
  (4) Analyses of behavioural data include a large number of variables without adequate justification.
  
  Review 3
Visit annotations in context

Tags

Review 3

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.02.26.640411v2
www.biorxiv.org www.biorxiv.org

A miniaturized MR1 metabolite display system with native-like protein features

4
1. Public_Reviews 29 May 2026
  
  in eLife
  
  eLife Assessment
  
  The manuscript by Rotsides et al. reports the design and validation of SMART-MR1, a miniaturized MR1 metabolite-display platform in which the α1/α2 ligand-binding domain is stabilized by a synthetic helical domain in place of the α3 domain and β2-microglobulin. Supported by biochemical, biophysical, and structural approaches, including ITC, NMR, and cryo-EM, the work provides solid evidence that SMART-MR1 retains native-like ligand binding and A-F7 TCR recognition while enabling experimental approaches for ligand screening that are difficult with conventional MR1 constructs. The study is valuable for the MR1 and MAIT-cell fields, particularly as a tool for ligand screening and mechanistic studies of MR1-restricted antigen presentation. There are several suggestions to further strengthen the study's impact, including clearer benchmarking against existing MR1 platforms, broader validation across ligands and TCRs, and functional evidence from MAIT-cell staining or activation assays.
  
  Summary
2. Public_Reviews 29 May 2026
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This study presents an Important tool for the study of MR1 antigen binding, opening new possibilities, and cutting-edge techniques. The evidence supporting the claims of the authors is solid, although including some functional experiments using primary T-cells would also provide a more complete physiologic evaluation. The work will be of interest to T cell immunologists, in general, especially those studying unconventional T cells.
  
  Strengths:
  
  In this study, the authors developed a single-chain MR1-derived protein by exchanging the α3 domain and β2-microglobulin for a helical stabilizing domain that they had previously developed. The aim was to generate a more compact structure that would still fold properly, without the risk of losing β2-microglobulin. This overall more robust structure would facilitate ligand exploration using various cutting-edge biophysical techniques.
  
  The authors successfully demonstrated that their construct folds similarly to native MR1 and retains the ability to bind MAIT TCR in solution, as shown by cryo-EM experiments. Its melting temperature was equivalent to that of the native protein. Importantly, the construct enables the use of differential scanning fluorometry and transverse relaxation-optimized spectroscopy, which represent the main strengths of this work. These approaches should greatly facilitate the screening of additional unknown ligands and enable interaction mapping.
  
  Weaknesses:
  
  One possible area for improvement would be to extend the validation to additional known ligands, particularly weaker binders. Furthermore, although the cryo-EM data are highly convincing, including either MAIT cell staining or MAIT activation assays with the generated construct would provide stronger functional validation of its equivalence to the wild-type protein with respect to ligand-binding properties.
  
  Overall, this work is of great interest to the field, as several groups worldwide are seeking to identify endogenous/tumour-derived MR1 ligands. In addition, some pathogens lacking the capacity to produce 5-OP-RU have been shown to activate MAIT cells, raising the possibility that unknown pathogen-derived ligands may also exist.
  
  Review 1
3. Public_Reviews 29 May 2026
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors develop a miniaturized MR1 construct (SMART-MR1) in which the α1/α2 platform is stabilized by a synthetic domain, and show that it can bind ligands, engage a cognate TCR, and recapitulate native-like recognition by cryo-EM.
  
  Strengths:
  
  The work is well-written, technically strong and carefully executed. The authors combine biochemical, biophysical and structural approaches, including ITC, NMR and cryo-EM, to show that SMART-MR1 behaves in a manner closely resembling native MR1. The reduction in size and the demonstration of solution NMR are clear practical advantages for certain types of mechanistic studies.
  
  Weaknesses:
  
  The main limitation is that the manuscript does not clearly establish a practical advantage over existing MR1 formats, such as single-chain MR1-β2M or previously described stabilized constructs. The comparison is largely framed against native MR1, which risks overstating the problem, and on the basis of the data presented, it is unlikely that other researchers will adopt this system. In addition, the choice of the A-F7 TCR as a validation reagent may overestimate the generality of the approach, as this receptor is known to exhibit relatively broad ligand tolerance, including recognition of MR1 presenting vitamin B6 metabolites (PDB 9CGR) and structurally diverse synthetic ligands. The extent to which SMART-MR1 supports recognition by a broader range of MR1-restricted TCRs is not addressed.
  
  Review 2
4. Public_Reviews 29 May 2026
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  This manuscript describes the engineering, production and validation of an MR1 variant with enhanced suitability for screening of ligands and biophysical and structural analysis. The authors utilize a previous advance from their laboratory on a classical MHC (HLA-A2) whereby the alpha 3 and b2m domains are replaced by a helical stabilizing domain.
  
  Strengths:
  
  This variant has a smaller molecular weight than the native MR1, can be produced easily through refolding and is thus much more suitable for NMR analysis. The authors provide data demonstrating that many of the parameters typically evaluated in protein biochemistry/biophysics are similar to reported values between this engineered variant and the wild-type protein. Overall, this is a significant advance to the MR1 field and more broadly to MR1 relevance in immunology and cancer biology, as this will accelerate high-throughput screening and discovery of disease-relevant ligands for MR1, which have been overshadowed by the misguided fixation on 5-OP-RU.
  
  Weaknesses:
  
  Minor concerns about the lack of comparison with the native MR1 extracellular domain construct in the validation of this engineered construct.
  
  Review 3
Visit annotations in context

Tags

Review 3

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.04.13.718121v1
www.biorxiv.org www.biorxiv.org

Uncovering genetic mechanisms underlying trait variation in switchgrass using explainable artificial intelligence

3
1. Public_Reviews 29 May 2026
 
 in eLife
 
 eLife Assessment
 
 The study by Izquierdo and colleagues provides important insights into the field of genomic and transcriptomic prediction of traits across multiple environments. The rationale and analyses conducted to integrate the two types of ~omics datasets across two environments are solid. However, some clarification would be appreciated in the presentation of the results, and adding some statistical control to clarify how the predictors were selected, or assessing their importance using the SHAP framework, would further consolidate the findings.
 
 Summary
2. Public_Reviews 29 May 2026
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 P. Izquierdo et al. investigated the genetic determinism of various traits of interest in switchgrass using large-scale genomic and transcriptomic data. More specifically, they worked on a diversity panel comprising 426 genotypes evaluated in common-garden experiments at two locations (Michigan and Texas). The phenotypic and genomic data were already published. In this work, they produced transcriptomic data for each of the 426 genotypes at each site, and they carried out phenotype predictions using genomic and transcriptomic data separately or together. While they were moderately correlated at each location, both omic information appeared to be complementary for the prediction of phenotype. To further exploit the fact that they have data across two locations, they computed differences for phenotypes and transcripts between locations as indicators of trait and transcript plasticity, respectively. They built predictive models of trait plasticity using genomic information and transcript plasticity, which proved to be quite accurate for traits affected by GxE. Finally, they made use of SHAP values from predictive models of flowering time and biomass at each location, as well as for their plasticity, to gain insight into their genetic determinism. These SHAP values provide the importance of the predictive features (SNP and/or transcripts) for trait prediction. This allowed them to confirm some candidate genes and to propose new candidates for both traits.
 
 Strengths:
 
 I found this study interesting and rich. I think the sample size (426 genotypes) is large enough to support the findings. The use of a modern machine-learning approach (XGBoost) together with SHAP indices to find interesting features and get insights into the biological mechanisms underlying flowering time and biomass production is quite original. The methodology employed is globally sound. I also like the fact that the authors accounted implicitly for the population structure by providing a baseline prediction using the first 5 PCs.
 
 Weaknesses:
 
 While the methodology is globally sound, I sometimes had difficulties following exactly what was done. This is partly due to the fact that the authors used 2 omics (SNPs and transcripts) to predict phenotypes, and sometimes, in the results, it is not clear which of the 2 is the focus. This was especially the case for the importance of the features and the interpretability of the models, where I found it sometimes hard to tell whether the analysis was done on SNPs or transcripts.
 
 Also, regarding the methodology, I did not understand why the authors needed to perform a feature selection approach. Maybe it was required to perform the interaction analysis, which could not be deployed on all the features? But regarding the importance of the features, I do not get the added value of the selection over the direct use of SHAP indices when using all features. Maybe this is because I am not a specialist in this kind of approach, but maybe the authors could add more details to explain the rationale behind the feature selection.
 
 Review 1
3. Public_Reviews 29 May 2026
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The authors aimed to evaluate whether integrating genomic (SNP) and transcriptomic information with machine learning can improve phenotypic prediction of polygenic traits across environments. The manuscript explored not only the predictability across models and predictor feature sets, but also attempted to identify meaningful genes and interactions underlying trait variation.
 
 Strengths:
 
 The main strength of the manuscript is its integration of SNP, transcriptomic, and phenotype datasets for 426 sorghum genotypes between Texas and Michigan. It provides a systematic comparison of predictor types (SNP versus transcriptomic abundance) and model strategies to integrate them.
 
 Weaknesses:
 
 (1) Experimental Design
 
 The experimental design raises several concerns that should be clarified before strong biological conclusions are drawn from the transcriptomic analyses.
 
 First, the transcriptomic sampling is not well aligned with the developmental stages most relevant to the phenotypes being modeled. Leaf tissue was collected at a single time point in each environment, whereas traits such as flowering time, biomass, tiller count, and panicle height arise from developmental processes occurring over extended and potentially distinct temporal windows. Consequently, the measured expression profiles are likely to reflect physiological states specific to the sampling dates (May 5-6 in Texas and June 22-24 in Michigan) rather than the regulatory processes underlying the target phenotypes.
 
 Second, the phrase "haphazardly randomized" is questionable for a field experiment. It is unclear whether the design included formal randomization, blocking, row/column structure, or spatial correction. Without explicit accounting for spatial field heterogeneity, environmental variation within sites may confound genotype and transcriptomic effects.
 
 Third, the Methods do not clearly describe biological replication for RNA-seq. If each genotype-by-environment combination were represented by a single transcriptomic sample, then within-genotype expression variance cannot be estimated. This is important because transcript abundance is highly sensitive to microenvironment, sampling time, tissue status, developmental stage, and technical variation. The absence of replication significantly weakens confidence in gene-level feature importance and gene-gene interaction claims.
 
 Four, the analysis of expression differences across environments is based on a simple subtraction (TX - MI) followed by correlation with genetic similarity. This approach is not standard in transcriptomic analysis and does not account for variability, replication, or statistical uncertainty. Conventional methods for assessing differential expression and genotype-by-environment interactions rely on model-based frameworks that explicitly estimate variance components and test for interaction effects. Without such modeling, the observed expression differences may reflect noise or confounding factors rather than genotype-driven responses.
 
 (2) SHAP contribution values
 
 Although SHAP is a well-established framework for decomposing model predictions into feature-level contributions, its use in this manuscript raises several concerns regarding interpretation, statistical validity, and biological inference.
 
 First, SHAP values quantify the contribution of features within the fitted model, conditional on the joint distribution of inputs and the model structure. They do not represent causal effects or direct biological importance. There is a difference where SHAP values are often in log-odds and the regression model uses absolute units. Without a fair evaluation of model fit, the interpretation of SHAP values needs to take a cautious step because a model could fit poorly when a feature shows very high SHAP values.
 
 In genomic data, where features are highly correlated due to linkage disequilibrium and co-expression, SHAP values can distribute contribution values across correlated variables in ways that are not uniquely identifiable. As a result, features highlighted as "important" may reflect correlation structure rather than true functional relevance.
 
 This correlative structure can be exacerbated in this manuscript because of the use of TPM-normalized transcript abundances as predictor variables without biological replicates. Assume the estimates of transcript abundances are robust, TPM values are compositional, with a constant-sum constraint that creates dependencies among all genes that induce negative correlations. This issue is particularly relevant for the interpretation of gene importance and interaction effects, where correlated predictors can lead to unstable and non-unique attributions. This biological interpretation of transcript-based features remains uncertain.
 
 (3) Result interpretation
 
 For example, in page 11, "plasticity SNP- and transcriptomic-based models generally outperformed single-environment models for traits with low cross-environment correlation, such as green-up (Fig. 2c, r = -0.13, p < 8.3 × 10⁻³) and tiller count (Fig. 2f, r = -0.08, p = 0.1) (Supplementary Fig. S1).", is too broad. For green-up, the Diff model appears much better than MI, but not clearly better than TX.
 
 And, same page 11, "...Diffexp was more predictive than SNPs for trait plasticity in biomass, flowering time, and tiller count..." only holds true for biomass, not flowering time, or tiller count.
 
 The aspect of "complementary information" between SNP and transcriptomic models in page 12 is stronger than what is supported by Figure 2. Figure 2 shows different predictive performance, but it does not by itself demonstrate complementarity. Establishing complementarity requires evidence that combining SNP+T improves prediction consistently or captures distinct, non-overlapping signals. Yet the preceding section says SNP+T outperformed either single data type in only 15% of cases, with modest gains. This is confusing. Also, there was not G+T in Figure 2; it is SNP+T.
 
 Review 2
Visit annotations in context

Tags

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.03.06.710154v1
chinaxiv.org chinaxiv.org

ChinaXiv.org 中国科学院科技论文预发布平台

5
1. Public_Reviews 29 May 2026
  
  in eLife (unscoped)
  
  eLife Assessment
  
  This Review Article provides an overview of circadian findings obtained using the zebrafish model and will be of particular interest to researchers working with zebrafish in chronobiology and behavioural neuroscience. The article would benefit from a broader conceptual framework that more clearly positions zebrafish within the wider landscape of animal models used in circadian biology, including comparisons with other extensively studied systems. In addition, several citation inaccuracies and interpretational issues identified during peer review should be carefully addressed to strengthen the accuracy and impact of the review.
  
  Summary
2. Public_Reviews 28 May 2026
  
  in eLife (unscoped)
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Wang Liao and colleagues aim to provide a comprehensive synthesis of zebrafish circadian research, with particular emphasis on the decentralized photoreceptive architecture that distinguishes teleosts from mammals, and to outline future research directions leveraging emerging technologies for translational applications. The authors frame zebrafish as occupying a "crucial evolutionary and experimental niche" and argue that the model system is uniquely suited to address open questions in chronobiology.
  
  Strengths:
  
  The review is broad in scope and up to date in its citation of recent primary literature. The coverage of physiological outputs - spanning cardiovascular rhythmicity, hepatic metabolism, immune function, reproduction, and gut homeostasis - is more comprehensive than many existing reviews in this area, and researchers seeking an entry point into any of these subfields will find a useful orientation. The figures are well-designed and effectively summarise complex regulatory relationships. The section on immune rhythmicity is a particular strength, providing mechanistic detail on how specific clock components (Clock1a, Per1b, Per2, Cry1a) differentially regulate neutrophil behaviour, bacterial killing, and cytokine expression; this level of molecular specificity distinguishes it from comparable sections in the review. The brief discussion of non-canonical clock gene functions (CLOCK in neuronal connectivity, BMAL1 in stem cell state, vascular calcification) raises genuinely interesting points that are underexplored in the field and might deserve more prominence.
  
  The future perspectives section makes a conceptually interesting move in suggesting that the zebrafish decentralized architecture could reframe a central question in chronobiology - from how a master clock imposes order on passive peripheral oscillators, to how semi-autonomous oscillators achieve coherence. This is the most original conceptual contribution in the manuscript, and it would benefit from much further development.
  
  Weaknesses:
  
  The core limitation of this review is that it functions primarily as an annotated bibliography rather than a critical synthesis. Section after section follows the same pattern: a physiological system is introduced, several findings from recent papers are described in sequence, and the section ends. Missing throughout is an evaluative voice - where does the field agree, where does it disagree, which findings have been replicated versus remain preliminary, and which conceptual questions are genuinely unresolved versus merely unstudied? Readers with expertise in the field will find little that reframes their understanding; readers new to the field will receive information but not the interpretive scaffolding needed to assess its significance.
  
  The framing of zebrafish as occupying a "crucial evolutionary and experimental niche" is asserted but not substantiated. The experimental advantages of zebrafish - optical transparency, external development, genetic tractability - are real, but they apply primarily to larval stages, typically the first two weeks of development. The review does not adequately address whether the key features it highlights, particularly peripheral photosensitivity and autonomous peripheral oscillators, have been demonstrated in adult animals, where optical transparency is lost. Many of the physiological findings described (sleep-wake cycles, cardiovascular function, reproduction, and immune function) are most relevant in adult or juvenile fish, yet the mechanistic underpinnings often come from larval studies. Whether the mechanisms generalise across developmental stages is not discussed, and this is an important gap that the review could acknowledge explicitly.
  
  The claim that zebrafish bridge invertebrate and mammalian models is a conventional framing that appears in most zebrafish review articles; its repetition here adds little. More interesting - and underexplored - is the comparative question of how the decentralised clock architecture of teleosts compares with that of other non-mammalian vertebrates, or indeed with invertebrate systems such as Drosophila, where peripheral tissue clocks and non-visual photoreception have also been studied. The review does not engage with this comparative dimension, which would be the natural intellectual context for the claims being made.
  
  The future perspectives section identifies several promising directions - optogenetic circuit mapping, whole-body longitudinal imaging, inter-organ communication, network modeling - but these are described at a high level of generality. Most are not specific to the questions raised by the zebrafish decentralized clock architecture; they would appear in any forward-looking review of circadian biology. The one conceptually distinctive idea - that zebrafish could be used to ask how distributed oscillators achieve coordinated coherence without hierarchical control - is identified but not developed into concrete experimental questions or testable predictions. The discussion of non-canonical clock gene functions in the Future Perspectives section would benefit from being more directly connected to what zebrafish specifically can offer: given that teleost genome duplication has produced additional paralogues of clock genes, there is a concrete opportunity to dissect canonical from non-canonical functions through comparative analysis of paralogues with diverged expression patterns. This point is hinted at but not made explicitly.
  
  Appraisal of conclusions:
  
  The conclusions are broadly consistent with the evidence cited, and the authors are appropriately cautious in noting that many signalling cascades and inter-tissue communication mechanisms remain incompletely characterised. The conclusion that zebrafish represents a valuable and underexploited model for circadian-disease translational research is well-supported. However, the review would be significantly strengthened if the authors distinguished more clearly between what is firmly established, what is supported by preliminary or single-study evidence, and what remains genuinely speculative.
  
  Likely impact and utility:
  
  This review will be useful as an orientation document for researchers new to zebrafish circadian biology, and the comprehensive treatment of physiological outputs across organ systems is a genuine service to the field. Its impact as an intellectual contribution is limited by the descriptive approach and the absence of original synthesis or conceptual reframing. The most interesting ideas in the manuscript - the reframing of the central/peripheral clock hierarchy question, and the potential of clock gene paralogues for probing non-canonical functions - could be further developed and, if pursued, could form the basis of a more distinctive and impactful contribution.
  
  Review 1
3. Public_Reviews 28 May 2026
  
  in eLife (unscoped)
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This review is valuable in principle because circadian rhythms in zebrafish are unexplored and therefore this degree is valuable in principle. There are a number of significant weaknesses that should be addressed for it to have an impact. First, while the review covers a broad range of topics in chronobiology, it does not put them in context. Placing zebrafish work in the context of other model organisms that are better understood and other fish species would broaden the appeal. The review could also expand to a discussion of sleep, where the understanding in zebrafish is much more advanced. Critically, providing a novel framework, identifying new areas of opportunity and limitations of the system would expand the interest to non-zebrafish research groups. In addition, there are a number of misstatements/mis-citations that are critical to correct. Therefore, I find this review potentially impactful, but its current form is likely to limit its impact.
  
  Strengths:
  
  Focusing on decentralized photo sensing is a strength because it is relatively unique to zebrafish.
  
  The breadth of discussion in zebrafish is a strength.
  
  Weaknesses:
  
  It might be helpful to reorganize the review with an introduction on what is known in other better studied systems to be highly conserved, then to focus in on the components of zebrafish that are discussed here.
  
  A weakness is the lack of integration with other model organisms and other fish systems. Therefore, the narrow focus on zebrafish is unlikely to appeal to broader audiences.
  
  It's surprising that there is not more discussion of sleep, which has been studied in detail, and its relationship to the clock.
  
  Discussions of limitations of the model, including adult vs larval analysis and challenges performing long-term behavioral analysis in fish, would be valuable.
  
  Review 2
4. Public_Reviews 28 May 2026
  
  in eLife (unscoped)
  
  Reviewer #3 (Public review):
  
  Summary:
  
  Over the past 3 or 4 decades, our understanding of the molecular mechanism underlying the circadian clock has increased substantially. This is in large part due to successful forward and reverse genetics approaches applied to a broad range of genetic model systems, notably Drosophila, Neurospora, mouse, Arabidopsis and cyanobacteria. Although the clock components in these species are diverse, the basic operating principles are highly conserved, allowing us to build a general view of clock mechanisms. Looking forward, there are still many unanswered questions regarding how clocks are organized at the systems level and, in turn, how they are coupled to key aspects of physiology. Each model species has its own set of advantages and disadvantages for tackling particular questions. As this timely review aims to illustrate, the zebrafish has become a particularly valuable model for exploring circadian clock biology. This is in part due to its technical advantages, accessibility of early developmental stages and its directly light-entrainable peripheral clocks. This provides unparalleled opportunities for studying the circadian clock hierarchy and its links with physiology.
  
  Strengths:
  
  This review does a good job of integrating the many lines of circadian clock research where the zebrafish has been used as a model and provides an overview of many future challenges it is well-suited to tackle.
  
  Weaknesses:
  
  There are citation errors, as well as inaccurate and misleading statements that must be remedied in a revised version.
  
  Review 3
5. Public_Reviews 28 May 2026
  
  in eLife (unscoped)
  
  Author response:
  
  We sincerely thank the reviewers and editors for the thorough, constructive, and insightful comments, which have greatly helped us improve the accuracy, clarity, and rigor of the manuscript. We acknowledge that the current version has several limitations, including insufficient contextualization with other model systems and lack of critical synthesis. These important weaknesses will be comprehensively addressed in a future revised version of the review.
  
  For the present revision, we have focused exclusively on correcting objective errors, factual inaccuracies, and citation mistakes as pointed out by the reviewers. All specific factual and reference issues raised by Reviewer 2 and Reviewer 3 have been carefully corrected in the revised manuscript, including inaccurate statements, incorrect citations, missing references, and inconsistent descriptions of zebrafish clock genes, photoreception, and physiological functions.
  
  We appreciate the reviewers’ thoughtful suggestions regarding the conceptual depth, comparative context, critical synthesis, and expanded discussion of sleep and model limitations. While we fully agree that these aspects would significantly strengthen the review, we plan to systematically incorporate these broader conceptual improvements in a future, more substantial revision.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

chinaxiv.org/abs/202604.00029v1
www.biorxiv.org www.biorxiv.org

Contractile perinuclear actomyosin network promotes peripheral and polar chromosome interaction with the mitotic spindle

4
1. Public_Reviews 28 May 2026
 
 in eLife
 
 eLife Assessment
 
 This important study demonstrates that a perinuclear actomyosin network, present in some types of human cells, facilitates kinetochore-spindle attachment of chromosomes in unfavorable locations, thereby reducing their missegregation rate. This actomyosin network and its general role have been studied previously, but this study convincingly clarifies the underlying mechanism and expands the investigation to additional cell lines. The results are relevant to understanding chromosome missegregation in cancer cells.
 
 [Editors' note: this paper was reviewed by Review Commons.]
 
 Summary
2. Public_Reviews 28 May 2026
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Sheidaei and colleagues report a novel and potentially important role for an early mitotic actomyosin-based mechanism, PANEM contraction, in promoting timely congression of chromosomes located at the nuclear periphery, particularly those in polar positions. The manuscript will interest researchers studying cell division, cytoskeletal dynamics, and motor proteins. Although some data overlap with the group's prior work, the authors extend those findings by optimizing key perturbations and performing more detailed analyses of chromosome movements, which together provide a clearer mechanistic explanation. The study also builds naturally on recent ideas from other groups about how chromosome positioning influences both early and later mitotic movements.
 
 Comments on revised version:
 
 In the revised manuscript, organizational issues have been largely resolved. In addition, the inclusion of new experiments in additional cell lines, along with an expanded discussion that places actomyosin contractility in the broader conceptual context of other mechanisms governing chromosome movement, has significantly strengthened the manuscript.
 
 Review 1
3. Public_Reviews 28 May 2026
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Sheidaei et al. report how chromosomes are favourably positioned to facilitate kinetochore-microtubule interactions during early mitosis. Studying kinetochore capture during early prophase is extremely difficult due to kinetochore crowding, but the team has taken up the challenge by classifying types of kinetochore movements, carefully marking kinetochore positions in early mitosis, and linking these to map their fate/next positions over time. The work is an excellent addition to the chromosome segregation field, as most of the literature has thus far focused on tracking kinetochores at slightly later stages of mitosis. The authors show that PANEM facilitates chromosome positioning toward the interior of the newly forming spindle, which in turn promotes chromosome congression. In the absence of PANEM, chromosomes end up in unfavourable locations and fail to form proper kinetochore-microtubule interactions. The work highlights the perinuclear actomyosin network in early mitosis (PANEM) as a key spatial and temporal element of chromosome congression, a step that precedes the segregation process.
 
 Comments on revised version:
 
 The authors' revisions have brought clarity to the description of movements in many of the figures. The manuscript ties a fundamental process to differences in cancer cell lines.
 
 The work extends their published discovery that an actomyosin network forms on the cytoplasmic side of the nuclear envelope during prophase. The current manuscript explains how this network facilitates chromosome capture and congression by tracking the motions of individual kinetochores during early mitosis. The findings are broadly useful for the cell division and cytoskeletal fields.
 
 Review 2
4. Public_Reviews 28 May 2026
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews
 
 Reviewer #1 (Evidence, reproducibility and clarity):
 
 Summary
 
 Sheidaei and colleagues report a novel and potentially important role for an early mitotic actomyosinbased mechanism, PANEM contraction, in promoting timely congression of chromosomes located at the nuclear periphery, particularly those in polar positions. The manuscript will interest researchers studying cell division, cytoskeletal dynamics, and motor proteins. Although some data overlap with the group's prior work, the authors extend those findings by optimizing key perturbations and performing more detailed analyses of chromosome movements, which together provide a clearer mechanistic explanation. The study also builds naturally on recent ideas from other groups about how chromosome positioning influences both early and later mitotic movements.
 
 In its current form, however, the manuscript is not acceptable for publication. It suffers from major organizational problems, an overcrowded and confusing Results section and figures, and a lack of essential experimental controls and contextual discussion. These deficiencies make it difficult to evaluate the data and the authors' conclusions. A substantial structural revision is required to improve clarity and persuasiveness. In addition, several key control experiments and more conceptual context are needed to establish the specificity and relevance of PANEM relative to other microtubule- and actin-based mitotic mechanisms. Testing PANEM in additional cell lines or contexts would also strengthen the claim. I therefore recommend Major Revision, addressing the structural, conceptual, and experimental issues detailed below.
 
 Major Comments
 
 A. Structural overhaul and figure reorganization
 
 The Results section is overly dense, lacks clear structure, and includes descriptive content that belongs in the Methods. Many figure panels should be moved to Supplementary Materials. A substantial reorganization is required to transform the manuscript into a focused, "Reports"-type article.
 
 Move methodological and descriptive details (e.g., especially from the second Results subheading and Figure 2) to the Methods or Supplementary Materials.
 
 In these parts, we define four phases of kinetochore motion in early mitosis. Without such a description in the main text, readers would be confused about subsequent analyses. Figure 2 is also important to show examples of how the four phases develop. Although we respect this suggestion from the reviewer, we would like to keep these parts in the main text and main figure.
 
 Remove repetitive statements that simply restate that later phenotypes arise as consequences of delayed Phase 1 (applicable to subheadings 3 onward).
 
 As suggested, we have removed the statement for the delayed start of Phase 2 for peripheral kinetochores in azBB-treated cells (Page 9, second paragraph). We have also simplified the statement for the delayed start of Phase 3 and Phase 4 to avoid repetition (Page 9, third paragraph; Page 10, second paragraph).
 
 Figure 4I: This panel is currently unclear and should be drastically simplified.
 
 Following this suggestion, we simplified Figure 4I by removing the column of ‘Start’, which is easily deduced from the ‘Duration’ results and therefore does not provide much new information.
 
 I recommend to reorganize figures as follows:
 
 Figure I: Keep as single figure but simplify. Figure 1D and 1E could be combined, move unnormalized SCV to supplementary materials. Same goes for 1F.
 
 We have reorganized Figure 1, as suggested, and moved unnormalized data to supplemental materials.
 
 New Figure 2: Combine current Figures 2A, 3A, 3C, 3D, 4C, 4F, and 4H to illustrate how PANEM contraction facilitates initial interactions of peripheral chromosomes with spindle microtubules which increases speed of congression initiation.
 
 If we were to follow this suggestion, we would lose Figure 2B, D, Figure 3B and Figure 4A, where examples of kinetochore motions are shown in images and 3D diagrams. The new Figure would mostly consist of only graphs. Without examples of images and 3D diagrams, readers would have difficulty understanding the study. Although we respect this suggestion from the reviewer, we would like to keep Figures 2, 3 and 4, as they are (except for making Figure 4I simpler; see above).
 
 New Figure 3: Combine current Figures 5A, 5C, 5D, 5F, 6B, 6C, and lower panels of 4H to show how
 
 PANEM contraction repositions polar chromosomes and reduces chromosome volume in early mitosis to enable rapid initiation of congression.
 
 If we were to follow this suggestion, we would lose Figure 5B and Figure 6A, where examples of kinetochore/chromosome dynamics are shown in images and 3D diagrams. For the same reason as above, we would like to keep Figure 5 and 6 as they are, although we respect this suggestion from the reviewer.
 
 New Figure 4: Combine Figures 7A, 7B, 7D, 7E, 7F, expanded Supplementary Figure S7, and new data to demonstrate that PANEM actively pushes peripheral chromosomes inward which is important for efficient chromosome congression in diverse cellular contexts.
 
 We have conducted new experiments to demonstrate the role of PANEM in diverse cellular contexts, as detailed below. We have combined the new results with the original Figure S7 to create Figure 8 in line with this suggestion.
 
 On the other hand, in our view, combining Figure 7A-E and the extended Figure S7 would be confusing because the two parts address different topics. Although we respect this suggestion from the reviewer, we would like to keep Figure 7 and the extended Figure S7 (i.e. Figure 8) separate.
 
 B. Specificity and redundancy of actin perturbation
 
 To establish the specificity and relevance of PANEM, the authors should include or discuss appropriate controls:
 
 Apply global actin inhibitors (e.g., cytochalasin D, latrunculin A) to disrupt the entire actin cytoskeleton. These perturbations strongly affect mitotic rounding and cytokinesis but only modestly influence early chromosome movements, as reported previously (Lancaster et al., 2013; Dewey et al., 2017; Koprivec et al., 2025). The minimal effect of global inhibition must be addressed when proposing a localized actomyosin mechanism. Comment if the apparent differences in this approach and one that the authors were using arises due to different cell types.
 
 We did experiments along this line, using a dominant-negative LINC construct, in our previous study (Booth et al eLife 2019). LINC-DN should more specifically remove/reduce PANEM than the global actin inhibitors mentioned above. LINC-DN attenuated the reduction of CSV soon after NEBD and increased the number of polar chromosomes (Booth et al eLife 2019); i.e. in this regard, the outcome was similar to azBB treatment in the current study. One can expect that global actin inhibitors would also inhibit the PANEM formation and show effects similar to LINC-DN. By contrast, the indicated references reported that global actin inhibitors strongly affect mitotic rounding and cytokinesis but only modestly influence early chromosome movements, as the reviewer noted. One possibility is that such differences may have arisen from different cell types – this could be important, especially given that some cells form the PANEM and others do not (Figure 8A). A second possibility is that cytokinesis, mitotic rounding and PANEM formation may rely on actin polymerization to different extents. For example, the same concentration of global actin polymerization inhibitors may affect cytokinesis, but may still allow PANEM formation to proceed without observable effects on early chromosome movements. As suggested, we discussed this topic in the Discussion (page 16, third paragraph).
 
 Clarify why spindle-associated actin, especially near centrosomes, as reported in prior studies using human cultured cells (Kita et al., 2019; Plessner et al., 2019; Aquino-Perez et al., 2024), was not observed in this study. The Myosin-10 and actin were also observed close to centrosomes during mitosis in X.laevis mitotic spindles (Woolner et al., 2008). Possible explanations include differences in fixation, probe selection, imaging methods, or cell type. Note that some actin probes (e.g., phalloidin) poorly penetrate internal actin, and certain antibodies require harsh extraction protocols. Comment on possibility that interference with a pool of Myo10 at the centrosomes is important for effects on congression.
 
 As the reviewer implies, we cannot rule out that we could not detect actin associated with the spindle or centrosomes because of the difference in methods or cell lines between the current study and the literature mentioned by the reviewer. We have therefore moderated our claim in the Discussion that ‘we did not detect any actin network inside the nucleus, on the spindle or between chromosomes’ by adding ‘at least, using the method and the cell line in the current study’ to this statement (Page 14, second paragraph). We have also cited the three references mentioned by the reviewer in the Discussion (Page 14, second paragraph). Regarding Myosin10, azBB (blebbistatin variant) should have negligible effects on class-X myosin, including Myosin-10 (Limouze et al 2004 [PMID 15548862]). It is therefore unlikely that the effects of azBB that we observed in the current study are due to the inhibition of Myosin-10. We have cited Woolner et al 2008 and another paper and discussed this topic in the Discussion (Page 14, second paragraph).
 
 C. Expansion of PANEM functional analysis
 
 To strengthen the conclusions and broaden the study beyond the group's previous work, PANEM function should be tested in additional contexts (some may be considered optional but important for broader impact): [underlined by authors]
 
 Test PANEM function in at least one additional cell line that displays PANEM to rule out cell-line-specific effects.
 
 As suggested, we have studied the effect of PANEM contraction in cell lines other than U2OS. We have found that when PANEM contraction was inhibited, the reduction in chromosome scattering was diminished in RPE1 cells (new Figure 8B, C). Moreover, we have found that inhibition of PANEM contraction increased polar chromosomes during prometaphase/ metaphase in RPE1 and HCT116 cells (which form PANEM), but not in HeLa cells (which do not form PANEM) (new Figure 8D, E). These results suggest that the effects of PANEM contraction, originally observed in U2OS cells, are also present in other cell lines (RPE1 and HCT116) that form PANEM.
 
 Examine higher-ploidy or binucleated cells to determine whether multiple PANEM contractions are coordinated and if PANEM contraction contributes more in cells of higher ploidies or specific nuclear morphologies.
 
 This is an interesting suggestion, but it takes lots of time to conduct such a study, and it goes beyond the scope of this paper.
 
 Investigate dependency on nuclear shape or lamina stiffness; test whether PANEM force transmission requires a rigid nuclear remnant.
 
 This is an interesting suggestion, but it takes lots of time to conduct such a study, and it goes beyond the scope of this paper.
 
 Analyze PANEM's contribution under mild microtubule perturbations that are known to induce congression problems (e.g., low-dose nocodazole).
 
 In the current study, we found that PANEM contraction affects chromosome motions in Phase 1 and Phase 3 but not Phase 2 or Phase 4. Mild microtubule perturbation itself could affect chromosome motions in all four Phases. We do not think it would be so informative to study what additional effects the reduced PANEM contraction shows when combined with mild microtubule perturbation.
 
 Evaluate PANEM contraction role in unsynchronized U2OS cells, where centrosome separation can occur before NEBD in a subset of cells (Koprivec et al., 2025), and in other cell types with variable spindle elongation timing.
 
 Following this suggestion, we first investigated the timing of spindle elongation, relative to NEBD, in asynchronous U2OS cells (Figure 8 – figure supplement 3). We imaged cells every 5 min (it was difficult to reasonably observe enough mitotic cells using a shorter interval). Most of the cells showed no significant change in the spindle length (distance between two spindle poles) after (or around) NEBD [e.g. Cell 1 in A] or a mild reduction in it [e.g. Cell 2 in A]. Only a small number of cells (2-3 out of 26) showed a mild increase in the spindle length after (or around) NEBD [e.g. Cell 3 in A]. Because the spindle elongation after NEBD was rare and mild, it was difficult to address how the timing of spindle elongation affects the effect of PANEM on reducing chromosome scattering and on chromosome relocation from polar regions. We explained this result and discussed this topic in the Discussion section.
 
 Quantify not only the percentage of affected cells after azBB but also the number of chromosomes per cell with congression defects in the current and future experiments.
 
 It is tricky to count the number of chromosomes because they frequently overlap. Counting kinetochores is more feasible, but kinetochore signals show some non-specific background (e.g. those outside of the nucleus in prophase). We therefore quantified the chromosome volume at polar regions in azBB-treated cells (Figure 6C).
 
 D. Conceptual integration in Introduction and Discussion
 
 The manuscript should better situate its findings within the context of early mitotic chromosome movements:
 
 Clearly state in the Introduction and elaborate in the Discussion that initiation of congression is coupled to biorientation (Vukušić & Tolić, 2025). This provides essential context for how PANEM-mediated nuclear volume reduction supports efficient congression of polar chromosomes.
 
 It has been a widely accepted view in the field that chromosome congression precedes biorientation, since the publication in 2006 (Kapoor et al Science 2006). Very recently, this view has been challenged by the new publication (Vukušić & Tolić, Nat comm 2025), as indicated by this reviewer. We have mentioned this new model and discussed the new interpretation of our results based on this new model, in the Discussion (page 15; ‘It has been a widely accepted view…’).
 
 To explain the new interpretation of our results more clearly, we have a new diagram as a supplemental figure (Figure 9 – figure supplement 1) in the revised manuscript.
 
 Explain that PANEM is most critical for polar chromosomes because their peripheral positions are unfavorable for rapid biorientation (Barišić et al., 2014; Vukušić & Tolić, 2025).
 
 We have included such a statement in the Discussion, as a part of the new interpretation of our results based on the new model that chromosome biorientation precedes congression (see above). We have also cited the indicated two papers.
 
 Discuss how cell lines lacking PANEM (e.g., HeLa and others) nonetheless achieve efficient congression, and what alternative mechanisms compensate in the absence of PANEM. For example, it is well established that cells congress chromosomes after monastrol or nocodazole washout, which essentially bypasses the contribution of PANEM contraction.
 
 Following this suggestion, we discussed three possible mechanisms that could compensate for a lack of PANEM and facilitate kinetochore-MT interaction and chromosome congression, based on previous literature (Page 17): 1) the enhanced assembly rate of spindle MTs may facilitate kinetochore-MT interactions in N-CIN+ cancer cells, 2) chromosome biorientation may precede congression more frequently to promote the congression towards the spindle midplane, and 3) the balance between CENP-E, Dynein and chromokinesin’s activities may incline to greater chromosome-arm ejection forces towards the spindle midplane.
 
 Minor Comments
 
 These issues are more easily addressable but will significantly improve clarity and presentation.
 
 Introduction
 
 Remove the reference to Figure 1A in the Introduction. The portion of Figure 1 and related text that recapitulates the authors' previous work should be incorporated into the Introduction, not the Results.
 
 As suggested in the second sentence of this comment, we have moved most of the second paragraph of the first section of Results to Introduction (Page 4) and cited Figure 1A and 1B in Introduction. We would like to keep the reference to Figure 1A in the Introduction, because showing the PANEM images at the beginning of the manuscript would help readers’ understanding of our study. In addition, citing Figure 1A in the Introduction is more consistent with the suggestion in the second sentence of this comment.
 
 Results (by subheading)
 
 First subheading: When introducing the ~8-minute early mitotic interval, cite additional studies that have characterized this period: Magidson et al., 2011 (Cell); Renda et al., 2022 (Cell Reports); Koprivec et al., 2025 (bioRxiv); Vukušić & Tolić, 2025 (Nat Commun); Barišić et al., 2013 (Nat Cell Biol).
 
 As suggested, we cited these references at the indicated part of the first section of the Results (page 5).
 
 Second subheading: Cite key reviews and foundational research on kinetochore architecture and sequential chromosome movement during early mitosis: Mussachio & Desai, 2017 (Biology); Itoh et al., 2018 (Sci Rep); Magidson et al., 2011 (Cell); Vukušić & Tolić, 2025 (Nat Commun); Koprivec et al., 2025 (bioRxiv); Rieder & Alexander, 1990 (J Cell Biol); Skibbens et al., 1993 (J Cell Biol); Kapoor et al., 2006 (Science); Armond et al., 2015 (PLoS Comput Biol); Jaqaman et al., 2010 (J Cell Biol).
 
 Rieder & Alexander, 1990 (J Cell Biol) and Kapoor et al., 2006 (Science) have already been cited in the second section of the Results in the original manuscript. We agree that all other references should be cited in this manuscript, and they are now cited in the Introduction and/or Discussion where they fit best (e.g. Mussachio & Desai 2017 reviews the kinetochore in general and is therefore best cited in the Introduction).
 
 Third subheading: Clarify why some kinetochores on Figure 3A appear outside the white boundaries if these boundaries are intended to represent the nuclear envelope.
 
 We interpret that these are background signals in the cytoplasm, which do not come from kinetochores, because 1) before NEBD, they were outside of the nucleus, and 2) after NEBD, they did not show any characteristic kinetochore motions such as those towards a spindle pole (Phase 2) and the spindle mid-plane (Phase 4). We have commented on these background signals in the legend for Figure 3A.
 
 Fourth subheading: Note that congression speed is lower for centrally located kinetochores because they achieve biorientation more rapidly (Barišić et al., 2013, Nat Cell Biol; Vukušić & Tolić, 2025, Nat Commun).
 
 Relevant to this comment, there was an error regarding the congression speed of central kinetochores (original Figure 4H). The congression speed of peripheral kinetochores was shown correctly, but for central kinetochores it was shown incorrectly with µm per time interval (30s) shown, rather than µm per minute. We amended this error in the revised manuscript (new Figure 4H). Based on the corrected data, the speed of congression is similar between peripheral and central kinetochores. The original Figure 3G (the speed of poleward motion for central kinetochores) had a similar error, which we have also corrected in the revised manuscript. We apologize for these errors and the confusion it may have caused.
 
 Regarding this comment, if biorientation is achieved more rapidly for central kinetochores, Phase 3 (rather than congression speed) would be shorter for central kinetochores. Indeed, Phase 3 is slightly shorter for central kinetochores (control) than for peripheral kinetochores (control) (Figure 4C), but the difference is not statistically significant (t test; p\=0.21).
 
 Fifth subheading: Cite studies on polar chromosome movements: Klaasen et al., 2022 (Nature); Koprivec et al., 2025 (bioRxiv). Clarify that Figure 5F displays only those kinetochores that initiated directed congression movements.
 
 These two references have already been cited and discussed in this Result section of our original manuscript. However, considering this suggestion, we have discussed more about polar chromosome movements reported by Koprivec et al (page 11). Meanwhile, the reviewer is correct about Figure 5F, and we have clarified this point in the Figure 5F legend.
 
 Sixth subheading (currently in Discussion): Move the final paragraph of the Discussion into the Results and expand it with preliminary analyses linking PANEM contraction to congression efficiency across untreated cell types or under mild nocodazole treatment.
 
 As suggested, we have moved the final paragraph of the Discussion in the original manuscript to make a new final section in the Results in the revised manuscript. Moreover, as suggested, we have studied the outcome of inhibiting PANEM contraction in cell lines other than U2OS (Figure 8 B–E), and have described the new results to the new final section in the Results.
 
 Discussion
 
 When discussing cortical actin, cite key reviews on its presence and function during mitosis: Kunda & Baum, 2009 (Trends Cell Biol); Pollard & O'Shaughnessy, 2019 (Annu Rev Biochem); Di Pietro et al., 2016 (EMBO Rep).
 
 As suggested, we have cited all these review papers in the Discussion (page 17), and mentioned the role of the cortical actin on the spindle orientation and positioning (Kunda & Baum, 2009; Di Pietro et al., 2016), as well as the function of the actomyosin ring on cytokinesis (Pollard & O'Shaughnessy, 2019).
 
 Significance
 
 Advance
 
 This study's main strength is its novel and potentially important demonstration that contraction of PANEM, a peripheral actomyosin network that operates contracts early mitosis, contributes to the timely initiation of chromosome congression, especially for polar chromosomes. While PANEM itself was previously described by this group, this manuscript provides new mechanistic evidence, improved perturbations, and detailed chromosome tracking. To my knowledge, no prior studies have mechanistically connected this contraction to polar chromosome congression in this level of detail. The work complements dominant microtubule-centric models of chromosome congression and introduces actomyosin-based forces as a cooperating system during very early mitosis. However, the impact of the study is currently limited by major organizational issues, insufficient controls, and incomplete contextualization within existing literature. Addressing these issues will substantially improve clarity and credibility. [underlined by authors]
 
 We have addressed the underlined criticisms as detailed above.
 
 Audience
 
 Primary audience of this study will be researchers working in cell division, mitosis, cytoskeleton dynamics, and motor proteins. The findings may interest also the wider cell biology community, particularly those studying chromosome segregation fidelity, spindle mechanics, and cytoskeletal crosstalk. If validated and clarified, the concept of PANEM could be integrated into textbooks and models of chromosome congression and could inform studies on mitotic errors and cancer cell mechanics.
 
 Expertise
 
 My expertise lies in kinetochore-microtubule interactions, spindle mechanics, chromosome congression, and mitotic signaling pathways.
 
 Reviewer #2 (Evidence, reproducibility and clarity):
 
 In this manuscript, Sheidaei et al. reported on their study of chromosome congression during the early stages of mitotic spindle assembly. Building on their previous study (ref. #15, Booth et al., Elife, 2019), they focused on the exact role of the actin-myosin-based contraction of the nuclear envelope. First, they addressed a technical issue from their previous study, finding a way to specifically impair the actomyosin contraction of the nuclear membrane without affecting the contraction of the plasma membrane. This allowed them to study the former more specifically. They then tracked individual kinetochores to reveal which were affected by nuclear membrane contraction and at what stage of displacement towards the metaphase plate. The investigation is rigorous, with all the necessary controls performed. The images are of high quality. The analyses are accurate and supported by convincing quantifications. In summary, they found that peripheral chromosomes, which are close to the nuclear membrane, are more influenced by nuclear membrane contraction than internal chromosomes. They discovered that nuclear membrane contraction primarily contributes to the initial displacement of peripheral chromosomes by moving them towards the microtubules. The microtubules then become the sole contributors to their motion towards the pole and subsequently the midplane. This step is particularly critical for the outermost chromosomes, which are located behind the spindle pole and are most likely to be missegregated.
 
 Significance
 
 While the conclusions are somewhat intuitive and could be considered incremental with regard to previous works, they are solid and improve our understanding of mitotic fidelity. The authors had already reported the overall role of nuclear membrane contraction in reducing chromosome missegregation in their previous study, as mentioned fairly and transparently in the text. However, the reason for this is now described in more detail with solid quantification. Overall, this is good-quality work which does not drastically change our understanding of chromosome congression, but contributes to improving it. Personally, I am surprised by the impact of such a small contraction (of around one micron) on the proper capture of chromosomes and wonder whether the signalling associated with the contraction has a local impact on microtubule dynamics. However, investigating this point is clearly beyond the scope of this study, which can be published as it is. [underlined by authors]
 
 The suggested topic (underlined) is intriguing. However, we agree with the reviewer that it is beyond the scope of this paper. The reviewer recommends publication of our manuscript as it is.
 
 Reviewer #3:
 
 Sheidaei et al., report how chromosomes are brought to positions that facilitate kinetochore-microtubule interactions during mitosis. The study focusses on an important early step of the highly orchestrated chromosome segregation process. Studying kinetochore capture during early prophase is extremely difficult due to kinetochore crowding but the team has taken up the challenge by classifying the types of kinetochore movements, carefully marking kinetochore positions in early mitosis and linking these to map their fate/next-positions over time. The work is an excellent addition to the field as most of the literature has thus far focussed on tracking kinetochore in slightly later stages of mitosis. The authors show that the PANEM facilitates chromosome positioning towards the interior of the newly forming spindle, which in turn facilitates chromosome congression - in the absence of PANEM chromosomes end up in unfavourable locations, and they fail to form proper kinetochore-microtubule interactions. The work highlights the perinuclear actomyosin network in early mitosis (PANEM) as a key spatial and temporal element of chromosome congression which precedes the segregation process.
 
 Major points
 
 (1) The complexity of tracking has been managed by classifying kinetochore movements into 4 categories, considering motions towards or away from the spindle mid-plane. While this is a very creative solution in most cases, there may be some difficult phases that involve movement in both directions or no dominant direction (eg Phase3-like). It is unclear if all kinetochores go through phase1, 2, 3 and 4 in a sequential or a few deviate from this pattern. A comment on this would be helpful. Also, it may be interesting to compare those that deviate from the sequence, and ask how they recover in the presence and absence of azBB.
 
 To respond to this comment, we would like to first clarify how we selected kinetochores for our analysis. We selected kinetochores that can be individually tracked. If kinetochore tracking was difficult (before the start of Phase 4 in control and azBB-treated cells or before observing the extended Phase 3 in azBB-treated cells) because of kinetochore crowding, we did not choose such kinetochores. For example, related to the next comment of this Reviewer, we did not include kinetochores close to spindle poles (within 4 µm) at NEBD in our analysis for the following two reasons: First, these kinetochores often did not show clear and rapid movements towards a spindle pole, which we used to define Phase 2. Second, although we referred to kinetochore co-localization with a microtubule signal for the start of Phase 2, this was difficult for kinetochores close to spindle poles because of a high density of microtubules. As requested, we have added this comment to the Method section (page 25).
 
 With the above selection, all selected kinetochores without azBB treatment (control) showed the poleward motion (Phase 2) and congression (Phase 4) in this order, though their extents were varied among kinetochores. All selected kinetochores with azBB treatment also showed the poleward motion (Phase 2), and some of them showed congression (Phase 4) after Phase 2. Then, Phase 1 and Phase 3 were defined as intervals between NEBD and Phase 2 and between Phase 2 and Phase 4, respectively. If no Phase 4 was observed with azBB, we judged that Phase 3 continued till the end of tracking. We have added this comment to the Method section (page 25-26).
 
 (2) Would peripheral kinetochore close to poles behave differently compared to peripheral kinetochore close to the midplane (figure S4)? In figure 3D, are they separated? If not, would it look different?
 
 Since we did not include kinetochores close to spindle poles (at NEBD), for which it was difficult to define Phase 2 (see our response to the above major point 1), in our analysis, the suggested comparison is not feasible.
 
 (3) Uncongressed polar chromosomes (eg., CENPE inhibited cells) are known to promote tumbling of the spindle. In figure 5B with polar chromosomes, it will be helpful to indicate how the authors decouple spindle pole movements from individual kinetochore movements.
 
 In contrast to CENPE-inhibited cells, azBB-treated cells did not show much tumbling of the spindle, though both cells showed uncongressed polar chromosomes. The reason for this difference may be fewer uncongressed polar chromosomes in azBB-treated cells. There were still modest spindle motions in azBB-treated cells. However, because kinetochore motions were assessed relative to a spindle pole (and other reference points on the spindle) in our study (Figure 2A, C), the modest spindle motions were offset in our analyses of kinetochore motions. We have clarified the underlined part in the Method section (page 24).
 
 (4) The work has high quality manual tracking of objects in early mitosis- if this would be made available to the field, it can help build AI models for tracking. The authors could consider depositing the tracking data and increasing the impact of their work.
 
 As suggested, we have included kinetochore tracking data as supplemental data in the revised manuscript (Figure 3 – source data 1–4; Figure 5 – source data 1, 2).
 
 Minor points
 
 (1) It will be helpful for readers to see how many kinetochores/cell were considered in the tracking studies. Figure legends show kinetochore numbers but not cell numbers.
 
 As suggested, we have now mentioned the number of cells, where the kinetochore motions were analyzed, in the legends for Figures 3, 4, 5, and supplemental figures.
 
 (2) Discussion point: If cells had not separated their centrosomes before NEBD, would PANEM still be effective? Perhaps the cancer cell lines or examples as shown in Figure 6A have some clues here.
 
 Following this suggestion, we first investigated the timing of spindle elongation, relative to NEBD, in asynchronous U2OS cells (Figure 8 – figure supplement 3). We imaged cells every 5 min (it was difficult to reasonably observe enough mitotic cells using a shorter interval). Most of the cells showed no significant change in the spindle length (distance between two spindle poles) after (or around) NEBD [e.g. Cell 1 in A] or a mild reduction in it [e.g. Cell 2 in A]. Only a small number of cells (2-3 out of 26) showed a mild increase in the spindle length after (or around) NEBD [e.g. Cell 3 in A]. Because the spindle elongation after NEBD was rare and mild, it was difficult to address how the timing of spindle elongation affects the effect of PANEM on reducing chromosome scattering and on chromosome relocation from polar regions. We explained this result and discussed this topic in the Discussion section.
 
 (3) Figure 7 cartoon shows misalignment leading to missegregation. It may be useful to consider this in the context of the centrosome directed kinetochore movements via pivoting microtubules. Is this process blocked in azBB-treated cells?
 
 We understand that the Reviewer refers to the kinetochore pivoting mechanism around a spindle pole, which was recently reported by the Tolic group (Koprivec et al., 2026). Such a pivoting mechanism would work only when the spindle elongates (i.e. the distance between spindle poles is enlarged) after NEBD. Therefore, to address this Reviewer’s question, we tried to assess how PANEM contraction contributes to relocating polar chromosomes when the spindle elongates before or after NEBD in asynchronous U2OS cells (i.e. in the situation where the kinetochore pivoting mechanism is applied or not), as we noted above in response to Point 2. However, spindle elongation after NEBD was rare and mild, and we were unable to address this issue (see our response to Point 2). We discussed this matter in the Discussion section.
 
 (4) Are all the N-CIN- lines with PANEM highly sensitive to azBB? In other words, is PANEM essential for normal congression in some of these lines.
 
 Because blebbistatin could kill cells by inhibiting cytokinesis, the blebbistatin sensitivity of cell growth may not necessarily reflect how essential the PANEM contraction is for chromosome congression.
 
 Instead, we addressed more directly how essential the PANEM contraction is for chromosome congression. We analyzed chromosome congression in RPE1 and HCT116 cells (both are NCIN-) in the presence and absence of pnBB, the inhibitor of PANEM contraction (new Figure 8D, E). With pnBB, these cells showed congression defects, suggesting that the PANEM contraction is essential for chromosome congression in these N-CIN- cells.
 
 (5) Are congression times delayed in lines that naturally lack PANEM?
 
 For example, it takes 10-20 min for HeLa cells (lacking PANEM) to complete chromosome congression after the NEBD (Bancroft et al 2025: https://doi.org/10.1242/jcs.163659). This is not significantly different from the time (8-18 min) for chromosome congression we observed in U2OS cells (which form PANEM). We assume that cells lacking PANEM have developed a compensatory mechanism for efficient chromosome congression – we have discussed possible compensatory mechanisms in the last paragraph of the Discussion (page 17).
 
 (6) Page 23 "we first identified the end of congression" how does this relate to kinetochore oscillations that move kinetochores away from the metaphase plate?
 
 The start of kinetochore oscillation was defined as the end of Phase 4 if we could track the kinetochore until that point. In some cases where the kinetochore became close to the midplane (< 2.5 µm), it was not possible to track it further due to kinetochore crowding around the spindle mid-plane – in such cases, the end of Phase 4 was assigned as the end of tracking. These definitions were not necessarily clear in the original manuscript. Moreover, in the original manuscript, it was not clearly stated that the end of Phase 4 was defined in the same way for both non-polar and polar kinetochores. We have now clarified these points in the Method section (page 25).
 
 (7) Are spindle pole distances (spindle sizes) different in early and late mitotic cells (4min vs 6min after NEBD) in control vs azBB-treated cells? Please comment on Figure S2E (mean distance) in the context of when phase 4 is completed. Does spindle size return to normal after congression?
 
 In Figure S2E (Figure 1 – figure supplement 6 in the revised manuscript), we did not observe a significant difference in the spindle-pole distance (the spindle size) between control and azBBtreated cells at any individual time points. The smallest p-value was 0.094 at 6.0 min. As suggested, we have explained this in the legend for this supplementary figure. Completion of Phase 4 is highly variable across different kinetochores within the same cell; thus, a general comment on its completion timing in cells is not feasible.
 
 Significance:
 
 The current work builds upon their previous work, in which the authors demonstrated that an actomyosin network forms on the cytoplasmic side of the nuclear envelope during prophase. This work explains how the network facilitates chromosome capture and congression by tracking motions of individual kinetochores during early mitosis. The findings can be broadly useful for cell division and the cytoskeletal fields.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.09.19.677380v6
www.biorxiv.org www.biorxiv.org

Bacterial ancestry of the mitochondrial ATP exporter

5
1. Public_Reviews 28 May 2026
  
  in eLife
  
  eLife Assessment
  
  This potentially useful paper presents an intriguing hypothesis about the evolutionary origins of the SLC25 family of mitochondrial carrier proteins common to all eukaryotic life, proposing that all members originated from the ADP/ATP carrier (AAC) and that AAC itself may have emerged from bacterial homologs such as CysZ and YihY. While the phylogenetic analyses and structural searches are reasonable methodologies to explore ancient evolutionary events, the evidence provided here is deemed to provide incomplete support for the conclusion that the mitochondrial ATP transporter is related to CysZ and Yih.
  
  Summary
2. Public_Reviews 28 May 2026
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This paper tries to address an important outstanding issue, which is the evolutionary origin of the SLC25 family of mitochondrial carrier proteins, which are common to all eukaryotic life, with few exceptions. The authors have carried out phylogenetic analyses and DALI searches of AlphaFold databases of bacterial and archaeal membrane proteins. They identify two bacterial proteins, CysZ and YhiY, and they propose that they are progenitors of SLC25 family members. Whilst the paper addresses an interesting topic, the conclusions are not supported by the data and are not presented in an unbiased manner, as they highlight only features that provide some tentative support for the hypothesis. They do not address the large number sequence and structural properties that refute the hypothesis, such as the asymmetric vs three-fold pseudo-symmetric features, hexamer vs monomer, and the complete lack of any conserved motifs with similar functions. Any resemblances between CysZ/YhiY and mitochondrial carriers thus seem to be superficial and could well be coincidental, as they represent generic properties of membrane proteins rather than specific ones, indicative of an evolutionary relationship.
  
  Strengths:
  
  This paper explores the evolutionary origins of the SLC25 family of mitochondrial carrier proteins, which are found across nearly all eukaryotic organisms. They were likely to be present in the last common ancestor of all eukaryotes, around two billion years ago. The question is whether they are of bacterial, archeal or eukaryotic origin. The authors propose that two bacterial proteins, CysZ and YihY, may represent ancestral forms of these carriers, based on structural comparisons of models, a sequence motif, and phylogenetic analyses. While the research addresses an important and longstanding question, the presented evidence does not convincingly support their hypothesis.
  
  Weaknesses:
  
  A central concern is the reliance on structural similarity searches using predicted protein models, since these models are often built using known protein structures as templates, and thus these searches may produce misleading matches. The reported similarities between CysZ, YihY, and mitochondrial carriers are weak and fall within ranges expected for unrelated membrane proteins, which commonly share general structural features, such as helical bundles. Quantitative measures of similarity are low and do not support a shared evolutionary origin. The case for YhiY is extremely poor as neither structure nor sequence features support the claim. Importantly, the opening of the YihY is towards the membrane rather than the water phase, as is the case for carriers, indicating that it has a very different structure and function. The case for CysZ is somewhat better, as it is a helical bundle with two short helices somewhat resembling the matrix helices of mitochondrial carriers, and a short sequence PXDXXK that is part of one of the known sequence motifs of mitochondrial carriers, but this is where the similarities end.
  
  Mitochondrial carriers have a distinctive threefold pseudo-symmetrical structure and a highly complex transport mechanism involving six structural elements. This paper's hypothesis does not explain how such a high level of threefold pseudo-symmetry could have evolved from entirely asymmetric proteins. To complicate matters further, CysZ is not functional as a monomer but forms a functional hexamer, which also explains why it has two half helices rather than two transmembrane helices. Thus, the hypothesis is that CysZ, which is an asymmetric protomer of a functional hexamer, has evolved into a three-fold pseudo-symmetric protein, which is functional as a monomer. A more convincing explanation is that the threefold pseudo-symmetrical structure arose from gene triplication and fusions, with later mutations introducing asymmetry to support diverse substrate binding. In support of this notion, mitochondrial carriers transporting large molecules, such as ATP, show more asymmetry, whereas those for small molecules remain nearly symmetrical. In general, the vast majority of transport proteins arose from gene duplications and fusions of the domains.
  
  Although mitochondrial carriers have a similar sequence motif as found in CysZ (PXDXXK), their roles are very different. In mitochondrial carriers, this motif is located roughly in the middle of transmembrane helices H1, H3, and H5, where proline creates a pronounced kink, bringing the charged residues inward to form a salt-bridge network in the central water-filled cavity. The formation and disruption of this network is essential for the transport mechanism when switching between inward- and outward-open states. In CysZ, the motif is found at the end of a helix and in the following loop at the end of the transporter, with residues pointing outward toward the water phase. These residues are typical of membrane-water interface regions, where proline acts as a helix breaker and charged residues interact with the water phase. Thus, this motif in CysZ does not match the position or function seen in mitochondrial carriers, and its presence is likely to be coincidental, because these residues often occur in the water-membrane region. Importantly, none of the other important conserved three-fold symmetrical motifs of mitochondrial carriers is found in these bacterial proteins, such as the cytoplasmic network [YF][DE]xx[RK], cardiolipin binding sites, ER-links, and sequences of small amino acids, which are critical for its dynamic mechanism.
  
  The phylogenetic relationship is also overstated, as there is no sequence similarity between these proteins other than that occurring because of similar biophysical properties, such as transmembrane helices. The authors suggest that a specific mitochondrial carrier represents the ancestral member of the family, but this conclusion appears to be inferred rather than rigorously demonstrated. Key aspects, such as tree rooting and taxon sampling, are not sufficiently addressed, weakening confidence in the evolutionary claims. Further, the selection of only a few bacterial and archaeal proteomes for analysis limits the study's scope. Broader searches would be necessary to support claims about conservation and ancestry. Independent sequence searches indicate that CysZ and YihY are not widely conserved in the bacterial groups most closely related to mitochondria, undermining the argument that they are plausible ancestors.
  
  Overall, the presented similarities are superficial and can be explained by general features of membrane proteins rather than by specific adaptations to function. The hypothesis that CysZ and YihY are evolutionary precursors of mitochondrial carriers is not supported by the presented data.
  
  Review 1
3. Public_Reviews 28 May 2026
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Here, the authors performed a phylogenetic analysis of mitochondrial ATP/ADP carrier (AAC) proteins. They also performed a structure-based screen for remote homologs, seeking to reveal their evolutionary origins. The authors claim that AACs are found at the root of their family tree, and through a structure-based homolog search protocol, identify putative prokaryotic homologs.
  
  The proposed evolutionary history of AACs is bold and complicated, but the phylogenetic methodology and the way in which the tree is interpreted are incomplete and unconvincing. Further, the structure-based search strategy uses very relaxed cutoffs for fold similarity, which may be fine, but it does not clearly justify this decision. This is potentially very problematic, as I did not find the quantitative or qualitative assessments of fold similarity particularly compelling.
  
  In summary, the authors have presented a bold and extremely interesting hypothesis for the evolution of these proteins, but there is insufficient support for their claims.
  
  Strengths:
  
  (1) The authors are presenting a very interesting hypothesis about the birth of these proteins, including that they may have undergone a radical rearrangement in their sequence at some point in evolution.
  
  (2) The paper makes use of appropriate tools for structure-based homolog identification.
  
  (3) Identification of a conserved sequence motif in these twilight zone proteins would be a rare and interesting occurrence, and could be consistent with their proposed homology.
  
  Weaknesses:
  
  (1) The phylogenetic analysis and its interpretations are incomplete. The authors regularly refer to the root of the tree, and its placement is given central importance. However, the methodology by which they selected the root is unexplained. This is notable, as the proposed root is curious and quite confusing. It implies that (at least) yeast and Paramecium AACs are independently paraphyletic. While certainly not impossible, this evokes quite a complicated evolutionary history. The taxonomy of this gene family, when rooted this way, does not seem to echo the phylogeny of species, suggesting an extremely complex history of duplication/loss and horizontal gene transfer, none of which the authors discuss in detail. Perhaps more clearly and specifically: I'm very surprised by the branching order at the root, where there are three independent branches of fungal proteins, followed by the excavate proteins in a monophyletic clade, followed by several independent branches of the Paramecium proteins. I very much expect incomplete lineage sorting at this evolutionary depth, but this seems extreme to the point that I question if it is accurately placed. More directly: this very much looks like an unrooted tree, presented radially.
  
  (2) The Bayesian and ML trees seem quite incongruent, but this is not discussed. In fact, the text states that they "exhibit a similar tree topology." This is admittedly very difficult to assess without very carefully going over the tree, branch by branch, but there are nevertheless differences, the most obvious being paraphyly vs monophyly of taxon-specific AAC clades. Do the authors have any comments on this, and can they show some sort of consensus tree? How does this affect their interpretation?
  
  (3) Presenting branch support as similarly-sized points makes it nearly impossible to actually judge the strength of support.
  
  (4) The use of structure for remote homology detection is becoming increasingly popular, and in my opinion, is very powerful. But it is still much too early to be taken for granted. The methodology must be justified. Most importantly, the authors have not clearly described why they chose these quantitative cutoffs (I'm mostly thinking of the Dali Z-score cutoff, which here seems very low for a transmembrane protein of this size, as the Z-score is very dependent on alignment length). The authors reference categories defined by tool authors, but why a Z-score of 3, specifically? The same goes for TM scores. There are not yet any defined best practices, to my knowledge, so the authors should independently validate/justify their approach in some way and/or cite and discuss relevant literature (there have been a growing number of these screens using similar approaches in recent years).
  
  (5) The proposed homologs have very little quantitative structural similarity to the query structure, or to each other, as shown in Figure 3 (and hence my concerns about the methodology). Also, I did not find the structural alignments in the supplement or Figure 4 to be qualitatively compelling. They simply appear too different, and I cannot discard this qualitative assessment because the quantitative similarities are likewise very weak. It's not clear to me if this is because the folds are in fact different, or if my view of them is a presentation issue (perhaps it could be improved by visualizing more angles, or more carefully cartooning the similarities and differences).
  
  (6) The authors point out that the alpha-helices are ordered differently in YihY and CysZ, and that their membrane orientation is flipped. Taken at face value, I would view this as evidence against homology. This could perhaps be more reasonably explained as convergent global fold similarity resulting from different underlying structures. However, the authors imply that this may be the result of the transposition of the sequences encoding these alpha helices, yet there is no convincing description or argument concerning when and how this could have occurred. I think this would be a deeply interesting phenomenon, but there is insufficient evidence and discussion to seriously consider whether or not it is homology or convergence.
  
  (7) Following up on comment #5, the authors did perform a very interesting in silico experiment by transposing sequences to reorder the helices. They then note that structural similarity improved. This is very, very interesting, but without other evidence of homology between the transposed alpha helices, I do not think this disproves alternative hypotheses. Does any such evidence exist?
  
  (8) The authors show in Figure 5E-F that sequence transposition flips the membrane orientation, such that YihY and CysZ have extracellular termini (which you would expect from homologs, I suppose). But it is just cartooned and not discussed. Is this computationally or experimentally supported?
  
  (9) The putative presence of a conserved motif would be a very compelling piece of evidence consistent with homology. However, it is not clear to me in the text which proteins actually have the repeats - is it truly just CysZ? What does this mean for YihY? Further, what specifically is being proposed to be homologous? Is SLC25 repeat 2 proposed to be homologous to CysZ repeat 2 (and the same for 3 to 3)? If so, this would seem to have implications for the transposition hypothesis. The helix nomenclature (e.g., H1-6) suggests homology across the proteins (i.e, H1 is homologous to H1); however, wouldn't the presence of these conserved domains instead, for example, suggest homology between SLC H3 and CysZ H2? The authors' conclusions are not clear, and it is difficult to interpret what the implications are for assessing homology.
  
  (10) The sequence retrieval methods are incomplete, so it is impossible to reproduce the searches or to judge their accuracy and scope. What were the E-value cutoffs and other settings used in the searches?
  
  (11) The phylogenetic methods are incomplete. What substitution models were used, and how were they chosen? What branch support method was used? What were the stop conditions of the Bayesian analysis (e.g. did the authors monitor for convergence, and how)? How much of the Bayesian analysis was considered burn-in, if any? And echoing points 1 & 2 above, how were these phylogenies rooted?
  
  (12) Throughout, there is a distinct lack of careful, evolutionarily informative language.
  
  (i) In reference to the phylogeny, the authors frequently refer to "grouping," but it's not entirely clear what this means. Referring to clades and their branching order would be more informative.
  
  (ii) The authors refer to the excavate branch as the "most ancient." Whether or not excavates most closely resemble LECA is somewhat irrelevant, because the branch itself is not the most ancient - it is equally as ancient as its sister branch, which may be all other eukaryotes.
  
  (iii) Likewise, the authors refer to bacterial proteins as "the evolutionary ancestor of mitochondrial AACs," and state that "AAC emerged from the conserved sulfat transporter CysZ." But extant bacteria are not the ancestors of mitochondria - nor are extant proteins descended from other extant proteins. They are, perhaps more accurately, cousins.
  
  (iv) The authors refer to AACs as "evolutionarily founder member of the SLC25 carrier family," but I'm not sure that has a clear evolutionary meaning, unless the authors mean to say that the common ancestor was more AAC-like than anything-else-like. Even if the rooting is accurate, a basal branch does not necessarily reflect the ancestral state.
  
  Review 2
4. Public_Reviews 28 May 2026
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The most important weakness is that the authors have avoided the direct structural comparison of experimentally determined x-ray structures of AAC and CysZ. Instead, the comparisons are made through predicted membrane topologies and predicted structural models of protein homologs, which give rise to misleading results. Direct comparison of the X-ray structures of the ADP/ATP carrier and CysZ clearly shows that these proteins have very different folds. Therefore, flaws in the methods produce results that lead to the wrong conclusions, and the authors have not achieved their aims.
  
  Weaknesses:
  
  (1) Figure 2. There is something very strange about how the tree is drawn, given that S. cerevisiae AAC1, AAC2 and AAC3 share about 76-83% sequence identity but appear to be very diversified in the tree. The phylogenetic trees are only based on the sequences of three species. The authors should explain in much more detail how they made the phylogenetic trees to support their statement that all mitochondrial carriers have come from an ancient AAC.
  
  (2) There are at least three and seven X-ray structures of CysZ (with about 43% sequence identity to the E. coli homolog) and AAC, respectively, deposited in the Protein Data Bank. Therefore, there is no need for the approach using predicted structures as described in the manuscript. It is clear from direct comparison of the CysZ and AAC structures that they have very different folds, i.e. lengths of the transmembrane helices, their orientation and packing. CysZ has been suggested to form dimers or trimers of dimers (eLife 2018;7:e27829), with each protomer formed by two long transmembrane helices and four short helices that do not cross the membrane totally. Thus, CysZ has a different membrane topology and oligomeric state than AAC (monomer with six transmembrane helices). CysZ is therefore rightfully classified in a separate 3D domain fold from mitochondrial carriers in various protein family and domain databases.
  
  (3) In the 3D structures of CysZ, the conserved QYXDYPXDNHK motif is involved in a network of hydrogen bonds and salt bridges thought to hold the helical bundle together (eLife 2018;7:e27829). This motif is similar to PX[DE]XX[KR], a part of the signature motif, typical of mitochondrial carriers, which is repeated three times in the sequences and forms a three-fold pseudo-symmetrical salt bridge network of the so-called matrix gate that opens and closes during the transport cycle. Therefore, although this single motif in CysZ is similar to those of mitochondrial carriers, it is not found in a similar structural context to those in AAC structures.
  
  (4) It appears odd that the sulfate transporter CysZ should be more similar to nucleotide-transporting AAC than any of the other mitochondrial carriers, of which some transport sulfate.
  
  (5) The alphafold model of YihY is not very similar to either the crystal structures of CysZ or AAC.
  
  (6) The authors are relying too much on the TM-score results. The values of 0.5-0.6 between AAC and CysZ or YihY probably reflect that they contain six main helices. However, as noted in point 2, they have very different folds.
  
  Review 3
5. Public_Reviews 28 May 2026
  
  in eLife
  
  Author response:
  
  Thank you for your decision letter with the public review and the recommendations. While we are delighted that the referees feel the work is addressing an outstanding and important issue, they have raised concerns regarding the strength of the support. We will address all the concerns in full in a revised manuscript in the due course. Please find below a couple of general points regarding the referees’ concerns and a proposal as to how we plan to address them.
  
  (1) The idea of the manuscript is to present a plausible solution for a long-standing question in the field of mitochondrial biology and evolution. The fact that the identified solution to the origin of AAC transporters is a remote structural homolog (as you will see in our later detailed response that it is better than any other sequence/structure available till date) is to be expected. If the actual similarities were any better than what we have identified (with a special case of circular permutation), they could have been identified by other simpler structural homology search methodologies.
  
  (2) A recurrent and strong disagreement of the reviewers on the findings presented in this manuscript is rooted on the fact that the structural and sequence relatedness between AAC and CysZ detected in this work are so weak that they can be co-incidental and not an actual evolutionary link. Based on the above, we now searched carefully in all available structural databases such as SCOP, CATH, ECOD etc. whether the above fold link has been noted by others independently. We notice that in the ECOD (Evolutionary Classification of Protein Domains) database only AAC and CysZ are grouped together under a single Possible homology group (X) called ‘Mitochondrial ADP/ATP carrier-like’. The ECOD database contains hierarchical classification of protein domains organized according to their evolutionary relationships and the server is maintained by Prof. Nick Grishin at The University of Texas Southwestern Medical Center.
  
  Link to ECOD database: http://prodata.swmed.edu/ecod/index_af2_pdb.php
  
  Reference: Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, et al. (2014) ECOD: An Evolutionary Classification of Protein Domains. PLOS Computational Biology 10(12): e1003926. https://doi.org/10.1371/journal.pcbi.1003926
  
  Therefore, our study and the independent findings of the ECOD database team together offers greater confidence on the proposed remote evolutionary relationship between AAC and CysZ, and that the structural and sequence similarity we report in the manuscript are not a mere co-incidence. We will also incorporate the details of possible evolutionary relationship between AAC and CysZ identified in the ECOD database in the revised version of manuscript.
  
  (3) One point we would like to stress is that considering all the similarities identified, it very unlikely falls into the class of ‘convergent evolution’. We will make this point explicit in the revised version.
  
  (4) Lastly, while we totally agree that the similarities are in the twilight zone, considering the importance of the problem, we feel that our work would induce researchers from the field of protein design to attempt possible interconversion of the two distantly related transporters thus providing an experimental rationale for the evolution of these transporters.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.03.31.715626v1
www.biorxiv.org www.biorxiv.org

Disentangling Cephalopod Chromatophores Motor Units with Computer Vision

5
1. Public_Reviews 28 May 2026
 
 in eLife
 
 eLife Assessment
 
 This valuable study uses a computer vision pipeline to infer the motor control of cephalopod skin, revealing that individual chromatophores exhibit anisotropic deformations and can be associated with multiple putative motor units. The evidence supporting these claims is convincing, and the authors present some limited electrophysiological validation of the findings from their computational analysis. This work will be of significant interest to biologists studying cephalopod behavior and motor control.
 
 Summary
2. Public_Reviews 28 May 2026
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Renard, Ukrow et al. applied their recently published computational pipeline (CHROMAS) to the skin of Euprymna berryi and Sepia officinalis to track the dynamics of cephalopod chromatophore expansion. By segmenting each chromatophore into radial slices, and analyzing the co-expansion of slices across regions of the skin, they inferred the motor control underlying chromatophore groups.
 
 Strengths:
 
 - The authors demonstrate that most motor units of cephalopod skin include a subregion of multiple chromatophores, creating "virtual chromatophores" between fixed chromatophores. This is an interesting concept that challenges prevailing models of chromatophore organization, and raises interesting possibilities for how chromatophore arrays may be patterned during development.
 
 - This study introduces new analytical approaches of cephalopod skin that will be valuable for the quantitative study of cephalopod behavior.
 
 Weaknesses:
 
 - The authors use patch-clamp experiments in E. berryi to test their approach for inferring motor units. The stimulations indeed evoke expansions of sub-regions of each chromatophore, creating "virtual chromatophores". However, they were not able to predict these motor units from behavioral analysis before confirming them with patch-clamp, limiting the strength of this validation.
 
 - In S. officinalis, chromatophores are far more numerous than in E. berryi and exhibit frequent spontaneous activity, making it more challenging to distinguish shared motor drive. Patch-clamp experiments in this species would provide important validation and strengthen confidence in the method for inferring motor units.
 
 - Although multiple experimental conditions were tested (e.g., age, size, behavioral context, sedation, head-fixation, lighting), data is only shown from a small subset of experiments. Analyzing pooled data across conditions would allow for more generalizable conclusions.
 
 - Different clustering algorithms were used for the two species (HDBSCAN for E. berryi and Affinity Propagation for S. officinalis). Since Affinity Propagation appeared to better capture correlation structure in S. officinalis, it would be informative to reanalyze the E. berryi data using the same method to assess potential algorithm-dependent biases.
 
 Conclusion:
 
 The CHROMAS tool is likely to be valuable to the field, given the need for quantitative frameworks in cephalopod biology. The predictions outlined here provide a useful foundation for future experimental investigation.
 
 Review 1
3. Public_Reviews 28 May 2026
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 Overall, this is an excellent paper, making use of a newly developed system for monitoring the behaviour of chromatophores in the skin of (mostly) free swimming bobtail squid and European cuttlefish. The manuscript is very well written, clearly presented and very well structured. The central finding, that individual chromatophores are connected to multiple motor neurones, is not new. Novelty instead comes from the ability to measure the actuation of chromatophore sections across wide areas of skin in free-swimming animals, showing the diversity of local motor units and reinforcing the notion that individual chromatophores are not necessarily the individual units of colour change, but rather local motor units that cover multiple neighbour and near neighbour chromatophore muscles. This is an excellent finding and one that will shape our understanding of the neural control of cephalopod skin colour. I have a number of minor points below that the authors will need to address before acceptance.
 
 Strengths:
 
 The methodological approach to collecting large amounts of data about local variations in the expansion of sections of chromatophores is exciting, and the analysis pipeline for clustering sections of chromatophores whose spontaneous activity correlated over time is powerful and exciting.
 
 Comments on revisions:
 
 All concerns have been addressed in the revised version of the manuscript.
 
 Review 2
4. Public_Reviews 28 May 2026
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 This study uses high-resolution videography and a custom computer-vision pipeline to dissect the motor control of cephalopod chromatophores in Euprymna berryi and Sepia officinalis. By quantifying anisotropic chromatophore deformations and applying dimensionality reduction methods, the authors infer that individual chromatophores can be a part of multiple motor units. Clustering analyses reveal putative motor units that often span multiple chromatophores, with diverse and overlapping geometries. Chromatophore expansion dynamics are faster and more stereotyped than relaxation, consistent with active neural contraction followed by passive recoil. Together, the results show that chromatophores function not as uniform pixels but as fractionated, coordinately controlled elements that enable flexible pattern generation
 
 Strengths:
 
 The authors present compelling, direct evidence that a). chromatophore deformations are anisotropic, and indirect evidence that b). individual chromatophores can be split across multiple putative motor units. This evidence is provided through data collected over large spatial scales, but also at a sub-chromatophore resolution. This combination of scale and resolution is not possible using traditional neuroanatomical and physiological approaches alone.
 
 The authors also develop a new non-invasive, image analysis approach to extract information about chromatophore deformation across large spatial scales on the organism's body. In principle this approach is applicable across species and may allow for further comparative characterization of chromatophore motor control. It is therefore a promising new tool and useful resource for the community.
 
 Weaknesses:
 
 An important weakness of the work is that the methods the authors develop can only be applied during resting, spontaneous 'flickering' activity of chromatophores to yield interpretable results at the motor unit level. This is because common presynaptic input would confound the identification of individual motor units. Thus, there remains a large difficulty in linking insights about single motor unit organization to the circuit and behavioral levels.
 
 Another weakness of this paper is the rather limited electrophysiological validation of the computational findings. The authors present only one electrophysiology experiment in E. berryi, the species that they used only for 'methodological development' and not for detailed characterization. A complementary electrophysiological experiment in S. officinalis, or some visualization of neuron morphology confirming that motor neurons do indeed project to multiple chromatophores would strengthen the generalizability of their computational analysis. This would be particularly pertinent to validate the author's claim that some motor units contain chromatophores that are quite distant from one another on the animal.
 
 Overall, the authors' technical contributions and method development are an important advance. This work serves as an excellent proof of concept that their method can extract useful information about chromatophore motor control. Further validation of their method is needed to fully trust the fine-scale conclusions drawn about the distribution and composition of multi-innervated chromatophores. Furthermore, the authors raise many interesting ideas about developmental constraints on circuit wiring and potential adaptive significance of multi-innervated chromatophores for certain features of camouflage patterning. Their method may be able to help resolve some of these questions in the future if it is refined and applied across developmental stages, regions on the animal, and across species
 
 Comments on revisions:
 
 Thank you for clarifying my major point of confusion regarding how one might connect these results to behaviorally relevant camouflage. I now have a better understanding of the author's rationale in studying resting activity of motor units and believe that the clarifications added to the manuscript will help other readers who encounter similar confusion.
 
 Review 3
5. Public_Reviews 28 May 2026
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 Public Reviews:
 
 Reviewer #1 (Public review):
 
 Summary:
 
 Renard, Ukrow et al. applied their recently published computational pipeline (CHROMAS) to the skin of Euprymna berryi and Sepia officinalis to track the dynamics of cephalopod chromatophore expansion. By segmenting each chromatophore into radial slices and analyzing the co-expansion of slices across regions of the skin, they inferred the motor control underlying chromatophore groups.
 
 Strengths:
 
 The authors demonstrate that most motor units of cephalopod skin include a subregion of multiple chromatophores, creating "virtual chromatophores" in between the fixed chromatophores. This is an interesting concept that challenges prevailing models of chromatophore organization, and raises interesting possibilities for how chromatophore arrays may be patterned during development.
 
 This study introduces new analyses of cephalopod skin that will be valuable for the quantitative study of cephalopod behavior.
 
 Weaknesses:
 
 The authors chose to image spontaneous skin changes in sedated animals, rather than visually-evoked skin changes in awake, freely-moving animals. Spontaneous chromatophore changes tend to be small shimmers of expansion and contraction, rather than obvious, sizable expansions. This may make it more challenging to distinguish truly co-occurring expansions from background activity. The authors don't provide any raw data (videos) of the skin, so it is difficult to independently assess the robustness of the inferred chromatophore groupings.
 
 The patch-clamp experiments in E. berryi are used to test the validity of their approach for inferring motor units. The stimulations evoke expansions of sub-regions of each chromatophore, creating "virtual chromatophores" as predicted from the behavioral analysis. However, the authors were not able to predict these specific motor units from behavioral analysis before confirming them with patch-clamp, limiting the strength of the validation. It would be informative to quantify the results of the patch-clamp experiments - are the inferred motor units of similar sizes to those predicted from behavior?
 
 The authors report testing multiple experimental conditions (e.g., age, size, behavioral stimuli, sedation, head-fixation, and lighting), but only a small subset of these data are presented. It is difficult to determine which conditions were used for which experiments, and the manuscript would benefit from pooling data from multiple experiments to draw general conclusions about the motor control of cephalopod skin.
 
 The authors use a different clustering algorithm for E. berryi and S. officinalis, but do not discuss why different clustering approaches were required for the two species.
 
 Impact:
 
 The authors use their computational pipeline to generate a number of interesting predictions about chromatophore control, including motor unit size, their spatial distribution within the skin, and the independent control of subregions within individual chromatophores by putatively distinct motor neurons. While these observations are interesting, the current data do not yet fully support them.
 
 The CHROMAS tool is likely to be valuable to the field, given the need for quantitative frameworks in cephalopod biology. The predictions outlined here provide a useful foundation for future experimental investigation.
 
 We thank the reviewer for the thoughtful and detailed evaluation of our work and for recognizing the potential of the CHROMAS pipeline for studying chromatophore control.
 
 We agree that some aspects of the manuscript required clarification and additional explanation, and we have revised the text accordingly. We also now provide access to representative raw video recordings in the Data Availability section. In the E. berryi patch-clamp experiments, single motor neurons evoked expansions of sub-regions of chromatophores, consistent with the “virtual chromatophore” concept. We have now quantified the size of motor units across patch-clamp sessions, and the results show that the inferred motor-unit sizes broadly match those predicted from behavioral recordings, supporting the validity of our approach.
 
 We agree that pooling data across individuals would provide valuable insight into variability across animals. In practice, we recorded chromatophore activity from several animals (14 Euprymna berryi and 12 Sepia officinalis) under different experimental conditions during development of the experimental pipeline. However, acquiring long, stable, artifact-free recordings suitable for motor unit analysis is technically challenging. We now clarify this point in the manuscript. Specifically, we explain that multiple animals were recorded during pipeline development, while the analyses presented focus on recordings with the highest signal quality. We anticipate that the framework introduced here will enable future studies to collect larger datasets and compare motor unit organization across individuals, developmental stages, and species.
 
 HDBSCAN was used for E. berryi during initial exploratory analyses, and Affinity Propagation was adopted for S. officinalis because it better captured the correlation structure of those recordings. We did not re-analyze the E. berryi data with Affinity Propagation, and the implications of algorithm choice are now discussed in the Discussion.
 
 Reviewer #2 (Public review):
 
 Summary:
 
 Overall, this is an excellent paper, making use of a newly developed system for monitoring the behaviour of chromatophores in the skin of (mostly) free-swimming bobtail squid and European cuttlefish. The manuscript is very well-written, clearly presented and very well-structured. The central finding, that individual chromatophores are connected to multiple motor neurones, is not new. Novelty instead comes from the ability to measure the actuation of chromatophore sections across wide areas of skin in free-swimming animals, showing the diversity of local motor units and reinforcing the notion that individual chromatophores are not necessarily the individual units of colour change, but rather local motor units that cover multiple neighbour and near-neighbour chromatophore muscles. This is an excellent finding and one that will shape our understanding of the neural control of cephalopod skin colour.
 
 Strengths:
 
 The methodological approach to collecting large amounts of data about local variations in the expansion of sections of chromatophores is exciting, and the analysis pipeline for clustering sections of chromatophores whose spontaneous activity correlated over time is powerful and exciting.
 
 Weaknesses:
 
 Some minor edits and typographical errors need correcting. I also had some concerns that the preparation for the electrophysiological section of the manuscript complies with the journal's ethical requirements, so I would urge that this be carefully checked.
 
 We thank the reviewer for the positive evaluation of our work and for recognizing the value of the methodological approach and the clarity of the manuscript.
 
 We have carefully reviewed the manuscript and corrected minor typographical errors.
 
 Regarding the ethical considerations raised for the electrophysiological experiments, we have carefully verified that the experimental procedures comply with the journal's ethical requirements and relevant institutional guidelines.
 
 Reviewer #3 (Public review):
 
 Summary:
 
 This study uses high-resolution videography and a custom computer-vision pipeline to dissect the motor control of cephalopod chromatophores in Euprymna berryi and Sepia officinalis. By quantifying anisotropic chromatophore deformations and applying dimensionality reduction methods, the authors infer that individual chromatophores can be a part of multiple motor units. Clustering analyses reveal putative motor units that often span multiple chromatophores, with diverse and overlapping geometries. Chromatophore expansion dynamics are faster and more stereotyped than relaxation, consistent with active neural contraction followed by passive recoil. Together, the results show that chromatophores function not as uniform pixels but as fractionated, coordinately controlled elements that enable flexible pattern generation
 
 Strengths:
 
 The authors present compelling, direct evidence that a). chromatophore deformations are anisotropic, and indirect evidence that b) individual chromatophores can be split across multiple putative motor units. This evidence is provided through data collected over large spatial scales, but also at a sub-chromatophore resolution. This combination of scale and resolution is not possible using traditional neuroanatomical and physiological approaches alone.
 
 The authors also develop a new non-invasive, image analysis approach to extract information about chromatophore deformation across large spatial scales on the organism's body. In principle, this approach is applicable across species and may allow for further comparative characterization of chromatophore motor control. It is therefore a promising new tool and useful resource for the community.
 
 Weaknesses:
 
 An important weakness of the work is that the methods the authors develop can only be applied during resting, spontaneous 'flickering' activity of chromatophores. The inability to reliably apply their technique during any kind of realistic camouflage is a large limitation, as it means this method cannot be used to study the dynamics of motor control during realistic camouflage behaviors.
 
 Another weakness of this paper is the rather limited electrophysiological validation of the computational findings. The authors present only one electrophysiology experiment in E. berryi, the species that they used only for 'methodological development' and not for detailed characterization. A complementary electrophysiological experiment in S. officinalis, or some visualization of neuron morphology confirming that motor neurons do indeed project to multiple chromatophores, would strengthen the generalizability of their computational analysis. This would be particularly pertinent to validate the author's claim that some motor units contain chromatophores that are quite distant from one another on the animal.
 
 Overall, the authors' technical contributions and method development are an important advance. This work serves as an excellent proof of concept that their method can extract useful information about chromatophore motor control. Further validation of their method is needed to fully trust the fine-scale conclusions drawn about the distribution and composition of multi-innervated chromatophores. Furthermore, the authors raise many interesting ideas about developmental constraints on circuit wiring and potential adaptive significance of multi-innervated chromatophores for certain features of camouflage patterning. Their method may be able to help resolve some of these questions in the future if it is refined and applied across developmental stages, regions of the animal, and across species
 
 We thank the reviewer for their thoughtful evaluation and for recognizing the potential of the computational approach introduced in this study.
 
 Regarding the focus on spontaneous chromatophore activity, we have clarified earlier in the Results section why these events are necessary to isolate individual muscle activations. While large camouflage patterns are visually striking, they involve the coordinated activation of many groups of chromatophores by premotor circuits simultaneously, making the identification of individual motor units, our goal here, impossible. Our approach can, however, also be applied during active behavior, including camouflage; the questions addressed there would be different, focusing on how multiple motor units are coordinated to generate the resulting skin patterns, rather than resolving the structure of single motor units. This could be challenging if the patterns of premotor control are highly variable, thus making the detection of meaningful or interpretable motion correlations difficult. This remains to be tested.
 
 We also acknowledge that electrophysiological validation remains limited. Patch-clamp experiments were performed in Euprymna berryi to test predictions generated by the computational analysis, and these experiments confirmed that activation of single motor neurons can produce anisotropic expansion of chromatophore subregions. We now provide the associated datasets in the Data Availability section. We agree that complementary electrophysiological or anatomical experiments in Sepia officinalis would further strengthen the conclusions. Such experiments represent an important direction for future work.
 
 Recommendations for the authors:
 
 Reviewer #1 (Recommendations for the authors):
 
 General points:
 
 (1) Given all the experimental conditions and animals tested, the manuscript would be much stronger if the figures represented pooled data from many animals and experiments (e.g. Figure 1C).
 
 We agree that pooling data from multiple animals would strengthen the manuscript. In practice, we tested these experimental conditions across several animals (14 Euprymna berryi and 12 Sepia officinalis), but we selected the segments shown in the figures for their minimal artifacts and errors. Acquiring high-quality, stable recordings of this type is extremely challenging, and the presented data represents the clearest examples suitable for analysis and visualization. We hope that in the future these methods will enable not only the collection of a larger, high-quality dataset, but also comparisons across individuals, ages, species, and different regions of the mantle.
 
 (2) It's very unclear what animals were used for each experiment:
 
 (a) E. berryi: L677 states that 14 animals were filmed, and L684 implies that non-sedated individuals were used in addition to sedated animals, but it appears all the data is from a single E. berryi with sedation?
 
 The original wording was unclear, so we modified the sentence for clarity. The Methods now specify that 14 animals were filmed to refine the experimental pipeline and explore different conditions, while the data presented in the Results are from a single lightly sedated individual chosen for quality and stability of chromatophore activity.
 
 (b) S. officinalis: L692 onwards states that lots of different conditions and animals were explored, but only minimal data from a couple of animals is described in the figures. L156 states that all (?) the data comes from one head-fixed animal and one sedated and head-fixed animal. L549: The conclusion states that the pipeline was used in freely moving animals, but it appears that all of the S. officinalis were head-fixed? This is very confusing. Rather than describing the conditions of every experiment ever performed, the manuscript would benefit from explicitly stating the experimental conditions used for each figure.
 
 The original text was unclear. We have clarified in the manuscript which animals and experimental conditions were used for the analyses in each figure. To clarify, E. berryi was recorded without head fixation, whereas S. officinalis data were obtained under head-fixed conditions. We did film 11 S. officinalis without head fixation, and data can in principle be extracted from these recordings. Head fixation was used both to minimize visual artifacts and to enable longer, stable recordings, which was important for capturing the highest level of apparent noise in motor unit activation—information that is critical for our analyses of motor-unit organization, though not necessary for studies of broader camouflage patterns. Our computational pipeline enables large-scale analyses that would be very difficult or impossible with traditional electrophysiology, not that all data were acquired from freely behaving animals. While fully unconstrained recordings remain technically challenging due to optical and logistical constraints, we maintain that our approach provides a valid framework for analyzing freely behaving animals.
 
 (c) Additionally, there is a claim that the sedated condition represents the unsedated one (e.g. L151 and L643), but no data is shown to support this. L173 references Figure 6d as evidence, but 6d doesn't exist. Only L210 provides sedation/no sedation statistics for the number of components per motor unit. However, in L643 it says "and motor unit organization remained unchanged". This data needs to be shown to include that statement.
 
 Reference to the inexistant 6d figure was removed. L170 provides statistics for the number of principal components per chromatophore, and L210 provides statistics for the number of components per MU. We do not think a sub-figure is necessary. We, however, agree that L643 “motor unit organisation” is potentially misleading as we only compared the number of chromatophores belonging to a single MU and not the MU shape or distribution. Changed “organization” to “size (in chromatophores)”.
 
 (3) The text needs considerable revision. There are many typos (including multiple instances of "refs" instead of the actual references being inserted). These issues make the manuscript much more difficult to evaluate.
 
 Our apologies. We have now added the missing refs.
 
 (4) It is not clear how convincing the chromatophore groups are. For instance, Figure 4h could alternatively be interpreted as a group of 5 chromatophores in a motor group that happen to co-vary with a sixth one at a great distance. Without seeing some of the raw data (videos), it's difficult to assess how convincing it is that these chromatophores belong to the same group. I recommend analyzing: when multiple chromatophores expand together, what is the likelihood that other chromatophores also happen to expand at the same time (given the frequency that they're all changing shape spontaneously)?
 
 We appreciate the reviewer’s concern. Chromatophores are assigned to the same cluster because their activity, or that of their slices, covaries consistently over time. It is, of course, possible that what appears as a single motor unit may reflect two or more motor neurons acting simultaneously during the recording. Longer video segments increase confidence in the integrity of inferred motor units, but in the absence of a ground truth for motor unit spatial organization in this species at this age, it is difficult to quantify the likelihood that two motor units are being conflated. Raw video data is provided in the Data Availability section. We note, however, that most of the time motor units cannot be readily discerned by eye, because individual chromatophores and their constituent slices fluctuate continuously, and motor-unit correlations are subtle and distributed across multiple chromatophores.
 
 (5) The rationale for focusing on spontaneous activity is introduced relatively late in the manuscript and would benefit from being stated earlier. Examples should be provided of what this looks like (as opposed to regular chromatophore expansion). It would be valuable to see measurements across many experiments of how expanded the chromatophores are - what is the change in surface area? And what is the frequency of expansion for each chromatophore?
 
 Thank you for the remark. This is true. We have added a paragraph at the beginning of the Results section to clarify the rationale for focusing on spontaneous activity.
 
 This section now reads:
 
 “Because our primary aim was to describe the composition and coordination of chromatophore motor units, it was important to examine animals in the absence of the descending commands that occur during active behavior. Spontaneous activity, typically mild and “noisy” was thus ideal to enable measurements of the motion correlations between chromatophores that reflected shared motor neuron drive, rather than shared correlations due to upstream motor neuron groupings by premotor circuits.”
 
 We added an example of video recording of spontaneous activity in our Data Availability section.
 
 While quantifying expansion magnitude and frequency across experiments would indeed be valuable, these questions fall outside the primary focus of the present study, which centers on resolving motor unit organization. In the section “Dynamics of chromatophore expansion and contraction,” we analyze the speed of expansion and contraction to demonstrate that such kinetic features can be reliably detected with the temporal resolution of our video imaging approach. By isolating single muscle activations, we establish a methodological framework that can be used in future work to quantify expansion amplitude, rate of change and frequency across preparations.
 
 (6) Chromatophore expansion was only measured in anesthetized E. berryi, and L679 states that chromatophore expansion was triggered by shining light on the skin. However, light-mediated chromatophore expansion may be mediated by a different mechanism, so chromatophore correlations do not necessarily reflect the underlying motor control.
 
 We agree that there is, in principle, a theoretical risk of direct light-mediated activation of chromatophores. Yet, the kinetics of this light mediated activation are very different, and are the object of a separate, on-going investigation by our groups. In our experiments, the illumination was applied to the whole animal rather than locally to the skin, ensuring that all chromatophores and the eyes were exposed to the same light source. By transitioning from darkness to light, we created a window in which chromatophores were partially expanded—both fully contracted and fully expanded states would show little to no decorrelation. Within this window, we observed spontaneous fluctuations in chromatophore activity, which formed the basis for our correlation analyses. To our knowledge, direct light-mediated expansion of chromatophores has not been reported in E. berryi although it may exist there. Finally, the size, shape, and orientation of the inferred motor units align with electrophysiological evidence, supporting the validity of our motor unit inferences.
 
 (7) Some figures might be better suited for the supplement. For instance, it's not clear what the significance of Figure 5 is (it's not currently sufficiently justified in the text).
 
 We have clarified the purpose of Fig. 5 in both the Results and Discussion sections. In the Results, we now explain that events are separated by amplitude to show that expansion–contraction kinetics can be reliably measured across a full range of chromatophore events, validating the precision of our videographic approach. In the Discussion, we highlight that this precision allows measurement of radial muscle speeds and opens avenues to study chromatophore biomechanics, including the contributions of intertwined forces such as radial muscles, elastic pigment sacs, and intercellular coupling.
 
 (8) Multiple chromatophores can belong to multiple clusters - this study reveals that this is because subsections of a chromatophore are controlled separately. But do the same sections (slices) of chromatophores ever belong to multiple clusters?
 
 Yes, it is possible. Dubas (1985) used videographic recordings to show that the same chromatophore muscle fibers could be activated by stimulation of different nerve bundles, supporting Florey’s (1969) electrophysiological evidence for polyneuronal excitatory innervation. From Dubas: "Usually, different muscle fibres were recruited by each nerve but sometimes a single muscle fibre responded to stimulation of each nerve. Variations of the stimulus voltage also produced gradation of the amplitude of shortening of individual muscle fibres. This supports the evidence above for multiple innervation of single muscle fibres."
 
 The petal-like distribution of motor-neuron influence shows overlapping territories, suggesting that some chromatophore sections may be influenced by multiple neurons. However, this overlap could arise from polyinnervation of individual muscles, the presence of gap junctions between muscles, or passive mechanical coupling due to the elastic properties of the pigment sac.
 
 The petal-like distribution of motor-neuron influence shows overlapping territories, suggesting that some chromatophore sections may be influenced by multiple neurons. However, this overlap could arise from polyinnervation of individual muscles, the presence of gap junctions between muscles, or passive mechanical coupling due to the elastic properties of the pigment sac.
 
 With the present approach, it is not possible to disentangle the relative contributions of these mechanisms, which will require targeted physiological or anatomical experiments. For this reason, we adopted a hard clustering approach for individual chromatophore slices.
 
 (9) All time should be labeled in seconds, not in frames, and all distances should be measured in um or mm, not in pixels.
 
 We chose to present figures in pixels and frames to reflect the native units of our recordings and analyses, which preserves fidelity and reproducibility of the computational pipeline. For biological interpretation, corresponding values are converted to µm in the main text, providing the relevant real-world scale. A scale for conversion is provided in the figure legend.
 
 Specific comments:
 
 (1) L36: I'm not sure the description of virtual chromatophores here is clear enough to make sense to a more general audience.
 
 Addressed. We retained the concept of ‘virtual chromatophores’ in the abstract and added a brief clarifying phrase to indicate that these are functional groupings of adjacent chromatophore territories that act as single units.
 
 (2) L50: "Rimmed by" - consider rephrasing.
 
 Addressed. Replaced with “surrounded”.
 
 (3) L64: "refs" - actual references aren't inserted. There are multiple other examples of this.
 
 Addressed. Added missing references.
 
 (4) L100: This section could use rewriting. Some of the text reads more like a figure legend.
 
 Addressed. We have streamlined the main text to reduce redundancy with the figure legend.
 
 (5) L101: Consider the opening sentence/s providing a more general introduction to the question and approach.
 
 Addressed.
 
 (6) L104: This implies that the data presented are from 14 animals of many ages. This is only relevant if the pooled data is analyzed and presented.
 
 We agree that the original phrasing was ambiguous. We have modified the sentence for clarity, and explain in the Methods that 14 animals were filmed to refine the pipeline and explore experimental conditions, while the analyses shown are from a single animal.
 
 (7) L111: HDBSCAN should be defined.
 
 Addressed. The acronym has been expanded.
 
 (8) L173: Figure 6D doesn't exist.
 
 Addressed. Reference to the inexistent 6d figure was removed.
 
 (9) L193: "excluding negative (contraction) phases" This phrase requires clarification.
 
 Addressed. Added “see Methods” in the legend and added clarification on the reasoning in Methods.
 
 (10) L204: Should explain why the switch to affinity-propagation clustering was made when a different method was used for E. berryi.
 
 Addressed in discussion.
 
 (11) Figure 3: I recommend including a diagram or image of a whole cuttlefish and showing what the corresponding imaging area was in relation to the animal so the reader gets an intuitive sense of scale.
 
 Thank you. We have added a supplementary figure to give the reader a sense of scale.
 
 (12) L221/Fig 3b: These colors are supposed to represent clusters of 3 to 5 chromatophores? The clusters look much bigger.
 
 The figure shows clusters of 3 to 5 chromatophores, but many adjacent clusters were assigned the same color. We have changed the colors to remove this ambiguity.
 
 (13) Figure 3c: This would be more powerful if it represented the combined data of many experiments to draw a general conclusion. Also, shouldn't these cluster sizes match those in 2e, e.g. they get as big as 40?
 
 We assume the reviewer is referring to a comparison between Figures 3c and 2e. For visualization purposes, the graph in 3c was truncated to display over 90% of the data, which explains why the largest clusters appear smaller than in 2e. We modified the legend accordingly. We agree that the results would be strengthened by pooling data from additional experiments; however, acquiring high-quality, artifact-free recordings suitable for motor unit analysis is extremely challenging. We hope that our framework will enable future studies to extend this analysis.
 
 (14) Figure 4: I would show some of these examples earlier, to give the reader an intuitive sense of the data and claims (though it doesn't need its own figure - provide a couple of examples, and the diagram of how much of the mantle you're sampling) then put the rest in the supplement, and include some videos too.
 
 We agree that providing spatial context is important for readers to develop an intuitive understanding of the dataset. However, introducing examples of motor units earlier in the manuscript would, in our view, interrupt the logical progression of the Results, where motor unit identification builds on prior analyses. To address the reviewer’s concern, we have added a new supplementary figure (Fig. S1) illustrating the size and location of the sampled mantle region. In addition, we now provide representative videos in the Data Availability section to give readers direct visual access to the underlying dynamics.
 
 (15) Figure 4f: Is the location of the split color in each dot accurate? It's surprising that each one is split down the middle, and the pink side is always on the right - this is unintuitive given where the motor neuron is likely to be located.
 
 The dots and half dots represent the membership of a chromatophore to a particular cluster.
 
 (16) Figure 5: I didn't find this figure sufficiently justified in the text. I would move this to the supplement.
 
 Addressed in General point #7.
 
 (17) L350: States that 12 animals were patched, but the data isn't shown. It's important to show all of this data (some of which can be in the supplement).
 
 Addressed. We provided the data in the Data Availability Section.
 
 (18) Figure 5: I would quantify how many chromatophores were in each motor group across all the recording sessions, and compare this to the equivalent behavioral analysis.
 
 We assume the reviewer means Fig. 6. We calculated and stated the size of motor units across patching sessions.
 
 (19) Figure 5c: I recommend labeling each panel with a different number so you can refer to specific data.
 
 We assume the reviewer means Fig. 6c. We consider the figure layout clear enough to allow readers to follow the data without additional panel numbers.
 
 (20) L379: Typo: repeat of "quantitative"
 
 Addressed.
 
 (21) L576: Salinity should be 33-36 ppt, not %
 
 Addressed.
 
 (22) L877: The salinity units are sg? That should be stated. Though I would use the same units for salinity throughout.
 
 Addressed.
 
 Overall, this work introduces a potentially valuable quantitative framework for studying chromatophore dynamics. Addressing the points above would substantially strengthen the manuscript and clarify the scope and support for its conclusions.
 
 We thank the reviewer for these many helpful comments.
 
 Reviewer #2 (Recommendations for the authors):
 
 (1) Line 64 - missing references for chromatophore colour with age.
 
 Addressed. Added missing refs.
 
 (2) Line 64-65 - would be good to have a little more detail about what is meant by 'migrating through the skin'. Is this a lateral process, or depth in the skin?
 
 Addressed. Changed “migrating in the thickness..” with “through the thickness..” to emphasize verticality.
 
 (3) Line 72 - typo, should read '...individual and groups...'
 
 Addressed.
 
 (4) Remove 'In Fig 1, ...' from line 104.
 
 Addressed.
 
 (5) Figure 1 - It's unclear why some chromatophores are uncoloured with a red dot in the centre. Are these chromatophores that do not share a cluster with neighbours? If so, wouldn't it make more sense to colour the chromatophore with a unique colour of its own? Or, at the very least, make a note in the caption to indicate that all white chromatophores are not clustered with neighbours.
 
 Segmented chromatophores are shown in white, with coloured slices highlighting cluster membership. Uncoloured slices represent outliers. Addressed in the figure legend.
 
 (6) Line 119 - the concept of a 'closed virtual chromatophore' needs a few more words of explanation. The way I interpret the text as it is, is that the motor units driving colour change are not necessarily the individual chromatophores, but a motor region containing a mixture of whole and partial chromatophores innervated by the same motor neuron. If this is the case, a few extra words of description would help here to remove any ambiguity as I think this is an important concept for the paper.
 
 Addressed. We added a sentence clarifying the concept.
 
 (7) Line 173 - Figure 6d doesn't exist in the paper. Was a different panel intended? If so, please make sure to number the figures in order of appearance in the manuscript.
 
 Reference to the inexistent figure 6d was removed.
 
 (8) Figure 3b is very difficult to see. Perhaps consider lightening the background image. Please also indicate whether the individual colours refer to individual clusters. If this is the case, then some of these clusters look much larger than the 3-5 suggested in the caption.
 
 This issue has been corrected.
 
 (9) Line 210 - remove the bold type.
 
 Addressed.
 
 (10) Line 211 - please specify which 'two groups' you are referring to here. Presumably, this is anaesthetised and non-anaesthetised.
 
 Addressed.
 
 (11) I think that the text is missing any indication of the pixel sizes involved in extracting slice metrics, particularly from the S. officinalis data. It would be great to include some data on how many pixels span the radius of an expanded chromatophore. There is some small indication of this in Figure 2a, but a panel or two with details about the pixel size of S. officinalis chromatophores and their slices would be welcome. This would help with the judgment of the robustness of the resolution of the analysis. Looking at the y-axis in Figure 5a, there is some indication that the chromatophore radius is only 1 to 8 pixels. Is this the case?
 
 Figure 5a doesn’t show chromatophore radius but instead the relative change in peak amplitude during an expansion event. At that point the chromatophore has likely a larger radius as you sum the baseline radius of the chromatophore + the size of the peak.
 
 (12) Line 246-7 - reword this sentence to avoid referring to Figure 3d in the narrative. Include it in parentheses instead.
 
 Addressed.
 
 (13) Lines 408 and 409 - missing references.
 
 Addressed.
 
 (14) Line 576 - salinity should be reported in parts per thousand, not per cent.
 
 Addressed.
 
 (15) Line 593 - how were animals <50mm fed?
 
 Animals smaller than 50 mm were fed Neomysis spp. or small Palaemonetes spp., as noted a few lines above the description for animals larger than 50 mm.
 
 (16) Line 847 - typo - '...putative motor units' ramifications...'
 
 Addressed.
 
 (17) Line 854 - better to write out the [chrom_id, label] info as narrative text rather than using the variable names.
 
 Addressed.
 
 (18) Line 876 - two typos '...were reared in an artificial...'
 
 Addressed.
 
 (19) Line 877 - please use the same salinity metric as used in the earlier part of the methods.
 
 Addressed.
 
 (20) Section 898-910 - equipment details would ideally include the location of the company. E.g. (BX51W1, Olympus, Tokyo, Japan).
 
 Addressed.
 
 Reviewer #3 (Recommendations for the authors):
 
 I am left with a number of questions that arise from the authors' work, some of which the authors themselves briefly mention in the technical limitations section.
 
 (1) In relation to the first weakness, do the authors know if the recruitment patterns they identify are likely to be the same when octopi perform visually-mediated camouflage to their environment?
 
 Thank you for this comment. We assume the reviewer is referring to S. officinalis. There seems to be a misunderstanding: our approach is designed to reveal the smallest independent functional units—motor units—that together generate skin patterns. The technique is fully applicable to an animal displaying camouflage, but the results would necessarily differ. Camouflage patterns are composed of relatively large shapes compared to individual motor units and arise from the coordinated activation of multiple units. Disentangling motor units requires decorrelated activity, whereas visually-evoked camouflage inherently drives correlated motor-unit activation by premotor control. To use an analogy, if our goal were to map the distribution and wiring of pixels on a screen, it would be more informative to broadcast a noise signal rather than display coherent images, as the noise produces decorrelated activity that allows the underlying structure to be resolved. We have clarified this important point in the early results section.
 
 (2) The authors provide indirect evidence that motor neurons innervate multiple chromatophores. Can sets of radial muscles within a chromatophore be innervated by multiple motor neurons? Is there neuroanatomical evidence or experiments that could perhaps shed light on this?
 
 Addressed above. Same question as #1(8).
 
 (3) Are multi-innervated chromatophores evenly distributed across the octopus's body? For instance, could the authors compare chromatophore recruitment over multiple patches on the animal from multiple regions?
 
 At present, we do not have sufficient data to quantitatively compare motor-unit structure or the distribution of multi-innervated chromatophores across different body regions of cuttlefish. However, we would not necessarily expect uniformity across the skin, as distinct body regions are associated with characteristic pattern elements (e.g., the white square on the central mantle or the thicker zebra stripes along the sides). It is therefore plausible that different motor-unit geometries and densities are differentially represented across regions to support these region-specific patterns. Future recordings spanning multiple patches and body locations will be required to test this question directly.
 
 (4) Relatedly, is there any idea of whether chromatophore size or age corresponds with the number of motor units within a single chromatophore?
 
 At present, our analyses are limited to single developmental time points, and we therefore cannot directly assess whether chromatophore size or age correlates with the number of motor neurons innervating an individual chromatophore. However, this is a question that our analysis framework is explicitly designed to address. Our custom pipeline, CHROMAS, (Ukrow, Renard et al., 2025) includes tools for longitudinal image alignment that allow chromatophores to be tracked within the same animal across development. Applying these scripts to developmental datasets enables future analyses linking chromatophore growth or age to changes in the motor innervation of single chromatophores.
 
 I understand that a full resolution to the issues raised above may require substantial additional experiments. At a minimum, further discussion of these points with integration of existing literature would elevate the paper.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2025.11.30.691401v2
www.biorxiv.org www.biorxiv.org

When word order matters: human brains represent sentence meaning differently from large language models

4
1. Public_Reviews 28 May 2026
  
  in eLife
  
  eLife Assessment
  
  The paper presents a valuable finding that the human brain and models that incorporate sentence structures can capture sentence-level semantics beyond word meaning, while large language models behave differently. The evidence supporting the authors' claims is solid, though the stimuli are highly controlled and some analyses could be more thorough. This work will be of interest to researchers in language neuroscience and those developing language models.
  
  Summary
2. Public_Reviews 28 May 2026
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This paper investigates whether transformer-based models can represent sentence-level semantics in a human-like way. The authors designed a set of 108 sentences specifically to dissociate lexical semantics from sentence-level information and collected 7T fMRI data from 30 participants reading these sentences. They conducted representational similarity analysis (RSA) comparing brain data and model representations, as well as the human behavioral ratings. It is found that transformer-based models match brain representation better than static word embedding baseline which ignores word order but fall short of models that encode the structural relations between words. The main contributions of this paper are:
  
  (1) The construction of a sentence set that disentangles sentence structure from word meaning.
  
  (2) A comprehensive comparison of neural sentence representations (via fMRI), human behavior, and multiple computational models at the sentence level.
  
  Strengths:
  
  (1) The paper evaluates a wide variety of models, including layer-wise analysis for transformers and region-wise analysis in the human brain.
  
  (2) The stimulus design allows precise dissociation between lexical and sentence-level semantics. The RSA-based approach is empirically sound and intuitive.
  
  (3) The constructed sentences, along with the fMRI and behavioral data, represent a valuable resource for studying sentence representation.
  
  Weaknesses:
  
  (1) The rationale behind averaging sentence embeddings across multiple transformer models (with different architectures and training objectives) is unclear. These transformer-based models have different training paradigms and model architectures, which may result in misaligned semantic spaces. The averaging operation may dilute the distinct sentence representations learned by each model, potentially weakening the overall semantic encoding for sentences. Please clarify this choice or cite supporting methodology.
  
  (2) All structure-sensitive models discussed incorporate semantics to some extent. Including a purely syntactic baseline, such as a model based on context-free grammar, would help confirm the importance of syntactic structures.
  
  (3) In Figure 2, human behavioral judgments show weak correlations with neural data, and even fall below those of computational models, suggesting the behavioral judgments may not reflect the sentence structures in a brain-like way. This discrepancy between behavioral and neural data should be clarified, as it affects the interpretation of the results.
  
  (4) To better contextualize model and neural performance, sentence similarity should be anchored to a notion of semantic "ground truth", such as the matrix shown in Figure 1a. Comparing this reference with human judgments, brain responses, and model similarities would help establish an upper bound.
  
  (5) The structure of this paper is confusing. For instance, Figure 5 is cited early but appears much later. Reordering sections and figures would enhance readability.
  
  (6) While the analysis is broad and comprehensive, it lacks depth in some respects. For instance, it remains unclear what specific insights are gained from comparing across brain regions (e.g., whole brain, language network, and other subregions). Similarly, the results of simple-average and group-average RSA appear quite similar and may not advance the interpretation.
  
  (7) While explaining the grid-like pattern due to sentence length is important, this part feels somewhat disconnected from the central question of this paper (word order). It might be better placed in supplementary material.
  
  Comments on revised version:
  
  The new version of the paper has addressed my main concerns, including:
  
  (1) clarification about the methodology of Transformer embeddings
  
  (2) discussion about the purely syntactic models
  
  (3) discussion about the low correlation between behavioural ratings and brain activations
  
  (4) better structure of the paper
  
  (5) clarification about pre-registration
  
  I believe the paper has been substantially improved after revision.
  
  Review 1
3. Public_Reviews 28 May 2026
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  Large Language Models have revolutionized Artificial Intelligence and can now match or surpass human language abilities on many tasks. This has fuelled interest in cognitive neuroscience in exposing representational similarities between Language Models and brain recordings of language comprehension. The current study breaks from this mold by: (1) Systematically identifying sentence structures for which brain and Large Language Model representations diverge. (2) Accounting for such sentence structures using a model structured by semantic roles. As such the study may now fuel interest in characterizing how Large Language Models and brain representations differ, which may prompt new more brain like language models.
  
  Strengths:
  
  * This study presents a bold challenge to a literature trend that has touted similarities between Transformer models and human cognition based on representational correlations with brain activity. This challenge is substantiated by identifying sentences for which brain and model representations of sentences diverge.
  
  * This study conducts a rigorous pre-registered analysis of a comprehensive selection of the state-of-the-art Large Language Models, on a controlled sentence comprehension fMRI dataset. The analysis is conducted within a Representation Similarity framework to support similarity comparisons between graph structures and brain activity without needing to vectorize graphs. Transformer models are predicted and shown to diverge from brain representations on subsets of sentences with similar word-level content but different sentence structures.
  
  * The study introduces a 7T fMRI sentence comprehension dataset and accompanying human sentence similarity ratings which may be a fruitful resource for developing more human-like language models. Unlike other model-based sentence datasets, the relation between grammatical structure and word-level content is controlled, and subsets of sentences for which models and brains diverge are identified.
  
  Weaknesses:
  
  * The interpretation of findings is nuanced. Although Transformers underperform as brain models on the critical subsets of controlled sentences, a Transformer outperforms all other models when evaluated on the union of all sentences when both word-level content and structure vary. Transformers also yield equivalent or better models of human behavioral data. Thus, although Transformers have demonstrable flaws as human models which are pinpointed here, in the general case (some) Transformers are more human-like than the other models considered.
  
  * There may be confounds between the critical sentence structure manipulations and visual processing. This is inconvenient because activation in brain regions that process semantics tends to partially correlate with low-level representations of sentence surface features encoded in visual cortex. Although the study commendably controls for confounds associated with sentence length, correlations with the key sentence structure models are most salient in visual cortex and diminish in other brain networks when V1-V4 activation is controlled for.
  
  * Sentence similarity computations are emphasized as the basis for unifying comparative analyses of graph structures and vector data. A strength of this approach is that correlation is not always the ideal similarity metric. However, a weakness is that similarity computations are not unified across models. This has practical consequences because different similarity metrics applied to the same model produce positive or negative correlations with brain data and repeating analyses with a different representational dissimilarity measure seems to produce some anomalous results.
  
  Review 2
4. Public_Reviews 28 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public review):
  
  (1) The rationale behind averaging sentence embeddings across multiple transformer models (with different architectures and training objectives) is unclear. These transformer-based models have different training paradigms and model architectures, which may result in misaligned semantic spaces. The averaging operation may dilute the distinct sentence representations learned by each model, potentially weakening the overall semantic encoding for sentences. Please clarify this choice or cite supporting methodology.
  
  The reviewer questions the rationale for averaging sentence embeddings across different models. However, our method involves computing correlations separately for each model, then averaging the correlations. We apologize for the confusion. We have clarified this on page 3:
  
  “Results for the ‘Transformers’ model are computed by computing correlations separately for five different transformer models and then taking a simple average of these correlations. Results for each individual transformer are presented in Supplementary Information Figure S2.”
  
  (2) All structure-sensitive models discussed incorporate semantics to some extent. Including a purely syntactic baseline, such as a model based on context-free grammar, would help confirm the importance of syntactic structures.
  
  Following the suggestion, we have implemented two syntactic models and discuss the results on page 10:
  
  “We also found that purely syntactic models based on constituency parses (see Benepar and CFG) show poor correlations with brain activity (see Supplementary Information Figure S2). Examining the corresponding RSA matrices (see Figure S1), this seems to be due to such models being overly sensitive to syntactic form, and relatively insensitive to which words are assigned to different nodes within the syntactic tree. This is most evident for the edit-distance similarity metric, and to a lesser extent also for the subtree similarity metric. This finding highlights the value of hybrid approaches designed to appropriately balance sensitivity to lexical, syntactic, and compositional information in representing semantic information at the sentence level.”
  
  (3) In Figure 2, human behavioral judgments show weak correlations with neural data, and even fall below those of computational models, suggesting the behavioral judgments may not reflect the sentence structures in a brain-like way. This discrepancy between behavioral and neural data should be clarified, as it affects the interpretation of the results.
  
  While the behavioural judgements are made by different participants and involve a different task than the neuroimaging results, nonetheless we agree the difference is surprising and warrants more detailed consideration. We have included a more detailed discussion of this issue on page 11:
  
  “Our study has several limitations. First, we found a surprisingly low correlation between behavioural ratings and brain activations (see Figure 2). This may be partly explained by differences in task structure. In the behavioural experiment, participants viewed many pairs of related sentences, and were explicitly asked to pay attention to differences in the words of each sentence. In contrast, in the fMRI task, participants read one sentence at a time without an explicit comparison. In addition, we suspect that presentation of so many sentence pairs with highly similar structures may have biased the way in which participants rated sentence similarity. Modifications to the behavioural task to mitigate these aspects may reduce the divergence between behavioural and brain findings.”
  
  (4) To better contextualize model and neural performance, sentence similarity should be anchored to a notion of semantic "ground truth", such as the matrix shown in Figure 1a. Comparing this reference with human judgments, brain responses, and model similarities would help establish an upper bound.
  
  While our design matrix served as the basis for constructing a set of stimuli with systematic modifications, we respectfully suggest that it should not be regarded as a ‘semantic ground truth’. Sentence pairs within each category will not have the same degrees of semantic similarity since the words and context differ across sentences in a graded manner. Furthermore, while we anticipated ‘different’ sentence pairs would be less similar than ‘swapped’ sentence pairs, and that within each of the six block diagonals the ‘modified’ or ‘substituted’ sentence pairs would be the most similar, we did not have any prediction about the magnitude of these differences. Our goal was to construct a set of sentence pairs which spanned a range of semantic similarities, and allowed for dissociation between lexical similarity and overall similarity in meaning. The design matrix is not intended to represent a ‘ground truth’ that human judgements or brain representations would be expected to conform with.
  
  (5) The structure of this paper is confusing. For instance, Figure 5 is cited early but appears much later. Reordering sections and figures would enhance readability.
  
  We agree that placement of figures was not ideal in the previous draft. We have reworked the manuscript so that all figures appear closer to their mention in the text, and the figure (now Figure 3) appears in the correct order. We have also substantially revised the discussion, and included subheadings to help guide the reader through the various different issues we include.
  
  (6) While the analysis is broad and comprehensive, it lacks depth in some respects. For instance, it remains unclear what specific insights are gained from comparing across brain regions (e.g., whole brain, language network, and other subregions). Similarly, the results of simple-average and group-average RSA appear quite similar and may not advance the interpretation.
  
  We included both analyses in line with our preregistration, and also because we believe the fact that two distinct approaches to analyzing the data yield similar results strengthens our conclusions.
  
  (7) While explaining the grid-like pattern due to sentence length is important, this part feels somewhat disconnected from the central question of this paper (word order). It might be better placed in supplementary material.
  
  We believe that the grid-like pattern in the RSA results is an important unexpected finding that warrants discussion in the main manuscript.
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) Consider including a purely syntactic baseline model. For instance, parse each sentence into a constituency tree and compute tree edit distances between pairs of trees. This would allow you to construct a sentence similarity matrix based solely on syntactic structure, and may clarify the role of syntax in sentence representations.
  
  See our response to Public Review comment 2.
  
  (2) Instead of averaging embeddings across different transformer-based models, I recommend reporting RSA results for each model individually. For instance, compare one sentence-level model (e.g., SentBERT or SimCSE) and one general-purpose language model (e.g., GPT-2 or Llama).
  
  See our response to Public Review comment 1.
  
  (3) I suggest revisiting the structure of the Results section to improve the clarity and impact of your key findings. Consider which results are most central to the paper's claims and ensure they are presented in the main text. Less central analyses (e.g., the analysis on the grid-like pattern) might be better suited for the supplementary information. Presenting behavioral results prior to neuroimaging results could also improve logical flow by first validating model similarity estimates behaviorally.
  
  As mentioned in our response to Public Review comment 5, we have revised the ordering of the figures to improve the flow of the main manuscript. We believe that the grid-like pattern in the RSA results is an important unexpected finding that warrants discussion in the main manuscript. In addition, we believe that presenting the neuroimaging results first is appropriate as this is the primary and most important contribution of our study.
  
  Reviewer #2 (Public review):
  
  (1) The stimuli are not fully controlled for lexical content across conditions. Residual lexical differences between sentences could still influence both brain and model similarity patterns. To more cleanly isolate syntactic effects, it would be useful to systematically vary only a single structural element while keeping all other lexical content constant (e.g., the boy kicked the ball / the ball kicked the boy). It would be better to engage more with the minimal pair paradigm, which is widely used in large language model probing research.
  
  The reviewer rightly argues that our stimuli do not fully control for lexical content across conditions, and that a more appropriate paradigm may be to utilise minimal pairs in which only a single variable of interest (such as sentence structure) is modified. We agree that most of our sentence pairs do not constitute minimal pairs; however, this was not our objective. Our study design aimed to synthesise traditional minimal pair approaches with more recent research paradigms using naturalistic stimuli. As such, we selected stimuli which are more complex and contain more variable features than traditional minimal pair studies, but which also are tailored to highlight differences which are of particular theoretical interest.
  
  Because we are interested in comparing the effects of multiple sentence elements and semantic roles, a systematic pairwise comparison of minimal pairs is not necessarily optimal. Instead, we designed our stimuli to leverage the advantage of fMRI in that we can measure the brain representations corresponding to each sentence, and hence can conduct a full series of pairwise comparisons of sentence representations. We do not claim this approach to be universally superior to a minimal pair approach, but we do believe our novel approach provides additional insights and a new perspective on semantic representation relative to minimal pair studies.
  
  We have added the following paragraph on pages 9-10 contrasting our approach to previous minimal-pair studies:
  
  “Another approach that has seen widespread use is the presentation of minimal sentence pairs that differ only in one specified aspect, for example, interchanging subject and object in a sentence (Frankland 2015, Wang 2016, Frankland 2020, Giglio 2024), or altering adjective-noun phrases to influence composition (Graves 2010, Schell 2017, Fyshe 2019, Ciapparelli 2025). Our approach is an extension of these approaches utilising more naturalistic and complex sentences, designed to facilitate comparison of a wider range of structural manipulations (see Table 1). In more completely characterising the representational structure of various computational models in response to different structural contrasts, we can more comprehensively evaluate their adequacy as models of semantic processing in the brain.”
  
  (2) The comparisons are done across fundamentally different model types, including static embeddings, graph-based parsers, and transformers. The inherent differences in dimensionality and training objectives might make the conclusion drawn from RSA inconclusive. Transformer embeddings typically occupy much higher-dimensional, anisotropic representational spaces, and their similarity structure may reflect richer, more heterogeneous information than models explicitly encoding semantic roles. A lower RSA correlation in this study does not necessarily imply that transformers fail to encode syntactic information; rather, they may represent additional aspects of meaning or context that diverge from the narrow structural contrasts probed here.
  
  The reviewer notes that low RSA correlations do not necessarily imply that transformers fail to encode syntactic information. We acknowledge this in our discussion (page 10), where we also highlight that our focus is not on whether transformers encode such information, but rather what transformer representations can tell us about how sentence structure is represented in the brain. Our results indicate that transformer embeddings do not have the same geometric properties as brain representations of sentence meaning, at least for certain types of sentences where lexical information is insufficient to determine overall meaning.
  
  The reviewer also notes that transformer embeddings are highly anisotropic; however, we adjust for this by normalising each feature as discussed on page 14. Finally, the reviewer notes that the transformers we examine differ in architecture and training objectives. This is not critical for our study because we are not seeking to determine which architecture or training objectives are best. Our goal is simply to compare a range of approaches and see which, if any, have similar sentence representations to those formed by the brain. In fact, our results indicate that architecture and training regime make relatively little difference for our stimuli, as shown by the pattern of results for all models in Figure S2.
  
  (3) The interpretation of the RSA correlation largely depends on the understanding of models. The authors suggest that because hybrid models correlate better than transformers, this implies that transformers are inferior at representing syntax. However, this is not a direct test of syntactic ability. Transformers may encode syntactic information, but it may not be expressed in a way that aligns with the RSA paradigm or the chosen stimuli. RSA does not reveal what the model encodes, and the models might achieve a good correlation for non-syntactic reasons (e.g., length of sentence, orthographic similarity, lexical features).
  
  The reviewer argues that RSA correlations do not measure the extent to which a model encodes syntactic information. This is very similar to the previous point. We do not claim that our results show that transformers do not encode syntactic information. Rather, our claim is that sentence embeddings derived from transformers have different geometric properties to brain representations, and that brain representations are better described by models explicitly representing key semantic roles. From this we conclude that, at least for the sentences we present, the brain is highly sensitive to semantic roles in a way that transformer representations are not (at least to the same extent). We have clarified this in a modified paragraph on page 11:
  
  “We emphasise that our results do not show that transformers fail to represent syntactic or semantic role information. Indeed, large language models show clear capabilities of correctly interpreting sentence structure (Chang 2024), and probing studies have found that transformers represent information about syntax and word order (Clark 2019, Manning 2020). This is consistent with our finding that directly prompting GPT-4 to rate sentence similarity yields very high correlations with human judgements (see Supplementary Information Figure S3). Nonetheless, the fact that transformers can encode and utilise structural information to perform linguistic tasks does not mean that they effectively utilise this information to construct a brain-like representation of sentence meaning.”
  
  We also respectfully disagree with the reviewer’s suggestions that sentence length and orthographic or lexical similarities may drive model correlations with brain activity. As we discuss on page 19, we explicitly control for differences in sentence length when computing correlations. Our process for constructing our sentence set also controls for lexical similarity by generating pairs of sentences with all or mostly the same words but different orderings. We did not explicitly address orthographic similarity, but this will be strongly correlated with lexical similarity.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Model dimensionality: the interpretability of cosine similarity diminishes as the dimensionality increases, and there are some math tricks to work around it. To make a fair comparison among models with different dimensionalities, it would be better to apply some dimensionality-insensitive distance metrics.
  
  We thank the reviewer for this suggestion. We repeated all vector-based similarity calculations using the Dimension Insensitive Euclidean Metric (DIEM). As shown in Figure S9, the results are broadly similar, though with overall somewhat lower brain correlations for most transformers compared to cosine similarity.
  
  (2) Depending on the scope of the current study, if the authors would like to establish whether transformers are inferior to graph-based models in representing syntax, a linear classifier using the model embeddings would be sufficient. I think this would be a more direct assessment of model syntax ability than correlation with brain data.
  
  As we discuss in our previous responses, our objective in this study was not to assess how well transformers can represent syntax. Rather, the goal was to assess whether internal transformer representations have similar geometric properties to patterns of brain activation. Our results indicate that transformers do represent sentence structure, but in a different manner to the human brain.
  
  Reviewer #3 (Public review):
  
  (1) The interpretation of findings is nuanced. Although Transformers underperform as brain models on the critical subsets of controlled sentences, a Transformer outperforms all other models when evaluated on the union of all sentences when both word-level content and structure vary. Transformers also yield equivalent or better models of human behavioral data. Thus, although Transformers have demonstrable flaws as human models, which are pinpointed here, in the general case, (some) Transformers are more human-like than the other models considered.
  
  The reviewer argues that we overstate some of our conclusions, as several transformers achieve higher brain correlations than the hybrid model when computed over all sentence pairs, as well as on the behavioural data. In response, we first note that our primary interest in this paper is on the block diagonal sentence pairs, as these were specifically designed to interrogate how different models represent sentence structure. The comparison with all sentence pairs is presented for comparison but is not our primary focus on this paper, as also reflected in the pre-registered prediction that our VerbNet-CN hybrid model would show higher brain correlations than transformers over this block diagonal subset.
  
  Second, we have included a new analysis in the revised manuscript (Figure S9) where we compute brain correlations controlling for the pattern of similarities observed in the primary visual cortex (averaged over participants), as a way to control for visual similarity. This added control substantially reduces the brain correlations of the transformers, such that they all have lower correlations than VerbNet-CN and AMR-smatch even over the set of all sentence pairs. We provide interpretation of this result in the discussion.
  
  Third, we would like to note one of the disadvantages of transformers as a model of mind or brain representations is that they are largely a ‘black box’ whose workings are poorly understood. One advantage of hybrid models like our simple semantic role model is that they can be much easier to interpret, thereby enabling them to be used to determine which features are most important for brain representations of sentence meaning, and what mechanisms are used to combine individual words into a full sentence. Given their relative simplicity and interpretability, we believe hybrid models have considerable value as scientific tools, even in cases where they achieve comparable correlations to transformers. We have added a short discussion of this issue in the revised manuscript (page 10).
  
  (2) There may be confounds between the critical sentence structure manipulations and visual representations of sentence stimuli. This is inconvenient because activation in brain regions that process semantics tends to partially correlate with visual cortex representations, and computational models tend to reflect the number of words/tokens/elements in sentences. Although the study commendably controls for confounds associated with sentence length, there could still be residual effects that remain. For instance, the Graph model correlates most strongly with the visual cortex despite these sentence length controls.
  
  We agree with the reviewer that this is a potential confound. As noted in the previous response, we have implemented a new control analysis in which we directly control for visual similarities as reflected in participant-averaged similarities of primary visual cortex activations in response to all stimuli. These results are shown in Figures S8-S11 in the SI. We show that transformer correlations are reduced much more than graph and hybrid models with this control. Also, we note that the AMR-smatch graph model shows high correlations with other brain regions even after removing correlations with the visual cortex (Figure S10). This indicates that the model represents a range of sentence features, including both superficial visual or length-related features, as well as semantic features that are represented in common with language and other cortical regions.
  
  (3) Sentence similarity computations are emphasized as the basis for unifying comparative analyses of graph structures and vector data. A strength of this approach is that correlation is not always the ideal similarity metric. However, a weakness is that similarity computations are not unified across models. This has practical consequences here because different similarity metrics applied to the same model produce positive or negative correlations with brain data.
  
  The reviewer notes that the method for computing similarities differs between the vector-based (mean and transformer) models, and the hybrid and syntax-based models, thereby potentially adding an additional confound to our results. We agree that this is a potential limitation, and our correlations should always be understood as applying to a model paired with a similarity metric. However, we believe that this is mostly unavoidable when comparing different formalisms. In the revised manuscript we have incorporated an entirely new similarity metric for vector-based models (DIEM similarity), as well as an extended discussion of the effect of different similarity metrics for graph and hybrid models.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) Compute separate RSAs on each sentence pair type (especially Swapped), to quantify how each sentence type manipulation contributed to the divergence between model and brain. Although the manuscript is already brimming with analyses, I think squeezing this in would be helpful because the results currently rely on qualitative inspection of group-average scatter plots to interpret how sentence pair manipulations contributed to the divergence between Transformers and humans. The Swapped condition would appear to be the centrepiece of the title and manuscript, and potentially the only condition for which confounds associated with the surface form of sentence are controlled for (because sentences should be the same words in different orders). Thus, this analysis might see to the inconvenient visual cortex correlations in Figures 3d/e.
  
  We respectfully disagree that computing separate RSA for each sentence pair type would be a useful additional analysis. The motivation for the construction of our stimulus set was to provide a range of variants of a given base sentence that alter the semantic meaning and lexical content (somewhat) independently. The purpose of the ‘modified’ sentences, for instance, is to construct sentences with a similar overall meaning but lower lexical similarity due to the inclusion of many modifier words. It is precisely the comparisons across the different pair types that provide information about how each model represents sentence semantics, so restricting an analysis to only a single subset would not be very informative. Another problem with this approach is that it would dramatically reduce the number of sentence pairs analysed, thereby decreasing statistical power. In the revised manuscript we have provided additional details regarding the motivation and rationale for how our stimulus set of 108 sentences was constructed, which should help to elucidate this point more clearly. The following excerpt is from page 3:
  
  “Within each of the six subsets, we begin with a base sentence such as `the cameraman brought the equipment to the director', which we then systematically modified in various ways to create different combinations of lexical and compositional similarity, in order to dissociate these two aspects of meaning (see Table 1 for further details).”
  
  (2) Explaining the motivation for the sentence stimulus types. I appreciated the careful design of the dataset, but I couldn't immediately work out the motivation for all the different sentence types, and why this selection was ideal to identify divergences with Transformers. For instance, given the goal of (approximately) controlling for lexical similarity whilst varying sentence meaning, I couldn't immediately see why stimulus blocks weren't all built from rearranging the same content words (as in the Swapped condition). The negative RSA correlation with the Mean model also made me stop and think - it seems like the more similar the words in a sentence, the more different their structure, and vice versa, but I wasn't clear that this was a design feature. Thus, a few extra words motivating the conditions could be helpful for the reader, and these might helpfully lead them to anticipate the negative RSA correlation.
  
  As noted in the previous response, in the revised manuscript we have expanded our explanation of the rationale for the construction of our 108 sentences. In particular, Table 1 in the methods section now includes two additional columns which summarise the intended combinations of lexical and overall sentence similarity which our sentence pairs are intended to satisfy.
  
  (3) Explanation for why different implementations and similarity computations between variants of ostensibly equivalent Graph / Hybrid models yielded widely divergent positive vs negative brain correlations, despite both positively capturing behavioural ratings. This might incorporate a brief intuitive explanation of how Graph model similarities were computed (e.g., what SMATCH and WWLK do). In light of the above, why do different similarity algorithms applied to the Graph model yield positive and negative correlations on the same brain (e.g., Figure S2 - Graph / Graph-WL a,b, diag-pairs). Same goes for why Hybrid and Hybrid-AMR yielded positive vs negative correlations (e.g., Figure S2 - Graph / Graph-WL a,b, diag-pairs). Acknowledge that the brain results are sensitive to similarity computations in the Discussion.
  
  We appreciate this suggestion. We have added an extended consideration of these issues to the discussion (pages 10-11), as well as some additional details regarding the differences between the Smatch and WWLK metrics in the methods section (page 17).
  
  (4) Acknowledgement and explanation of why the human similarity ratings were poor at explaining brain data in Figure 2a,b (right column diag-pairs). The poor behaviour vs brain match is indirectly implied in the Discussion as "the comparison between behavioural and fMRI data is somewhat difficult owing to the difference in task structure." However, I would suggest being upfront and explicitly mentioning and explaining the poor brain match in Figures 2a and b, because the reader will notice and wonder - especially because the models correlate strongly with the behavioural data without the models doing the human behavioral task (though this could be a possibility, see later).’
  
  As suggested, we have included a passing reference to this in the presentation of our main results in page 5, and a lengthier discussion on page 11:
  
  “Our study has several limitations. First, we found a surprisingly low correlation between behavioural ratings and brain activations (see Figure 2). This may be partly explained by differences in task structure. In the behavioural experiment, participants viewed many pairs of related sentences, and were explicitly asked to pay attention to differences in the words of each sentence. In contrast, in the fMRI task participants (who were not the same as the behavioural task participants) read one sentence at a time without an explicit comparison. In addition, we suspect that presentation of so many sentence pairs with highly similar structures may have biased the way in which participants rated sentence similarity. Modifications to the behavioural task to mitigate these aspects may reduce the divergence between behavioural and brain findings.”
  
  (5) Brief explanation of why model vs brain correlations tended to be strongest in the visual cortex (Figure 3d,e). Currently, this issue is only mentioned in passing, however, it seems worthy of further comment.
  
  We appreciate the reviewer for highlighting this issue. We have added discussion of the potential for visual confounds to several points in the revised manuscript, including the ‘Neuroscience of semantics’ subsection on page 11. As noted, we have also added a new analysis in which we compute correlations controlling for the average RSA similarities of the primary visual cortex. We find that this additional control significantly reduces correlations for most transformer models, but only has a more modest reduction on the correlations for most of the graph and hybrid models, particularly VerbNet-CN (see Figures S8-S11).
  
  (6) Softening/clarifying some statements that could be misconstrued as suggesting Transformers were universally inferior models. Statements made in the Abstract/Discussion initially came over to me as implying that Transformers were universally inferior models when compared to the Graph/Hybrid models - but this appears only to be true when one looks at analyses conducted within block diagonal sentence subsets. Otherwise, when analyses are conducted on all sentences (between and within blocks, Figure 5) Llama 3 L2 provides by far the strongest brain model. Transformers also appear to yield the strongest accounts of the behavioural data, whether tested on block diagonal or all sentence pairs (Figure S3). To remedy this, I would suggest softening some statements in the Abstract/Discussion that could be misconstrued as suggesting that Transformers were universally inferior. I would also suggest explicitly acknowledging that when the entire dataset was analyzed, Transformers were most accurate, and that (some) Transformers best accounted for the behavioural data.
  
  We agree that there was some lack of precision in certain sections of the previous draft regarding the conclusions to be drawn regarding the representational capacities of transformers. We have revised the abstract and conclusion to better reflect our intended message, which is that transformers certainly can represent sentence structure and semantic roles, but that the way in which they do this (through vector representations in their hidden layers) is significantly different to how such features are represented in the human brain. In particular, we have included this new text on page 10:
  
  “We emphasise that our results do not show that transformers fail to represent syntactic or semantic role information. Indeed, large language models show clear capabilities of correctly interpreting sentence structure, and probing studies have found that transformers represent information about syntax and word order. This is consistent with our finding that directly prompting GPT-4 to rate sentence similarity yields very high correlations with human judgements (see Figure S3). Nonetheless, the fact that transformers can encode and utilise structural information to perform linguistic tasks does not mean that they effectively utilise this information to construct a brain-like representation of sentence meaning.
  
  (7) Given that GPT-4 was already deployed to parse semantic roles for the hybrid model, and GPT-4 should be able to generate reasonable similarity ratings between sentence pairs, it struck me that an interesting addendum could be to use GPT-4 similarities derived from the human behavioral task to interpret both brain and human behavioral data. This might also help support the case for conducting analyses within a similarity-based framework.
  
  We appreciate this suggestion. We have added this model (GPT-4 ratings of sentence similarity) to the revised manuscript (see Figures S1-S3).
  
  Other changes
  
  As noted by reviewer 3, the full set of sentence pairs was missing from the previous draft. They have been added to the SI of the revised manuscript.
  
  We have renamed the Graph and Hybrid models in the manuscript to AMR-Smatch and Verbnet-CN respectively, for greater clarity as to which models these terms refer to, and also to better differentiate from the newly added constituency parse graph models.
  
  We have thoroughly revised the discussion section, incorporating feedback from all reviewers regarding areas needing additional depth.
  
  We have added subsections to the discussion to aid the reader navigating the now lengthier section.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.07.19.665701v2
www.biorxiv.org www.biorxiv.org

Decoupling AMPK from fatty acid synthesis allows maintenance of fitness late in life

5
1. Public_Reviews 28 May 2026
  
  in eLife
  
  eLife Assessment
  
  This study addresses an important question in aging biology by combining metabolic, genetic, and functional approaches to examine how cytosolic acetyl-CoA metabolism influences late-life fitness in replicatively aging yeast. The evidence supporting the roles of AMPK activation, mitochondrial acetyl-CoA utilization, and fatty acid synthesis in shaping distinct aging-associated phenotypes is convincing overall, with the engineered A2A strain providing a particularly elegant demonstration of coordinated metabolic regulation. However, several conclusions would benefit from clarification or moderation, particularly regarding the relationship between late-life fitness and replicative lifespan, the interpretation of "senescence," the proposed existence of distinct aging subpopulations, and the extent to which the data support mechanistic claims about lipid starvation, acetyl-CoA excess, and chromatin-based aging pathways.
  
  Summary
2. Public_Reviews 28 May 2026
  
  in eLife
  
  Reviewer #1 (Public review):
  
  This rigorous and creative study uses an elegant combination of metabolomics, transcriptomics, and budding yeast molecular genetics to discover that (i) activating AMPK to maintain mitochondrial respiration fueled by cytosolic Acetyl CoA and (ii) increasing fatty acid synthesis independent of respiration drive independent pathways that increase the fitness of replicatively-aged budding yeast cells, albeit without increasing their lifespan. This work will be of interest to scientists in the field of aging and metabolism. Some clarifications in the text would address the following concerns, which would increase the impact of the study:
  
  (1) What does activation of AMPK (via PGDP-Sak1 expression) do to the replicative lifespan? How many bud scars, in general, do the subpopulations that are older - yet have less Tom70 (increased mitochondrial fitness) - have, after the 48 hrs time point that they are examining? How many divisions occurred in this 48hr time period - i.e. is it long enough to have all cells reach the end of their replicative lifespan? This information is important to rule out that a subset of the mutant cells just divided faster and hence had more divisions within 48 hrs (growing faster and living longer are different things). Having identical growth curves doesn't indicate per se that they all divide at the same rate, as there may be a subpopulation that divides faster and a subpopulation that doesn't grow so well.
  
  (2) A2A cells do not have an extended replicative lifespan (RLS) but show an increase in the "low senescence" population (Figure 2). If the cells are not becoming senescent, why don't they have longer RLS? Not having a longer lifespan seems inconsistent with the statement that "bud scar counting confirmed that A2A cells reach a higher age than wild type", which comes back to how many times the cells can divide in the 48hr timepoint studied and their rate of cell division? Also, the lifespan curve shown is plotted against time, not cell division number, which does not take into account different division times of cells within the population (described above). It would be much more useful to show standard lifespan curves showing cell division numbers per lifespan per cell.
  
  (3) Increased "fitness" of the old cells is implied from the increased size of the colonies that the old cells can make. However, this is a measure of the fitness of the daughters per se, not the old mother cells. Are the old mothers just passing on healthier mitochondria and more lipids to the daughters, such that they can divide more times? If the aged cells have an "increased fitness", why don't they divide more times themselves (i.e. live longer?).
  
  (4) The statement is made that "these experiments define two classes of aging cells with distinct metabolic needs, coherent with the model of two aging trajectories previously proposed (referencing Nan Hao's work)". However, the big difference here is that in Nan Hao's work, their two aging trajectories influenced the length of lifespan, but that does not appear to be the case here. That distinction should be made clear. Perhaps the authors could also speculate as to why the A2A yeast stops dividing after presumably the same number of cell divisions, even though they have an activated AMPK and activated fatty acid synthesis pathway.
  
  (5) I am a bit confused by the use of the word "senescence" by this lab here and in their previous growth on galactose studies. If yeast don't senesce, which is usually defined as an irreversible arrest of the cell cycle where cells stop dividing, shouldn't the yeast that do not senesce still be dividing and hence have a longer lifespan? Should a different term be used rather than senescence? Such as "fitness late in life". The authors giving their definition of senescence may help reduce this apparent contradiction.
  
  Review 1
3. Public_Reviews 28 May 2026
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  In this study, the authors investigate how cytosolic acetyl-CoA metabolism influences replicative aging in budding yeast. They propose that acetyl-CoA regulates aging through three major pathways: (1) mitochondrial transport to support mitochondrial function, (2) fatty acid synthesis, and (3) global protein acetylation. The data show that AMPK activation promotes mitochondrial import of acetyl-CoA and partially mitigates mitochondrial decline in a subset of aging cells.
  
  Furthermore, the engineered A2A strain, which enhances mitochondrial acetyl-CoA utilization while relieving inhibition of fatty acid synthesis, increases the proportion of cells exhibiting a "low senescence" phenotype.
  
  Overall, this is a thoughtful and potentially impactful study that advances our understanding of metabolic control of aging. Addressing the points below, particularly by refining interpretations and, where feasible, incorporating additional analyses, will further strengthen the manuscript and its conclusions.
  
  Strengths:
  
  The study has several notable strengths. It addresses an important question by shifting the focus from lifespan to preservation of late-life fitness, which is highly relevant to aging biology. The work integrates metabolic, genetic, and functional analyses to link cytosolic acetyl-CoA flux with distinct aging outcomes, and the engineering of the A2A strain provides a clear and elegant demonstration of how coordinated pathway modulation can improve cellular fitness.
  
  Weaknesses:
  
  (1) While the manuscript focuses on mitochondrial transport and fatty acid synthesis, cytosolic acetyl-CoA is also a key regulator of histone acetylation and chromatin silencing. It would strengthen the study to consider whether acetyl-CoA depletion contributes to improved fitness through enhanced rDNA silencing. Given the well-established role of rDNA instability in yeast aging, additional experiments examining rDNA silencing and stability would be valuable. For example, monitoring rDNA copy number changes (not necessarily ERCs) under AMPK activation, oleic acid supplementation, and in the A2A strain, similar to approaches used in the authors' prior work, would help clarify whether chromatin regulation contributes to the observed phenotypes.
  
  (2) The current data do not fully distinguish whether AMPK activation and oleic acid supplementation act on distinct subpopulations of aging cells. An alternative explanation is that oleic acid supplementation enhances mitochondrial function and acts additively with AMPK activation, thereby increasing the fraction of cells in the "low senescence" state. Since this distinction is not central to the main conclusions, I suggest softening the language around subpopulation specificity. Emphasizing instead that the A2A strain coordinately modulates multiple branches of acetyl-CoA metabolism to improve late-life fitness would maintain the strength of the central message without overinterpretation.
  
  (3) The manuscript proposes that lipid starvation and excess acetyl-CoA are major drivers of senescence in distinct subpopulations of wild-type aging cells. This conclusion is not yet fully supported by the presented data. Direct measurements of age-dependent divergence in acetyl-CoA and fatty acid levels at the single-cell level would be needed to substantiate this model. Based on the current evidence, a more conservative interpretation would be that aging cells exhibit differential sensitivity to perturbations in acetyl-CoA and lipid metabolism. Accordingly, I recommend revising the statement in the Abstract ("We further implicate lipid starvation and excess acetyl coenzyme A availability as major drivers of senescence...") and the corresponding discussion text to better align with the data.
  
  Review 2
4. Public_Reviews 28 May 2026
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  These findings suggest that PGPD-SAK1 yeast show a subpopulation with lowered TOM70-GFP expression in high bud scar staining aged cells. Deletion of CAT2 or MLS1 reduces this effect. A PGPD-SAK1 acc1S1157A double mutant (called "A2A" here) shows an even larger effect of lowered tom70 expression in high bud scar staining aged cells. Utilization of various additional mutants involved in acetyl-CoA transport, carnitine shuttle, respiration, etc., leads the authors to conclude that these shifts in TOM70-GFP in aged cells are linked to the AMPK-fatty acid metabolic regulatory system.
  
  Strengths:
  
  These extensive and clearly described experiments reveal interesting changes in TOM70-GFP intensity in subsets of aged yeast in several mutants eventually identified as linked to the AMPK-fatty acid metabolic regulatory system.
  
  Weaknesses:
  
  (1) 3 biological replicates for mRNASeq is low.
  
  (2) While "Traditional conceptions of ageing implicate a progressive accumulation of damage leading to systemic degradation in performance until death, with evolutionary pressures acting to maximise early life fitness and fecundity at the expense of ageing health." is tangential perhaps to the data and conclusions of the study, both claims of this sentence are at best controversial, and the manuscript is no weaker for their omission.
  
  (3) The statement that "Here, we determine the basis of senescence and fitness loss in replicatively ageing yeast" is a bit strong as a summary of the present careful work presented here. If the authors had created yeast mutants that retained fitness indefinitely, this would be a more appropriate strength of claim to summarize the work.
  
  Review 3
5. Public_Reviews 28 May 2026
  
  in eLife
  
  Author response:
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  This rigorous and creative study uses an elegant combination of metabolomics, transcriptomics, and budding yeast molecular genetics to discover that (i) activating AMPK to maintain mitochondrial respiration fueled by cytosolic Acetyl CoA and (ii) increasing fatty acid synthesis independent of respiration drive independent pathways that increase the fitness of replicatively-aged budding yeast cells, albeit without increasing their lifespan. This work will be of interest to scientists in the field of aging and metabolism. Some clarifications in the text would address the following concerns, which would increase the impact of the study:
  
  (1) What does activation of AMPK (via PGDP-Sak1 expression) do to the replicative lifespan? How many bud scars, in general, do the subpopulations that are older - yet have less Tom70 (increased mitochondrial fitness) - have, after the 48 hrs time point that they are examining? How many divisions occurred in this 48hr time period - i.e. is it long enough to have all cells reach the end of their replicative lifespan? This information is important to rule out that a subset of the mutant cells just divided faster and hence had more divisions within 48 hrs (growing faster and living longer are different things). Having identical growth curves doesn't indicate per se that they all divide at the same rate, as there may be a subpopulation that divides faster and a subpopulation that doesn't grow so well.
  
  Increasing AMPK activity increases replicative lifespan [PMID: 25869125], but given our finding that AMPK activation splits the population, such replicative lifespan assays are hard to interpret. Bud scar counts have a similar issue. Hence we restricted the lifespan and bud scar analyses to wt and A2A which are more homogenous (Figures S2 B and E). A2A cells at 48h have ~25% more bud scars than wt cells. Yes, by 48h most of the cells have lost viability (Figure 2E). The reviewer is correct that you can't properly compare the lifespan curves if the cells divide at different rates, hence our follow-up test of wt at 48h vs A2A at 40h viability after we had confirmed that these timepoints captured cells at equivalent replicative ages (Figure 2D,E). This shows that viability of A2A is slightly lower than wt at matched age, indicating a slightly shorter lifespan.
  
  (2) A2A cells do not have an extended replicative lifespan (RLS) but show an increase in the "low senescence" population (Figure 2). If the cells are not becoming senescent, why don't they have longer RLS? Not having a longer lifespan seems inconsistent with the statement that "bud scar counting confirmed that A2A cells reach a higher age than wild type", which comes back to how many times the cells can divide in the 48hr timepoint studied and their rate of cell division? Also, the lifespan curve shown is plotted against time, not cell division number, which does not take into account different division times of cells within the population (described above). It would be much more useful to show standard lifespan curves showing cell division numbers per lifespan per cell.
  
  Our observation that cells can reach the end of life without senescing is consistent with other studies that have studied the life course of individual cells by microscopy [PMID: 31291577, 32675375]. These studies always highlight some proportion of the cells that reach the end of life with no or minimal senescence, though this fraction varies with the experimental system. The question of why cells lose viability without senescing is a complete unknown in the field, but reflects a wider lack of consensus as to why yeast lose viability with replicative age.
  
  We are wary about making strong statements on lifespan for exactly the reason the reviewer picks out. In liquid culture we can only assess viability over time, and it is clear from the comparison of liquid and solid media lifespans performed by the Gottschling lab [PMID: 19652178] that culture system has a huge effect on lifespan, with cells in classical microdissection-based lifespan assays living far longer than they do in liquid. This of course means that classical microdissection assays are not very useful for A2A so we are left with an unsatisfactory approximation. We have therefore restricted our conclusion on lifespan to simply say that lifespan of A2A cells is not extended which our data in Figures 2D,E,S2B does support (see also answer to Q1), and therefore with the majority of A2A cells showing low senescence marks and high fitness at 48h we can conclude that lifespan and fitness loss must be separable.
  
  We will note these limitations of lifespan measurements in the manuscript.
  
  (3) Increased "fitness" of the old cells is implied from the increased size of the colonies that the old cells can make. However, this is a measure of the fitness of the daughters per se, not the old mother cells. Are the old mothers just passing on healthier mitochondria and more lipids to the daughters, such that they can divide more times? If the aged cells have an "increased fitness", why don't they divide more times themselves (i.e. live longer?).
  
  Yes, colony growth speed is defined by daughter cell replication, and as long as the daughters and subsequent generations divide at the same rate irrespective of whether they come from a young or old mothers then the size of the colony after 24 hours varies based on the time it took the initial mother to produce a daughter. This is what the assay really measures. We note that aged wildtype mothers often do not divide at all in the first 24 hours after being put on an agar plate (hence the tiny reported colony size), even though they do eventually produce a daughter which then forms a colony, whereas A2A cells tend to produce the first daughter rapidly whether young or old. It is known that daughters of aged wildtype mothers also divide slower, which will also contribute to differences in colony size, and this may well result from a lipid and/or mitochondrial contribution, but the primary driver of colony size in 24 hours is the time the mother took to initially divide. We will add this detail to the manuscript.
  
  As noted above, the mechanistic basis of lifespan is unknown, but although senescence can shorten lifespan, our work and that of others shows that lifespan is still limited in the absence of senescence.
  
  (4) The statement is made that "these experiments define two classes of aging cells with distinct metabolic needs, coherent with the model of two aging trajectories previously proposed (referencing Nan Hao's work)". However, the big difference here is that in Nan Hao's work, their two aging trajectories influenced the length of lifespan, but that does not appear to be the case here. That distinction should be made clear. Perhaps the authors could also speculate as to why the A2A yeast stops dividing after presumably the same number of cell divisions, even though they have an activated AMPK and activated fatty acid synthesis pathway.
  
  We will add this distinction. As noted above, we are wary of making strong statements regarding lifespan as the assays we can do in liquid culture are limited. We are therefore similarly wary about speculating about causes for the lack of lifespan difference because in reality all we can do is rule out a big effect. We would love to speculate on why the A2A cells don't have an extended lifespan, but at this point we don't have any good ideas on this point!
  
  (5) I am a bit confused by the use of the word "senescence" by this lab here and in their previous growth on galactose studies. If yeast don't senesce, which is usually defined as an irreversible arrest of the cell cycle where cells stop dividing, shouldn't the yeast that do not senesce still be dividing and hence have a longer lifespan? Should a different term be used rather than senescence? Such as "fitness late in life". The authors giving their definition of senescence may help reduce this apparent contradiction.
  
  We completely agree, this is confusing and noted this distinction in the Introduction. Use of the term senescence to mean a loss of fitness late in life in yeast stems from the classical definition of senescence as applied to whole organisms. However, the term senescence as applied to cells has a more specific meaning in terms of the cell cycle as the reviewer notes. As an individual S. cerevisiae is both a cell and an organism, the terminology clashes. However, the marker we largely employ (Tom70-GFP) which in our hands is a very good proxy for fitness was originally defined as marking the senescence entry point (SEP), so overall we feel we can't avoid the term.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  In this study, the authors investigate how cytosolic acetyl-CoA metabolism influences replicative aging in budding yeast. They propose that acetyl-CoA regulates aging through three major pathways: (1) mitochondrial transport to support mitochondrial function, (2) fatty acid synthesis, and (3) global protein acetylation. The data show that AMPK activation promotes mitochondrial import of acetyl-CoA and partially mitigates mitochondrial decline in a subset of aging cells.
  
  Furthermore, the engineered A2A strain, which enhances mitochondrial acetyl-CoA utilization while relieving inhibition of fatty acid synthesis, increases the proportion of cells exhibiting a "low senescence" phenotype.
  
  Overall, this is a thoughtful and potentially impactful study that advances our understanding of metab to olic control of aging. Addressing the points below, particularly by refining interpretations and, where feasible, incorporating additional analyses, will further strengthen the manuscript and its conclusions.
  
  Strengths:
  
  The study has several notable strengths. It addresses an important question by shifting the focus from lifespan to preservation of late-life fitness, which is highly relevant to aging biology. The work integrates metabolic, genetic, and functional analyses to link cytosolic acetyl-CoA flux with distinct aging outcomes, and the engineering of the A2A strain provides a clear and elegant demonstration of how coordinated pathway modulation can improve cellular fitness.
  
  Weaknesses:
  
  (1) While the manuscript focuses on mitochondrial transport and fatty acid synthesis, cytosolic acetyl-CoA is also a key regulator of histone acetylation and chromatin silencing. It would strengthen the study to consider whether acetyl-CoA depletion contributes to improved fitness through enhanced rDNA silencing. Given the well-established role of rDNA instability in yeast aging, additional experiments examining rDNA silencing and stability would be valuable. For example, monitoring rDNA copy number changes (not necessarily ERCs) under AMPK activation, oleic acid supplementation, and in the A2A strain, similar to approaches used in the authors' prior work, would help clarify whether chromatin regulation contributes to the observed phenotypes.
  
  We have data addressing this point that we will add to the manuscript. In short, we see no difference in gene expression from Sir2-repressed sub-telomeric regions or MAT loci, but the genome-wide gene expression dysregulation associated with age is partially suppressed in PGPD-SAK1. However, A2A does not suppress this further, so it is not critical for the suppression of senescence in A2A though we are following this up. ERC accumulation is higher in A2A at 48h, consistent with the cells being older, meaning that ERCs are unlinked to senescence onset as we have previously reported. There is a strong upregulation of transcripts from Sir2-repressed rDNA intergenic spacers with age in all genotypes, but we attribute this simply to the copy number increase of these regions on ERCs rather than a defect in silencing. We have previously looked for heritable changes in rDNA copy number arising during ageing and found (to our surprise) absolutely nothing, so we don't expect any changes under these conditions.
  
  (2) The current data do not fully distinguish whether AMPK activation and oleic acid supplementation act on distinct subpopulations of aging cells. An alternative explanation is that oleic acid supplementation enhances mitochondrial function and acts additively with AMPK activation, thereby increasing the fraction of cells in the "low senescence" state. Since this distinction is not central to the main conclusions, I suggest softening the language around subpopulation specificity. Emphasizing instead that the A2A strain coordinately modulates multiple branches of acetyl-CoA metabolism to improve late-life fitness would maintain the strength of the central message without overinterpretation.
  
  We agree that oleic acid and the lipids produced downstream of Acc1 in A2A may improve late life fitness via enhanced mitochondrial function, and in support of this Oxygen Consumption Rate is marginally (though significantly) higher in A2A than PGPD-SAK1. We will add this data to the manuscript. However, we disagree with the interpretation of an additive effect as we report throughout the study that AMPK activation and lipid biosynthesis/supplementation affect different sub-populations of cells. We do not observe populations of intermediate senescence cells, rather by flow cytometry and fitness assays we observe individual cells in binary low senescence or high senescence states.
  
  (3) The manuscript proposes that lipid starvation and excess acetyl-CoA are major drivers of senescence in distinct subpopulations of wild-type aging cells. This conclusion is not yet fully supported by the presented data. Direct measurements of age-dependent divergence in acetyl-CoA and fatty acid levels at the single-cell level would be needed to substantiate this model. Based on the current evidence, a more conservative interpretation would be that aging cells exhibit differential sensitivity to perturbations in acetyl-CoA and lipid metabolism. Accordingly, I recommend revising the statement in the Abstract ("We further implicate lipid starvation and excess acetyl coenzyme A availability as major drivers of senescence...") and the corresponding discussion text to better align with the data.
  
  We agree and will adjust the abstract to make it clearer that the lipid starvation / excess acetyl coA interpretation is a model.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  These findings suggest that PGPD-SAK1 yeast show a subpopulation with lowered TOM70-GFP expression in high bud scar staining aged cells. Deletion of CAT2 or MLS1 reduces this effect. A PGPD-SAK1 acc1S1157A double mutant (called "A2A" here) shows an even larger effect of lowered tom70 expression in high bud scar staining aged cells. Utilization of various additional mutants involved in acetyl-CoA transport, carnitine shuttle, respiration, etc., leads the authors to conclude that these shifts in TOM70-GFP in aged cells are linked to the AMPK-fatty acid metabolic regulatory system.
  
  Strengths:
  
  These extensive and clearly described experiments reveal interesting changes in TOM70-GFP intensity in subsets of aged yeast in several mutants eventually identified as linked to the AMPK-fatty acid metabolic regulatory system.
  
  Weaknesses:
  
  (1) 3 biological replicates for mRNASeq is low.
  
  Thank you for pointing this out. We performed another replicate after posting the initial preprint but didn’t update the figure in the eLIFe-reviewed version. We will add this to the scatter plots and analysis in Figure 1, the findings have not changed.
  
  (2) While "Traditional conceptions of ageing implicate a progressive accumulation of damage leading to systemic degradation in performance until death, with evolutionary pressures acting to maximise early life fitness and fecundity at the expense of ageing health." is tangential perhaps to the data and conclusions of the study, both claims of this sentence are at best controversial, and the manuscript is no weaker for their omission.
  
  We actually feel that this sentence is very important to the message of the manuscript, which is that ageing does not necessarily have to involve a loss of fitness before death. Ageing is often described as the progressive wearing out of components leading to decline and death (with an old car often used as an analogy); in the ageing field this is certainly controversial, but outside the field this remains the normal understanding. We think it is important to state this widely held viewpoint with which our findings are hard to reconcile.
  
  Our interpretation that yeast are bet-hedging as a population growth strategy and this drives ageing in the long term is a classic antagonistic pleiotropy - we will add this term (from the citation that is already in the manuscript) and clarify in the discussion to make it obvious why we are introducing this concept in the introduction.
  
  (3) The statement that "Here, we determine the basis of senescence and fitness loss in replicatively ageing yeast" is a bit strong as a summary of the present careful work presented here. If the authors had created yeast mutants that retained fitness indefinitely, this would be a more appropriate strength of claim to summarize the work.
  
  Indeed - we will refine this sentence.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.03.27.645766v2
www.biorxiv.org www.biorxiv.org

Membrane Binding Controls the ATPase Cycle and Localization of MinD in Bacillus subtilis

4
1. Public_Reviews 28 May 2026
  
  in eLife
  
  eLife Assessment
  
  This important study provides convincing data suggesting that subcellular localization of the spatial regulator of cell division, MinD, is an intrinsic feature of the protein's ability to associate with the membrane as both a dimer and a monomer. These findings distinguish the behavior of MinD in B. subtilis from its counterpart in E. coli and suggest that there is not a need to invoke additional localization factors. The reviewers felt that the revisions, particularly the additional experiments and changes to the text to make the experimental design and conclusions clearer, improve the quality of the manuscript.
  
  Summary
2. Public_Reviews 28 May 2026
  
  in eLife
  
  Reviewer #1 (Public review):
  
  [Editor's note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]
  
  Summary:
  
  In this work the authors investigate the molecular dynamics of MinD, a component of the Bacillus subtilis Min system, in vitro and in vivo. In Escherichia coli the Min system is highly dynamic and displays rapid pole to pole oscillation whereby a time average minimum of the Min proteins at mid cell is established. However, in B. subtilis, this is not the case, and there is no MinE present. MinD in B. subtilis dynamically relocalizes from the poles to division sites, and binds to MinC and MinJ, which mediates its interaction with DivIVA. This paper reports biochemical characterization of B. subtilis MinD in vitro and dynamics of MinD variants in vivo, providing mechanistic insight into the mechanism of dynamic localization.
  
  Strengths:
  
  In the current study, the authors perform a detailed biochemical characterization of the in vitro ATPase activity of MinD and demonstrate that rapid hydrolysis is elicited by adding phospholipids. They further show using a collection of substitution mutants of MinD that both monomers and dimers bind to the membrane, and ATP occupancy changes the on and off rates. Identification, quantification, and tracking of discrete Halo-MinD populations was nicely done and showed that mutations in MinD alter dynamic localization, correlating with PL binding on and off rates in vitro.
  
  In the revised manuscript, the authors now demonstrate localization and tracking data for minC and minJ deletion strains, which suggest that MinJ impacts MinD membrane cycling, but MinC does not. Additional in vitro work showed that the PDZ domain of MinJ modifies MinD ATP hydrolysis rates, and the authors propose that MinJ may promote MinD dimer formation.
  
  Weaknesses of the revised version: No major weaknesses.
  
  Review 1
3. Public_Reviews 28 May 2026
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Feddersen & Bramkamp determined important characteristics of how MinD protein binds/dissociates to/from the membrane, and dimerizes in relation to its ATPase activity. The presented data clearly shows the differences in function of MinD homologs from B. subtilis and E. coli.
  
  Strengths:
  
  The work presents well-executed experiments that lead to interesting conclusions and a new model of how Min system works during B. subtilis mid-cell division. Importantly, this model is supported by in vitro characterization of well-chosen mutants in the functional domains of MinD. Outstandingly, most of the in vitro data are confirmed by single-molecule localization microscopy.
  
  Review 2
4. Public_Reviews 28 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  In this work the authors investigate the molecular dynamics of MinD, a component of the Bacillus subtilis Min system, in vitro and in vivo. In Escherichia coli the Min system is highly dynamic and displays rapid pole to pole oscillation whereby a time average minimum of the Min proteins at mid cell is established. However, in B. subtilis, this is not the case, and there is no MinE present. MinD in B. subtilis dynamically relocalizes from the poles to division sites, and binds to MinC and MinJ, which mediates its interaction with DivIVA. This paper reports biochemical characterization of B. subtilis MinD in vitro and dynamics of MinD variants in vivo, providing mechanistic insight into the mechanism of dynamic localization.
  
  Strengths:
  
  In the current study, the authors perform a detailed biochemical characterization of the in vitro ATPase activity of MinD and demonstrate that rapid hydrolysis is elicited by adding phospholipids. They further show using a collection of substitution mutants of MinD that both monomers and dimers bind to the membrane, and ATP occupancy changes the on and off rates. Identification, quantification, and tracking of discrete Halo-MinD populations was nicely done and showed that mutations in MinD alter dynamic localization, correlating with PL binding on and off rates in vitro.
  
  In the revised manuscript, the authors now demonstrate localization and tracking data for minC and minJ deletion strains, which suggest that MinJ impacts MinD membrane cycling, but MinC does not. Additional in vitro work showed that the PDZ domain of MinJ modifies MinD ATP hydrolysis rates, and the authors propose that MinJ may promote MinD dimer formation.
  
  Weaknesses of the revised version: No major weaknesses.
  
  We thank this reviewer for the positive evaluation of our manuscript and the precise summary of our findings.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Feddersen & Bramkamp determined important characteristics of how MinD protein binds/dissociates to/from the membrane, and dimerizes in relation to its ATPase activity. The presented data clearly shows the differences in function of MinD homologs from B. subtilis and E. coli.
  
  Strengths:
  
  The work presents well-executed experiments that lead to interesting conclusions and a new model of how Min system works during B. subtilis mid-cell division. Importantly, this model is supported by in vitro characterization of well-chosen mutants in the functional domains of MinD. Outstandingly, most of the in vitro data are confirmed by single-molecule localization microscopy.
  
  Weaknesses:
  
  The authors immobilized liposomes, for which they used E. coli total lipids, to measure ATPase activity and liposome association and dissociation of B. subtilis MinD. For these experiments would be more suitable to use B. subtilis total lipids as more biologically relevant data could be gained.
  
  Although the work is in detail and nicely compares the function of B. subtilis Min system with E. coli Min system, it lacks the comparison of the Min system function in other rod-shaped Gram-positive bacteria. I would suggest including in the Discussion the complexity of other Min systems. Especially, this complexity is seen in other rod-shaped and spore formers such as Clostridial species in which one of these Min systems or both are present, an oscillating E. coli Min system type and more static as in B. subtilis.
  
  Comments on revisions:
  
  I'm satisfied with the authors response to my private recommendation points. However, I thought that they would also respond to my points mentioned in Public Review part, weaknesses as shown above and update the revised version accordingly.
  
  We are very grateful to the reviewer for the positive comments and fully agree with the points raised. Due to the overall length of the manuscript, we initially omitted a discussion of the complexity of the Min system in certain Firmicutes. However, we agree that this aspect should be considered. Accordingly, we have now added a dedicated paragraph to the Discussion section addressing this point.
  
  We also agree that investigating different lipid compositions, including native membranes from Bacillus subtilis, represents a logical next step to further elucidate the influence of lipids on the MinD activity cycle. However, we consider this to constitute a separate project and therefore beyond the scope of the present study.
  
  Recommendations for the authors:
  
  Reviewing Editors:
  
  Some minor corrections are requested-the addition of a bit more details about the complexity of Min systems in other bacteria in particular to the discussion as suggested by Reviewer 2 would be very much appreciated.
  
  We thank the editors for their positive assessment and the clear recommendations. We have now added a dedicated paragraph to the Discussion section addressing the complexity of the Min system in Clostridioides.
  
  Reviewer #1 (Recommendations for the authors):
  
  The following corrections are requested:
  
  Abstract - Line 29 - Remove the word "solely" from this statement of the abstract. It would be wise to not be so rigid for a biological system that is only partially characterized and to allow for the possibility that biological factors, including local concentrations and/or other molecules, may yet be discovered to impact MinD activation under certain conditions.
  
  We agree and have amended the text to avoid a to restrictive statement.
  
  Line 38 - Remove "do not require any unknown protein component" for the reason stated above. Currently, the experiments recapitulate activation suggesting the membrane binding and release controls dynamics without additional factors. This allows for the possibility that biological factors may yet be shown to impact MinD activation under certain conditions.
  
  We agree and have change the text.
  
  Discussion - Line 526 - Thermus thermophilus is misspelt.
  
  Corrected.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.07.08.602513v3
www.biorxiv.org www.biorxiv.org

Dynamic assembly of malate dehydrogenase-citrate synthase multienzyme complex in the mitochondria

5
1. Public_Reviews 28 May 2026
 
 in eLife
 
 eLife Assessment
 
 This important study provides novel information on multi-enzyme complexes, known as metabolons, that form between sequential enzymes in a metabolic pathway. Using an innovative NanoBiT split-luciferase system, the authors present compelling evidence that malate dehydrogenase (MDH1) and citrate synthase (CIT1) dynamically associate under different metabolic conditions in Saccharomyces cerevisiae. The findings suggest the dynamic MDH1-CIT1 interaction facilitates control of TCA pathway flux rate.
 
 Summary
2. Public_Reviews 28 May 2026
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 The study by the Obata group characterizes the dynamics of the canonical malate dehydrogenase-citrate synthase metabolon in yeast.
 
 Strengths:
 
 The study is well-written and appears to give clear demonstrations of this phenomenon.
 
 Studies of the dynamics of metabolon formation are rare; if the authors can address the concern detailed below, then they have provided such for one of the canonical metabolons in nature.
 
 Weaknesses:
 
 There is a fundamental issue with the study, which is that the authors do not provide enough support or information concerning the split luciferase system that they use. Is the binding reversible or not? How the data is interpreted is massively influenced by this fact. What are the pros and cons of this method in comparison to, for example, FLIM-FRET? The authors state that the method is semi-quantitative - can they document this? All of the conclusions are based on the quality of this method. I know that it has been used by others, but at least some preliminary documentation to address these questions is required.
 
 Comments on revised version:
 
 I feel that the authors have adequately addressed my prior concerns. I have no further critiques of their work.
 
 Review 1
3. Public_Reviews 28 May 2026
 
 in eLife
 
 Reviewer #2 (Public review):
 
 This study explores the dynamic association between malate dehydrogenase (MDH1) and citrate synthase (CIT1) in Saccharomyces cerevisiae, with the aim of linking this interaction to respiratory metabolism. Utilizing a NanoBiT split-luciferase system, the authors monitor protein-protein interactions in vivo under various metabolic conditions.
 
 Major Concerns:
 
 (1) NanoBiT Signal May Reflect Protein Abundance Rather Than Interaction Strength In Figure 1C, the authors report increased MDH1-CIT1 interaction under respiratory (acetate) conditions and decreased interaction during fermentation (glucose), as indicated by NanoBiT luminescence. However, this signal appears to correlate strongly with the expression levels of MDH1 and CIT1, raising the possibility that the observed luminescence reflects protein abundance rather than specific interaction dynamics. To resolve this, NanoBiT signals should be normalized to the expression levels of both proteins to distinguish between abundance-driven and interaction-driven changes.
 
 (2) Lack of Causal Evidence The study presents a series of metabolic perturbation experiments (e.g., arsenite, AOA, antimycin A, malonate) and correlates changes in metabolite levels with NanoBiT signals. However, these data are correlative and do not establish a functional role for the MDH1-CIT1 interaction in metabolic regulation. To demonstrate causality, the authors should implement approaches to specifically disrupt the MDH1-CIT1 interaction. One strategy could involve using a 15-residue peptide (Pept1) derived from the Pro354-Pro366 region of CIT1, previously shown to mediate the interaction or introducing the cit1Δ3 (Arg362Glu) mutation, which perturbs binding. Metabolic flux analysis using ^13C-labeled glucose and mitochondrial respiration assays (e.g., Seahorse) could then assess functional consequences.
 
 (3) Absence of Protein Expression Controls Under Perturbation Conditions In experiments involving acetate, arsenite, AOA, antimycin A, and malonate, the authors infer changes in MDH1-CIT1 association based solely on NanoBiT signals. However, no accompanying data are provided on MDH1 and CIT1 protein levels under these conditions. This omission weakens the conclusions, as altered expression rather than interaction strength could underlie the observed luminescence changes. Immunoblotting or quantitative proteomics should be used to confirm constant protein expression across conditions.
 
 Conclusion:
 
 Although the central question is compelling and the use of NanoBiT in live cells is a strength, the manuscript requires additional experimental rigor. Specifically, normalization of interaction signals, introduction of causative perturbations, and validation of protein expression are essential to substantiate the study's claims.
 
 Comments on revised version:
 
 The manuscript is much improved.
 
 Review 2
4. Public_Reviews 28 May 2026
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 Metabolons are multisubunit complexes that promote the physical association of sequential enzymes within a metabolic pathway. Such complexes are proposed to increase metabolic flux and efficiency by channeling reaction intermediates between enzymes. The TCA cycle enzymes malate dehydrogenase (MDH1) and citrate synthase (CIT1) have been linked to metabolon formation, yet the conditions under which these enzymes interact, and whether such interactions are dynamic in response to metabolic cues, remains unclear, particularly in the native cellular context. This study uses a nanoBIT protein-protein interaction assay to map the dynamic behavior of the MDH1-CIT1 interaction in response to multiple metabolic stimuli and challenges in yeast. Beyond mapping these interactions in real time, the authors also performed GC-MS metabolomics to map whole cell metabolite alterations across experimental conditions. Finally, the authors use microscale thermophoresis to determine components that alter the MDH1-CIT1 interaction in vitro. Collectively, the authors synthesize their collected data into a model in which the MDH1-CIT1 metabolon dissociates in conditions of low respiratory flux, and is stimulated during conditions of high respiratory flux. While their data largely support these models, some key exceptions are found that suggest this model is likely oversimplified and will require further work to understand the complexities associated with MDH1-CIT1 interaction dynamics. Nonetheless, the authors put forth an interesting and timely toolkit to begin to understand the interaction kinetics and dynamics of key metabolic enzymes that should serve as a platform to begin disentangling these important yet understudied aspects of metabolic regulation.
 
 Strengths:
 
 - The authors address an important question: how do metabolon-associated protein protein interactions change across altered metabolic conditions?
 
 - The development and validation of the MDH1-CIT1 nanoBIT assay provides an important tool to allow the quantification of this protein-protein interaction in vivo. Importantly, the authors demonstrate that the assay allows kinetic and real time assessment of these protein interactions, which reveal interesting and dynamic behavior across conditions.
 
 - The use of classic biochemical techniques to confirm that pH and various metabolites can alter the MDH1-CIT1 interaction in vitro is rigorous and supports the model put forth by the authors.
 
 Weaknesses:
 
 The authors have addressed identified weaknesses within the revision of their manuscript.
 
 Review 3
5. Public_Reviews 28 May 2026
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 eLife Assessment
 
 This study reports a dynamic association/dissociation between malate dehydrogenase (MDH1) and citrate synthase (CIT1) in Saccharomyces cerevisiae under different metabolic conditions that control TCA pathway flux rate. The research question is timely, the use of the NanoBiT split-luciferase system to monitor protein-protein interactions is innovative, and the significance of the findings is valuable. However, the strength of evidence needed to support the conclusions was found to be incomplete based on a lack of critical control and mechanistic experiments.
 
 We thank the editor for this thoughtful assessment of our work. We are encouraged that the research question, experimental approach, and overall significance were viewed positively.
 
 To address the concern regarding the strength of evidence, we have implemented additional controls in the revised manuscript. Specifically, we have repeated all MDH1CIT1 interaction measurements alongside strains expressing full-length NanoLUC fusion proteins to assess MDH1 and CIT1 protein abundance. The resulting data, now included as supplementary figures (Figure 2 – figure supplement 2, Figure 2 – figure supplement 3, Figure 3 – figure supplement 1, Figure 4 – figure supplement 2), demonstrate the reproducibility of the findings and indicate that the observed changes in MDH1-CIT1 interaction are not attributable to protein abundance variations.
 
 We agree that a detailed mechanistic dissection of how the MDH1–CIT1 complex influences metabolic pathway flux is an essential piece of evidence for establishing the functions of the metabolon. However, such analyses require extensive additional investigation beyond the scope of the present study. Accordingly, we have clarified the aims of this work in the revised manuscript to emphasize that our primary objective is to characterize the dynamic behavior of the MDH1–CIT1 interaction under different metabolic conditions and to identify key factors associated with its regulation.
 
 We believe these revisions strengthen the rigor of the study, better define its scope, and provide a solid foundation for future mechanistic investigations.
 
 Public Reviews:
 
 Reviewer #1 (Public review):
 
 Summary:
 
 The study by the Obata group characterizes the dynamics of the canonical malate dehydrogenase-citrate synthase metabolon in yeast.
 
 Strengths:
 
 The study is well-written and appears to give clear demonstrations of this phenomenon.
 
 Studies of the dynamics of metabolon formation are rare; if the authors can address the concern detailed below, then they have provided such for one of the canonical metabolons in nature.
 
 We sincerely thank the reviewer for their positive assessment and for recognizing the value of our study in characterizing the dynamics of the MDH1-CIT1 metabolon. We appreciate the recognition that studies of metabolon dynamics are rare and that our work provides a clear demonstration of this phenomenon for a canonical metabolon. We have carefully addressed the methodological concerns regarding the NanoBiT system as detailed below to further strengthen the evidence for our findings.
 
 Weaknesses:
 
 There is a fundamental issue with the study, which is that the authors do not provide enough support or information concerning the split luciferase system that they use.
 
 We agree that a detailed description of the NanoBiT system is essential to ensure the reliability of the methodology. As suggested, we have added a dedicated paragraph to the Introduction (Lines 90–103) to clarify these technical aspects, supported by the foundational work of Dixon et al. (2016).
 
 Is the binding reversible or not? How the data is interpreted is massively influenced by this fact.
 
 Yes, the NanoBiT system is specifically designed to be reversible. The intrinsic affinity of the subunits is low (KD = 190 μM), and the association and dissociation rate constants (kon = 500 M-1s -1, koff = 0.2 s-1) are well outside the range of typical protein-protein interactions (Dixon et al., 2016). These kinetics ensure that the assembly and disassembly of the luminescent complex are dictated solely by the interaction characteristics of the target proteins (MDH1 and CIT1) and not by the tags themselves. This allows for real-time monitoring of both the association and dissociation phases.
 
 What are the pros and cons of this method in comparison to, for example, FLIM-FRET?
 
 We have now explicitly addressed the pros and cons of our methodology compared to fluorescence-based systems:
 
 Pros: The NanoLUC-based reporter is 150 times brighter than conventional luciferases and has a significantly higher dynamic range (Hall et al 2016), allowing detection of weak transient interactions. Importantly for this study, fluorescence-based methods such as FLIM-FRET and BRET are difficult to implement in yeast microplate assays due to the high levels of cellular autofluorescence. NanoBiT bypasses this issue, providing a high signal-tonoise ratio.
 
 Cons: Unlike FRET, NanoBiT requires the application of a substrate (furimazine). We did not include this disadvantage in the manuscript because it is not critical in a yeast study. Furimazine can be applied directly to the medium and readily permeates cells.
 
 The authors state that the method is semi-quantitative - can they document this?
 
 The semi-quantitative nature of the system is supported by its high dynamic range and the linear relationship between the luminescence signal and the amount of protein complex formed, as documented in Dixon et al. (2016). By using this system in a microplate setting, we were able to monitor relative increases or decreases in interaction levels over time across multiple metabolic conditions, providing a robust comparative analysis of metabolon dynamics.
 
 All of the conclusions are based on the quality of this method. I know that it has been used by others, but at least some preliminary documentation to address these questions is required.
 
 We acknowledge the reviewer’s concern regarding the reliance on the NanoBiT system. To ensure the reliability of our conclusions, we have included several lines of evidence to validate the method and demonstrate that the observed luminescence signals accurately reflect protein-protein interaction dynamics.
 
 To confirm the NanoBiT results using an independent biochemical approach, we performed an in vivo pull-down assay following glucose addition (Figure 2 – figure supplement 1A). The results demonstrate a reduction in the physical association between MDH1 and CIT1. This biochemical validation directly supports the reduction in interaction observed with the NanoBiT system during the Crabtree effect.
 
 We have provided protein abundance data for both MDH1 and CIT1 across the experimental conditions (Figure 2 – figure supplement 1&3; Figure 3 – figure supplement 1; Figure 4 – figure supplement 2). These results show only minor changes in protein levels, confirming that the fluctuations in the NanoBiT signal are independent of protein expression and represent genuine changes in metabolon assembly.
 
 To ensure the findings are reproducible, we have included MDH1-CIT1 interaction results from repeated independent experiments (Figure 2 – figure supplement 1&3; Figure 3 – figure supplement 1; Figure 4 – figure supplement 1). The consistency of the results across these trials confirms the robustness of the system in monitoring the metabolic regulation of this complex.
 
 We hope that these additional experimental validations, alongside the detailed technical description based on the established properties of the NanoBiT system (Dixon et al., 2016; Hall et al., 2012), provide the necessary documentation to satisfy the reviewer’s concerns regarding the quality and reliability of the method.
 
 Reviewer #2 (Public review):
 
 This study explores the dynamic association between malate dehydrogenase (MDH1) and citrate synthase (CIT1) in Saccharomyces cerevisiae, with the aim of linking this interaction to respiratory metabolism. Utilizing a NanoBiT split-luciferase system, the authors monitor protein-protein interactions in vivo under various metabolic conditions.
 
 Major Concerns:
 
 (1) NanoBiT Signal May Reflect Protein Abundance Rather Than Interaction Strength
 
 In Figure 1C, the authors report increased MDH1-CIT1 interaction under respiratory (acetate) conditions and decreased interaction during fermentation (glucose), as indicated by NanoBiT luminescence. However, this signal appears to correlate strongly with the expression levels of MDH1 and CIT1, raising the possibility that the observed luminescence reflects protein abundance rather than specific interaction dynamics. To resolve this, NanoBiT signals should be normalized to the expression levels of both proteins to distinguish between abundance-driven and interaction-driven changes.
 
 We agree that distinguishing between abundance-driven and interaction-driven changes is vital. To address this, we have included new data showing the relative protein levels of MDH1 and CIT1 across all experimental conditions. The protein levels were assessed using yeast lines expressing these proteins tagged with full-length NanoLUC luciferase (Figure 2 – figure supplement 1&3, Figure 3 - figure supplement 1, Figure 4 – figure supplement 2). Using the luminescence data of these relative protein levels, we have included plots showing normalized interaction index (Figure 2 – figure supplement 1G & 3D,H,L; Figure 3 - figure supplement 1D,H,L P; Figure 4 – figure supplement 1D,H,L). This index was calculated by dividing the NanoBiT interaction signal by the product of the relative abundances of both proteins:
 
 In this formula, NanoBiT, MDH1, and CIT1 are the relative luminescence levels at each time point. This analysis clarified that the changes in the interaction signal significantly exceeded the fluctuations in protein levels, confirming that the dynamics are interactionspecific and not abundance-driven. To provide the most direct and transparent representation of the experimental measurements, we have chosen to keep the raw RLU data in the main figures and have moved the data related to protein abundance and normalization to figure supplements.
 
 (2) Lack of Causal Evidence
 
 The study presents a series of metabolic perturbation experiments (e.g., arsenite, AOA, antimycin A, malonate) and correlates changes in metabolite levels with NanoBiT signals. However, these data are correlative and do not establish a functional role for the MDH1CIT1 interaction in metabolic regulation. To demonstrate causality, the authors should implement approaches to specifically disrupt the MDH1-CIT1 interaction. One strategy could involve using a 15-residue peptide (Pept1) derived from the Pro354-Pro366 region of CIT1, previously shown to mediate the interaction, or introducing the cit1Δ3 (Arg362Glu) mutation, which perturbs binding. Metabolic flux analysis using ^13C-labeled glucose and mitochondrial respiration assays (e.g., Seahorse) could then assess functional consequences.
 
 We agree with the reviewer that the current dataset correlates metabolon assembly with metabolic states rather than establishing a direct causal proof of its functional role in regulating pathway flux.
 
 However, the primary objective of this manuscript was to establish the dynamic nature of the MDH1-CIT1 metabolon and to demonstrate the causal relationship between the changes in cellular conditions and metabolon dynamics through in vitro and in vivo assessments. Demonstrating that this canonical multienzyme complex undergoes reversible assembly and disassembly in vivo represents a major advance, as metabolon dynamics is a critical, yet previously unrevealed, factor involved in metabolic regulation. We aimed to define the specific environmental triggers that govern these dynamics, providing the necessary foundation for defining the functions of metabolons.
 
 We completely agree that establishing causality using interaction-deficient mutants coupled with metabolic flux analysis is another critical experiment to establish the functions of the TCA cycle metabolon. We have, in fact, been conducting these precise metabolic flux analyses on CIT1 mutants with disrupted interaction with MDH1. Because the functional consequences of complex disruption involve wide-reaching metabolic rerouting that requires extensive data presentation and modeling, this work forms a separate, comprehensive follow-up study that is currently in preparation for submission in the near future.
 
 To address this limitation in the current manuscript, we have carefully reviewed and revised the Abstract, Results, Discussion, and Conclusion sections (Lines 19-22; 205; 322-327; 341-342; 458-466). We have removed any language that may have inadvertently implied direct causality. We now explicitly state that our findings indicate the relationship between metabolon dynamics and respiratory conditions, and we have added a clear statement noting that the direct effects of this assembly on metabolic flux are the focus of our forthcoming studies.
 
 (3) Absence of Protein Expression Controls Under Perturbation Conditions
 
 In experiments involving acetate, arsenite, AOA, antimycin A, and malonate, the authors infer changes in MDH1-CIT1 association based solely on NanoBiT signals. However, no accompanying data are provided on MDH1 and CIT1 protein levels under these conditions. This omission weakens the conclusions, as altered expression rather than interaction strength could underlie the observed luminescence changes. Immunoblotting or quantitative proteomics should be used to confirm constant protein expression across conditions.
 
 In response to your first concern, we have now performed protein expression assessments for all experiments, including the perturbation conditions, such as acetate, arsenite, AOA (Figure 3 – figure supplement 1), antimycin A, cyanide, and malonate (Figure 4 – figure supplement 2). The results demonstrate that the protein levels of MDH1 and CIT1 remain relatively stable throughout these treatments and do not correlate with the large changes observed in the interaction signals. This is also demonstrated by the normalized interaction index, which confirms that the shifts in luminescence are driven by the dynamic assembly and disassembly of the MDH1-CIT1 metabolon rather than changes in protein concentrations.
 
 Conclusion:
 
 Although the central question is compelling and the use of NanoBiT in live cells is a strength, the manuscript requires additional experimental rigor. Specifically, normalization of interaction signals, introduction of causative perturbations, and validation of protein expression are essential to substantiate the study's claims.
 
 We sincerely thank the reviewer for recognizing the value of our central question and the strength of the live-cell NanoBiT system, as well as for your rigorous critique that has strengthened this manuscript. To address the concerns regarding experimental rigor, we have now provided extensive validation of MDH1 and CIT1 protein expression across all experimental conditions using yeast lines tagged with the full-length NanoLUC luciferase. These data demonstrate relatively stable protein expression, allowing us to calculate a normalized interaction index that substantiates that the observed luminescence shifts are driven by dynamic metabolon assembly rather than protein concentration. Regarding causative perturbations, we agree that introducing interaction-deficient mutants coupled with isotopic flux analysis is the critical next step to establish functional consequences. Because defining these pathway-wide rerouting events requires extensive modeling, this work will be reported in a follow-up study currently in preparation. Accordingly, we have carefully revised the manuscript to remove language implying direct causality, explicitly framing metabolon dynamics as an integral factor in metabolic regulation closely related to pathway activity and cellular metabolic states. We believe these new quantitative controls, normalizations, and textual clarifications thoroughly address the need for additional rigor and solidly substantiate our findings.
 
 Reviewer #3 (Public review):
 
 Summary:
 
 Metabolons are multisubunit complexes that promote the physical association of sequential enzymes within a metabolic pathway. Such complexes are proposed to increase metabolic flux and efficiency by channeling reaction intermediates between enzymes. The TCA cycle enzymes malate dehydrogenase (MDH1) and citrate synthase (CIT1) have been linked to metabolon formation, yet the conditions under which these enzymes interact, and whether such interactions are dynamic in response to metabolic cues, remain unclear, particularly in the native cellular context. This study uses a nanoBIT protein-protein interaction assay to map the dynamic behavior of the MDH1-CIT1 interaction in response to multiple metabolic stimuli and challenges in yeast. Beyond mapping these interactions in real time, the authors also performed GC-MS metabolomics to map whole-cell metabolite alterations across experimental conditions. Finally, the authors use microscale thermophoresis to determine components that alter the MDH1-CIT1 interaction in vitro. Collectively, the authors synthesize their collected data into a model in which the MDH1CIT1 metabolon dissociates in conditions of low respiratory flux, and is stimulated during conditions of high respiratory flux. While their data largely support these models, some key exceptions are found that suggest this model is likely oversimplified and will require further work to understand the complexities associated with MDH1-CIT1 interaction dynamics. Nonetheless, the authors put forth an interesting and timely toolkit to begin to understand the interaction kinetics and dynamics of key metabolic enzymes that should serve as a platform to begin disentangling these important yet understudied aspects of metabolic regulation.
 
 We thank the reviewer for this thoughtful and constructive summary of our work. We appreciate the recognition of the novelty and utility of our experimental approach and the integrated analysis of MDH1–CIT1 interaction dynamics.
 
 We agree with the reviewer that, although our data largely support a model in which MDH1– CIT1 interaction correlates with respiratory activity, there are conditions that do not fully conform to this simplified framework. In the revised manuscript, we have addressed these apparent inconsistencies by providing detailed interpretations of the counterintuitive observations (e.g., ETC inhibition) and emphasizing that the MDH1–CIT1 interaction is modulated by changes in the mitochondrial matrix microenvironment associated with respiratory activity.
 
 Furthermore, we have revised the Discussion to highlight that the regulation of the MDH1– CIT1 interaction is likely multifactorial, involving the combined effects of pH, metabolites, and other unknown factors, which together enable fine-tuning of metabolic flux in fluctuating environments. This expanded perspective is now more clarified.
 
 We agree that identifying the precise molecular determinants of MDH1–CIT1 interaction dynamics will require additional mechanistic studies, such as systematic analyses using yeast mutants. While these experiments are an important next step, they are beyond the scope of the present study. We anticipate that the toolkit and framework established here will facilitate such future investigations.
 
 Strengths:
 
 (1) The authors address an important question: how do metabolon-associated proteinprotein interactions change across altered metabolic conditions?
 
 (2) The development and validation of the MDH1-CIT1 nanoBIT assay provides an important tool to allow the quantification of this protein-protein interaction in vivo. Importantly, the authors demonstrate that the assay allows kinetic and real time assessment of these protein interactions, which reveal interesting and dynamic behavior across conditions.
 
 (3) The use of classic biochemical techniques to confirm that pH and various metabolites can alter the MDH1-CIT1 interaction in vitro is rigorous and supports the model put forth by the authors.
 
 We thank the reviewer for these positive and encouraging comments. We are pleased that the importance of the research question, the development of the MDH1–CIT1 NanoBiT assay, and the integration of in vivo and in vitro approaches were recognized. We especially appreciate the acknowledgment of the assay’s ability to capture dynamic and kinetic changes in protein–protein interactions, as well as the support provided by the biochemical analyses. We hope that the experimental framework established in this study will serve as a useful platform for further investigations into metabolon dynamics and metabolic regulation.
 
 Weaknesses:
 
 (1) Some of the data collected seem to be merely reported rather than synthesized and interpreted for the reader.
 
 We agree that explicitly synthesizing these findings is essential for clarity. To improve this, we have revised the Results section to include concise summary statements at the conclusion of each major experimental paragraph (Lines 190-191, 201, 218-219, 229-231, 241-242, 272-274, 282-283; 291-293). These additions interpret the data in relation to our main hypothesis. The discussion section was thoroughly revised to more precisely explain the logic supporting the model (Lines 381-393; 433-443, 458-466). Additionally, to bring together the entire dataset, we introduced a new summary schematic (Figure 6A). This figure visually and conceptually integrates our diverse findings, covering metabolic treatments, pH fluctuations, and complex metabolite profiles, showing how these signals work together to control multienzyme complex assembly.
 
 This is particularly true for data that seem to reflect more complex trends, such as the GCMS experiments that map metabolites across multiple experiments, or treatments that show somewhat counterintuitive results, such as the antimycin A treatment, which promotes rather than disrupts the MDH1-CIT1 interaction.
 
 We agree that our complex datasets, including the metabolomics and the seemingly counterintuitive Antimycin A results, required deeper synthesis. To clarify the broader metabolic trends, we have added Figure 6A to visually map which factors, specifically pH, malate, fumarate, and aspartate, most consistently align with complex assembly. We revised the Discussion (Lines 390-393, 439-443) to explicitly conclude that no single variable predominantly governs the interaction, but it is coordinately regulated by multiple microenvironmental cues.
 
 Regarding the Antimycin A (and other ETC inhibitors) discrepancy, where the interaction is enhanced despite suppressed respiration, we have expanded our interpretation (Lines 346–358) to explain this as a transient response that is not directly reflected by steadystate respiratory activity. Specifically, we propose that acute perturbations of the mitochondrial matrix microenvironment, particularly changes in pH, temporarily promote MDH1–CIT1 interaction. Thus, under these conditions, transient microenvironmental changes can dominate over steady-state respiratory output in regulating metabolon assembly.
 
 The discussion paragraph about the imperfect relationship between pH and interaction has been revised to highlight our conclusion that mitochondrial matrix pH can be a contributing factor rather than the primary regulator (Lines 386-393).
 
 (2) Some of the assertions put forth in the manuscript are not substantiated by the data presented, and the authors are at times overly reliant on previous findings from the literature to support their claims. This is particularly notable for claims about "TCA cycle flux"; the authors do not perform flux analysis anywhere in their study and should be cautious when insinuating correlations between their observations and "flux".
 
 We appreciate the reviewer’s careful evaluation of our terminology and fully agree that claims regarding "flux" should be reserved for studies that employ direct isotopic flux measurements. In response to this constructive feedback, we have thoroughly reviewed the manuscript to ensure that our assertions are substantiated by the presented experimental data. We have carefully evaluated the use of the term "flux" throughout the Abstract, Introduction, and Discussion, replacing it with more accurate phrases such as "pathway activity," "respiratory activity," or "mitochondrial respiration" depending on the specific context (Lines 11; 20-21; 50; 111-112; 322-327; 329; 345; 349-350; 442-443; 458466).
 
 We also removed a paragraph discussing the potential role of the MDH1-CIT1 metabolon in the malate-aspartate shuttle (Line 361). We realized the paragraph is highly speculative, and our data do not directly support the hypothesis. The influence of the MDH1-CIT1 on the malate-aspartate shuttle is a major finding of the upcoming manuscript reporting its effects in metabolic network flux. We apologize for mixing up the results of two separate studies.
 
 Furthermore, we have revised our conclusions to avoid over-reliance on prior literature in making causal claims. We now explicitly frame the dynamic assembly of the MDH1-CIT1 metabolon as an integral factor in metabolic regulation, closely related to cellular metabolic states, rather than stating that it controls pathway flux (Lines 454-462). We believe these textual revisions accurately align our claims with our current observations and remove any unsubstantiated assertions.
 
 (3) The manuscript presentation could be improved. For figures, at times, the axes do not have intuitive labels (example, Figure 1A), data points and details about the number of samples analyzed are missing (bar graphs and box plots), and molecular weight markers are not reported on western blots. The authors refer to the figures out of order in the text, which makes the manuscript challenging to navigate as a reader.
 
 We thank the reviewer for these helpful suggestions to improve the clarity and presentation of the manuscript. We have made several revisions accordingly.
 
 First, axis labels have been revised throughout the figures to improve clarity and make them more intuitive. Second, we have added the number of biological replicates to the figure captions and updated bar graphs and box plots to display individual data points. Third, to improve the transparency of the immunoblot data, we have included molecular weight marker position in Figure 1C and corresponding full gel images in a new Figure 1 – figure supplement 2. Other immunoblot images have been moved to Figure 2 – figure supplement 1 since they lack molecular marker images.
 
 In addition, we have reorganized the figure panel labeling and corresponding text to improve the flow of the Results section. Specifically, figure subpanels are now arranged according to the measured parameters rather than treatment conditions, and the relevant sections describing TCA cycle manipulation and ETC inhibition have been revised to follow this updated figure order (Lines 208–231; 251–274). These changes improve the readability and logical progression of the manuscript.
 
 Recommendations for the authors:
 
 Reviewer #1 (Recommendations for the authors):
 
 The grammar in the abstract in the sentence which states called metabolon. This needs to be fixed.
 
 We thank the reviewer for pointing this out. We have revised the sentence in the Abstract to improve clarity. The revised sentence reads: “The tricarboxylic acid (TCA) cycle enzymes malate dehydrogenase (MDH1) and citrate synthase (CIT1) form a multienzyme complex, referred to as a metabolon, that channels intermediate oxaloacetate between their reaction centers.” (Lines 7-9)
 
 Reviewer #3 (Recommendations for the authors):
 
 Major points:
 
 (1) Much of the data reported in this manuscript reads as a summary of what was found, rather than distilling what the trends in the data mean or how they support the proposed model.
 
 We thank the reviewer for this comment. This concern overlaps with your previous point (Weakness 1), which we have addressed through revisions to improve synthesis and clarity. Specifically, we have added concise summary statements at the end of each major experimental section (Lines 190-191, 201, 218-219, 229-231, 241-242, 272-274, 282-283; 291-293), and we have included a new summary schematic (Figure 6A) that integrates the findings to illustrate how metabolic conditions and mitochondrial microenvironments relat to MDH1–CIT1 interaction. Together, these revisions improve the interpretation and clarify how the results support our model.
 
 For instance, in Figure 3, the authors use one metabolic treatment to activate the TCA cycle and two to inhibit the TCA cycle. In Figure 3M, GC-MS data are reported for select metabolites across these three conditions, as well as a control condition. However, these metabolites don't follow clean "trends" according to the predictions; as one example, malate is down in the TCA active (acetate) and one TCA inhibited condition (arsenite), whereas it is elevated in the second TCA inhibited (aminooxyacetate) condition. As an additional example, glutamate is down in the arsenite (inhibited) condition, slightly down in the acetate (activated) condition, but is unchanged in the AOA (inhibited) condition. Similar variability is seen in Figure 4M. What do these discrepancies mean? How do they support the model? As written, these data bring forth more questions than they answer.
 
 We appreciate the reviewer’s careful analysis of the metabolomics data in Figures 2E, 3M, and 4M. The reviewer notes that the levels of certain metabolites show complex patterns that do not simply reflect overall TCA cycle activity. We have acknowledged that our metabolomics dataset is a valuable resource for the research community and have added a brief paragraph to emphasize the complex metabolic phenotypes resulting from chemical treatments (Lines 422-431).
 
 As mentioned in the paragraph, this complexity is biologically expected. It is likely from the distinct primary targets of each inhibitor, such as arsenite affecting redox-sensitive enzymes and AOA disrupting the malate-aspartate shuttle, as well as off-target effects and the adaptive reorganization of intersecting metabolic networks to bypass local blockades. Rather than viewing these diverse metabolic phenotypes as discrepancies, we leveraged them to uncouple general respiratory suppression from specific metabolite pools, allowing us to independently assess their relationship with metabolon assembly.
 
 Furthermore, we note that our GC-MS analysis measures whole-cell metabolite levels, which represent the sum of multiple subcellular compartments and may not precisely reflect localized concentrations within the mitochondrial matrix that is directly affected by the TCA cycle. The description of this limitation of whole-cell metabolomics has been revised in Lines 417-420.
 
 (2) Why do the authors propose that antimycin A increases the interaction between MDH1 and CIT1 despite decreasing respiratory activity? Given the generalities proposed in Figure 6, this is important to address.
 
 We thank the reviewer for this comment. This point overlaps with Weakness 1, where we have addressed the apparent discrepancy associated with antimycin A (and other ETC inhibitors). Briefly, we have expanded our interpretation (Lines 349–360) to explain this effect as a transient response that is not directly aligned with steady-state respiratory activity. We propose that acute perturbations of the mitochondrial matrix microenvironment, particularly changes in pH, temporarily promote MDH1–CIT1 interaction. In addition, we have revised the Discussion (Lines 386–404) to clarify that mitochondrial matrix pH acts as a contributing factor rather than the primary regulator of the interaction. Together, these revisions reconcile the ETC inhibition by antimycin A with the overall model presented in Figure 6.
 
 (3) The authors use acetate to "activate" the TCA cycle; do other non-fermentable carbon sources also promote the MDH1-CIT1 interaction?
 
 We thank the reviewer for this insightful question. We have tested additional nonfermentable carbon sources and found that they did not significantly affect MDH1–CIT1 interaction (Figure 3—figure supplement 1). We note that raffinose present in the medium likely provides a baseline carbon source supporting oxidative metabolism, which may limit the observable effects of these treatments (Lines 149-150).
 
 In addition, we performed a new experiment using ethanol. While ethanol treatment enhanced the MDH1–CIT1 interaction signal, it also increased the abundance of MDH1 and CIT1, resulting in a reduced interaction index. Because ethanol induces protein accumulation under our experimental conditions, this result is not straightforward to interpret. We have included this observation and its interpretation in the revised manuscript (Lines 208–211).
 
 (4) The authors show that the MDH1-CIT1 interaction is sensitive to pH. Is the MDH1-CIT1 interaction affected by uncouplers in vivo?
 
 We thank the reviewer for suggesting a meaningful experiment. We performed a new experiment examining the effect of the uncoupler CCCP on MDH1–CIT1 interaction in vivo (Figure 4—figure supplement 4). We found that CCCP treatment increased the interaction signal, consistent with the idea that acidification of the mitochondrial matrix promotes MDH1–CIT1 association.
 
 However, we observe that CCCP treatment also decreased the luciferase signals from MDH1 and CIT1 fused to full-length NanoLUC in an abnormal way, making it harder to interpret the interaction index. Therefore, although these results support a possible role for pH in regulating the interaction, they should be viewed with caution and included as a figure supplement. This experiment and its interpretation have been added to the revised manuscript (Lines 276–283).
 
 (5) NADH is a potent suppressor of many enzymes within the TCA cycle, including MDH1 and CIT1. Can the authors modulate mitochondrial NADH through genetic manipulation of Ndi1, or through overexpression of mito-Lb-NOX (PMID: 27124460)?
 
 We thank the reviewer for this insightful suggestion. We agree that the mitochondrial NADH is a potential regulator of the MDH1-CIT1 interaction as it is a potent suppressor of many TCA cycle enzymes, and indeed, we have previously shown that NADH inhibit the MDH-CS interaction in vitro (Omini et al 2021 PMID: 34548590). For this reason, we investigated the mitochondrial matrix redox state that is related to the NADH levels in the current study. The reviewer’s proposed strategy of using targeted genetic tools like mito-Lb-NOX or Ndi1 manipulation to specifically influence the NADH level is an elegant approach to isolate this variable. However, implementing this system requires generating, optimizing, and validating new yeast strains that harbor the targeted NADH-modulating constructs alongside NanoBiT and full-length NanoLUC sensor systems. Because this extensive strain engineering and subsequent live-cell validation fall outside a feasible timeframe for the current manuscript revision, we must respectfully defer these experiments. We view the precise manipulation of the mitochondrial redox state via tools like mito-Lb-NOX as a complementary approach for our future work to systematically pinpoint the individual regulatory factors. We have expanded our Discussion (Lines 417-420; 462-465) to highlight the targeted genetic manipulation of the possible regulatory factors including the NADH pool, as a critical future direction for dissecting these dynamics.
 
 (6) The authors should correct their figures:
 
 (a) Axes should be easy to interpret on graphs.
 
 (b) Individual datapoints should be shown on bar graphs and box plots. Minimally, the number of samples evaluated should be reported.
 
 (c) Molecular weight markers should be reported on blots.
 
 We thank the reviewer for these helpful suggestions. Points (a) and (b) overlap with Weakness 3, which we have addressed through revisions to improve figure clarity and data presentation. Specifically, axis labels have been revised to be more intuitive, the number of samples is now reported in the figure captions, and bar and box plots have been updated to include individual data points. For time-course data, we retained point-line plots, as alternative formats (e.g., bar or box plots) would reduce clarity due to the density of time points.
 
 For point (c), we have added molecular weight markers to the immunoblot data where available (Figure 1C). In the time-course experiment in the original Figure 2, molecular weight markers were absent from the gel images. Although we are confident in the identity of the detected signals, we have moved these data to a figure supplement (Figure 2—figure supplement 1C) to reflect this limitation. Similarly, the corresponding Co-IP data are now presented as a figure supplement (Figure 2—figure supplement 1A).
 
 Minor points:
 
 (1) In the last paragraph before the results, the authors refer to "the fluorescent biosensors", but start the paragraph discussing the nanoBIT PPI. After reading the manuscript, these seem to be distinct experimental setups, but that was not evident in the first read through of the paper.
 
 We thank the reviewer for pointing out this source of confusion. We apologize for the lack of clarity in distinguishing between the experimental approaches. In this study, the NanoBiT system was used to measure MDH1–CIT1 interaction, whereas fluorescent biosensors were used to assess mitochondrial matrix pH, redox state, and ATP levels. We have revised the paragraph to more clearly distinguish these methodologies and their respective roles in the study (Lines 105–112).
 
 (2) As mentioned above, referring to multiple figures out of order within the manuscript is very jarring for the reader. The authors should consider reworking the narrative or figures to be presented in order.
 
 We thank the reviewer for this comment. This concern overlaps with the previous comment regarding figure organization, which we have addressed by revising both the figure labeling and the corresponding text. Specifically, figure subpanels have been reorganized to follow the measured parameters rather than treatment conditions, and the Results sections describing TCA cycle manipulation and ETC inhibition have been revised to follow the updated figure order (Lines 208–231; 251–274). These changes improve the logical flow and readability of the manuscript.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.06.16.659985v2

Public_Reviews

Annotations: 10,000

Joined: March 17, 2021

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators