10,000 Matching Annotations

Jun 2025
www.biorxiv.org www.biorxiv.org

The increase in cell volume and nuclear number of the koji-fungus Aspergillus oryzae contributes to its high enzyme productivity

1
1. Public_Reviews 11 Jun 2025
  
  in eLife
  
  Author response:
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Filamentous fungi are established workhorses in biotechnology, with Aspergillus oryzae as a prominent example with a thousand-year history. Still, the cell biology and biochemical properties of the production strains is not well understood. The paper of the Takeshita group describes the change in nuclear numbers and correlates it to different production capacities. They used microfluidic devices to really correlate the production with nuclear numbers. In addition, they used microdissection to understand expression profile changes and found an increase in ribosomes. The analysis of two genes involved in cell volume control in S. pombe did not reveal conclusive answers to explain the phenomenon. It appears that it is a multi-trait phenotype. Finally, they identified SNPs in many industrial strains and tried to correlate them to the capability of increasing their nuclear numbers.
  
  The methods used in the paper range from high-quality cell biology, Raman spectroscopy, to atomic force and electron microscopy, and from laser microdissection to the use of microfluidic devices to study individual hyphae.
  
  This is a very interesting, biotechnologically relevant paper with the application of excellent cell biology. I have only minor suggestions for improvement.
  
  We sincerely appreciate your fair and positive evaluation of our work. Thank you for your suggestions for improvement. We respond to each of them appropriately.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  In the study presented by Itani and colleagues, it is shown that some strains of Aspergillus oryzae - especially those used industrially for the production of sake and soy sauce - develop hyphae with a significantly increased number of nuclei and cell volume over time. These thick hyphae are formed by branching from normal hyphae and grow faster and therefore dominate the colonies. The number of nuclei positively correlates with the thicker hyphae and also the amount of secreted enzymes. The addition of nutrients such as yeast extract or certain amino acids enhanced this effect. Genome and transcriptome analyses identified genes, including rseA, that are associated with the increased number of nuclei and enzyme production. The authors conclude from their data involvement of glycosyltransferases, calcium channels, and the tor regulatory cascade in the regulation of cell volume and number of nuclei. Thicker hyphae and an increased number of nuclei were also observed in high-production strains of other industrially used fungi such as Trichoderma reesei and Penicillium chrysogenum, leading to the hypothesis that the mentioned phenotypes are characteristic of production strains, which is of significant interest for fungal biotechnology.
  
  Strengths:
  
  The study is very comprehensive and involves the application of diverse state-of-the-art cell biological, biochemical, and genetic methods. Overall, the data are properly controlled and analyzed, figures and movies are of excellent quality.
  
  The results are particularly interesting with regard to the elucidation of molecular mechanisms that regulate the size of fungal hyphae and their number of nuclei. For this, the authors have discovered a very good model: (regular) strains with a low number of nuclei and strains with a high number of nuclei. Also, the results can be expected to be of interest for the further optimization of industrially relevant filamentous fungi.
  
  Weaknesses:
  
  There are only a few open questions concerning the activity of the many nuclei in production strains (active versus inactive), their number of chromosomes (haploid/diploid), and whether hyper-branching always leads to propagation of nuclei.
  
  We are very grateful for your recognition of our findings, the proposed model, and their significance for future applications. We are grateful for the questions, which contribute to a more accurate understanding.
  
  Our responses to each are provided below. Necessary experiments are in progress.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The authors seek to determine the underlying traits that support the exceptional capacity of Aspergillus oryzae to secrete enzymes and heterologous proteins. To do so, they leverage the availability of multiple domesticated isolates of A. oryzae along with other Aspergillus species to perform comparative imaging and genomic analysis.
  
  Strengths:
  
  The strength of this study lies in the use of multifaceted approaches to identify significant differences in hyphal morphology that correlate with enzyme secretion, which is then followed by the use of genomics to identify candidate functions that underlie these differences.
  
  Weaknesses:
  
  There are aspects of the methods that would benefit from the inclusion of more detail on how experiments were performed and data interpreted.
  
  Overall, the authors have achieved their aims in that they are able to clearly document the presence of two distinct hyphal forms in A. oryzae and other Aspergillus species, and to correlate the presence of the thicker, rapidly growing form with enhanced enzyme secretion. The image analysis is convincing. The discovery that the addition of yeast extract and specific amino acids can stimulate the formation of the novel hyphal form is also notable. Although the conclusions are generally supported by the results, this is perhaps less so for the genetic analysis as it remains unclear how direct the role of RseA and the calcium transporters might be in supporting the formation of the thicker hyphae.
  
  The results presented here will impact the field. The complexity of hyphal morphology and how it affects secretion is not well understood despite the importance of these processes for the fungal lifestyle. In addition, the description of approaches that can be used to facilitate the study of these different hyphal forms (i.e., stimulation using yeast extract or specific amino acids) will benefit future efforts to understand the molecular basis of their formation.
  
  We are very grateful for your fair and thoughtful evaluation of our work. We agree that the genetic analysis in the latter part is relatively weaker compared to the imaging analysis in the first half. Rather than a single mutation causing a dramatic phenotypic change, we believe that the accumulation of various mutations through breeding leads to the observed phenotype, making it difficult to clearly demonstrate causality. Since transcriptome and SNP analyses have revealed key pathways and phenotypes, it would be gratifying if these insights could contribute to future applications utilizing filamentous fungi.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.04.06.647446v1
www.biorxiv.org www.biorxiv.org

CCDC32 stabilizes clathrin-coated pits and drives their invagination

4
1. Public_Reviews 11 Jun 2025
 
 in eLife
 
 eLife Assessment
 
 The manuscript presents a valuable finding that CCDC32, beyond its reported role in AP2 assembly, follows AP2 to the plasma membrane and regulates clathrin-coated pit assembly and dynamics. The authors further suggest that the alpha-helical region of CCDC32 interacts with AP2 via the alpha appendage domain to mediate this function. While live-cell and ultrastructural imaging data are solid, future biochemical studies will be needed to confirm the proposed CCDC32-AP2 interaction.
 
 [Editors' note: this paper was reviewed by Review Commons.]
 
 Summary
2. Public_Reviews 11 Jun 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 This is a revision of a manuscript previously submitted to Review Commons. The authors have partially addressed my comments, mainly by expanding the introduction and discussion sections. Sandy Schmid, a leading expert on the AP2 adaptor and CME, has been added as a co-corresponding author. The main message of the manuscript remains unchanged. Through overexpression of fluorescently tagged CCDC32, the authors propose that, in addition to its established role in AP2 assembly, CCDC32 also follows AP2 to the plasma membrane and regulates CCP maturation. The manuscript presents some interesting ideas, but there are still concerns regarding data inconsistencies and gaps in the evidence.
 
 (1) eGFP-CCDC32 was expressed at 5-10 times higher levels than endogenous CCDC32. This high expression can artificially drive CCDC32 to the cell surface via binding to the alpha appendage domain (AD)-an interaction that may not occur under physiological conditions.
 
 (2) Which region of CCDC32 mediates alpha AD binding? Strangely, the only mutant tested in this work, Δ78-98, still binds AP2, but shifts to binding only mu and beta. If the authors claim that CCDC32 is recruited to mature AP2 via the alpha AD, then a mutant deficient in alpha AD binding should not bind AP2 at all. Such a mutant is critical for establish the model proposed in this work.
 
 (3) The concept of hemicomplexes is introduced abruptly. What is the evidence that such hemicomplexes exist? If CCDC32 binds to hemicomplexes, this must occur in the cytosol, as only mature AP2 tetramers are recruited to the plasma membrane. The authors state that CCDC32 binds the AD of alpha but not beta, so how can the Δ78-98 mutant bind mu and beta?
 
 (4) The reported ability of CCDC32 to pull down AP2 beta is puzzling. Beta is not found in the CCDC32 interactome in two independent studies using 293 and HCT116 cells (BioPlex). In addition, clathrin is also absent in the interactome of CCDC32, which is difficult to reconcile with a proposed role in CCPs. Can the authors detect CCDC32 binding to clathrin?
 
 (5) Figure 5B appears unusual-is this a chimera? Figure 5C likely reflects a mixture of immature and mature AP2 adaptor complexes.
 
 (6) CCDC32 is reduced by about half in siRNA knockdown. Why not use CRISPR to completely eliminate CCDC32 expression?
 
 Review 1
3. Public_Reviews 11 Jun 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Yang et al. describes CCDC32 as a new clathrin mediated endocytosis (CME) accessory protein. The authors show that CCDC32 binds directly to AP2 via a small alpha helical region and cells depleted for this protein show defective CME. Finally, the authors show that the CCDC32 nonsense mutations found in patients with cardio-facial-neuro-developmental syndrome (CFNDS) disrupt the interaction of this protein to the AP2 complex. The results presented suggest that CCDC32 may act as both a chaperone (as recently published) and a structural component of the AP2 complex.
 
 Strengths: The conclusions presented are generally well supported by experimental data and the authors carefully point out the differences between their results and the results by Wan et al. (PNAS 2024).
 
 Weaknesses: The experiments regarding the role of CCDC32 in CFNDS still require some clarifications to make them clearer to scientists working on this disease. The authors fail to describe that the CCDC32 isoform they use in their studies is different from the one used when CFNDS patient mutations were described. This may create some confusion. Also, the authors did not discuss that the frame-shift mutations in patients may be leading to nonsense mediated decay.
 
 Review 2
4. Public_Reviews 11 Jun 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 In this manuscript, Yang et al. characterize the endocytic accessory protein CCDC32, which has implications in cardio-facio-neuro-developmental syndrome (CFNDS). The authors clearly demonstrate that the protein CCDC32 has a role in the early stages of endocytosis, mainly through the interaction with the major endocytic adaptor protein AP2, and they identify regions taking part in this recognition. Through live cell fluorescence imaging and electron microscopy of endocytic pits, the authors characterize the lifetimes of endocytic sites, the formation rate of endocytic sites and pits and the invagination depth, in addition to transferrin receptor (TfnR) uptake experiments. Binding between CCDC32 and CCDC32 mutants to the AP2 alpha appendage domain is assessed by pull down experiments. While interaction between CCDC32 and the alpha appendage domain of AP2 is clearly described, a discussion of potential association with other AP2 domains would be beneficial to understand the impact of CCDC32 in endocytosis.
 
 Together, these experiments allow deriving a phenotype of CCDC32 knock-down and CCDC32 mutants within endocytosis, which is a very robust system, in which defects are not so easily detected. A mutation of CCDC32, mimicking CFNDS mutations, is also addressed in this study and shown to have endocytic defects.
 
 In summary, the authors present a strong combination of techniques, assessing the impact of CCDC32 in clathrin mediated endocytosis and its binding to AP2.
 
 Review 3
Visit annotations in context

Tags

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.06.26.600785v3
www.biorxiv.org www.biorxiv.org

Theory of non-dilute binding and surface phase separation applied to membrane-binding proteins

4
1. Public_Reviews 11 Jun 2025
  
  in eLife
  
  Author Response:
  
  We sincerely thank the reviewers and the editorial team for their thoughtful and constructive evaluation of our manuscript. We are very pleased that both reviewers and the Reviewing Editor found the work to be compelling and of interest to the community studying membrane-associated condensates. Below we outline our planned revisions in response to the public reviews.
  
  Reviewer #1
  
  We appreciate Reviewer #1’s positive evaluation of the study’s significance and the utility of our theoretical framework.
  
  Understandably, the authors used one system to test their theory (ZO-1). However, to establish a theoretical framework, this is sufficient.
  
  Response: We acknowledge this limitation. While we agree that additional systems would strengthen the generality of our theory, we note that the focus of this work is to introduce and validate a theoretical framework. As the reviewer notes, this is sufficient for establishing the framework. Nonetheless, we are open to further collaborations or future studies to test the model with other systems.
  
  Reviewer #2
  
  We are grateful for Reviewer #2’s detailed comments and will address each of the points as follows:
  
  In the theoretical section, what has previously been known, compared to which equations are new, should be made more clear.
  
  Response: We will revise the theory section to clearly distinguish previously established formulations from novel contributions.
  
  Some assumptions in the model are made purely for convenience and without sufficient accompanying physical justification. E.g., the authors should justify, on physical grounds, why binding rate effects are/could be larger than the other fluxes.
  
  Response: We will expand the discussion to provide key physical justification, especially to explain why binding rate effects are/could be larger than the other fluxes.
  
  I feel that further mechanistic explanation as to why bulk phase separation widens the regime of surface phase separation is warranted.
  
  Response: We will elaborate on the mechanism underlying this coupling.
  
  The major advantage of the non-dilute theory as compared with a best parameterized dilute (or homogenous) theory requires further clarification/evidence with respect to capturing the experimental data.
  
  Response: We will clarify this comparison more explicitly and highlight how the non-dilute model captures key nonlinear behaviors and concentration-dependent adsorption phenomena that the dilute model fails to reproduce.
  
  Discrete (particle-based) molecular modelling could help to delineate the quantitative improvements that the non-dilute theory has over the previous state-of-the-art. Also, this could help test theoretical statements regarding the roles of bulk-phase separation, which were not explored experimentally.
  
  Response: We appreciate the suggestion and agree that such modeling would be valuable. However, this is beyond the scope of the current study. We will add a discussion on how discrete simulations could be used to further test our theory in future work.
  
  Discussion of the caveats and limitations of the theory and modelling is missing from the text.
  
  Response: We will add a paragraph outlining caveats and limitations of the modelling.
  
  We believe these changes will significantly improve the clarity and impact of our manuscript, and we thank the reviewers again for their valuable input.
  
  AuthorResponse
2. Public_Reviews 10 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  This important study presents a compelling theoretical framework for understanding phase separation of membrane-bound proteins, with a focus on the organization of tight junction components. By incorporating non-dilute binding effects into thermodynamic models and validating the model's predictions with in vitro experiments on the tight junction protein ZO-1, the authors provide a quantitative tool that will be of interest for biologists interested in membrane-associated condensates. While further clarification of model assumptions and broader mechanistic context would strengthen the work even further, the combination of theory and experiment here is robust and a key advancement in the field.
  
  Summary
3. Public_Reviews 10 Jun 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Biomolecular condensates are an essential part of cellular homeostatic regulation. In this manuscript, the authors develop a theoretical framework for the phase separation of membrane-bound proteins. They show the effect of non-dilute surface binding and phase separation on tight junction protein organization.
  
  Strengths:
  
  It is an important study, considering that the phase separation of membrane-bound molecules is taking the center stage of signaling, spanning from immune signaling to cell-cell adhesion. A theoretical framework will help biologists to quantitatively interpret their findings.
  
  Weaknesses:
  
  Understandably, the authors used one system to test their theory (ZO-1). However, to establish a theoretical framework, this is sufficient.
  
  Review 1
4. Public_Reviews 10 Jun 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors present a clear expansion of biophysical (thermodynamic) theory regarding the binding of proteins to membrane-bound receptors, accounting for higher local concentration effects of the protein. To partially test the expanded theory, the authors perform in vitro experiments on the binding of ZO1 proteins to Claudin2 C-terminal receptors anchored to a supported lipid bilayer, and capture the effects that surface phase separation of ZO1 has on its adsorption to the membrane.
  
  Strengths:
  
  (1) The derived theoretical framework is consistent and largely well-explained.
  
  (2) The experimental and numerical methodologies are transparent.
  
  (3) The comparison between the best parameterized non-dilute theory is in reasonable agreement with experiments.
  
  Weaknesses:
  
  (1) In the theoretical section, what has previously been known, compared to which equations are new, should be made more clear.
  
  (2) Some assumptions in the model are made purely for convenience and without sufficient accompanying physical justification. E.g., the authors should justify, on physical grounds, why binding rate effects are/could be larger than the other fluxes.
  
  (3) I feel that further mechanistic explanation as to why bulk phase separation widens the regime of surface phase separation is warranted.
  
  (4) The major advantage of the non-dilute theory as compared with a best parameterized dilute (or homogenous) theory requires further clarification/evidence with respect to capturing the experimental data.
  
  (5) Discrete (particle-based) molecular modelling could help to delineate the quantitative improvements that the non-dilute theory has over the previous state-of-the-art. Also, this could help test theoretical statements regarding the roles of bulk-phase separation, which were not explored experimentally.
  
  (6) Discussion of the caveats and limitations of the theory and modelling is missing from the text.
  
  Review 2
Visit annotations in context

Tags

Review 1

AuthorResponse

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.12.31.630850v2
www.biorxiv.org www.biorxiv.org

New submission 02/02/2024, 08:58:15

1
1. Public_Reviews 10 Jun 2025
  
  in eLife
  
  Author response:
  
  We thank the reviewers for their thoughtful and constructive feedback. As the reviewers noted, dissecting the contributions of Gtr1/2 and Pib2 to TORC1 signaling across diverse nutrient states is a technically and conceptually challenging problem. Indeed, many of the issues raised—including the interpretation of non-canonical TORC1 readouts (e.g., Rps6, Par32), the influence of strain auxotrophy and media composition, and the limitations of phosphoproteomic analysis performed under a single growth condition—underscore the challenges of working with the TORC1 signaling system.
  
  In response to the reviewers’ comments, we have undertaken a broader and more systematic analysis of TORC1 regulation across defined nitrogen transitions, building directly on the signaling framework established in Figures 6 and 8 of this manuscript. This work, which includes expanded phosphoproteomic profiling and the use of refined genetic tools, supports and extends the key conclusions of Cecil et. al. Specifically, it reinforces the existence of a Pib2-dependent TORC1 output under nitrogen-limited conditions and further clarifies the physiological relevance of the intermediate TORC1 activity state. Due to the scope and depth of this expanded work, we are reporting those findings in a separate publication. Nonetheless, we view the data presented here as a key foundational step in establishing a non-redundant framework for Gtr1/2- and Pib2-dependent control of TORC1.
  
  We have therefore made minor changes to the manuscript to clarify our use of different growth media and to temper our conclusions where appropriate. These changes, together with the context of ongoing work, should reinforce the value of Cecil et. al. in advancing our understanding of TORC1 and nutrient signaling in eukaryotes.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.12.06.570342v1
www.biorxiv.org www.biorxiv.org

Micro-Scale Control of Oligodendrocyte Morphology and Myelination by the Intellectual Disability-Linked Protein Acyltransferase ZDHHC9

3
1. Public_Reviews 10 Jun 2025
 
 in eLife
 
 eLife Assessment
 
 This study provides an in-depth exploration of the impact of X-linked ZDHHC9 gene mutations on cognitive deficits and epilepsy, with a particular focus on the expression and function of ZDHHC9 in myelin-forming oligodendrocytes (OLs). These valuable findings offer insights into ZDHHC9-related X-linked intellectual disability (XLID) and shed light on the regulatory mechanisms of palmitoylation in myelination. The experimental design and analysis of results are solid, providing a reference for further research in this field.
 
 Summary
2. Public_Reviews 10 Jun 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary: Having shown that acyltransferase ZDHHC9 expression is far higher in myelinating oligodendrocytes (OLs) than in other CNS cell types, Jeong and colleagues focus on exploring the role of ZDHHC9 in myelinating OLs in particular in the palmitoylation of several myelin proteins. This study is relevant in the context of X-linked intellectual disability as it suggests a more relevant role for myelinating glia than previously thought. It also provides useful insights the mechanisms of ZDHHC9-associated XLID and on the palmitoylation-dependent control of myelination.
 
 Strengths: Well written paper In general good data quality Use of transgenics strategies (in addition to the ZDHHC9 KO) strengthen the data and claims
 
 Weaknesses: A few claims might have needed better experimental support but new data and revised discussion sections addressed some of these weaknesses
 
 Review 1
3. Public_Reviews 10 Jun 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews
 
 Public Reviews:
 
 Reviewer #1 (Public Review):
 
 In this work Jeong and colleagues focus on exploring the role of the acyltransferase ZDHHC9 in myelinating OLs in particular in the palmitoylation of several myelin proteins. After confirming the specific enrichment of the Zdhhc9 transcript in mouse and human OLs, the authors examine the subcellular localization of the protein in vitro and observed that in comparison with other isoforms, ZDHHC9 localizes at OLs cell bodies and at discrete puncta in the processes. These observations (Figures 1 and 2) led the authors to hypothesize that ZDHHC9 plays an important role in myelination. No gross changes were detected in OL development in Zdhhc9 KO mice and analyses from P28 Zdhhc9 KO mice crossed with Mobp-EGFP reporter mice did not show changes in EGFP+ OL differentiation (Figure 3).
 
 However, and given the observed subcellular localization of ZDHHC9 in OL processes (Figure 2) and the observation that the percentage of unmyelinated axons is increased in Zdhhc9 KO (Figure 6), early time points to examine the differentiated pools of OLs and their capacity to extend processes/contact axons need to be considered.
 
 We appreciate this point, but due to the order in which experiments were performed, the ZDHHC9 KO mouse colony that we maintained after initial submission of this work contains homozygous MOBP-EGFP, but not the mT/mG transgene that would be most optimal for the proposed experiment. We hope the reviewer appreciates that it would take considerable time and effort regarding mouse breeding to cross out the MOBP and add back the mT/mG. We nonetheless appreciate the importance of the point raised and therefore examined an earlier developmental time point (P21, 3 weeks) to quantify OLs and NG2+ OPCs. In our updated Fig 3C1-C3, we use Mobp-EGFP mice to show that Zdhhc9 KO does not significantly affect the number of EGFP+ OLs at this time point in the cortex, corpus callosum and spinal cord. We also show that in corpus callosum, Zdhhc9 KO does not significantly affect the number of NG2+ OPCs at this earlier time point (Fig 3D, E). Furthermore, immunostaining to detect BCAS1, a marker of pre-mature OLs, also revealed no qualitative difference with ZDHHC9 loss at P21. We show representative images from these BCAS1 experiments in an updated Fig S3. While these new experiments do not address the morphology of OLs in Zdhhc9 KO, they do provide further evidence that deficits in myelination in young Zdhhc9 KO mice (Figure 6) are not likely due to gross differences in OPC or OL numbers during development.
 
 Maturation of OL in Zdhhc9 KO was examined by crossing Zdhhc9 KO with Pdgfra-CreER;R26- EGFP and following the newly EGFP-labelled OPCs following tamoxifen administration. No changes in the numbers of EGFP+ OL were detected. The authors concluded that the loss of ZDHHC9 does not alter oligodendrogenesis in either the young or mature CNS. The authors observed defects in Zdhhc9 KO OL protrusions that they attributed to abnormal OL membrane expansion (Fig 4 and 5). Can they show evidence for this?
 
 This is an important point, and we appreciate the opportunity to explain the reasoning behind our initial statement more fully, while noting that other explanations are possible. Fig 5B (an Imaris-assisted reconstruction using the EGFP cell fill/morphology marker) highlights large spheroid-like distensions along OL processes. We reason that these spheroids are enclosed by the OL lipid membrane because if the membrane were ruptured, the EGFP signal would likely diffuse. This in turn suggests that the caliber of the OL process at the position of the spheroid is grossly abnormal i.e. the membrane has hyper-expanded. Given that OL membrane growth during myelination extends in two directions, i.e., spiral growth to the axonal surface and longitudinal growth along the axon, it is possible that spheroid-like structures are formed by uneven myelin growth. We recognize that we cannot yet conclude whether and how spheroid formation might be linked to the myelination deficit that we observe in Zdhhc9 KO mice. However, defining the subcellular mechanism for spheroid formation may provide further insights into this issue. We have therefore largely retained the original statement but have added the reasoning above to our revised Discussion.
 
 The authors report that Zdhhc9 KO primary and secondary branches in OL were longer, some contained spheroid-like swellings and the OL protrusion complexity was higher. However, these data is partially contradictory to what they show in OL differentiation experiments in vitro (Fig 7). There is also no evidence for increased membrane expansion in Zdhhc9 knockdown myelin forming cells in culture. How to reconcile this?
 
 We appreciate the reviewer’s interest in this issue. Several non-mutually exclusive factors could account for the differences in OL morphology in vitro versus in vivo caused by Zdhhc9 loss. First, morphology in vivo may well be influenced by the axons and/or other extrinsic components around each OL that are not present in our primary cultures. Second, OL growth in vivo is highly 3-dimensional, whereas growth in culture is largely 2-dimensional – it may be difficult to support formation of spheroids (by definition, a 3-dimensional structure) in the latter situation. Finally, Zdhhc9 is absent in vivo from the beginning of development until the time points examined, whereas in our cultured OL experiments, Zdhhc9 shRNA is virally delivered to OPC cultures at DIV2 and likely acutely affects Zdhhc9 expression predominantly in committed OLs (following the switch to differentiation medium at DIV3). These differences may also affect the ability of other PATs or, potentially, palmitoylation-independent subcellular processes, to compensate for Zdhhc9 loss. We have more fully explained these points in our revised Discussion.
 
 Reviewer #2 (Public Review):
 
 This study provides an in-depth exploration of the impact of X-linked ZDHHC9 gene mutations on cognitive deficits and epilepsy, with a particular focus on the expression and function of ZDHHC9 in myelin-forming oligodendrocytes (OLs). These findings offer crucial insights into understanding ZDHHC9-related X-linked intellectual disability (XLID) and shed light on the regulatory mechanisms of palmitoylation in myelination. The experimental design and analysis of results are convincing, providing a valuable reference for further research in this field. However, upon careful review, I believe the article still needs further improvement and supplementation in the following aspects:
 
 (1) Regarding the subcellular localization experiment of ZDHHC9 mutants in OL, it is currently limited to in vitro cultured OL, lacking validation in vivo OL or myelin sheath. Additionally, it is necessary to investigate whether the abnormal subcellular localization of ZDHHC9 mutants affects their enzyme activity and palmitoylation modification of substrate proteins.
 
 This is an important point but is technically challenging to address in vivo as it would likely require delivery of AAV to express ZDHHC9wt and XLID mutants specifically in OLs, preferably in the absence of endogenous ZDHHC9. We hope the reviewers would agree that this experiment is beyond the scope of the current study. However, we did compare the ability of ZDHHC9wt and XLID mutants to palmitoylate MBP, and to autopalmitoylate (sometimes used as a surrogate measure of PAT activity) in transfected heterologous cells. Although we recognize that this over-expression system is less physiological than a native OL, it has the benefit of being able to readily compare transfected wt vs mutant forms of ZDHHC9 with minimal contribution from endogenous ZDHHC9. Intriguingly, using this system, we found that autopalmitoylation activity of the XLID ZDHHC9-P150S mutant does not differ significantly from that of ZDHHC9wt, and that this mutant is still capable of palmitoylating MBP. Moreover, the R96W mutant, while impaired in autopalmitoylation, still palmitoylated MBP approximately 50% as effectively as ZDHHC9wt in our cell-based assay. These findings suggest that ZDHHC9-P150S and, probably, ZDHHC9-R96W mutants might still be able to palmitoylate substrates in OLs if they were properly localized. This possibility in turn suggests that impaired subcellular targeting in addition to, or instead of, impaired catalytic activity, may be a key factor in certain cases of ZDHHC9-associated XLID. We have expanded our Figure 8 (new panels 8E-G) to show these additional experiments and have summarized the conclusions above in our revised Discussion. We thank the reviewer for suggesting that we further investigate this issue.
 
 (2) The experimental period (P21+21 days) using genetic labeling to track the development of myelinating cells may not be long enough. It is recommended to extend the observation time and analyze at more time points to more comprehensively reflect the impact of Zdhhc9 KO.
 
 We appreciate this point from the reviewer but, regrettably, we did not maintain the PdgfraCreER; R26-EGFP; Zdhhc9 KO mouse line and hope the reviewer appreciates that it would take considerable time and effort to rederive this line and then perform the suggested extended time course experiments. However, we note for the reviewer that our preliminary studies did not reveal any effect of Zdhhc9 KO on the number of MOBP-EGFP+ OLs in 6-month-old mice (not shown), consistent with a model in which Zdhhc9 loss does not affect OPC-OL commitment per se.
 
 (3) The author speculates that Zdhhc9 may regulate myelination by affecting the membrane localization of specific myelin proteins, but lacks direct experimental evidence to support this. It is suggested to detect the expression and distribution of relevant proteins in the myelin of Zdhhc9 KO mice.
 
 We share the reviewer’s interest in this point but realized that it is more technically challenging to address than might be initially thought. The main protein we would implicate and seek to test is MBP, but we already found that there is no gross change in MBP distribution in vivo in Zdhhc9 KO mice (Fig 3A). However, an anti-MBP antibody recognizes all forms of MBP, not just the specific splice variants whose palmitoylation is affected by ZDHHC9 loss. Specifically assessing nanoscale distribution of these splice variants would require a way (e.g. anti-MBP splice form-specific antibodies that are compatible with immuno-EM) to distinguish these variants from other, non-palmitoylated forms of MBP. Although such an antibody could be an important tool, we hope the reviewers would agree that developing and characterizing such a reagent is beyond the scope of the current study.
 
 We do, however, note that the lack of gross change in MBP distribution and levels in Zdhhc9 KO mice is consistent with the relatively mild phenotype of these mice, compared with shiverer (shi/shi) mice, in which MBP is completely lost. In shiverer, CNS compact myelin is almost absent (PMID: 671037; PMID: 88695; PMID: 460693) and, as the name suggests, mice display a shivering gait, and exhibit seizures and early death. In contrast, Zdhhc9 mice show only subtle behavioral deficits (PMID: 29944857). These differences are all consistent with a model in which Zdhhc9 KO mice, despite their significantly reduced MBP palmitoylation (Fig 8) have grossly normal distribution and levels of MBP when all splice variants are assessed (Fig 3, Fig 8). It is not inconceivable that Zdhhc9 KO mice have a nanoscale change in the distribution of MBP, particularly of specific palmitoylated splice variants, within myelin that profoundly affects myelin ultrastructure, without grossly altering MBP distribution. However, an alternative and not mutually exclusive possibility is that aberrant palmitoylation of other Zdhhc9 substrates accounts for, or contributes to, the abnormalities in myelin at the ultrastructural level. Addressing this issue would require a multi-pronged approach, not just to assess palmitoylation and distribution of such proteins in Zdhhc9 KO, but also to test whether they are direct Zdhhc9 substrates, in order to rule out indirect effects. We hope reviewers would agree that this is best left to a separate study. However, in our revised Discussion we now summarize what can be inferred regarding Zdhhc9-dependent effects on total and splicevariant specific distribution and levels of MBP.
 
 (4) Although the article mentions the association of Zdhhc9 with intellectual disabilities, it does not involve behavioral analysis of Zdhhc9 KO mice. It is recommended to supplement some behavioral experimental data to support the important role of Zdhhc9 in maintaining normal cognitive function, enhancing the clinical relevance of the article.
 
 We appreciate this point from the reviewer. The behavior of the same ZDHHC9 KO mouse line that we used was reported in PMID: 31747610 and in PMID: 29944857. In the former study, Zdhhc9 KO mice were reported to display seizures reminiscent of phenotypes in human patients with ZDHHC9 mutation. The latter study assessed performance of Zddhc9 KO mice in several tasks that test cognitive function. Specifically the KO mice were reported to display “altered behaviour in the open-field test, elevated plus maze and acoustic startle test that is consistent with a reduced anxiety level; a reduced hang time in the hanging wire test that suggests underlying hypotonia but which may also be linked to reduced anxiety [and] deficits in the Morris water maze test of hippocampal-dependent spatial learning and memory.”. We have incorporate these findings in our revised Discussion, where we summarize how these phenotypes are common, not just to human patients with ZDHHC9 mutation, but also to other human neurodevelopmental conditions and mouse models in which ID is a common feature.
 
 (5) For the abnormal myelination observed in Zdhhc9 KO mice, including unmyelinated large-diameter axons and excessively myelinated small-diameter axons, the article lacks indepth research and explanation on the exact mechanism and mode of action of ZDHHC9 in regulating myelination.
 
 We share the reviewer’s interest in this point but again note that gaining definitive insights into this issue is far from trivial. Convincing evidence of a causative mechanism would require an exhaustive identification of ZDHHC9 in vivo substrates, followed by point mutation of substrate palmitoylation site(s) to determine the extent to which palmitoylation of such protein(s) phenocopies ZDHHC9 loss. Nonetheless, it is possible to break this question down and to summarize what we do and do not know. For example, our experiments in cultured OLs show that ZDHHC9 loss causes call-autonomous deficits in morphological maturation of these cells. We also know that ZDHHC9 loss results in impaired palmitoylation of MBP, a direct substrate for ZDHHC9. Moreover, loss of ZDHHC9 at Golgi outposts in OLs (a phenotype observed with several XLID-associated mutant forms of ZDHHC9, even those with no significant loss of catalytic activity) correlates with intellectual disability. Together, these findings are consistent with a model in which ZDHHC9 action at OL Golgi outposts is critical for normal myelination. However, it is yet to be determined whether the key substrates of ZDHHC9 include MBP, other palmitoyl-proteins that are key constituents of CNS myelin, or proteins whose palmitoylation is important for myelin protein trafficking and targeting. Another non-mutually exclusive possibility is that ZDHHC9 acts at Golgi outposts but indirectly, for example to drive the expression of myelin protein genes. Future experiments, including but not limited to palmitoyl-proteomics in ZDHHC9 (OL-specific) KO mice, will be needed to provide more definitive insights into this issue. We have expanded our Discussion of links between ZDHHC9 mutation and impaired myelination to summarize the above points.
 
 (6) The function of ZDHHC9 in OL may be related to the Golgi apparatus, but its exact role in these structures is still unclear. It is suggested to discuss in more detail the role of ZDHHC9 in the Golgi apparatus in the discussion section.
 
 We appreciate this point, which we considered as related to point (5) above. In our revised Discussion we highlight how ZDHHC9 action at Golgi outposts may involve direct palmitoylation of myelin proteins, palmitoylation of proteins that direct myelin proteins to the myelin membrane and/or activation of gene expression programs that serve to drive myelination. We further note that these possibilities are not mutually exclusive.
 
 (7) More experimental support and in-depth research are needed on the detailed mechanism of how ZDHHC9 and Golga7 cooperatively regulate MBP palmitoylation, and how this decrease in palmitoylation level leads to myelination defects.
 
 This is another important point – our new experiments suggest that, although some XLID mutations markedly affect ZDHHC9’s ability to palmitoylate MBP, others do not, yet all of the mutant forms fail to localize to Golgi outposts. These findings are consistent with a model in which the subcellular location at which ZDHHC9 palmitoylates MBP, and potentially other substrates, is critical for normal myelination. Interestingly, despite their marked differences in basal catalytic activity (as assessed by autopalmitoylation), wt and all XLID forms of ZDHHC9 appear to show enhanced activity (measured by both auto- and MBP palmitoylation) in the presence of ZDHHC9, suggesting that the association with Golga7 (which also localizes to Golgi outposts) is central to ZDHHC9 activity. This model is also highly consistent with the biased expression of Golga7 in OLs, compared to other CNS cell types (Fig 1E, 1F). Moreover, XLID-associated mutant forms of ZDHHC9 also show reduced protein stability and are impaired in their ability to form complexes with Golga7 (also known as Golgi Complex Protein 16kDa; GCP16; PMID: 37035671). Failure of ZDHHC9 XLID mutants to localize to Golgi outposts may thus be due to aberrant trafficking of mutant ZDHHC9 per se, but may also involve impaired association/stabilization of ZDHHC9/Golga7 complexes at these locations. Again, it is possible that either or both of these mechanisms, which are not mutually exclusive, contribute to impaired MBP palmitoylation and/or myelination deficits. We summarize these points in our revised Discussion.
 
 In summary, it is recommended that the authors address the above issues through additional experiments and improved discussions to further strengthen the credibility and clinical relevance of the article.
 
 Recommendations for the authors:
 
 Reviewer #1 (Recommendations For The Authors):
 
 No gross changes were detected in OL development in Zdhhc9 KO mice and analyses from P28 Zdhhc9 KO mice crossed with Mobp-EGFP reporter mice did not show changes in EGFP+ OL differentiation (Figure 3). However, and given the observed subcellular localization of ZDHHC9 in OL processes (Figure 2) and the observation that the percentage of unmyelinated axons is increased in Zdhhc9 KO (Figure 6), ***early time points to examine the differentiated pools of OLs and their capacity to extend processes/contact axons need to be considered***.
 
 We appreciate this point, but due to the order in which experiments were performed, the ZDHHC9 KO mouse colony that we maintained after initial submission of this work contains homozygous MOBP-EGFP, but not the mT/mG transgene that would be most optimal for the proposed experiment. We hope the reviewer appreciates that it would take considerable time and effort regarding mouse breeding to cross out the MOBP and add back the mT/mG. We nonetheless appreciate the importance of the point raised and therefore examined an earlier developmental time point (P21, 3 weeks) to quantify OLs and NG2+ OPCs. In our updated Fig 3C1-C3, we use Mobp-EGFP mice to show that Zdhhc9 KO does not significantly affect the number of EGFP+ OLs at this time point in the cortex, corpus callosum and spinal cord. We also show that in corpus callosum, Zdhhc9 KO does not significantly affect the number of NG2+ OPCs at this earlier time point (Fig 3D, E). Furthermore, immunostaining to detect BCAS1, a marker of pre-mature OLs, also revealed no qualitative difference with ZDHHC9 loss at P21. We show representative images from these BCAS1 experiments in an updated Fig S3. While these new experiments do not address the morphology of OLs in Zdhhc9 KO, they do provide further evidence that deficits in myelination in young Zdhhc9 KO mice (Figure 6) are not likely due to gross differences in OPC or OL numbers during development.
 
 The authors observed defects in Zdhhc9 KO OL protrusions that they attributed to abnormal OL membrane expansion (Fig 4 and 5). Can they show evidence for this?
 
 This is an important point, and we appreciate the opportunity to explain the reasoning behind our initial statement more fully, while noting that other explanations are possible. Fig 5B (an Imaris-assisted reconstruction using the EGFP cell fill/morphology marker) highlights large spheroid-like distensions along OL processes. We reason that these spheroids are enclosed by the OL lipid membrane because if the membrane were ruptured, the EGFP signal would likely diffuse. This in turn suggests that the caliber of the OL process at the position of the spheroid is grossly abnormal i.e. the membrane has hyper-expanded. Given that OL membrane growth during myelination extends in two directions, i.e., spiral growth to the axonal surface and longitudinal growth along the axon, it is possible that spheroid-like structures are formed by uneven myelin growth. We recognize that we cannot yet conclude whether and how spheroid formation might be linked to the myelination deficit that we observe in Zdhhc9 KO mice.
 
 However, defining the subcellular mechanism for spheroid formation may provide further insights into this issue. We have therefore largely retained the original statement but have added the reasoning above to our revised Discussion.
 
 The authors report that Zdhhc9 KO primary and secondary branches in OL were longer, some contained spheroid-like swellings and the OL protrusion complexity was higher. However, these data is partially contradictory to what they show in OL differentiation experiments in vitro (Fig 7). There is also no evidence for increased membrane expansion in Zdhhc9 knockdown myelin forming cells in culture. How do they reconcile these different findings?
 
 We appreciate the reviewer’s interest in this issue. Several non-mutually exclusive factors could account for the differences in OL morphology in vitro versus in vivo caused by Zdhhc9 loss. First, morphology in vivo may well be influenced by the axons and/or other extrinsic components around each OL that are not present in our primary cultures. Second, OL growth in vivo is highly 3-dimensional, whereas growth in culture is largely 2-dimensional – it may be difficult to support formation of spheroids (by definition, a 3-dimensional structure) in the latter situation. Finally, Zdhhc9 is absent in vivo from the beginning of development until the time points examined, whereas in our cultured OL experiments, Zdhhc9 shRNA is virally delivered to OPC cultures at DIV2 and likely acutely affects Zdhhc9 expression predominantly in committed OLs (following the switch to differentiation medium at DIV3). These differences may also affect the ability of other PATs or, potentially, palmitoylation-independent subcellular processes, to compensate for Zdhhc9 loss. We have more fully explained these points in our revised Discussion.
 
 Page 7: "The OL processes in this culture condition correspond to large lipid-rich membranous sheets that form spiral membrane expansion on axons in vivo (49)." At which stage are authors referring to? OL processes are extended in culture before membrane formation and this is not clear here. In a 3-days differentiation culture, most OLs have not yet formed a myelin sheath (eg., Figure 2 in Zuchero et al., 2015, Dev Cell).
 
 We appreciate the reviewer highlighting this point. We first note that our oligodendrocyte (OL) culture conditions differ from the immunopanning method used by Zuchero et al., 2015 (original reference (Emery and Dugas, 2013)), which may affect the time course and progression of OL process elaboration and/or myelin sheath formation. We further note that in our cultures most EGFP+ processes are also MBP+ at the time point examined (strictly 3 days plus 9 hours post-differentiation). It thus seems likely that these MBP+ structures largely correspond to the MBP+ wrapping sheaths that occur in vivo, so we have therefore retained our original statement but have added this further explanation.
 
 Minor: Figure 6 (Legend): Time points should be indicated throughout the panels.
 
 We have added this information as requested
 
 Reviewer 2 Recommendations for the Authors:
 
 (1) Regarding the subcellular localization experiment of ZDHHC9 mutants in OL, it is currently limited to in vitro cultured OL, lacking validation in vivo OL or myelin sheath. Additionally, it is necessary to investigate whether the abnormal subcellular localization of ZDHHC9 mutants affects their enzyme activity and palmitoylation modification of substrate proteins.
 
 We thank the reviewer for raising this point. New data in our revised Figure 8 compares autopalmitoylation (sometimes used as a surrogate measure of PAT activity) of ZDHHC9wt and XLID mutants, and their ability to palmitoylate MBP in transfected cells. Intriguingly, we found that autopalmitoylation activity of the ZDHHC9-P150S mutant does not differ significantly from that of ZDHHC9wt, and that this mutant is still capable of palmitoylating MBP. Moreover, the R96W mutant, while impaired in autopalmitoylation, still palmitoylated MBP approximately 50% as effectively as ZDHHC9wt in our cell-based assay. These findings suggest that ZDHHC9-P150S and, probably, ZDHHC9-R96W mutants might still be able to palmitoylate substrates in OLs if they were properly localized. This possibility in turn suggests that impaired subcellular targeting in addition to, or instead of, impaired catalytic activity, may be a key factor in certain cases of ZDHHC9-associated XLID. We have expanded our Figure 8 to show these new experiments and have summarized the conclusions above in our revised Discussion. We thank the reviewer for suggesting that we further investigate this issue.
 
 (2) The experimental period (P21+21 days) using genetic labeling to track the development of myelinating cells may not be long enough. It is recommended to extend the observation time and analyze at more time points to more comprehensively reflect the impact of Zdhhc9 KO.
 
 We appreciate this point from the reviewer but, regrettably, we did not maintain the PdgfraCreER; R26-EGFP; Zdhhc9 KO mouse line and hope the reviewer appreciates that it would take considerable time and effort to rederive this line and then perform the suggested extended time course experiments. However, we note for the reviewer that our preliminary studies did not reveal any effect of Zdhhc9 KO on the number of MOBP-EGFP+ OLs in 6-month-old mice (not shown), consistent with a model in which Zdhhc9 loss does not affect OPC-OL commitment per se.
 
 (3) The author speculates that Zdhhc9 may regulate myelination by affecting the membrane localization of specific myelin proteins, but lacks direct experimental evidence to support this. It is suggested to detect the expression and distribution of relevant proteins in the myelin of Zdhhc9 KO mice.
 
 We share the reviewer’s interest in this point but realized that it is more technically challenging to address than might be initially thought. The main protein we would implicate and seek to test is MBP, but we already found that there is no gross change in MBP distribution in vivo in Zdhhc9 KO mice (Fig 3A). However, an anti-MBP antibody recognizes all forms of MBP, not just the specific splice variants whose palmitoylation is affected by ZDHHC9 loss. Specifically assessing nanoscale distribution of these splice variants would require a way (e.g. am anti-MBP splice form-specific antibody that is compatible with immuno-EM) to distinguish these variants from other, non-palmitoylated forms of MBP. Although such an antibody could be an important tool we hope the reviewers would agree that developing and characterizing such a reagent is beyond the scope of the current study.
 
 We do, however, note that the lack of gross change in MBP distribution and levels in Zdhhc9 KO mice is consistent with the relatively mild phenotype of these mice, compared with shiverer (shi/shi) mice, in which MBP is completely lost. In shiverer, CNS compact myelin is almost absent (PMID: 671037; PMID: 88695; PMID: 460693) and, as the name suggests, mice display a shivering gait, and exhibit seizures and early death. In contrast, Zdhhc9 mice show only subtle behavioral deficits (PMID: 29944857). These differences are all consistent with a model in which Zdhhc9 KO mice, despite their significantly reduced MBP palmitoylation (Fig 8) have grossly normal distribution and levels of MBP when all splice variants are assessed (Fig 3, Fig 8). It is not inconceivable that Zdhhc9 KO mice have a nanoscale change in the distribution of MBP, particularly of specific palmitoylated splice variants, within myelin that profoundly affects myelin ultrastructure, without grossly altering MBP distribution. However, an alternative and not mutually exclusive possibility is that aberrant palmitoylation of other
 
 Zdhhc9 substrates accounts for, or contributes to, the abnormalities in myelin at the ultrastructural level. Addressing this issue would require a multi-pronged approach, not just to assess palmitoylation and distribution of such proteins in Zdhhc9 KO, but also to test whether they are direct Zdhhc9 substrates, in order to rule out indirect effects. We hope reviewers would agree that this is best left to a separate study. However, in our revised Discussion we now summarize what can be inferred regarding Zdhhc9-dependent effects on total and splicevariant specific distribution and levels of MBP.
 
 (4) Although the article mentions the association of Zdhhc9 with intellectual disabilities, it does not involve behavioral analysis of Zdhhc9 KO mice. It is recommended to supplement some behavioral experimental data to support the important role of Zdhhc9 in maintaining normal cognitive function, enhancing the clinical relevance of the article.
 
 We appreciate this point from the reviewer. The behavior of the same ZDHHC9 KO mouse line that we used was reported in PMID: 31747610 and in PMID: 29944857. In the former study, Zdhhc9 KO mice were reported to display seizures reminiscent of phenotypes in human patients with ZDHHC9 mutation. The latter study assessed performance of Zddhc9 KO mice in several tasks that test cognitive function. Specifically the KO mice were reported to display “altered behaviour in the open-field test, elevated plus maze and acoustic startle test that is consistent with a reduced anxiety level; a reduced hang time in the hanging wire test that suggests underlying hypotonia but which may also be linked to reduced anxiety [and] deficits in the Morris water maze test of hippocampal-dependent spatial learning and memory.”. We have incorporate these findings in our revised Discussion, where we summarize how these phenotypes are common, not just to human patients with ZDHHC9 mutation, but also to other human neurodevelopmental conditions and mouse models in which ID is a common feature.
 
 (5) For the abnormal myelination observed in Zdhhc9 KO mice, including unmyelinated large-diameter axons and excessively myelinated small-diameter axons, the article lacks indepth research and explanation on the exact mechanism and mode of action of ZDHHC9 in regulating myelination.
 
 We share the reviewer’s interest in this point but again note that gaining definitive insights into this issue is far from trivial. Convincing evidence of a causative mechanism would require an exhaustive identification of ZDHHC9 in vivo substrates, followed by point mutation of substrate palmitoylation site(s) to determine the extent to which palmitoylation of such protein(s) phenocopies ZDHHC9 loss. Nonetheless, it is possible to break this question down and to summarize what we do and do not know. For example, our experiments in cultured OLs show that ZDHHC9 loss causes call-autonomous deficits in morphological maturation of these cells. We also know that ZDHHC9 loss results in impaired palmitoylation of MBP, a direct substrate for ZDHHC9. Moreover, loss of ZDHHC9 at Golgi outposts in OLs (a phenotype observed with several XLID-associated mutant forms of ZDHHC9, even those with no significant loss of catalytic activity) correlates with intellectual disability. Together, these findings are consistent with a model in which ZDHHC9 action at OL Golgi outposts is critical for normal myelination. However, it is yet to be determined whether the key substrates of ZDHHC9 include MBP, other palmitoyl-proteins that are key constituents of CNS myelin, or proteins whose palmitoylation is important for myelin protein trafficking and targeting. Another non-mutually exclusive possibility is that ZDHHC9 acts at Golgi outposts but indirectly, for example to drive the expression of myelin protein genes. Future experiments, including but not limited to palmitoyl-proteomics in ZDHHC9 (OL-specific) KO mice, will be needed to provide more definitive insights into this issue. We have expanded our Discussion of links between ZDHHC9 mutation and impaired myelination to summarize the above points.
 
 (6) The function of ZDHHC9 in OL may be related to the Golgi apparatus, but its exact role in these structures is still unclear. It is suggested to discuss in more detail the role of ZDHHC9 in the Golgi apparatus in the discussion section.
 
 We appreciate this point, which we considered as related to point (5) above. In our revised Discussion we highlight how ZDHHC9 action at Golgi outposts may involve direct palmitoylation of myelin proteins, palmitoylation of proteins that direct myelin proteins to the myelin membrane and/or activation of gene expression programs that serve to drive myelination. We further note that these possibilities are not mutually exclusive.
 
 (7) More experimental support and in-depth research are needed on the detailed mechanism of how ZDHHC9 and Golga7 cooperatively regulate MBP palmitoylation, and how this decrease in palmitoylation level leads to myelination defects.
 
 This is another important point – our new experiments suggest that, although some XLID mutations markedly affect ZDHHC9’s ability to palmitoylate MBP, others do not, yet all of the mutant forms fail to localize to Golgi outposts. These findings are consistent with a model in which the subcellular location at which ZDHHC9 palmitoylates MBP, and potentially other substrates, is critical for normal myelination. Interestingly, despite their marked differences in basal catalytic activity (as assessed by autopalmitoylation), wt and all XLID forms of ZDHHC9 appear to show enhanced activity (measured by both auto- and MBP palmitoylation) in the presence of ZDHHC9, suggesting that the association with Golga7 (which also localizes to Golgi outposts) is central to ZDHHC9 activity. This model is also highly consistent with the biased expression of Golga7 in OLs, compared to other CNS cell types (Fig 1E, 1F). Moreover, XLID-associated mutant forms of ZDHHC9 also show reduced protein stability and are impaired in their ability to form complexes with Golga7 (also known as Golgi Complex Protein 16kDa; GCP16; PMID: 37035671). Failure of ZDHHC9 XLID mutants to localize to Golgi outposts may thus be due to aberrant trafficking of mutant ZDHHC9 per se, but may also involve impaired association/stabilization of ZDHHC9/Golga7 complexes at these locations. Again, it is possible that either or both of these mechanisms, which are not mutually exclusive, contribute to impaired MBP palmitoylation and/or myelination deficits. We summarize these points in our revised Discussion.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.09.19.558291v3
www.biorxiv.org www.biorxiv.org

PA28γ promotes the malignant progression of tumor by elevating mitochondrial function via C1QBP

3
1. Public_Reviews 10 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  This manuscript determines how PA28g, a proteasome regulator that is overexpressed in tumors, and C1QBP, a mitochondrial protein for maintaining oxidative phosphorylation that plays a role in tumor progression, interact in tumor cells to promote their growth, migration and invasion. Additional experiments and analyses that supported the theoretical models for the interaction have been performed in response to the reviews. The overall findings and conceptual framework are important and the evidence is solid. A logical extrapolation of this work is to test the C1QBP mutants using functional assays to determine whether the mutations can decrease the protein stability mediated by the interaction with PA28g.
  
  Summary
2. Public_Reviews 10 Jun 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors tried to determine how PA28g functions in oral squamous cell carcinoma (OSCC) cells. They hypothesized it may act through metabolic reprogramming in the mitochondria.
  
  Strengths:
  
  They found that the genes of PA28g and C1QBP are in an overlapping interaction network after an analysis of a genome database. They also found that the two proteins interact in coimmunoprecipitation and pull-down assays using the lysate from OSCC cells with or without expression of the exogenous genes. They used truncated C1QBP proteins to map the interaction site to the N-terminal 167 residues of C1QBP protein. They observed the levels of the two proteins are positively correlated in the cells. They provided evidence for the colocalization of the two proteins in the mitochondria and the effect on mitochondrial form and function in vitro and in vivo OSCC models, and the correlation of the protein expression with the prognosis of cancer patients.
  
  Comments on revision:
  
  The third revision added data from two point mutations of C1QBP that would disrupt a hydrogen bond network with PA28g protein. As one would expect from the structural models obtained with AlphaFold, the interaction between the two proteins as detected by co-immunoprecipitation of cell lysate was reduced by both mutations. Therefore, the theoretical models for the interaction were supported by the experimental data. Moving forward, the home run experiments would be to test the C1QBP mutants in functional assays to determine whether the mutations can decrease the protein stability afforded by the interaction with PA28g, which in turn decrease the effect of PA28g on mitochondria and tumor cells via C1QBP. Success of these experiments will conclude this manuscript that presents a novel finding for tumor cell biology which could be a launch pad for therapeutic intervention of tumor development.
  
  Review 1
3. Public_Reviews 10 Jun 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews
  
  Public Reviews:
  
  Reviewer #2 (Public review):
  
  This manuscript determines how PA28g, a proteasome regulator that is overexpressed in tumors, and C1QBP, a mitochondrial protein for maintaining oxidative phosphorylation that plays a role in tumor progression, interact in tumor cells to promote their growth, migration and invasion. Evidence for the interaction and its impact on mitochondrial form and function was provided although it is not particularly strong.
  
  The revised manuscript corrected mislabeled data in figures and provides more details in figure legends. Misleading sentences and typos were corrected. However, key experiments that were suggested in previous reviews were not done, such as making point mutations to disrupt the protein interactions and assess the consequence on protein stability and function. Results from these experiments are critical to determine whether the major conclusions are fully supported by the data.
  
  The second revision of the manuscript included the proximity ligation data to support the PA28g-C1QBP interaction in cells. However, the method and data were not described in sufficient detail for readers to understand. The revision also includes the structural models of the PA28g-C1QBP complex predicted by AlphaFold. However, the method and data were not described with details for readers to understand how this structural modeling was done, what is the quality of the resulting models, and the physical nature of the protein-protein interaction such as what kind of the non-covalent interactions exist in the interface of the protein complexes. Furthermore, while the interactions mediated by the protein fragments were tested by pull-down experiments, the interactions mediated by the three residues were not tested by mutagenesis and pull-down experiments. In summary, the revision was improved, but further improvement is needed.
  
  Thank you very much for your comments.
  
  (1) Based on your suggestion, we predicted the possible interaction sites using AlphaFold 3 and found that mutations in amino acids 76 and 78 of C1QBP affect the interaction with PA28γ (Revised Appendix Figure 1J). Subsequently, pulldown experiment also found that after mutating the amino acids at the two aforementioned sites (T76A, G78N), C1QBP that could bind to PA28γ decreased (Revised Figure 1J). The above results confirm that PA28γ could interacts with C1QBP, in a manner dependent on the N-terminus of C1QBP. These findings are now included in the revised manuscript “In addition, we employed AlphaFold 3 to perform energy minimization and predict hydrogen bonds between the C1QBP N-terminus (amino acids 1-167) and the PA28γ protein interaction region. The results suggest that the T76 and G78 residues of C1QBP may be key contributors to the interaction. Consistently, coimmunoprecipitation analysis demonstrated that mutations at these sites (C1QBPT76A and C1QBPG78N) significantly reduced the binding ability to PA28γ (Fig. 1J and Appendix Fig. 1J)”, specifically in results section. We believe this additional validation strengthens the robustness of our findings.
  
  (2) According to your suggestion, we have added a description of the results of PLA in the figure legend (Revised Figure 1C) and the method of PLA in the appendix file (Revised Appendix file, Part “Proximity Ligation Assay”). The revised text reads as follows: (C) PLA image of UM1 cells shows the interaction between C1QBP and PA28γ in both cytoplasm and nucleus (red fluorescence).
  
  (3) In the light of your suggestion, we have enriched the description of AlphaFold 3 analysis in the appendix file (Revised Appendix file, Page 10-11). The revised text reads as follows:
  
  “Prediction and Analysis of Protein Interactions
  
  Protein Sequence Retrieval and Structure Prediction
  
  The protein sequences of C1QBP and PA28γ were obtained from the AlphaFold Protein Structure Database. Structural predictions of the protein-protein interaction between C1QBP and PA28γ were conducted using AlphaFold 3. The plDDT (predicted local distance difference test) values were utilized to assess the confidence of the predicted models. Models with a plDDT score above 70 were considered confident, while those with a score above 90 were categorized as very high confidence. These values were annotated in the figures to indicate the reliability of the structural predictions.”
  
  “Protein Preparation and Structure Optimization
  
  The best-scored model for the C1QBP-PA28γ interaction predicted by AlphaFold 3 was selected for further analysis. The model was imported into MOE 2022 (Molecular Operating Environment) software for protein preparation. This process included the removal of water molecules and other heteroatoms, followed by the addition of hydrogen atoms to the structure. This step was essential for optimizing the protein’s 3D conformation and ensuring the correctness of the protonation states at physiological pH.”
  
  “Energy Minimization and Hydrogen Bond Prediction
  
  The protein structure was subjected to energy minimization using the Amber10: EHT (Effective Hamiltonian Theory) force field, with R-field 1: 80 settings to refine the model’s geometry. The minimization process was performed to optimize the protein’s internal energy and ensure stable conformation, followed by calculation of hydrogen bond interactions. The interaction energies and hydrogen bonds were analyzed to identify potential binding sites and stabilize the predicted protein-protein complex.”
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.07.23.604769v7
www.biorxiv.org www.biorxiv.org

Mother-child dyadic interactions shape children's social brain and theory of mind

4
1. Public_Reviews 10 Jun 2025
 
 in eLife
 
 eLife Assessment
 
 This important study reports solid evidence for the significant role of mother-child neural synchronization and relationship quality in the development of Theory of Mind (ToM) and social cognition. The findings effectively bridge brain development with children's behavior and parenting practices, and will be of interest to researchers studying brain development and social cognition, as well as the general public.
 
 Summary
2. Public_Reviews 10 Jun 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 The authors have undertaken a significant revision of the manuscript and addressed the vast majority of our original comments. The manuscript is significantly improved as a result and will make a nice contribution to the literature. The new framing is especially impactful.
 
 We have a few remaining comments to improving the manuscript:
 
 Q1: The authors clarified the multiple comparison correction appropriately, and included a comprehensive of the study limitations related to causality and SEM. We think there could be a few further improvements to the manuscript to fully address our initial comment.
 
 Under the results section where the authors describe the use of structural equation modeling, we think that it would be helpful to readers to further emphasize that the current design doesn't allow for delineation of temporal sequences in development and do cannot reflect true mediation. These are important caveats that the readers describe beautifully in their response.
 
 In addition to think about the mediating variables, can the authors conduct a sensitivity analysis that re-orders the IV, mediator, and DV? That way, a formal comparison can be made between model fits. It would provide an empirical basis for how to temper the discussion of these findings.
 
 Q7: We think that this analysis (lack of significant correlations between ISS, child age, and neural maturity) and corresponding discussion by the authors would be very interesting for readers. It does not appear as though they've added this information to the text (even in a supplementary file would suffice), but I think their conclusions about the data are strengthened related to context specific neural dynamics.
 
 Review 1
3. Public_Reviews 10 Jun 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary: This study investigates the impact of mother-child neural synchronization and the quality of parent-child relationships on the development of Theory of Mind (ToM) and social cognition. Utilizing a naturalistic fMRI movie-viewing paradigm, the authors analyzed inter-subject neural synchronization in mother-child dyads and explored the connections between neural maturity, parental caregiving, and social cognitive outcomes. The findings indicate age-related maturation in ToM and social pain networks, emphasizing the importance of dyadic interactions in shaping ToM performance and social skills, thereby enhancing our understanding of the environmental and intrinsic influences on social cognition.
 
 Strengths: This research addresses a significant question in developmental neuroscience, by linking social brain development with children's behaviors and parenting. It also uses a robust methodology by incorporating neural synchrony measures, naturalistic stimuli, and a substantial sample of mother-child dyads to enhance its ecological validity. Furthermore, the SEM approach provides a nuanced understanding of the developmental pathways associated with Theory of Mind (ToM). The manuscript also addressed many concerns raised in the initial review. The adoption of the neuroconstructivist framework effectively frames neural and cognitive development as reciprocal, addressing prior concerns about causality. The justification for methodological choices, such as omitting resting-state baselines due to scanning challenges in children and using unit-weighted scoring for ToM tasks, further strengthens the study's credibility.
 
 Weaknesses: (1) The revised introduction has improved, particularly in framing the first goal-developmental changes in ToM and SPM networks-as a "developmental anchor" for goals 2 and 3. However, given prior research on age-related changes in these networks (e.g., Richardson et al., 2018), the authors should clarify whether this goal seeks to replicate prior findings or to extend them under new contexts. Specifying how this part differs from existing work and articulating specific hypotheses would enhance the focus. (2) I still have some reservations about retaining the slightly causal term "shape" in the title. While the manuscript now carefully avoids causal claims, the title may still be interpreted as implying directionality, especially by non-specialist audiences. (3) One more question about Figure 2A and 2B: adults and children showed highly similar response curves for video frames, yet some peaks (e.g., T02, T05, T06) are identified as ToM or SPM events only in adults. Whether statistical methods account for the differences? Or whether the corresponding video frames contain subtle social cues that only adults can process?
 
 Review 2
4. Public_Reviews 10 Jun 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary: The article explores the role of mother-child interactions in the development of children's social cognition, focusing on Theory of Mind (ToM) and Social Pain Matrix (SPM) networks. Using a naturalistic fMRI paradigm involving movie viewing, the study examines relationships among children's neural development, mother-child neural synchronization, and interaction quality. The authors identified a developmental pattern in these networks, showing that they become more functionally distinct with age. Additionally, they found stronger neural synchronization between child-mother pairs compared to child-stranger pairs, with this synchronization and neural maturation of the networks associated with the mother-child relationship and parenting quality.
 
 Strengths: This is a well-written paper, and using dyadic fMRI and naturalistic stimuli enhances its ecological validity, providing valuable insights into the dynamic interplay between brain development and social interactions.
 
 Weaknesses: The current sample size (N = 34 dyads) is a limitation, particularly given the use of SEM, which generally requires larger samples for stable results. Although the model fit appears adequate, this does not guarantee reliability with the current sample size.
 
 Review 3
Visit annotations in context

Tags

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.09.23.614623v2
www.biorxiv.org www.biorxiv.org

Deletion of Neuroligins from Astrocytes Does Not Detectably Alter Synapse Numbers or Astrocyte Cytoarchitecture by Maturity

4
1. Public_Reviews 10 Jun 2025
 
 in eLife
 
 eLife Assessment
 
 This fundamental study examines whether synaptic cell adhesion molecules neuroligin 1-3 resident on astrocytes, rather than neurons, exert effects on synaptic structure and function. With compelling evidence, including rigorous validation of neuroligin deletion efficiency in astrocytes and independent confirmation using human neuron-mouse glia co-cultures, the authors report that deletion of neuroligins 1-3 specifically in astrocytes does not alter synapse formation or astrocyte morphology in the hippocampus or visual cortex. This study provides definitive evidence highlighting the specific role of neuronal neuroligins rather than their astrocytic counterparts in synaptogenesis.
 
 Summary
2. Public_Reviews 10 Jun 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Astrocytes are known to express neuroligins 1-3. Within neurons, these cell adhesion molecules perform important roles in synapse formation and function. Within astrocytes, a significant role for neuroligin 2 in determining excitatory synapse formation and astrocyte morphology was shown in 2017. However, there has been no assessment of what happens to synapses or astrocyte morphology when all three major forms of neuroligins within astrocytes (isoforms 1-3) are deleted using a well characterized, astrocyte specific, and inducible cre line. By using such selective mouse genetic methods, the authors here show that astrocytic neuroligin 1-3 expression in astrocytes is not consequential for synapse function or for astrocyte morphology. They reach these conclusions with careful experiments employing quantitative western blot analyses, imaging and electrophysiology. They also characterize the specificity of the cre line they used. Overall, this is a very clear and strong paper that is supported by rigorous experiments. The discussion considers the findings carefully in relation to past work. This paper is of high importance, because it now raises the fundamental question of exactly what neuroligins 1-3 are actually doing in astrocytes. In addition, it enriches our understanding of the mechanisms by which astrocytes participate in synapse formation and function. The paper is very clear, well written and well illustrated with raw and average data.
 
 Comments on revisions:
 
 My previous comments have been addressed. I have no additional points to make and congratulate the authors.
 
 Review 1
3. Public_Reviews 10 Jun 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 In the present manuscript, Golf et al. investigate the consequences of astrocyte-specific deletion of Neuroligin (Nlgn) family cell adhesion proteins on synapse structure and function in the brain. Decades of prior research had shown that Neuroligins mediate their effects at synapses through their role in the postsynaptic compartment of neurons and their transsynaptic interaction with presynaptic Neurexins. More recently, it was proposed for the first time that Neuroligins expressed by astrocytes can also bind to presynaptic Neurexins to regulate synaptogenesis (Stogsdill et al. 2017, Nature). However, several aspects of the model proposed by Stogsdill et al. on astrocytic Neuroligin function conflict with prior evidence on the role of Neuroligins at synapse, prompting Golf et al. to further investigate astrocytic Neuroligin function in the current study. Using postnatal conditional deletion of Nlgn1-3 specifically from astrocytes in mice, Golf et al. show that virtually no changes in the expression of synaptic proteins or in the properties of synaptic transmission at either excitatory or inhibitory synapses are observed. Moreover, no alterations in the morphology of astrocytes themselves were found. To further extend this finding, the authors additionally analyzed human neurons co-cultured with mouse glia lacking expression of Nlgn1-4. No difference in excitatory synaptic transmission was observed between neurons cultured in the presence of wildtype vs. Nlgn1-4 conditional knockout glia. The authors conclude that while Neuroligins are indeed expressed in astrocytes and are hence likely to play some role there, this role does not include any direct consequences on synaptic structure and function, in direct contrast to the model proposed by Stogsdill et al.
 
 Overall, this is a strong study that addresses a fundamental and highly relevant question in the field of synaptic neuroscience. Neuroligins are not only key regulators of synaptic function, they have also been linked to numerous psychiatric and neurodevelopmental disorders, highlighting the need to precisely define their mechanisms of action. The authors take a wide range of approaches to convincingly demonstrate that under their experimental conditions, Nlgn1-3 are efficiently deleted from astrocytes in vivo, and that this deletion does not lead to major alterations in the levels of synaptic proteins or in synaptic transmission at excitatory or inhibitory synapses, or in the morphology of astrocytes. The authors have conducted an elegant and compelling analysis demonstrating efficient deletion of astrocytic Nlgn1-3, with deletion rates of 83-96% for Nlgn2 and Nlgn3, and 65-72% for Nlgn1. While the co-culture experiments provide additional support, they are not essential as the in vivo data on astrocytic Nlgn1-3 deletion are compelling on their own. Together, the data from this study provide compelling and important evidence that, whatever the role of astrocytic Neuroligins may be, they do not contribute substantially to synapse formation or function under the conditions investigated.
 
 Comments on revisions:
 
 All of my concerns have been satisfactorily addressed. The authors have fully addressed my concerns, and have in particular conducted a very elegant and compelling analysis of the degree of deletion of astrocytic Nlgn1-3/4 in their models. This greatly strengthens the main claims of their study and the fundamental nature of their conclusions for the field of synapse biology. Regarding the co-culture experiments, while I was initially concerned about the lack of controls demonstrating that glia affect synapse formation in human neurons, the authors have appropriately addressed this by clarifying the missing references and explaining that their culture system has been extensively validated in previous studies. Since the data on astrocytic Nlgn1-3 deletion in vivo are compelling on their own, the co-culture experiment provides useful additional support for the main conclusions. The authors have also added the mouse strain background information to the methods section as requested, which is important for interpreting potential differences with other studies.
 
 Review 2
4. Public_Reviews 10 Jun 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the previous reviews
 
 Public Reviews:
 
 Reviewer #1 (Public Review):
 
 Astrocytes are known to express neuroligins 1-3. Within neurons, these cell adhesion molecules perform important roles in synapse formation and function. Within astrocytes, a significant role for neuroligin 2 in determining excitatory synapse formation and astrocyte morphology was shown in 2017. However, there has been no assessment of what happens to synapses or astrocyte morphology when all three major forms of neuroligins within astrocytes (isoforms 1-3) are deleted using a well characterized, astrocyte specific, and inducible cre line. By using such selective mouse genetic methods, the authors here show that astrocytic neuroligin 1-3 expression in astrocytes is not consequential for synapse function or for astrocyte morphology. They reach these conclusions with careful experiments employing quantitative western blot analyses, imaging and electrophysiology. They also characterize the specificity of the cre line they used. Overall, this is a very clear and strong paper that is supported by rigorous experiments. The discussion considers the findings carefully in relation to past work. This paper is of high importance, because it now raises the fundamental question of exactly what neuroligins 1-3 are actually doing in astrocytes. In addition, it enriches our understanding of the mechanisms by which astrocytes participate in synapse formation and function. The paper is very clear, well written and well illustrated with raw and average data.
 
 Comments on revisions:
 
 My previous comments have been addressed. I have no additional points to make and congratulate the authors.
 
 Thank you for your acceptance.
 
 Reviewer #2 (Public Review):
 
 In the present manuscript, Golf et al. investigate the consequences of astrocyte-specific deletion of Neuroligin (Nlgn) family cell adhesion proteins on synapse structure and function in the brain. Decades of prior research had shown that Neuroligins mediate their effects at synapses through their role in the postsynaptic compartment of neurons and their transsynaptic interaction with presynaptic Neurexins. More recently, it was proposed for the first time that Neuroligins expressed by astrocytes can also bind to presynaptic Neurexins to regulate synaptogenesis (Stogsdill et al. 2017, Nature). However, several aspects of the model proposed by Stogsdill et al. on astrocytic Neuroligin function conflict with prior evidence on the role of Neuroligins at synapses, prompting Golf et al. to further investigate astrocytic Neuroligin function in the current study. Using postnatal conditional deletion of Nlgn1-3 specifically from astrocytes in mice, Golf et al. show that virtually no changes in the expression of synaptic proteins or in the properties of synaptic transmission at either excitatory or inhibitory synapses are observed. Moreover, no alterations in the morphology of astrocytes themselves were found. To further extend this finding, the authors additionally analyzed human neurons co-cultured with mouse glia lacking expression of Nlgn1-4. No difference in excitatory synaptic transmission was observed between neurons cultured in the present of wildtype vs. Nlgn1-4 conditional knockout glia. The authors conclude that while Neuroligins are indeed expressed in astrocytes and are hence likely to play some role there, this role does not include any direct consequences on synaptic structure and function, in direct contrast to the model proposed by Stogsdill et al.
 
 Overall, this is a strong study that addresses a fundamental and highly relevant question in the field of synaptic neuroscience. Neuroligins are not only key regulators of synaptic function, they have also been linked to numerous psychiatric and neurodevelopmental disorders, highlighting the need to precisely define their mechanisms of action. The authors take a wide range of approaches to convincingly demonstrate that under their experimental conditions, Nlgn1-3 are efficiently deleted from astrocytes in vivo, and that this deletion does not lead to major alterations in the levels of synaptic proteins or in synaptic transmission at excitatory or inhibitory synapses, or in the morphology of astrocytes. While the co-culture experiments are somewhat more difficult to interpret due to lack of a control for the effect of wildtype mouse astrocytes on human neurons, they are also consistent with the notion that deletion of Nlgn1-4 from astrocytes has no consequences for the function of excitatory synapses. Together, the data from this study provide compelling and important evidence that, whatever the role of astrocytic Neuroligins may be, they do not contribute substantially to synapse formation or function under the conditions investigated.
 
 Recommendations for the authors:
 
 Reviewer #2 (Recommendations for the authors):
 
 The authors have fully addressed my concerns, and have in particular conducted a very elegant and compelling analysis of the degree of deletion of astrocytic Nlgn1-3/4 in their models. This greatly strengthens the main claims of their study and the fundamental nature of their conclusions for the field of synapse biology.
 
 I am somewhat less convinced by the newly added experiment to investigate deletion of Nlgns1-4 from glia in glia-neuron co-cultures. The authors provide no evidence to show that either WT or cKO glia have any effect on synapse formation or function in human neurons, and therefore, the current lack of a difference could equally result from the fact that both WT and cKO glia were non-functional altogether. The authors cite two studies to state that human neurons do not form synapses in the absence of astrocytes, Zhang et al. 2013 and Huang et al. 2017, but neither seem to be listed in the references (unless Zhang et al. 2014 was meant), making it difficult to assess the relevance of these data. However, since the data on astrocytic Nlgn1-3 deletion in vivo are compelling on their own, I do not see the co-culture experiment as essential for the main conclusions of the study.
 
 Minor comment:
 
 Please add the information on the strain background of the mice to the methods section of the manuscript. Strain background can have a significant impact on many aspects of neuronal function, and this information is therefore essential for the interpretation of potential differences to other studies.
 
 We deeply apologize for forgetting to include the two important references mentioned by the reviewer in the reference list. We understand that the reviewer as a result could not assess the validity of our statement that co-culture of glia is required for efficient synapse formation by human neurons that are induced from ES or iPS cells. Note that this conclusion does not postulate that all synapse formation requires glia, since the cited papers demonstrate that human neurons induced by our protocol still form scarce synapses without glia. This observation has been confirmed in many different experiments that were performed after the data presented in the cited papers. As a result of this extensive prior documentation that human neurons produced by forced expression of Ngn2 require coculture of glia for efficient synapse formation, we do not feel that we need to repeat this basic characterization of our culture system again to validate multiple previous papers and hope the reviewer will concur. We have additionally added the relevant mouse strain information to the methods section.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.04.10.536254v3
www.biorxiv.org www.biorxiv.org

TRPV4 overactivation enhances cellular contractility and drives ocular hypertension in TGFβ2 overexpressing eyes

3
1. Public_Reviews 10 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  This fundamental work extends our understanding of the role of TGFβ2 as a modulator of mechanosensing in the eye and identifies the TRPV4 ion channel as a common regulator of Trabecular Meshwork (TM) contractility and pathological OHT and the data and evidence provided are convincing. This work will clearly be of interest to researchers investigating the role of mechanosensors in the TM and may underpin future research into treatments that aim to lower intra ocular pressure. This work will additionally be of interest to the growing field of researchers investigating the regulation of force sensing via ion channels and their roles in health and disease, in particular the ion channel TRPV4.
  
  Summary
2. Public_Reviews 10 Jun 2025
  
  in eLife
  
  Reviewer #1 (public review):
  
  Summary:
  
  This comprehensive study employed molecular, optical, electrophysiological and tonometric strategies to establish the role of TGFβ2 in transcription and functional expression of mechanosensitive channel isoforms alongside studies of TM contractility in biomimetic hydrogels, and intraocular pressure regulation in a mouse model of TGFβ2 -induced ocular hypertension. TGFβ2 upregulated expression of TRPV4 and PIEZO1 transcripts and time-dependently augmented functional TRPV4 activation. TRPV4 activation induced TM contractility whereas pharmacological inhibition suppressed TGFβ2-induced hypercontractility and abrogated ocular hypertension in eyes overexpressing TGFβ2. Trpv4-/- mice resisted TGFβ2-driven increases in IOP. These data establish a fundamental role of TGFβ as a modulator of mechanosensing and identifies TRPV4 channel as a common mechanism for TM contractility and pathological ocular hypertension.
  
  The manuscript is very well written and details the important function of TRPV4 in TM cell function. These data provide novel therapeutic targets and potential for disease-altering therapeutics.
  
  Review 1
3. Public_Reviews 10 Jun 2025
  
  in eLife
  
  Reviewer #2 (public review):
  
  The manuscript by Christopher N. Rudzitis et al. describes the role of TGFβ2 in the transcription and functional expression of mechanosensitive channel isoforms, alongside studies on TM contractility in biomimetic hydrogels and intraocular pressure. Overall, it is a very interesting study, nicely designed, and will contribute to the available literature on TRPV4 sensitivity to mechanical forces.
  
  Review 2
Visit annotations in context

Tags

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.11.05.622187v2
www.biorxiv.org www.biorxiv.org

Systematic analysis of transcription factor combinatorial binding uncovers TEAD1 as an antagonist of tissue-specific transcription factors in human organogenesis

5
1. Public_Reviews 10 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  This important study presents a pipeline for discovering cooperative transcription factor (TF) interactions that regulate development, and applies this pipeline in a systematic investigation of TF co-regulation in 11 human embryonic tissues. The authors provide overall solid bioinformatics and experimental support for their findings: although they make a convincing argument for the role of TEAD factors as co-repressors of regulatory activity with tissue-specific TFs, other aspects of the study would benefit from additional validation. This work would be of interest to cell biologists focused on development or on discovery of TF relationships.
  
  Summary
2. Public_Reviews 10 Jun 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  In this manuscript, the authors present a pipeline for the identification of transcription factor (TF) co-occurrence in regulatory regions. This pipeline aims to generate a catalogue of combinations of TFs working together, and the authors apply this during human embryonic development. In particular, they identified co-occurrences of TFs starting from H3K27ac ChIP-seq and RNA-seq input data to select active enhancers and transcribed TFs. The pipeline is applied to explore TF motifs co-occurrence at tissue-specific developmental enhancers across 11 human embryonic tissues. The application of the pipeline suggests the presence of regulatory patterns in different human developmental tissue-specific enhancers in association with ubiquitous TFs. The authors further explore the role of TEAD1 (an ubiquitously expressed TF) as a repressor. They test the role of TEAD1 as a co-repressor using a luciferase assay and tissue-specific enhancers, either alone or combined with a YAP coactivator. Overall, this paper presents an important aspect in mammalian gene regulation, the cooperative binding of TFs, and provides an important resource for TF pairs.
  
  Strengths:
  
  I appreciated the number of datasets analysed and the validation of a subset of enhancers.
  
  Weaknesses:
  
  Not many, but probably validation at more enhancers could have made the paper stronger.
  
  Review 1
3. Public_Reviews 10 Jun 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Garcia-Mora et al. presented a two-step bioinformatics pipeline using H3K27ac ChIP-seq and RNA-seq data from 11 human embryonic tissues published by the same groups of senior authors. "First Search" identifies motifs for TFs that are both tissue-restricted in expression and enriched in tissue-specific enhancers. "Second Search" then looks for additional motifs that co-occur near each "First Search" motif. The authors here went further than previous motif co-occurrence/co-enrichment analyses by identifying TEAD motifs as (1) representing a ubiquitously expressed family and (2) showing high co-occurrence with tissue-specific motifs at tissue-specific enhancers. They then elaborate on this finding and speculate that "TEAD, in concert with cardiac-restricted transcriptional regulators, may contribute to the recruitment of CHD4 and may play a role in attenuating the activity of enhancers involved in cardiomyocyte differentiation." They also discussed validation experiments using the luciferase assay.
  
  Strengths:
  
  The manuscript is well-written and easy to follow for the most part.
  
  Weaknesses:
  
  My main concerns and criticisms are about the sensitivity of the method and the validation of experiment designs and conclusions. Some examples where validation could be improved are as follows:
  
  (1) The authors propose a mechanism of a TF trio (TEAD - CHD4 - tissue-specific TFs). However, only one validation experiment checked CHD4. CHD4 binding was not mentioned at all in the other cases.
  
  (2) The authors integrated E12.5 TEAD binding with E11.5 acetylation data, and it would be important to show that this experimental approach is valid or otherwise qualify its limitations.
  
  (3) Motif co-occurrence analysis was extended to claiming TF interactions without further validation.
  
  Review 2
4. Public_Reviews 10 Jun 2025
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  Mora et al employ published ChIP-seq and RNA-seq from embryonic tissues to nominate transcription factors that work combinatorially during development. This manuscript addresses an important gap in knowledge regarding the complexities of gene regulation. However, as written, the manuscript is focused on confirming mostly known associations and does not unveil principles that can be broadly applied, given multiple technical caveats that are outlined below.
  
  Strengths:
  
  (1) Instead of focusing on a single transcription factor motif enriched within peaks, the authors search the flanking regions of enriched motifs to nominate additional transcription factors that may work cooperatively to provide organ specificity. This type of analysis is a crucial next step in the gene regulation field, as transcription factors rarely work independently.
  
  (2) Figure 6 is a good demonstration of the preliminary experiments that can be done to test the activity of co-occurring motifs.
  
  (3) This is a really nice resource of organ-specific motif associations that can be used to generate many testable hypotheses.
  
  (4) The rationale and writing are very clear and easy to read.
  
  Weaknesses:
  
  (1) Much of this manuscript focuses on confirming transcription factor relationships that have been reported previously. For example, it is well known that GATA4 interacts with MEF2 in the ventricle. There are limited new or unexpected associations discussed and tested.
  
  (2) Embryonic tissues are highly heterogeneous, limiting the utility of the bulk ChIP-seq employed in these analyses. Does the cellular heterogeneity explain the discrepancy between TEAD binding and histone acetylation? Similarly, how does conservation between species affect the TF predictions?
  
  (3) Some of the interpretations should also be fleshed out a bit more to clarify the advantage of the analyses presented here. For example, if Gata4 and Foxa2 transcripts are expressed during different stages of development, then it's likely that (as stated by the authors) these motifs are not used during the same stage of development. But examining the flanking regions wasn't necessary to make that statement. This type of conclusion seems tangential to the benefit of this analysis, which is to understand which TFs work together in a single organ at a single time point.
  
  (4) This manuscript hinges on luciferase assays whose results can be difficult to translate to complex gene regulation networks. Many motifs are often clustered together, which makes designing experiments at endogenous loci important in studies such as this one.
  
  Review 3
5. Public_Reviews 10 Jun 2025
  
  in eLife
  
  Author response:
  
  Reviewer #1:
  
  Point 1
  
  Not many weaknesses, but probably validation at more enhancers could have made the paper stronger.
  
  We experimentally validated two sets of enhancers from two distinct tissues and observed similar effects. While this supports the idea that the TEAD-tissue-specific TF interaction we observe is not restricted to a single tissue, we agree that testing additional enhancers from a third tissue would strengthen our conclusions. We will acknowledge in the discussion that including a third tissue could provide additional support for the generality of our findings.
  
  Reviewer #2:
  
  Point 1
  
  The authors propose a mechanism of a TF trio (TEAD - CHD4 - tissue-specific TFs). However, only one validation experiment checked CHD4. CHD4 binding was not mentioned at all in the other cases.
  
  Indeed, CHD4 binding was experimentally validated at only one enhancer. This was a deliberate decision based on two key considerations:
  
  (1) Consistent functional response across enhancers: We tested multiple enhancers (n =8) for functional response to the TEAD+YAP and GATA4/6 combination. All enhancers tested exhibited the same trend—attenuation of GATA-mediated activation upon co-expression of TEAD or TEAD/YAP. This consistent pattern supports a shared mechanism across these elements.
  
  (2) Substantial prior evidence supporting CHD4 recruitment by both GATA4 and YAP: Specifically, CHD4 recruitment by GATA4 has been described in the context of cardiovascular development[1], and CHD4 can also be recruited by TEAD coactivator YAP2. Furthermore, published genomic occupancy data from embryonic heart tissue show widespread co-binding of GATA4, TEAD, and CHD4[1,3], including at most of the cardiac enhancers we functionally tested (4 out of 5).
  
  Given the consistent enhancer responses and the supporting literature and genomic data indicating TEAD-CHD4 co-occupancy, we chose to validate CHD4 binding at a representative enhancer as a proof of concept.
  
  We will clarify this rationale in the revised manuscript to better address this concern.
  
  Reviewer #2:
  
  Point 2
  
  The authors integrated E12.5 TEAD binding with E11.5 acetylation data, and it would be important to show that this experimental approach is valid or otherwise qualify its limitations.
  
  We will provide additional evidence in support of this approach in the revised manuscript or alternatively acknowledge its limitations.
  
  Reviewer #2:
  
  Point 3
  
  Motif co-occurrence analysis was extended to claiming TF interactions without further validation.
  
  We thank the reviewer for pointing out this important distinction. We reviewed the manuscript and identified seven instances where TF interactions were mentioned. Four of these correctly refer to previously established protein-protein interactions. For the remaining instances, we will adjust the wording to reflect the level of evidence, e.g. describe combinatorial binding based on motif co-occurrence, rather than implying direct interaction.
  
  Reviewer #3:
  
  Point 1
  
  Much of this manuscript focuses on confirming transcription factor relationships that have been reported previously. For example, it is well known that GATA4 interacts with MEF2 in the ventricle. There are limited new or unexpected associations discussed and tested.
  
  We thank the reviewer for this important observation and see the recurrence of known interactions, such as GATA4-MEF2, not as a drawback, but as an important validation of our methodology.
  
  The identification of novel TF-TF combinations was geared toward uncovering shared regulatory principles across diverse human developmental tissues. While analysing 13 heterogeneous embryonic tissues introduced limitations, such as cellular complexity that may obscure rare interactions, it also allowed the identification of robust, recurrent patterns across tissues. Indeed, using this approach, we identified the widespread combinatorial effect of TEAD in partnership with lineage-specific TFs, which is explored more in depth in the manuscript.
  
  Another main goal of the study was to develop and demonstrate a generalizable strategy for identifying combinatorial TF binding patterns that underlie tissue-specific gene regulation. Given the inherent heterogeneity of the embryonic organs analysed, the approach is naturally biased toward recovering the most prevalent, and often well-characterized, TF combinations. While we fully acknowledge this limitation, we believe that the ability to robustly recover well-established TF partnerships across multiple organs provides a valuable proof of concept. The next step will be to apply this strategy to single-cell RNA datasets, in order to define TF relationships at higher resolution, for example, resolving associations down to specific family members that cooperate within distinct lineages or cell types, and identifying less frequent or underrepresented TF-TF relationships.
  
  In this context, we believe that our strategy has successfully highlighted shared enhancer logic and offers a framework for future high-resolution dissection of TF cooperativity at the single-cell level. The rationale for analysing heterogeneous tissues, along with its limitations, will be addressed in the revised version.
  
  Reviewer #3:
  
  Point 2
  
  Embryonic tissues are highly heterogeneous, limiting the utility of the bulk ChIP-seq employed in these analyses. Does the cellular heterogeneity explain the discrepancy between TEAD binding and histone acetylation? Similarly, how does conservation between species affect the TF predictions?
  
  We thank the reviewer for raising these important points. We acknowledge the limitations of using bulk ChIP-seq data in the context of complex embryonic tissues (see also previous point). We cannot exclude that the discrepancy between TEAD binding and histone acetylation is an effect of cellular heterogeneity. Indeed, we mention in the results “Our ventricle-specific enhancers were sampled at a single time point and likely represent enhancers that are selectively active in different cell types and developmental stages, given the heterogeneity of cell types in the ventricle”. The limitation of bulk ChIP-seq will be addressed in the discussion. In the specific case of the enhancers selected for validation, the binding site sequences are conserved between species, suggesting that the cis-regulatory activity is likely to be similar in both.
  
  Reviewer #3:
  
  Point 3
  
  Some of the interpretations should also be fleshed out a bit more to clarify the advantage of the analyses presented here. For example, if Gata4 and Foxa2 transcripts are expressed during different stages of development, then it's likely that (as stated by the authors) these motifs are not used during the same stage of development. But examining the flanking regions wasn't necessary to make that statement. This type of conclusion seems tangential to the benefit of this analysis, which is to understand which TFs work together in a single organ at a single time point.
  
  We appreciate the reviewer’s comment and the opportunity to clarify our interpretation. The reviewer refers to the finding that GATA4 and FOXA2 motifs are flanked by different sets of motifs in liver enhancers, suggesting that these TFs operate within distinct regulatory contexts.
  
  Our aim was not to state that GATA4 and FOXA2 do not function simultaneously—this can indeed be inferred from their non-overlapping expression patterns. Rather, we intended to highlight the potential of our approach, even when applied to bulk data, to resolve distinct regulatory modules that may act in different subpopulations of cells or developmental windows within the same tissue.
  
  We will revise the relevant section of the manuscript to make this interpretative point clearer.
  
  Reviewer #3:
  
  Point 4
  
  This manuscript hinges on luciferase assays whose results can be difficult to translate to complex gene regulation networks. Many motifs are often clustered together, which makes designing experiments at endogenous loci important in studies such as this one.
  
  We agree with the Reviewer that luciferase assays represent an oversimplified model of gene regulation and do not fully capture the complexity of endogenous regulatory networks. We will explicitly acknowledge this limitation in the discussion.
  
  Mutagenesis of TEAD and tissue-specific TF motifs at endogenous loci would provide more conclusive evidence. However, our goal was to test the generality of TEAD effect across multiple enhancers and tissues. Despite its limitations, a luciferase-based assay was the most feasible approach, as an endogenous strategy would not have allowed us to assess a broader set of enhancers efficiently. Additionally, the presence of recurrent motifs and the potential functional redundancy among enhancers targeting the same gene can complicate the interpretation of single-locus perturbations.
  
  References
  
  (1) Robbe ZL, Shi W, Wasson LK, Scialdone AP, Wilczewski CM, Sheng X, et al. CHD4 is recruited by GATA4 and NKX2-5 to repress noncardiac gene programs in the developing heart. Genes Dev. 2022 Apr 1;36(7–8):468–82.
  
  (2) Kim M, Kim T, Johnson RL, Lim DS. Transcriptional Co-repressor Function of the Hippo Pathway Transducers YAP and TAZ. Cell Rep. 2015 Apr;11(2):270–82.
  
  (3) Akerberg BN, Gu F, VanDusen NJ, Zhang X, Dong R, Li K, et al. A reference map of murine cardiac transcription factor chromatin occupancy identifies dynamic and conserved enhancers. Nat Commun. 2019 Oct 28;10(1):4907.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.10.05.561094v2
www.biorxiv.org www.biorxiv.org

Individuality across environmental context in Drosophila melanogaster

5
1. Public_Reviews 10 Jun 2025
 
 in eLife
 
 eLife Assessment
 
 There is a growing interest in understanding the individuality of animal behaviours. In this article, the authors build and use an impressive array of high throughput phenotyping paradigms to examine the 'stability' (consistency) of behavioural characteristics in a range of contexts and over time. They find that certain behaviours are individualistic and persist robustly across external stimuli while others are less robust to these changing parameters. The data, while extensive are incompletely analysed/explained. With more appropriate statistical methods adopted, the findings would have important implications for the study of individual variability.
 
 Summary
2. Public_Reviews 10 Jun 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.
 
 Strengths:
 
 The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as: (1) a large set of behavioral attributes, (2) with inter-individual variability, that are (3) stable over time. A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings and extends the experiments from temporal stability to examining correlation of locomotion features between different contexts.
 
 The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of high-throughput behavior assays and sharing the information on how to make the assays is a commendable contribution.
 
 Weaknesses:
 
 The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results. Why were five or so parameters selected from the full set? How were these selected? Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset?
 
 The correlation analysis is used to establish stability between assays. For temporal re-testing, "stability" is certainly the appropriate word, but between contexts it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency".
 
 The parameters are considered one-by-one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability', along with the analyses of single-parameter variability stability.
 
 The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23{degree sign}C and 32{degree sign}C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32{degree sign}C variance is predictable by the 23{degree sign}C variance. Is it fair to say that 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?
 
 The authors describe a dissociation between inter-group differences and inter-individual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining correlation? For example, would it be possible to transform the values to in-group ranks prior to correlation analysis?
 
 What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general, and with regard to these specific parameters? Is increased walking speed at higher temperature necessarily due to increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?
 
 The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.
 
 Using the current single-correlation analysis approach, the aims would benefit from re-wording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.
 
 The study presents a bounty of new technology to study visually guided behaviors. The Github link to the software was not available. To verify successful transfer or open-hardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable. The study discusses a number of interesting, stimulating ideas about inter-individual variability and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.
 
 While the current work does not present any mechanistic analysis of inter-individual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms.
 
 Comments on revisions:
 
 I want to express my appreciation for the authors' responsiveness to the reviewer feedback. They appear to have addressed my previous concerns through various modifications including GLM analysis, however, some areas still require clarification for the benefit of an audience that includes geneticists.
 
 (1) GLM Analysis Explanation (Figure 9) While the authors state that their new GLM results support their original conclusions, the explanation of these results in the text is insufficient. Specifically:
 
 - The interpretation of coefficients and their statistical significance needs more detailed explanation. The audience includes geneticists and other non-statistical people, so the GLM should be explained in terms of the criteria or quantities used to assess how well the results conform with the hypothesis, and to what extent they diverge. - The criteria used to judge how well the GLM results support their hypothesis are not clearly stated. - The relationship between the GLM findings and their original correlation-based conclusions needs better integration and connection, leading the reader through your reasoning.
 
 (2) Documentation of Changes One struggle with the revised manuscript is that no "tracked changes" version was included, so it is hard to know exactly what was done. Without access to the previous version of the manuscript, it is difficult to fully assess the extent of revisions made. The authors should provide a more comprehensive summary of the specific changes implemented, particularly regarding:
 
 (3) Statistical Method Selection The authors mention using "ridge regression to mitigate collinearity among predictors" but do not adequately justify this choice over other approaches. They should explain:
 
 - Why ridge regression was selected as the optimal method - How the regularization parameter (λ) was determined - How this choice affects the interpretation of environmental parameters' influence on individuality
 
 Review 1
3. Public_Reviews 10 Jun 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The authors repeatedly measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.
 
 Strengths:
 
 The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great, and I'm sure other folks will be interested in using and adapting to their own needs.
 
 Weaknesses/Limitations:
 
 I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting, temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, or a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank-order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context.
 
 I think the authors are missing an opportunity to use much more robust statistical methods It appears as though the authors used pearson correlations across time/situations to estimate individual variation; however far more sophisticated and elegant methods exist. The problem is that pearson correlation coefficients can be anti-conservative and additionally, the authors have thus had to perform many many tests to correlate behaviors across the different trials/scenarios. I don't see any evidence that the authors are controlling for multiple testing which I think would also help. Alternatively, though, the paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data, which are the standard analytical tools in the study of individual behavioral variation. In this way, the authors could partition the behavioral variance into its among- and within-individual components and quantify repeatability of different behaviors across trials/scenarios simultaneously. This would remove the need to estimate 3 different correlations for day 1 & day 2, day 1 & 3, day 2 & 3 (or stripe 0 & stripe 1, etc) and instead just report a single repeatability for e.g. the time spent walking among the different strip patterns (eg. figure 3). Additionally, the authors could then use multivariate models where the response variables are all the behaviors combined and the authors could estimate the among-individual covariance in these behaviors. I see that the authors state they include generalized linear mixed models in their updated MS, but I struggled a bit to understand exactly how these models were fit? What exactly was the response? what exactly were the predictors (I just don't understand what Line404 means "a GLM was trained using the environmental parameters as predictors (0 when the parameter was not changed, 1 if it was) and the resulting individual rank differences as the response"). So were different models run for each scenario? for different behaviors? Across scenarios? What exactly? I just harp on this because I'm actually really interested in these data and think that updating these methods can really help clarify the results and make the main messages much clearer!
 
 I appreciate that the authors now included their sample sizes in the main body of text (as opposed to the supplement) but I think that it would still help if the authors included a brief overview of their design at the start of the methods. It is still unclear to me how many rigs each individual fly was run through? Were the same individuals measured in multiple different rigs/scenarios? Or just one?
 
 I really think a variance partitioning modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation. I also genuinely think that this will improve the impact and reach of this paper as they'll be using methods that are standard in the study of individual behavioral variation
 
 Review 2
4. Public_Reviews 10 Jun 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 This manuscript is a continuation of past work by the last author where they looked at stochasticity in developmental processes leading to inter-individual behavioural differences. In that work, the focus was on a specific behaviour under specific conditions while probing the neural basis of the variability. In this work, the authors set out to describe in detail how stable individuality of animal behaviours is in the context of various external and internal influences. They identify a few behaviours to monitor (read outs of attention, exploration, and 'anxiety'); some external stimuli (temperature, contrast, nature of visual cues, and spatial environment); and two internal states (walking and flying).
 
 They then use high-throughput behavioural arenas - most of which they have built and made plans available for others to replicate - to quantify and compare combinations of these behaviours, stimuli, and internal states. This detailed analysis reveals that:
 
 (1) Many individualistic behaviours remain stable over the course of many days. (2) That some of these (walking speed) remain stable over changing visual cues. Others (walking speed and centrophobicity) remain stable at different temperatures. (3) All the behaviours they tested fail to remain stable over spatially varying environment (arena shape). (4) and only angular velocity (a read out of attention) remains stable across varying internal states (walking and flying)
 
 Thus, the authors conclude that there is a hierarchy in the influence of external stimuli and internal states on the stability of individual behaviours.
 
 The manuscript is a technical feat with the authors having built many new high-throughput assays. The number of animals are large and many variables have been tested - different types of behavioural paradigms, flying vs walking, varying visual stimuli, different temperature among others.
 
 Comments on revisions:'
 
 The authors have addressed my previous concerns.
 
 Review 3
5. Public_Reviews 10 Jun 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews
 
 Public Reviews:
 
 Reviewer #1 (Public Review):
 
 Summary:
 
 The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.
 
 Strengths:
 
 The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as:
 
 (1) a large set of behavioral attributes,
 
 (2) with inter-individual variability, that are
 
 (3) stable over time.
 
 A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings and extends the experiments from temporal stability to examining the correlation of locomotion features between different contexts.
 
 The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of highthroughput behavior assays and sharing the information on how to make the assays is a commendable contribution.
 
 We thank the reviewer for his exceptionally kind assessment of our work!
 
 Weaknesses:
 
 The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results.
 
 We have now uploaded a high-resolution PDF to the Github Address: https://github.com/LinneweberLab/Mathejczyk_2024_eLife_Individuality/blob/main/S8.pdf, and this is also mentioned in the figure legend for Fig. S8
 
 Why were five or so parameters selected from the full set? How were these selected?
 
 The five parameters (% of time walked, walking speed, vector strength, angular velocity, and centrophobicity) were selected because they describe key aspects of the investigated behaviors that can be compared directly across assays. Importantly, several parameters we typically use (e.g., Linneweber et al., 2020) cannot be applied under certain conditions, such as darkness or the absence of visual cues. Furthermore, these five parameters encompass three critical aspects of navigation across standard visual behavioral arenas: (1) The “exploration” category is characterized by parameters describing the fly’s activity. (2) Parameters related to “attention” reflect heightened responses to visual cues, but unlike commonly used metrics such as angle or stripe deviations (e.g., Coulomb, 2012; Linneweber et al., 2020), they can also be measured in absence of visual cues and are therefore suitable for cross-assay comparisons. (3) The parameter “centrophobicity,” used as a potential indicator of anxiety, is conceptually linked to the open-field test in mice, where the ratio of wall-to-open-field activity is frequently calculated as a measurement of anxiety (see for example Carter, Sheh, 2015, chapter 2. https://www.sciencedirect.com/book/9780128005118/guide-to-researchtechniques-in-neuroscience). Admittedly, this view is frequently challenged in mice, but it has a long history which is why we use it.
 
 Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset?
 
 As noted above, we only included a subset of parameters in our final analysis, as many were unsuitable for comparison across assays while still providing valuable assayspecific information which are important to relate these results to previous publications.
 
 The correlation analysis is used to establish stability between assays. For temporal retesting, "stability" is certainly the appropriate word, but between contexts, it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency".
 
 Thank you for this suggestion. During the preparation of the manuscript, we indeed frequently alternated between the terms “stability” and “consistency.” And decided to go with “stability” as the only descriptor, to keep it simple. We now fully agree with the reviewer’s argument and have replaced “stability” by “consistency” throughout the current version of the manuscript in order to increase clarity and coherence.
 
 The parameters are considered one by one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability' and analyses of single-parameter variability stability.
 
 We agree with the reviewer that a multivariate analysis adds clear advantages in terms of statistical power, in addition to our chosen approach. On one hand, we believe that the simplicity of our initial analysis, both for correlational and mean data, makes easy for readers to understand and reproduce our data. While preparing the previous version of the manuscript we were skeptical since more complex analyses often involve numerous choices, which can complicate reproducibility. For instance, a recent study in personality psychology (Paul et al., 2024) highlighted the risks of “forking paths” in statistical analysis, showing that certain choices of statistical methods could even reverse findings—a concern mitigated by our simplistic straightforward approach. Still, in preparation of this revised version of the manuscript, we accepted the reviewer’s advice and reanalyzed the data using a generalized linear model. This analysis nicely recapitulates our initial findings and is now summarized in a single figure (Fig. 9).
 
 The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23{degree sign}C and 32{degree sign}C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32{degree sign}C variance is predictable by the 23{degree sign}C variance. Is it fair to say that a 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?
 
 We agree that this is an important question. Our paper clearly demonstrates that individuality always plays a role in decision-making (and, in this context, any behavioral output can be considered a decision). However, the non-linear relationship between certain situations and the individual’s behavior often reduces the predictive value (or correlation) across contexts, sometimes quite drastically.
 
 For instance, temperature has a relatively linear effect on certain behavioral parameters, leading to predictable changes across individuals. As a result, correlations across temperature conditions are often similar to those observed across time within the same situation. In contrast, this predictability diminishes when comparing conditions like the presence or absence of visual stimuli, the use of different arenas, or different modalities.
 
 For this reason, we believe that significance remains the best indicator for describing how measurable individuality persists, even across vastly different situations.
 
 The authors describe a dissociation between inter-group differences and interindividual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining the correlation? For example, would it be possible to transform the values to in-group ranks prior to correlation analysis?
 
 We thank the reviewer for this suggestion, and we have now addressed this point. To account for slope effects, we have now introduced in-group ranks for our linear model computation (see Fig. 9).
 
 What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general and with regard to these specific parameters? Is the increased walking speed at higher temperatures necessarily due to an increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?
 
 We agree that grouping our parameters into traits like exploration, attention, and anxiety always includes subjective decisions. The classification into these three categories is even considered partially controversial in the mouse specific literature, which uses the term “anxiety” in similar experiments (see for exampler Carter, Sheh, 2015, chapter 2 . https://www.sciencedirect.com/book/9780128005118/guide-to-research-techniquesin-neuroscience). Nevertheless, we believe that readers greatly benefit from these categories, since they make it easier to understand (beyond mathematical correlations) which aspects of the flies’ individuality can be considered consistent across situations. Furthermore, these categories serve as a bridge to compare insight from very distinct models.
 
 The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.
 
 We assume the reviewer is referring to Figure 3a. The detailed experimental protocol can be found in the Materials and Methods section under Setup 2: IndyTrax Multi-Arena Platform. We have now clarified this in the mentioned figure legend.
 
 Using the current single-correlation analysis approach, the aims would benefit from rewording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.
 
 The reviewer raises an important point about hierarchies within the concept of animal individuality or personality. We agree that this is best addressed by first focusing on single behavioral traits/parameters and then integrating several trait properties into a cohesive concept of animal personality (holistic individuality). To ensure consistency throughout the text, we have now thoroughly reviewed the entire manuscript clearly distinguish between single-parameter variability stability/consistency and holistic individuality/personality.
 
 The study presents a bounty of new technology to study visually guided behaviors. The GitHub link to the software was not available. To verify the successful transfer of open hardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable.
 
 We have now uploaded all codes and materials to GitHub and made them available as soon as we received the reviewers’ comments. All files and materials can be accessed at https://github.com/LinneweberLab/Mathejczyk_2024_eLife_Individuality, which is now frequently mentioned throughout the revised manuscript.
 
 The study discusses a number of interesting, stimulating ideas about inter-individual variability, and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.
 
 While the current work does not present any mechanistic analysis of inter-individual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms.
 
 We thank the reviewer again for the extensive and constructive feedback.
 
 Reviewer #2 (Public Review):
 
 Summary:
 
 The authors repeatedly measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.
 
 Strengths:
 
 The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great and I'm sure other folks will be interested in using and adapting it to their own needs.
 
 We thank the reviewer for highlighting the strengths of our study.
 
 Weaknesses/Limitations:
 
 I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting and temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low-risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context.
 
 We agree with the reviewer that the definition of environmental context can differ between fields and that behavioral context is differently defined, particularly in ecology. Nevertheless, we highlight that our alternations of environmental context are highly stereotypic, well-defined, and unbiased from any interpretation (we only modified what we stated in the experimental description without designing a specific situation that might be again perceived individually differently. E.g., comparing a context with a predator and one without might result in a binary response because one fraction of the tested individuals might perceive the predator in the predator situation, and the other half does not.
 
 The analytical framework in terms of statistical methods is lacking. It appears as though the authors used correlations across time/situations to estimate individual variation; however, far more sophisticated and elegant methods exist. The paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data these models could capture and estimate differences in individual behavior across time and situations simultaneously. Along with this, it's currently unclear whether and how any statistical inference was performed. Right now, it appears as though any results describing how individuality changes across situations are largely descriptive (i.e. a visual comparison of the strengths of the correlation coefficients?).
 
 The reviewer raises an important point, also raised by reviewer #1. On one hand, we agree with both reviewers that a more aggregated analysis has clear advantages like more statistical power and has the potential to streamline our manuscript, which is why we added such an analysis (see below). On the other hand, we would also like to defend the initial approach we took, since we think that the simplicity of the analysis for both correlational and mean data is easy to understand and reproduce. More complex analyses necessarily include the selection of a specific statistical toolbox by the experimenters and based on these decisions, different analyses become less comparable and more and more complicated to reproduce, unless the entire decision tree is flawlessly documented. For instance, a recent personality psychology paper investigated the relationship between statistical paths within the decision tree (forking analysis) and their results, leading to very surprising results (Paul et al., 2024), since some paths even reversed their findings. Such a variance in conclusions is hardly possible with the rather simplistic and easily reproducible analysis we performed. One of the major strengths of our study is the simple experimental design, allowing for rather simple and easy to understand analyses.
 
 We nevertheless took the reviewer’s advice very seriously and reanalyzed the data using a generalized linear model, which largely recapitulated the findings of our previously performed “low-tech” analysis in a single figure (Fig. 9).
 
 Another pretty major weakness is that right now, I can't find any explicit mention of how many flies were used and whether they were re-used across situations. Some sort of overall schematic showing exactly how many measurements were made in which rigs and with which flies would be very beneficial.
 
 We apologize for this inconvenience. A detailed overview of male and female sample sizes has been listed in the supplemental boxplots next to the plots (e.g, Fig S6). Apparently, this was not visible enough. Therefore, we have now also uniformly added the sample sizes to the main figure legends.
 
 I don't necessarily doubt the robustness of the results and my guess is that the author's interpretations would remain the same, but a more appropriate modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation.
 
 As described above, we have now added the suggested analyses. We hope that the reviewer will appreciate the new Fig. 9, which, in our opinion, largely confirms our previous findings using a more appropriate generalized linear modelling framework.
 
 Reviewer #3 (Public Review):
 
 This manuscript is a continuation of past work by the last author where they looked at stochasticity in developmental processes leading to inter-individual behavioural differences. In that work, the focus was on a specific behaviour under specific conditions while probing the neural basis of the variability. In this work, the authors set out to describe in detail how stable the individuality of animal behaviours is in the context of various external and internal influences. They identify a few behaviours to monitor (read outs of attention, exploration, and 'anxiety'); some external stimuli (temperature, contrast, nature of visual cues, and spatial environment); and two internal states (walking and flying).
 
 They then use high-throughput behavioural arenas - most of which they have built and made plans available for others to replicate - to quantify and compare combinations of these behaviours, stimuli, and internal states. This detailed analysis reveals that:
 
 (1) Many individualistic behaviours remain stable over the course of many days.
 
 (2) That some of these (walking speed) remain stable over changing visual cues. Others (walking speed and centrophobicity) remain stable at different temperatures.
 
 (3) All the behaviours they tested failed to remain stable over the spatially varying environment (arena shape).
 
 (4) Only angular velocity (a readout of attention) remains stable across varying internal states (walking and flying).
 
 Thus, the authors conclude that there is a hierarchy in the influence of external stimuli and internal states on the stability of individual behaviours.
 
 The manuscript is a technical feat with the authors having built many new highthroughput assays. The number of animals is large and many variables have been tested - different types of behavioural paradigms, flying vs walking, varying visual stimuli, and different temperatures among others.
 
 We thank the reviewer for this extraordinary kind assessment of our work!
 
 Recommendations for the authors:
 
 Reviewing Editor (Recommendations For The Authors):
 
 While appreciating the effort and quality of the work that went into this manuscript, the reviewers identified a few key points that would greatly benefit this work.
 
 (1) Statistical methods adopted. The dataset produced through this work is large, with multiple conditions and comparisons that can be made to infer parameters that both define and affect the individualistic behaviour of an animal. Hierarchical mixed models would be a more appropriate approach to handle such datasets and infer statistically the influence of different parameters on behaviours. We recommend that the authors take this approach in the analyses of their data.
 
 (2) Brevity in the text. We urge the authors to take advantage of eLife's flexible template and take care to elaborate on the text in the results section, the methods adopted, the legends, and the guides to the legends embedded in the main text. The findings are likely to be of interest to a broad audience, and the writing currently targets the specialist.
 
 Reviewer #2 (Recommendations For The Authors):
 
 I want to start by saying this seems like a really cool study! It's an impressive amount of work and addressing a pretty basic question that is interesting (at least I think so!)
 
 We thank the reviewer again for this assessment!
 
 That said, I would really strongly recommend the authors embrace using mixed/hierarchical models to analyze their data. They're producing some really impressive data and just doing Pearson correlation coefficients across time points and situations is very clunky and actually losing out on a lot of information. The most up-todate, state-of-the-art are mixed models - these models can handle very complex (or not so complex) random structures which can estimate variance and importantly, covariance, in individual intercepts both over time and across situations. I actually think this could add some really cool insights into the data and allow you to characterize the patterns you're seeing in far more detail. It's datasets exactly like this that are tailormade for these complex variance partitioning models!
 
 As mentioned before, we have now adopted a more appropriate GLM-based data analysis (see above).
 
 Regardless of which statistical methods you decide to use, please explicitly state in your methods exactly what analyses you did. That is completely lacking now and was a bit frustrating. As such, it's completely unclear whether or how statistical inference was performed. How did you do the behavioral clustering?
 
 We apologize that these points were not clearly represented in the previous version of the manuscript. We have now significantly extended the methods section to include a separate paragraph on the statistical methods used, in order to address this critique and hope that the revised version is clear now.
 
 Also, I could not for the life of me figure out how many flies had been measured. Were they reused across the situation? Or not?
 
 We reused the same flies across situations whenever possible. However, having one fly experience all assays consecutively was not feasible due to their fragility. Instead, individual flies were exposed to at least 2 of the 3 groups of assays used here: in the Indytrax setup , the Buridan arenas and variants thereof, and the virtual arenas Hence, we have compared flies across entirely different setups, but the number of times flies can be retested is limited (as otherwise, sample sizes will drop over time, and the flies will have gone through too many experimental alternations). To make this more clear, we have elaborated on this point in the main text, and we added group sample sizes to figure legends r.
 
 What are these "groups" and "populations" that are referred to in the results (e.g. lines 384, 391, 409)?
 
 We apologize for using these two terms somewhat interchangeably without proper introduction/distinction. We have now made this more clear in at the beginning of the results in the main text, by focusing on the term ‘group’. By ‘group’ we refer to the average of all individuals tested in the same situation. Sample sizes in the figure legends now indicate group/population sizes to make this clearer.
 
 Some of the rationale for the development of the behavioral rigs would have actually been nice to include in the intro, rather than in the results.
 
 This rationale is introduced at the beginning of the last paragraph of the introduction. We hope that this now becomes clear in the revised version of the manuscript.
 
 Reviewer #3 (Recommendations For The Authors):
 
 This manuscript would do well to take advantage of eLife's flexible word limit. I sense that it has been written in brevity for a different journal but I would urge the authors to revisit this and unpack the language here - in the text, in the figure legends, in references to the figures within the text. The way it's currently written, though not misleading, will only speak to the super-specialist or the super-invested :). But the findings are nice, and it would be nice to tailor it to a broader audience.
 
 We appreciate this suggestion. Initially, we were hoping that we had described our results as clearly and brief as possible. We apologize if that was not always the case. The comments and requests of all three reviewers now led to a series of additions to both main text and methods, leading to a significantly expanded manuscript. We hope that these additons are helpful for the general, non-expert audience.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.11.26.568741v3
www.biorxiv.org www.biorxiv.org

Age-dependent predictors of effective reinforcement motor learning across childhood

5
1. Public_Reviews 09 Jun 2025
 
 in eLife
 
 eLife Assessment
 
 This important study tests the development of motor reinforcement learning from toddlerhood to adulthood, using a large online sample. They show that learning improves with age in a task that, like real-life movement, involves a continuous range of response options and probabilistic rewards, and link this shift to reduced movement variability and more efficient feedback-based learning through behavioural modeling. Simplifying the task with discrete actions and deterministic outcomes boosted younger children's performance, suggesting early learning is limited by spatial and probabilistic processing. The evidence is convincing, although future work may investigate more naturalistic movement.
 
 Summary
2. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 Here the authors address how reinforcement-based sensorimotor adaptation changes throughout development. To address this question, they collected many participants in ages that ranged from small children (3 years old) to adulthood (18+ years old). The authors used four experiments to manipulate whether binary and positive reinforcement was provided probabilistically (e.g., 30 or 50%) versus deterministically (e.g.,100%), and continuous (infinite possible locations) versus discrete (binned possible locations) when the probability of reinforcement varied along the span of a large redundant target. The authors found that both movement variability and the extent of adaptation changed with age.
 
 Strengths:
 
 The major strength of the paper is the number of participants collected (n = 385). The authors also answer their primary question, that reinforcement-based sensorimotor adaptation changes throughout development, which was shown by utilizing established experimental designs and computational modelling. They have compared an extensive number of potential models, finding the one that best fits the data while penalizing the number of free parameters.
 
 Review 1
3. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 In this study, Hill and colleagues use a novel reinforcement-based motor learning task ("RML"), asking how aspects of RML change over the course of development from toddler years through adolescence. Multiple versions of the RML task were used in different samples, which varied on two dimensions: whether the reward probability of a given hand movement direction was deterministic or probabilistic, and whether the solution space had continuous reach targets or discrete reach targets. Using analyses of both raw behavioral data and model fits, the authors report four main results: First, developmental improvements reflected 3 clear changes, including increases in exploration, an increase in the RL learning rate, and a reduction of intrinsic motor noise. Second, changes to the task that made it discrete and/or deterministic both rescued performance in the youngest age groups, suggesting that observed deficits could be linked to continuous/probabilistic learning settings. Overall, the results shed light on how RML changes throughout human development, and the modeling characterizes the specific learning deficits seen in the youngest ages.
 
 Strengths:
 
 (1) This impressive work addresses an understudied subfield of motor control/psychology - the developmental trajectory of motor learning. It is thus timely and will interest many researchers.
 
 (2) The task, analysis, and modeling methods are very strong. The empirical findings are rather clear and compelling, and the analysis approaches are convincing. Thus, at the empirical level, this study has very few weaknesses.
 
 (3) The large sample sizes and in-lab replications further reflect the laudable rigor of the study.
 
 (4) The main and supplemental figures are clear and concise.
 
 Review 2
4. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 The study investigates the development of reinforcement learning across the lifespan with a large sample of participants recruited for an online game. It finds that children gradually develop their abilities to learn reward probability, possibly hindered by their immature spatial processing and probabilistic reasoning abilities. Motor noise and exploration after a failure all contribute to children's subpar performance.
 
 Strengths:
 
 Experimental manipulations of both the continuity of movement options and the probabilistic nature of the reward function enable the inference of what cognitive factors differ between age groups. A large sample of participants is studied. The model-based analysis provides further insights into the development of reinforcement learning ability.
 
 Weaknesses:
 
 The conclusion that immature spatial processing and probabilistic reasoning abilities limit reinforcement learning here still needs more direct evidence.
 
 Review 3
5. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews
 
 Overview of changes in the revision
 
 We thank the reviewers for the very helpful comments and have extensively revised the paper. We provide point-by-point responses below and here briefly highlight the major changes:
 
 (1) We expanded the discussion of the relevant literature in children and adults.
 
 (2) We improved the contextualization of our experimental design within previous reinforcement studies in both cognitive and motor domains highlighting the interplay between the two.
 
 (3) We reorganized the primary and supplementary results to better communicate the findings of the studies.
 
 (4) The modeling has been significantly revised and extended. We now formally compare 31 noise-based models and one value-based model and this led to a different model from the original being the preferred model. This has to a large extent cleaned up the modeling results. The preferred model is a special case (with no exploration after success) of the model proposed in Therrien et al. (2018). We also provide examples of individual fits of the model, fit all four tasks and show group fits for all, examine fits vs. data for the clamp phases by age, provide measures of relative and absolute goodness of fit, and examine how the optimal level of exploration varies with motor noise.
 
 Reviewer #1 (Public review):
 
 Summary:
 
 Here the authors address how reinforcement-based sensorimotor adaptation changes throughout development. To address this question, they collected many participants in ages that ranged from small children (3 years old) to adulthood (1 8+ years old). The authors used four experiments to manipulate whether binary and positive reinforcement was provided probabilistically (e.g., 30 or 50%) versus deterministically (e.g., 100%), and continuous (infinite possible locations) versus discrete (binned possible locations) when the probability of reinforcement varied along the span of a large redundant target. The authors found that both movement variability and the extent of adaptation changed with age.
 
 Thank you for reviewing our work. One note of clarification. This work focuses on reinforcementbased learning throughout development but does not evaluate sensorimotor adaptation. The four tasks presented in this work are completed with veridical trajectory feedback (no perturbation).
 
 The goal is to understand how children at different ages adjust their movements in response to reward feedback but does not evaluate sensorimotor adaptation. We now explain this distinction on line 35.
 
 Strengths:
 
 The major strength of the paper is the number of participants collected (n = 385). The authors also answer their primary question, that reinforcement-based sensorimotor adaptation changes throughout development, which was shown by utilizing established experimental designs and computational modelling.
 
 Thank you.
 
 Weaknesses:
 
 Potential concerns involve inconsistent findings with secondary analyses, current assumptions that impact both interpr tation and computational modelling, and a lack of clearly stated hypotheses.
 
 (1) Multiple regression and Mediation Analyses.
 
 The challenge with these secondary analyses is that:
 
 (a) The results are inconsistent between Experiments 1 and 2, and the analysis was not performed for Experiments 3 and 4,
 
 (b) The authors used a two-stage procedure of using multiple regression to determine what variables to use for the mediation analysis, and
 
 (c)The authors already have a trial-by-trial model that is arguably more insightful.
 
 Given this, some suggested changes are to:
 
 (a) Perform the mediation analysis with all the possible variables (i.e., not informed by multiple regression) to see if the results are consistent.
 
 (b) Move the regression/mediation analysis to Supplementary, since it is slightly distracting given current inconsistencies and that the trial-by-trial model is arguably more insightful.
 
 Based on these comments, we have chosen to remove the multiple regression and mediation analyses. We agree that they were distracting and that the trial-by-trial model allows for differentiation of motor noise from exploration variability in the learning block.
 
 (2) Variability for different phases and model assumptions:
 
 A nice feature of the experimental design is the use of success and failure clamps. These clamped phases, along with baseline, are useful because they can provide insights into the partitioning of motor and exploratory noise. Based on the assumptions of the model, the success clamp would only reflect variability due to motor noise (excludes variability due to exploratory noise and any variability due to updates in reach aim). Thus, it is reasonable to expect that the success clamps would have lower variability than the failure clamps (which it obviously does in Figure 6), and presumably baseline (which provides success and failure feedback, thus would contain motor noise and likely some exploratory noise).
 
 However, in Figure 6, one visually observes greater variability during the success clamp (where it is assumed variability only comes from motor noise) compared to baseline (where variability would come from: (a) Motor noise.
 
 (b) Likely some exploratory noise since there were some failures.
 
 (c) Updates in reach aim.
 
 Thanks for this comment. It made us realize that some of our terminology was unintentionally misleading. Reaching to discrete targets in the Baseline block was done to a) determine if participants could move successfully to targets that are the same width as the 100% reward zone in the continuous targets and b) determine if there are age dependent changes in movement precision. We now realize that the term Baseline Variability was misleading and should really be called Baseline Precision.
 
 This is an important distinction that bears on this reviewer's comment. In clamp trials, participants move to continuous targets. In baseline, participants move to discrete targets presented at different locations. Clamp Variability cannot be directly compared to Baseline Precision because they are qualitatively different. Since the target changes on each baseline trial, we would not expect updating of desired reach (the target is the desired reach) and there is therefore no updating of reach based on success or failure. The SD we calculate over baseline trials is the endpoint variability of the reach locations relative to the target centers. In success clamp, there are no targets so the task is qualitatively different.
 
 We have updated the text to clarify terminology, expand upon our operational definitions, and motivate the distinct role of the baseline block in our task paradigm (line 674).
 
 Given the comment above, can the authors please:
 
 (a) Statistically compare movement variability between the baseline, success clamp, and failure clamp phases.
 
 Given our explanation in the previous point we don't think that comparing baseline to the clamp makes sense as the trials are qualitatively different.
 
 (b) The authors have examined how their model predicts variability during success clamps and failure clamps, but can they also please show predictions for baseline (similar to that of Cashaback et al., 2019; Supplementary B, which alternatively used a no feedback baseline)?
 
 Again, we do not think it makes sense to predict the baseline which as we mention above has discrete targets compared to the continuous targets in the learning phase.
 
 (c) Can the authors show whether participants updated their aim towards their last successful reach during the success clamp? This would be a particularly insightful analysis of model assumptions.
 
 We have now compared 31 models (see full details in next response) which include the 7 models in Roth et al. (2023). Several of these model variants have updating even after success with so called planning noise). We also now fit the model to the data that includes the clamp phases (we can't easily fit to success clamp alone as there are only 10 trials). We find that the preferred model is one that does not include updating after success.
 
 (d) Different sources of movement variability have been proposed in the literature, as have different related models. One possibility is that the nervous system has knowledge of 'planned (noise)' movement variability that is always present, irrespective of success (van Beers, R.J. (2009). Motor learning is optimally tuned to the properties of motor noise. Neuron, 63(3), 406-417). The authors have used slightly different variations of their model in the past. Roth et al (2023) directly Rill compared several different plausible models with various combinations of motor, planned, and exploratory noise (Roth A, 2023, "Reinforcement-based processes actively regulate motor exploration along redundant solution manifolds." Proceedings of the Royal Society B 290: 20231475: see Supplemental). Their best-fit model seems similar to the one the authors propose here, but the current paper has the added benefit of the success and failure clamps to tease the different potential models apart. In light of the results of a), b), and c), the authors are encouraged to provide a paragraph on how their model relates to the various sources of movement variability and ther models proposed in the literature.
 
 Thank you for this. We realized that the models presented in Roth et al. (2023) as well as in other papers, are all special cases of a more general model. Moreover, in total there are 30 possible variants of the full model so we have now fit all 31 models to our larger datasets and performed model selection (Results and Methods). All the models can be efficiently fit by Kalman smoother to the actual data (rather than to summary statistics which has sometimes been done). For model selection, we fit only the 100 learning trials and chose the preferred model based on BIC on the children's data (Figure 5—figure Supplement 1). After selecting the preferred model we then refit this model to all trials including the clamps so as to obtain the best parameter estimates.
 
 The preferred model was the same whether we combined the continuous and discrete probabilistic data or just examin d each task separately either for only the children or for the children and adults combined. The preferred model is a pecial case (no exploration after success) of the one proposed in Therrien et al. (2018) and has exploration variability (after failure) and motor noise with full updating with exploration variability (if any) after success. This model differs from the model in the original submission which included a partial update of the desired reach after exploration this was considered the learning rate. The current model suggests a unity learning rate.
 
 In addition, as suggested by another reviewer, we also fit a value-based model which we adapted from the model described in Giron et al. (2023). This model was not preferred.
 
 We have added a paragraph to the Discussion highlighting different sources of variability and links to our model comparison.
 
 (e) line 155. Why would the success clamp be composed of both motor and exploratory noise? Please clarify in the text
 
 This sentence was written to refer to clamps in general and not just success clamps. However, in the revision this sentence seemed unnecessary so we have removed it.
 
 (3) Hypotheses:
 
 The introduction did not have any hypotheses of development and reinforcement, despite the discussion above setting up potential hypotheses. Did the authors have any hypotheses related to why they might expect age to change motor noise, exploratory noise, and learning rates? If so, what would the experimental behaviour look like to confirm these hypotheses? Currently, the manuscript reads more as an exploratory study, which is certainly fine if true, it should just be explicitly stated in the introduction. Note: on line 144, this is a prediction, not a hypothesis. Line 225: this idea could be sharpened. I believe the authors are speaking to the idea of having more explicit knowledge of action-target pairings changing behaviour.
 
 We have included our hypotheses and predictions at two points in the paper In the introduction we modified the text to:
 
 "We hypothesized that children's reinforcement learning abilities would improve with age, and depend on the developmental trajectory of exploration variability, learning rate (how much people adjust their reach after success), and motor noise (here defined as all sources of noise associated with movement, including sensory noise, memory noise, and motor noise). We think that these factors depend on the developmental progression of neural circuits that contribute to reinforcement learning abilities (Raznahan et al., 2014; Nelson et al., 2000; Schultz, 1998)."
 
 In results we modified the sentence to:
 
 "We predicted that discrete targets could increase exploration by encouraging children to move to a different target after failure.”
 
 Reviewer #2 (Public review):
 
 Summary:
 
 In this study, Hill and colleagues use a novel reinforcement-based motor learning task ("RML"), asking how aspects of RML change over the course of development from toddler years through adolescence. Multiple versions of the RML task were used in different samples, which varied on two dimensions: whether the reward probability of a given hand movement direction was deterministic or probabilistic, and whether the solution space had continuous reach targets or discrete reach targets. Using analyses of both raw behavioral data and model fits, the authors report four main results: First, developmental improvements reflected 3 clear changes, including increases in exploration, an increase in the RL learning rate, and a reduction of intrinsic motor noise. Second, changes to the task that made it discrete and/or deterministic both rescued performance in the youngest age groups, suggesting that observed deficits could be linked to continuous/probabilistic learning settings. Overall, the results shed light on how RML changes throughout human development, and the modeling characterizes the specific learning deficits seen in the youngest ages.
 
 Strengths:
 
 (1) This impressive work addresses an understudied subfield of motor control/psychology - the developmental trajectory of motor learning. It is thus timely and will interest many researchers.
 
 (2) The task, analysis, and modeling methods are very strong. The empirical findings are rather clear and compelling, and the analysis approaches are convincing. Thus, at the empirical level, this study has very few weaknesses.
 
 (3) The large sample sizes and in-lab replications further reflect the laudable rigor of the study.
 
 (4) The main and supplemental figures are clear and concise.
 
 Thank you.
 
 Weaknesses:
 
 (1) Framing.
 
 One weakness of the current paper is the framing, namely w/r/t what can be considered "cognitive" versus "non-cognitive" ("procedural?") here. In the Intro, for example, it is stated that there are specific features of RML tasks that deviate from cognitive tasks. This is of course true in terms of having a continuous choice space and motor noise, but spatially correlated reward functions are not a unique feature of motor learning (see e.g. Giron et al., 2023, NHB). Given the result here that simplifying the spatial memory demands of the task greatly improved learning for the youngest cohort, it is hard to say whether the task is truly getting at a motor learning process or more generic cognitive capacities for spatial learning, working memory, and hypothesis testing. This is not a logical problem with the design, as spatial reasoning and working memory are intrinsically tied to motor learning. However, I think the framing of the study could be revised to focus in on what the authors truly think is motor about the task versus more general psychological mechanisms. Indeed, it may be the case that deficits in motor learning in young children are mostly about cognitive factors, which is still an interesting result!
 
 Thank you for these comments on the framing of our study. We now clearly acknowledge that all motor tasks have cognitive components (new paragraph at line 65). We also explain why we think our tasks has features not present in typical cognitive tasks.
 
 (2) Links to other scholarship.
 
 If I'm not mistaken a common observation in tudies of the development of reinforcement learning is a decrease in exploration over-development (e.g., Nussenbaum and Hartley, 2019; Giron et al., 2023; Schulz et al., 2019); this contrasts with the current results which instead show an increase. It would be nice to see a more direct discussion of previous findings showing decreases in exploration over development, and why the current study deviates from that. It could also be useful for the authors to bring in concepts of different types of exploration (e.g. "directed" vs "random"), in their interpretations and potentially in their modeling.
 
 We recognize that our results differ from prior work. The optimal exploration pattern differs from task to task. We now discuss that exploration is not one size fits all, it's benefits vary depending upon the task. We have added the following paragraphs to the Discussion section:
 
 "One major finding from this study is that exploration variability increases with age. Some other studies of development have shown that exploration can decrease with age indicating that adults explore less compared to children (Schulz et al., 2019; Meder et al., 2021; Giron et al., 2023). We believe the divergence between our work and these previous findings is largely due to the experimental design of our study and the role of motor noise. In the paradigm used initially by Schulz et al. (2019) and replicated in different age groups by Meder et al. (2021) and Giron et al. (2023), participants push buttons on a two-dimensional grid to reveal continuous-valued rewards that are spatially correlated. Participants are unaware that there is a maximum reward available and therefore children may continue to explore to reduce uncertainty if they have difficulty evaluating whether they have reached a maxima. In our task by contrast, participants are given binary reward and told that there is a region in which reaches will always be rewarded. Motor noise is an additional factor which plays a key role in our reaching task but minimal if any role in the discretized grid task. As we show in simulations of our task, as motor noise goes down (as it is known to do through development) the optimal amount of exploration goes up (see Figure 7—figure Supplement 2 and Appendix 1). Therefore, the behavior of our participants is rational in terms of R230 increasing exploration as motor noise decreases.
 
 A key result in our study is that exploration in our task reflects sensitivity to failure. Older children make larger adjustments after failure compared to younger children to find the highly rewarded zone more quickly. Dhawale et al. (2017) discuss the different contexts in which a participant may explore versus exploit (i.e., stick at the same position). Exploration is beneficial when reward is low as this indicates that the current solution is no longer ideal, and the participant should search for a better solution. Konrad et al. (2025) have recently shown this behavior in a real-world throwing task where 6 to 12 year old children increased throwing variability after missed trials and minimized variability after successful trials. This has also been shown in a postural motor control task where participants were more variable after non-rewarded trials compared to rewarded trials (Van Mastrigt et al., 2020). In general, these studies suggest that the optimal amount of exploration is dependent on the specifics of the task."
 
 (3) Modeling.
 
 First, I may have missed something, but it is unclear to me if the model is actually accounting for the gradient of rewards (e.g., if I get a probabilistic reward moving at 45°, but then don't get one at 40°, I should be more likely to try 50° next then 35°). I couldn't tell from the current equations if this was the case, or if exploration was essentially "unsigned," nor if the multiple-trials-back regression analysis would truly capture signed behavior. If the model is sensitive to the gradient, it would be nice if this was more clear in the Methods. If not, it would be interesting to have a model that does "function approximation" of the task space, and see if that improves the fit or explains developmental changes.
 
 The model we use (similar to Roth et al. (2023) and Therrien et al. (2016, 2018)) does not model the gradient. Exploration is always zero-mean Gaussian. As suggested by the reviewer, we now also fit a value-based model (described starting at line 810) which we adapted from the model presented in Giron et al. (2023). We show that the exploration and noise-based model is preferred over the value-based model.
 
 The multiple-trials-back regression was unsigned as the intent was to look at the magnitude and not the direction of the change in movement. We have decided to remove this analysis from the manuscript as it was a source of confusion and secondary analysis that did not add substantially to the findings of these studies.
 
 Second, I am curious if the current modeling approach could incorporate a kind of "action hysteresis" (aka perseveration), such that regardless of previous outcomes, the same action is biased to be repeated (or, based on parameter settings, avoided).
 
 In some sense, the learning rate in the model in the original submission is highly related to perseveration. For example if the learning rate is 0, then there is complete perseveration as you simply repeat the same desired movement. If the rate is 1, there is no perseveration and values in between reflect different amounts of perseveration. Therefore, it is not easy to separate learning rate from perseveration. Adding perseveration as another parameter would likely make it and the learning unidentifiable. However, we now compare 31 models and those that have a non-unity learning rate are not preferred suggesting there is little perseveration.
 
 (4) Psychological mechanisms. There is a line of work that shows that when children and adults perform RL tasks they use a combination of working memory and trial-by-trial incremental learning processes (e.g., Master et al., 2020; Collins and Frank 2012). Thus, the observed increase in the learning rate over development could in theory reflect improvements in instrumental learning, working memory, or both. Could it be that older participants are better at remembering their recent movements in short-term memory (Hadjiosif et al., 2023; Hillman et al., 2024)?
 
 We agree that cognitive processes, such as working memory or visuospatial processing, play a role in our task and describe cognitive elements of our task in the introduction (new paragraph at line 65). However, the sensorimotor model we fit to the data does a good job of explaining the variation across age, which suggests that that age-dependent cognitive processes probably play a smaller role.
 
 Reviewer #3 (Public review):
 
 Summary:
 
 The study investigates reinforcement learning across the lifespan with a large sample of participants recruited for an online game. It finds that children gradually develop their abilities to learn reward probability, possibly hindered by their immature spatial processing and probabilistic reasoning abilities. Motor noise, reinforcement learning rate, and exploration after a failure all contribute to children's subpar performance.
 
 Strengths:
 
 (1) The paradigm is novel because it requires continuous movement to indicate people's choices, as opposed to discrete actions in previous studies.
 
 (2) A large sample of participants were recruited.
 
 (3) The model-based analysis provides further insights into the development of reinforcement learning ability.
 
 Thank you.
 
 Weaknesses:
 
 (1 ) The adequacy of model-based analysis is questionable, given the current presentation and some inconsistency in the results.
 
 Thank you for raising this concern. We have substantially revised the model from our first submission. We now compare 31 noise-based models and 1 value-based model and fit all of the tasks with the preferred model. We perform model selection using the two tasks with the largest datasets to identify the preferred model. From the preferred model, we found the parameter fits for each individual dataset and simulated the trial by trial behavior allowing comparison between all four tasks. We now show examples of individual fits as well as provide a measure of goodness of fit. The expansion of our modeling approach has resolved inconsistencies and sharpened the conclusions drawn from our model.
 
 (2) The task should not be labeled as reinforcement motor learning, as it is not about learning a motor skill or adapting to sensorimotor perturbations. It is a classical reinforcement learning paradigm.
 
 We now make it clear that our reinforcement learning task has both motor and cognitive demands, but does not fall entirely within one of these domains. We use the term motor learning because it captures the fact that participants maximize reward by making different movements, corrupted by motor noise, to unmarked locations on a continuous target zone. When we look at previous ublications, it is clear that our task is similar to those that also refer to this as reinforcement motor learning Cashaback et al. (2019) (reaching task using a robotic arm in adults), Van Mastrigt et al. (2020) (weight shifting task in adults), and Konrad et al. (2025) (real-world throwing task in children). All of these tasks involve trial-by-trial learning through reinforcement to make the movement that is most effective for a given situation. We feel it is important to link our work to these previous studies and prefer to preserve the terminology of reinforcement motor learning.
 
 Recommendations for the authors:
 
 Reviewing Editor Comments:
 
 Thank you for this summary. Rather than repeat the extended text from the responses to the reviewers here, we point the Editor to the appropriate reviewer responses for each issue raised.
 
 The reviewers and editors have rated the significance of the findings in your manuscript as "Valuable" and the strength of evidence as "Solid" (see eLife evalutation). A consultancy discussion session to integrate the public reviews and recommendations per reviewer (listed below), has resulted in key recommendations for increasing the significance and strength of evidence:
 
 To increase the Significance of the findings, please consider the following:
 
 (1) Address and reframe the paper around whether the task is truly getting at a motor learning process or more generic cognitive decision-making capacities such as spatial memory, reward processing, and hypothesis testing.
 
 We have revised the paper to address the comments on the framing of our work. Please see responses to the public review comments of Reviewers #2 and #3.
 
 (2) It would be beneficial to specify the differences between traditional reinforcement algorithms (i.e., using softmax functions to explore, and build representations of state-action-reward) and the reinforcement learning models used here (i.e., explore with movement variability, update reach aim towards the last successful action), and compare present findings to previous cognitive reinforcement learning studies in children.
 
 Please see response to the public review comments of Reviewer #1 in which we explain the expansion of our modeling approach to fit a value-based model as well as 31 other noise-based models. In our response to the public review comments of Reviewer #2, we comment on our expanded discussion of how our findings compare with previous cognitive reinforcement learning studies.
 
 To move the "Strength of Evidence" to "Convincing", please consider doing the following:
 
 (1 ) Address some apparently inconsistent and unrealistic values of motor noise, exploration noise, and learning rate shown for individual participants (e.g., Figure 5b; see comments reviewers 1 and take the following additional steps: plotting r squares for individual participants, discussing whether individual values of the fitted parameters are plausible and whether model parameters in each age group can extrapolate to the two clamp conditions and baselines.
 
 We have substantially updated our modeling approach. Now that we compare 31 noise-based models, the preferred model does not show any inconsistent or unrealistic values (see response to Reviewer #3). Additionally, we now show example individual fits and provide both relative and absolute goodness of fit (see response to Reviewer #3).
 
 (2) Relatedly, to further justify if model assumptions are met, it would be valuable to show that the current learning model fits the data better than alternative models presented in the literature by the authors themselves and by others (reviewer 1). This could include alternative development models that formalise the proposed explanations for age-related change: poor spatial memory, reward/outcome processing, and exploration strategies (reviewer 2).
 
 Please see response to public review comments of Reviewer #1 in which we explain that we have now fit a value-based model as well as 31 other noise-based models providing a comparison of previous models as well as novel models. This led to a slightly different model being preferred over the model in the original submission (updated model has a learning rate of unity). These models span many of the processes previously proposed for such tasks. We feel that 32 models span a reasonable amount of space and do not believe we have the power to include memory issues or heuristic exploration strategies in the model.
 
 (3) Perform the mediation analysis with all the possible variables (i.e., not informed by multiple regression) to see if the results are more consistent across studies and with the current approach (see comments reviewer 1).
 
 Please see response to public review comments of Reviewer #1. We chose to focus only on the model based analysis because it allowed us to distinguish between exploration variability and motor noise.
 
 Please see below for further specific recommendations from each reviewer.
 
 Reviewer #1 (Recommendations for the author):
 
 (1) In general, there should be more discussion and contextualization of other binary reinforcement tasks used in the motor literature. For example, work from Jeroen Smeets, Katinka van der Kooij, and Joseph Galea.
 
 Thank you for this comment. We have edited the Introduction to better contextualize our work within the reinforcement motor learning literature (see line 67 and line 83).
 
 (2) Line 32. Very minor. This sentence is fine, but perhaps could be slightly improved. “select a location along a continuous and infinite set of possible options (anywhere along the span of the bridge)"
 
 Thank you for this comment. We have edited the sentence to reflect this suggestion.
 
 (3) Line 57. To avoid some confusion in successive sentences: Perhaps, "Both children over 12 and adolescents...".
 
 Thank you for this comment. We have edited the sentence to reflect this suggestion.
 
 (4) Line 80. This is arguably not a mechanistic model, since it is likely not capturing the reward/reinforcement machinery used by the nervous system, such as updating the expected value using reward predic tion errors/dopamine. That said, this phenomenological model, and other similar models in the field, do very well to capture behaviour with a very simple set of explore and update rules.
 
 We use mechanistic in the standard use in modeling, as in Levenstein et al. (2023), for example. The contrast is not with neural modeling, but with normative modeling, in which one develops a model to optimize a function (or descriptive models as to what a system is trying to achieve). In mechanistic modeling one proposes a mechanism and this can be at a state-space level (as in our case) or a neural level (as suggested my the reviewer) but both are considered mechanistic, just at different levels. Quoting Levenstein "... mechanistic models, in which complex processes are summarized in schematic or conceptual structures that represent general properties of components and their interactions, are also commonly used." We now reference the Levenstein paper to clarify what we mean by mechanistic.
 
 (5) Figure 1. It would be useful to state that the x-axis in Figure 1 is in normalized units, depending on the device.
 
 Thank you for this comment. We have added a description of the x-axis units to the Fig. 1 caption.
 
 (6) Were there differences in behaviour for these different devices? e.g., how different was motor noise for the mouse, trackpad, and touchscreen?
 
 Thank you for this question. We did not find a significant effect of device on learning or precision in the baseline block. We have added these one way ANOVA results for each task in Supplementary Table 1.
 
 (7) Line 98. Please state that participants received reinforcement feedback during baseline.
 
 Thank you for this comment. We have updated the text to specify that participants receive reward feedback during the baseline block.
 
 (8) Line 99. Did the distance from the last baseline trial influence whether the participant learned or did not learn? For example, would it place them too far from the peak success location such that it impacted learning?
 
 Thank you for this question. We looked at whether the position of movement on the last baseline block trial was correlated with the first movement position in the learning block. We did not find any correlations between these positions for any of the tasks. Interestingly, we found that the majority of participants move to the center of the workspace on the first trial of the learning block for all tasks (either in the presence of the novel continuous target scene or the presentation of 7 targets all at once). We do not think that the last movement in the baseline block "primed" the participant for the location of the success zone in the learning block. We have added the following sentence to the Results section:
 
 "Note that the reach location for the first learning trial was not affected by (correlated with) the target position on the last baseline trial (p > 0.3 for both children and adults, separately)."
 
 (9) The term learning distance could be improved. Perhaps use distance from target.
 
 Thank you for this comment. We appreciate that learning distance defined with 0 as the best value is counter intuitive. We have changed the language to be "distance from target" as the learning metric.
 
 (10) Line 188. This equation is correct, but to estimate what the standard deviation by the distribution of changes in reach position is more involved. Not sure if the authors carried out this full procedure, which is described in Cashaback et al., 2019; Supplemental 2.
 
 There appear to be no Supplemental 2 in the referenced paper so we assume the reviewer is referring to Supplemental B which deals with a shuffling procedure to examine lag-1 correlations.
 
 In our tasks, we are limited to only 9 trials to analyze in each clamp phase so do not feel a shuffling analysis is warranted. In these blocks, we are not trying to 'estimate what the standard deviation by the distribution of changes in reach position' but instead are calculating the standard deviation of the reach locations and comparing the model fit (for which the reviewer says the formula is correct) with the data. We are unclear what additional steps the reviewer is suggesting. In our updated model analysis, we fit the data including the clamp phases for better parameter estimation. We use simulations to estimate s.d. in the clamp phase (as we ensure in simulations the data does not fall outside the workspace) making the previous analytic formulas an approximation that are no longer used.
 
 (11) Line 197-199. Having done the demo task, it is somewhat surprising that a 3-year-old could understand these instructions (whose comprehension can be very different from even a 5-year old).
 
 Thank you for raising this concern. We recognize that the younger participants likely have different comprehension levels compared to older participants. However, we believe that the majority of even the youngest participants were able to sufficiently understand the goal of the task to move in a way to get the video clip to play. We intentionally designed the tasks to be simple such that the only instructions the child needed to understand were that the goal was to get the video clip to play as much as possible and the video clip played based on their movement. Though the majority of younger children struggled to learn well on the probabilistic tasks, they were able to learn well on the deterministic tasks where the task instructions were virtually identical with the exception of how many places in the workspace could gain reward. On the continuous probabilistic task, we did have a small number (n = 3) of 3 to 5 year olds who exhibited more mature learning ability which gives us confidence that the younger children were able to understand the task goal.
 
 (12) Line 497: Can the authors please report the F-score and p-value separately for each of these one-way ANOVA (the device is of particular interest here).
 
 Thank you for this request. We have added ina upplementarytable (Supplementary Table 1) with the results of these ANOVAs.
 
 (13) Past work has discussed how motivation influences learning, which is a function of success rate (van der Kooij, K., in 't Veld, L., & Hennink, T. (2021). Motivation as a function of success frequency. Motivation and Emotion, 45, 759-768.). Can the authors please discuss how that may change throughout development?
 
 Thank you for this comment. While motivation most probably plays a role in learning, in particular in a game environment, this was out of the scope of the direct focus of this work and not something that our studies were designed to test. We have added the following sentence to the discussion section to address this comment:
 
 "We also recognize that other processes, such as memory and motivation, could affect performance on these tasks however our study was not designed to test these processes directly and future work would benefit from exploring these other components more explicitly."
 
 (14) Supplement 6. This analysis is somewhat incomplete because it does not consider success.
 
 Pekny and collegues (2015) looked at 3 trials back but considered both success and reward. However, their analysis has issues since successive time points are not i.i.d., and spurious relationships can arise. This issue is brought up by Dwahale (Dhawale, A. K., Miyamoto, Y. R., Smith, M. A., & R475 Ölveczky, B. P. (2019). Adaptive regulation of motor variability. Current Biology, 29(21), 3551-3562.). Perhaps it is best to remove this analysis from the paper.
 
 Thank you for this comment. We have decided to remove this secondary analysis from the paper as it was a source of confusion and did not add to the understanding and interpretation of our behavioral results.
 
 Reviewer #2 (Recommendations for the author):
 
 (1 ) the path length ratio analyses in the supplemental are interesting but are not mentioned in the main paper. I think it would be helpful to mention these as they are somewhat dramatic effects
 
 Thank you for this comment. Path length ratios are defined in the Methods and results are briefly summarized in the Results section with a point to the supplementary figures. We have updated the text to more explicitly report the age related differences in path length ratios.
 
 (2) The second to last paragraph of the intro could use a sentence motivating the use ofthe different task features (deterministic/probabilistic and discrete/continuous).
 
 Thank you for this comment. We have added an additional motivating sentence to the introduction.
 
 Reviewer #3 (Recommendations for the author):
 
 The paper labeled the task as one for reinforcement motor learning, which is not quite appropriate in my opinion. Motor learning typically refers to either skill learning or motor adaptation, the former for improving speed-accuracy tradeoffs in a certain (often new) motor skill task and the latter for accommodating some sensorimotor perturbations for an existing motor skill task. The gaming task here is for neither. It is more like a
 
 decision-making task with a slight contribution to motor execution, i.e., motor noise. I would recommend the authors label the learning as reinforcement learning instead of reinforcement motor learning.
 
 Thank you for this comment. As noted in the response to the public review comments, we agree that this task has components of classical reinforcement learning (i.e. responding to a binary reward) but we specifically designed it to require the learning of a movement within a novel game environment. We have added a new paragraph to the introduction where we acknowledge the interplay between cognitive and motor mechanisms while also underscoring the features in our task that we think are not present in typical cognitive tasks.
 
 My major concern is whether the model adequately captures subjects' behavior and whether we can conclude with confidence from model fitting. Motor noise, exploration noise, and learning rate, which fit individual learning patterns (Figure 5b), show some quite unrealistic values. For example, some subjects have nearly zero motor noise and a 100% learning rate.
 
 We have now compared 31 models and the preferred model is different from the one in the first submission. The parameter fits of the new model do not saturate in any way and appear reasonable to us. The updates to the model analysis have addressed the concern of previously seen unrealistic values in the prior draft.
 
 Currently, the paper does not report the fitting quality for individual subjects. It is good to have an exemplary subject's fit shown, too. My guess is that the r-squared would be quite low for this type of data. Still, given that the children's data is noisier, it might be good to use the adult data to show how good the fitting can be (individual fits, r squares, whether the fitted parameters make sense, whether it can extrapolate to the two clamp phases). Indeed, the reliability of model fitting affects how we should view the age effect of these model parameters.
 
 We now show fits to individual subjects. But since this is a Kalman smoother it fits the data perfectly by generating its best estimate of motor noise and exploration variability on each trial to fully account for the data — so in that sense R2 is always 1 so that is not helpful.
 
 While the BIC analysis with the other model variants provides a relative goodness of fit, it is not straightforward to provide an absolute goodness of fit such as standard R2 for a feedforward simulation of the model given the parameters (rather than the output of the Kalman smoother). There are two problems. First, there is no single model output. Each time the model is simulated with the fit parameters it produces a different output (due to motor noise, exploration variability and reward stochasticity). Second, the model is not meant to reproduce the actual motor noise, exploration variability and reward stochasticity of a trial. For example, the model could fit pure Gaussian motor noise across trials (for a poor learner) by accurately fitting the standard deviation of motor noise but would not be expected to actually match each data point so would have a traditional R2 of O.
 
 To provide an overall goodness of fit we have to reduce the noise component and to do so we exam ined the traditional R2 between the average of all the children's data and the average simulation of the model (from the median of 1000 simulations per participant) so as to reduce the stochastic variation. The results for the continuous probabilistic and discrete probabilistic task are R2 of 0.41 and 0.72, respectively.
 
 Not that variability in the "success clamp" doe not change across ages (Figure 4C) and does not contribute to the learning effect (Figure 4F). However, it is regarded as reflecting motor noise (Figure SC), which then decreases over age from the model fitting (Figure 5B). How do we reconcile these contradictions? Again, this calls the model fitting into question.
 
 For the success clamp, we only have 9 trials to calculate variability which limits our power to detect significance with age. In contrast, the model uses all 120 trials to estimate motor noise. There is a downward trend with age in the behavioral data which we now show overlaid on the fits of the model for both probabilistic conditions (Figure 5—figure Supplement 4) and Figure 6—figure Supplement 4). These show a reasonable match and although the variance explained is 1 6 and 56% (we limit to 9 trials so as to match the fail clamp), the correlations are 0.52 and 0.78 suggesting we have reasonable relation although there may be other small sources of variability not captured in the model.
 
 Figure 5C: it appears one bivariate outlier contributes a lot to the overall significant correlation here for the "success clamp".
 
 Recalculating after removing that point in original Fig 5C was still significant and we feel the plots mentioned in the previous point add useful information to this issue. With the new model this figure has changed.
 
 It is still a concern that the young children did not understand the instructions. Nine 3-to-8 children (out of 48) were better explained by the noisy only model than the full model. In contrast, ten of the rest of the participants (out of 98) were better explained by the noisy-only model. It appears that there is a higher percentage of the "young" children who didn't get the instruction than the older ones.
 
 Thank you for this comment. We did take participant comprehension of the task into consideration during the task design. We specifically designed it so that the instructions were simple and straight forward. The child simply needs to understand the underlying goal to make the video clip play as often as possible and that they must move the penguin to certain positions to get it to play. By having a very simple task goal, we are able to test a naturalistic response to reinforcement in the absence of an explicit strategy in a task suited even for young children.
 
 We used the updated reinforcement learning model to assess whether an individual's performance is consistent with understanding the task. In the case of a child who does not understand the task, we expect that they simply have motor noise on their reach, and crucially, that they would not explore more after failure, nor update their reach after success. Therefore, we used a likelihood ratio test to examine whether the preferred model was significantly better at explaining each participant's data compared to the model variant which had only motor noise (Model 1). Focusing on only the youngest children (age 3-5), this analysis showed that that 43, 59, 65 and 86% of children (out of N = 21, 22, 20 and 21 ) for the continuous probabilistic, discrete probabilistic, continuous deterministic, and discrete deterministic conditions, respectively, were better fit with the preferred model, indicating non-zero exploration after failure. In the 3-5 year old group for the discrete deterministic condition, 18 out of 21 had performance better fit by the preferred model, suggesting this age group understands the basic task of moving in different directions to find a rewarding location.
 
 The reduced numbers fit by the preferred model for the other conditions likely reflects differences in the task conditions (continuous and/or probabilistic) rather than a lack of understanding of the goal of the task. We include this analysis as a new subsection at the end of the Results.
 
 Supplementary Figure 2: the first panel should belong to a 3-year-old not a 5-year-old? How are these panels organized? This is kind of confusing.
 
 Thank you for this comment. Figure 2—figure Supplement 1 and Figure 2—figure Supplement 2 are arranged with devices in the columns and a sample from each age bin in the rows. For example in Figure 2—figure Supplement 1, column 1, row 1 is a mouse using participant age 3 to 5 years old while column 3, row 2 is a touch screen using participant age 6 to 8 years old. We have edited the labeling on both figures to make the arrangement of the data more clear.
 
 Line 222: make this a complete sentence.
 
 This sentence has been edited to a complete sentence.
 
 Line 331: grammar.
 
 This sentence has been edited for grammar.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.07.09.602665v2
www.biorxiv.org www.biorxiv.org

Elucidating the Selection Mechanisms in Context-Dependent Computation through Low-Rank Neural Network Modeling

3
1. Public_Reviews 09 Jun 2025
 
 in eLife
 
 eLife Assessment
 
 This study provides an important set of analyses and theoretical derivations to understand the mechanisms used by recurrent neural networks (RNNs) to perform context-dependent accumulation of evidence. The results regarding the dimensionality and neural dynamical signatures of RNNs are convincing and provide new avenues to study the mechanisms underlying context-dependent computations. This manuscript will be of interest to a broad audience in systems and computational neuroscience.
 
 Summary
2. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 This paper investigates how recurrent neural networks (RNNs) can perform context-dependent decision-making (CDM). The authors use low-rank RNN modeling and focus on a CDM task where subjects are presented with sequences of auditory pulses that vary in location and frequency, and they must determine either the prevalent location or frequency based on an external context signal. In particular, the authors focus on the problem of differentiating between two distinct selection mechanisms: input modulation, which involves altering the stimulus input representation, and selection vector modulation, which involves altering the "selection vector" of the dynamical system.
 
 First, the authors show that rank-one networks can only implement input modulation, and that higher-rank networks are required for selection vector modulation. Then, the authors use pathway-based information flow analysis to understand how information is routed to the accumulator based on context. This analysis allows the authors to introduce a novel definition of selection vector modulation that explicitly links it to changes in the effective coupling along specific pathways within the network.
 
 The study further generates testable predictions for differentiating selection vector modulation from input modulation based on neural dynamics. In particular, the authors find that: 1) A larger proportion of selection vector modulation is expected in networks with high-dimensional connectivity. 2) Single-neuron response kernels exhibiting specific profiles (peaking between stimulus onset and choice onset) are indicative of neural dynamics in extra dimensions, supporting the presence of selection vector modulation. 3) The percentage of explained variance (PEV) of extra dynamical modes extracted from response kernels at the population level can serve as an index to quantify the amount of selection vector modulation.
 
 Strengths:
 
 The paper is clear and well written, and it draws bridges between two recent important approaches in the study of CDM: circuit-level descriptions of low-rank RNNs, and differentiation across alternative mechanisms in terms of neural dynamics. The most interesting aspect of the study involves establishing a link between selection vector modulation, network dimensionality and dimensionality of neural dynamics. The high correlation between the networks' mechanisms and their dimensionality (Fig. 7d) is surprising since differentiating between selection mechanisms is generally a difficult task, and the strength of this result is further corroborated by its consistency across multiple RNN hyperparameters (Figure 7-figure supplement 1 and Figure 7-figure supplement 2). Interestingly, the correlation between the selection mechanism and the dimensionality of neural dynamics is also high (Fig. 7g), potentially providing a promising future avenue for the study of neural recordings in this task.
 
 Weaknesses:
 
 As acknowledged by the authors, the results linking selection vector modulation and dimensionality might not generalize to neural representations where a significant fraction of the variance encodes information unrelated to the task. Therefore, these tools might not be applicable to neural recordings or to artificial neural networks with additional high-dimensional activity unrelated to the task (e.g. RNNs trained to perform many other tasks).
 
 Review 1
3. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 This manuscript examines network mechanisms that allow networks of neurons to perform context-dependent decision-making. In a recent study, Pagan and colleagues identified two distinct mechanisms by which recurrent neural networks can perform such computations. They termed these two mechanisms input-modulation and selection-vector modulation. Pagan and colleagues demonstrated that recurrent neural networks can be trained to implement combinations of these two mechanisms, and related this range of computational strategies with inter-individual variability in rats performing the same task. What type of structure in the recurrent connectivity favors one or the other mechanism however remained an open question.
 
 The present manuscript addresses this specific question by using a class of mechanistically interpretable recurrent neural networks, low-rank RNNs. The manuscript starts by demonstrating that unit-rank RNNs can only implement the input-modulation mechanism, but not the selection-vector modulation. The authors then build rank three networks which implement selection-vector modulation, and show how the two mechanisms can be combined. Finally, they relate the amount of selection-vector modulation with the effective rank, ie the dimensionality of activity, of a trained full-rank RNN.
 
 Strength:
 
 - The manuscript is written in an obvious manner - The analytic approach adopted in the manuscript is impressive - Very clear identification of the mechanisms leading to the two types of context-dependent modulation - Altogether, this manuscript reports remarkable insights on a very timely question
 
 Review 2
Visit annotations in context

Tags

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.09.02.610896v2
www.biorxiv.org www.biorxiv.org

Opposing Regulation of TNF Responses by IFN-γ and a PGE2-cAMP Axis that is Apparent in Rheumatoid and Immune Checkpoint Inhibitor-induced Arthritis IL-1β+ Macrophages

4
1. Public_Reviews 09 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  The manuscript contains important findings regarding inflammatory macrophage subsets that have theoretical and/or practical applications beyond the field of rheumatology. The authors demonstrate with compelling evidence the effects of PGE2 on TNF signaling. This work will be of broad interest to immunologists and cell biologists.
  
  Summary
2. Public_Reviews 09 Jun 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This article investigates the phenotype of macrophages with a pathogenic role in arthritis, particularly focusing on arthritis induced by immune checkpoint inhibitor (ICI) therapy.
  
  Building on prior data from monocyte-macrophage coculture with fibroblasts, the authors hypothesized a unique role for the combined actions of prostaglandin PGE2 and TNF. The authors studied this combined state using an in vitro model with macrophages derived from monocytes of healthy donors. They complemented this with single-cell transcriptomic and epigenetic data from patients with ICI-RA, specifically, macrophages sorted out of synovial fluid and tissue samples. The study addressed critical questions regarding the regulation of PGE2 and TNF: Are their actions co-regulated or antagonistic? How do they interact with IFN-γ in shaping macrophage responses?
  
  This study is the first to specifically investigate a macrophage subset responsive to the PGE2 and TNF combination in the context of ICI-RA, describes a new and easily reproducible in vitro model, and studies the role of IFNgamma regulation of this particular Mф subset.
  
  Strengths:
  
  Methodological quality: The authors employed a robust combination of approaches, including validation of bulk RNA-seq findings through complementary methods. The methods description is excellent and allows for reproducible research. Importantly, the authors compared their in vitro model with ex vivo single-cell data, demonstrating that their model accurately reflects the molecular mechanisms driving the pathogenicity of this macrophage subset.
  
  Comments on latest version:
  
  The revisions made to this manuscript followed the suggestions and improved the manuscript. The authors have thoroughly addressed my previous concerns, making several key improvements:
  
  The expanded comparison between rheumatoid arthritis (RA) and immune checkpoint inhibitor-induced RA (ICI-RA) in both cellular and molecular pathology is excellent. These additions to the literature review and discussion sections significantly strengthen the manuscript and provide valuable context.
  
  I particularly appreciate the added effort in mapping a particular cell subset onto previously published single-cell RNA-Seq embeddings. The enhanced UMAPs with cell subset projection analyses are methodologically compelling, informative and visually are easy to understand for any reader. The new Figure 3 represents a substantial improvement.
  
  More detailed comparisons with previously published single-cell datasets from 2019, 2020, and 2023 effectively contextualize this research within the broader field of rheumatoid arthritis pathogenesis. This enhances the manuscript's value for specialists in autoimmunity and myeloid immunology.
  
  I find the authors' suggestion to use the defined myeloid pathogenic phenotypes as biomarkers for therapy response prediction or dose optimization particularly insightful and clinically relevant.
  
  Overall, the authors have significantly improved both the analysis and presentation of results. The manuscript has been substantially enhanced.
  
  Review 1
3. Public_Reviews 09 Jun 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary/Significance of the findings:
  
  The authors have done a great job by extensively carrying out transcriptomic and epigenomic analyses in the primary human/mouse monocytes/macrophages to investigate TNF-PGE2 (TP) crosstalk and their regulation by IFN-γ in the Rheumatoid arthritis (RA) synovial macrophages. They proposed that TP induces inflammatory genes via a novel regulatory axis whereby IFN-γ and PGE2 oppose each other to determine the balance between two distinct TNF-induced inflammatory gene expression programs relevant to RA and ICI-arthritis.
  
  Strengths:
  
  The authors have done a great job on RT-qPCR analysis of gene expression in primary human monocytes stimulated with TNF and showing the selective agonists of PGE2 receptors EP2 and EP4 22 that signal predominantly via cAMP. They have beautifully shown IFN-γ opposes the effects of PGE2 on TNF-induced gene expression. They found that TP signature genes are activated by cooperation of PGE2-induced AP-1, CEBP, and NR4A with TNF-induced NF-κB activity. On the other hand, they found that IFN-γ suppressed induction of AP-1, CEBP, and NR4A activity to ablate induction of IL-1, Notch, and neutrophil chemokine genes but promoted expression of distinct inflammatory genes such as TNF and T cell chemokines like CXCL10 indicating that TP induces inflammatory genes via IFN-γ in the RA and ICI-arthritis.
  
  Comments on latest version:
  
  The authors have answered my questions and i recommend this manuscript for publication.
  
  Review 2
4. Public_Reviews 09 Jun 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This article investigates the phenotype of macrophages with a pathogenic role in arthritis, particularly focusing on arthritis induced by immune checkpoint inhibitor (ICI) therapy.
  
  Building on prior data from monocyte-macrophage coculture with fibroblasts, the authors hypothesized a unique role for the combined actions of prostaglandin PGE2 and TNF. The authors studied this combined state using an in vitro model with macrophages derived from monocytes of healthy donors. They complemented this with single-cell transcriptomic and epigenetic data from patients with ICI-RA, specifically, macrophages sorted out of synovial fluid and tissue samples. The study addressed critical questions regarding the regulation of PGE2 and TNF: Are their actions co-regulated or antagonistic? How do they interact with IFN-γ in shaping macrophage responses?
  
  This study is the first to specifically investigate a macrophage subset responsive to the PGE2 and TNF combination in the context of ICI-RA, describes a new and easily reproducible in vitro model, and studies the role of IFNgamma regulation of this particular Mф subset.
  
  Strengths:
  
  Methodological quality: The authors employed a robust combination of approaches, including validation of bulk RNA-seq findings through complementary methods. The methods description is excellent and allows for reproducible research. Importantly, the authors compared their in vitro model with ex vivo single-cell data, demonstrating that their model accurately reflects the molecular mechanisms driving the pathogenicity of this macrophage subset.
  
  Weaknesses:
  
  Introduction: The introduction lacks a paragraph providing an overview of ICI-induced arthritis pathogenesis and a comparison with other types of arthritis. Including this would help contextualize the study for a broader audience.
  
  Thank you for this suggestion, we have added a paragraph on ICI-arthritis to intro (pg. 4, middle paragraph).
  
  Results Section: At the beginning of the results section, the experimental setup should be described in greater detail to make an easier transition into the results for the reader, rather than relying just on references to Figure 1 captions.
  
  We have clarified the experimental setup (pg. 5).
  
  There is insufficient comparison between single-cell RNA-seq data from ICI-induced arthritis and previously published single-cell RA datasets. Such a comparison may include DEGs and GSEA, pathway analysis comparison for similar subsets of cells. Ideally, an integration with previous datasets with RA-tissue-derived primary monocytes would allow for a direct comparison of subsets and their transcriptomic features.
  
  We thank the Reviewer for this suggestion, which has increased the impact of our data and analysis. A computationally rigorous representation mapping approach showed that ICI-arthritis myeloid subsets predominantly mapped onto 4 previously defined RA subsets including IL-1β+ cells. This result was corroborated using a complementary data integration approach. Analysis of (TNF + PGE)-induced gene sets (TP signatures) in ICI-arthritis myeloid cells projected onto the RA subsets using the AUCell package showed elevated TP gene expression in similar ICI-arthritis and RA monocytic cell subsets. We also found mutually exclusive expression of TP and IFN signatures in distinct RA and ICI-arthritis myeloid cell subsets, which supports that the opposing cross-regulation between IFN-γ and PGE2 pathways that we identified in vitro also functions similarly in vivo. This analysis is shown in the new Fig. 3, described on pg. 7, and discussed on pp. 13-14.
  
  While it's understandable that arthritis samples are limited in numbers and myeloid cell numbers, it would still be interesting to see the results of PGE2+TNF in vitro stimulation on the primary RA or ICI-RA macrophages. It would be valuable to see RNA-Seq signatures of patient cell reactivation in comparison to primary stimulation of healthy donor-derived monocytes.
  
  We agree that this would be interesting but given limited samples and distribution of samples amongst many studies and investigators this is beyond the scope of the current study.
  
  Discussion: Prior single-cell studies of RA and RA macrophage subpopulations from 2019, 2020, 2023 publications deserve more discussion. A thorough comparison with these datasets would place the study in a broader scientific context.
  
  Creating an integrated RA myeloid cell atlas that combines ICI-RA data into the RA landscape would be ideal to add value to the field.
  
  As one of the next research goals, TNF blockade data in RA and ICI-RA patients would be interesting to add to such an integrated atlas. Combining responders and non-responders to TNF blockade would help to understand patient stratification with the myeloid pathogenic phenotypes. It would be great to read the authors' opinion on this in the Discussion section.
  
  Please see our response to point 3 above. This point is addressed in Fig. 3, pg. 7, and pp. 13-14, which includes a discussion of responders and nonresponders and patient stratification.
  
  Conclusion: The authors demonstrated that while PGE2 maintains the inflammatory profile of macrophages, it also induces a distinct phenotype in simultaneous PGE2 and TNF treatment. The study of this specific subset in single-cell data from ICI-RA patients sheds light on the pathogenic mechanisms underlying this condition, however, how it compares with conventional RA is not clear from the manuscript.
  
  Given the substantial incidence of ICI-induced autoimmune arthritis, understanding the unique macrophage subsets involved for future targeting them therapeutically is an important challenge. The findings are significant for immunologists, cancer researchers, and specialists in autoimmune diseases, making the study relevant to a broad scientific audience.
  
  Reviewer #2 (Public review):
  
  Summary/Significance of the findings:
  
  The authors have done a great job by extensively carrying out transcriptomic and epigenomic analyses in the primary human/mouse monocytes/macrophages to investigate TNF-PGE2 (TP) crosstalk and their regulation by IFN-γ in the Rheumatoid arthritis (RA) synovial macrophages. They proposed that TP induces inflammatory genes via a novel regulatory axis whereby IFN-γ and PGE2 oppose each other to determine the balance between two distinct TNF-induced inflammatory gene expression programs relevant to RA and ICI-arthritis.
  
  Strengths:
  
  The authors have done a great job on RT-qPCR analysis of gene expression in primary human monocytes stimulated with TNF and showing the selective agonists of PGE2 receptors EP2 and EP4 22 that signal predominantly via cAMP. They have beautifully shown IFN-γ opposes the effects of PGE2 on TNF-induced gene expression. They found that TP signature genes are activated by cooperation of PGE2-induced AP-1, CEBP, and NR4A with TNF-induced NF-κB activity. On the other hand, they found that IFN-γ suppressed induction of AP-1, CEBP, and NR4A activity to ablate induction of IL-1, Notch, and neutrophil chemokine genes but promoted expression of distinct inflammatory genes such as TNF and T cell chemokines like CXCL10 indicating that TP induces inflammatory genes via IFN-γ in the RA and ICI-arthritis.
  
  Weaknesses:
  
  (1) The authors carried out most of the assays in the monocytes/macrophages. How do APCcells like Dendritic cells behave with respect to this TP treatment similar dosing?
  
  We agree that this is an interesting topic especially as TNF + PGE2 is one of the standard methods of maturing in vitro generated human DCs and promoting antigen-presenting function. As DC maturation is quite different from monocyte activation this would represent a new study and is beyond the scope of the current manuscript. We have instead added a paragraph to the discussion (pg. 12) and cited the literature on DC maturation by TNF + PGE2 including one of our older papers (PMID: 18678606; 2008)
  
  (2) The authors studied 3h and 24h post-treatment transcriptomic and epigenomic. What happens to TP induce inflammatory genes post-treatment 12h, 36h, 48h, 72h. It is critical to see the upregulated/downregulated genes get normalised or stay the same throughout the innate immune response.
  
  We now clarify that subsets of inducible genes showed distinct kinetics of induction with transient expression at 3 hr versus sustained expression over the 24 hr stimulation period as shown in Supplementary Fig. 1 (pg. 5).
  
  (3) The authors showed IL1-axis in response to the TP-treatment. Do other cytokine axes get modulated? If yes, then how do they cooperate to reduce/induce inflammatory responses along this proposed axis?
  
  This is an interesting question, which we approached using a combination of pathway analysis and targeted inspection of pathways important pathogenesis of RA, which is the inflammatory condition most relevant for this study. In addition to genes in the IL-1-NF-κB core inflammatory pathway, pathway analysis of genes induced by TP co-stimulation showed enrichment of genes related to leukocyte chemotaxis, in particular neutrophil migration. Accordingly, TP costimulation increased expression of CSF3, which plays a key role in mobilizing neutrophils from the bone marrow, and major neutrophil chemokines CXCL1, CXCL2, CXCL3 and CXCL5 that recruit neutrophils to sites of inflammation including in inflammatory arthritis. Analysis of the late response to TNF similarly showed enrichment of genes important in chemotaxis, and suppression of genes in the cholesterol biosynthetic pathway, which we and others have previously linked to IFN responses. Targeted inspection of genes in additional pathways implicated in RA pathogenesis showed increased expression of genes in the Notch pathway. We believe that these pathways work together with the IL-1 pathway to increase immune cell recruitment and activation in inflammatory responses; these results are described on pp. 5-6 and are incorporated into Figures 1, 2 and Supplementary Fig. 2.
  
  Overall, the data looks good and acceptable but I need to confirm the above-mentioned criticisms.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  The discussion section of the manuscript claims: "In this study, we utilized transcriptomics to demonstrate a 'TNF + PGE2' (TP) signature in RA and ICI-arthritis IL-1β+ synovial macrophages." This statement is misleading, as no new transcriptomic data from RA synovial samples were generated in this study. To support such a claim, the authors would need to compare primary monocytes or macrophages from RA patients using bulk RNA-seq or singlecell RNA-seq. Based on the current data, the comparison is limited to bulk RNA-seq findings from the authors' in vitro model and prior monocyte-fibroblast coculture studies.
  
  We have modified the abstract and discussion (pg. 10) to reflect that we have compared an in vitro generated TP signature with gene expression in previously identified RA macrophage subsets.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.11.11.623039v2
www.biorxiv.org www.biorxiv.org

New submission 26/03/2025, 16:34:12

4
1. Public_Reviews 09 Jun 2025
 
 in eLife
 
 eLife Assessment
 
 The authors study the context of the skeletal remains of three individuals and associated sediment samples to conclude that the hominin species Homo naledi intentionally buried their dead. Demonstration of the earliest known instance of intentional funerary practice – with a relatively small-brained hominin engaging in a highly complex behavior that has otherwise been observed from Homo sapiens and Homo neanderthalensis – would represent a landmark finding. The authors have revised their manuscript extensively in light of the reviews of their initial submission, with improved illustration, context, discussion, and theoretical frameworks, leading to an improved case supporting their conclusion that Homo naledi intentionally buried their dead. One of the reviewers concludes that the findings convincingly demonstrate intentional burial practices, while another considers evidence for such an unambiguous conclusion to be incomplete given a lack of definitive knowledge around how the hominins got into the chamber. We look forward to seeing the continued development and assessment of this hypothesis. It is worth noting that the detailed reviews (both rounds) and comprehensive author response are commendable and consequential parts of the scientific record of this study. The editors note that the authors' response repeatedly invokes precedent from previous publications to help justify the conclusions in this paper. While doing so is helpful, the editors also note that scientific norms and knowledge are constantly evolving, and that any study has to rest on its own scientific merit.
 
 Summary
2. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Thank you for allowing me to review the paper "Evidence for deliberate burial of the dead by Homo naledi". This remains a very important site for paleoanthropology. I appreciate the work that the crew, especially the junior members of the team, put into this massive project. I appreciate that the authors did revise the paper since that is not a requirement of eLife. Extensive reviews by peer-reviewers have been provided for this paper, as well as professionally published replies (Martinón-Torres et al., 2023; Foecke et al., 2023). The composition, and citations of this version are much improved, though important information, some requested by reviewers, are buried in the supplementary section. It seems important that the authors make these sections more easily accessible to the general reader. The length of the paper is also unnecessary and impedes the readability of the work. Concise clarity is an expectation of most journals. The Netflix documentary was made to appeal to a mass audience, I would hope that the goal of the accompanying publication would be to enable readers to fully comprehend the work behind the claims.
 
 This version of the paper considers at great length many possibilities for how the H. naledi skeletal material came to rest in the cave system with some additional figures and data provided. However, quite a lot is still unclear. In my original review I stated, "The authors have repeatedly described how incredibly challenging it is to get into and out of this cave system and all of its chambers." This was a point emphasized in the Netflix documentary. In this version of the paper the authors have included within the supplementary section a brief discussion of other entrances. The work by Robbins et al. 2021 (a peer-reviewed paper in the impact factor rated journal Chemical Geology) is extremely relevant here. In this revision it is noted in the supplementary section that if the Postbox chamber was used as an opening, it would have reduced the length of the access to the system by 80 m. This fact seems important. This section should be moved out of the supplementary material and expanded because the conclusions published by Robbins et al. (2021) indicate a completely different route by which H. naledi accessed the cave, but this is hardly mentioned in the revision and deserves attention. To quote the Robbins et al.'s (2021) discussion section 6.3:
 
 "We acknowledge that additional data is required in order to confidently assess the relative timing of the Dragon's Back collapse and entry of H. naledi. Nonetheless, the stratigraphic and geochronologic observations presented here, together with those previously published (Dirks et al., 2017) are consistent with the following scenario. Prior to the collapse of the Dragon's Back, sometime before 241 ka (new minimum age for H. naledi from RS68), the cave could be entered by H. naledi via a shaft in the roof of the Postbox Chamber. From there H. naledi could walk along a straight passage that follows a gently descending, SW trending fracture into the Dragon's Back Chamber and, with the Dragon's Back block still attached to the roof, would have only needed to climb over a ~5 m high sill to access the Dinaledi Subsystem behind it. This sill and narrow fracture system behind the Dragon's Back block would have been a major impediment to any flood waters and most other fauna into the Dinaledi Subsystem, but it would have been a more accessible route than that today."
 
 The paper's conclusion continues, "The new dates further constrain the minimum age of H. naledi to 241 ka. Thus, H. naledi entered the subsystem between 241 ka and 335 ka, during a glacial period, when clastic sediment along the access route into the Dinaledi Subsystem experienced erosion. H. naledi would have probably entered the cave in the same way as the clastic sediments did, through an opening in the roof of the Postbox Chamber and may have entered via the Dragon's Back Chamber by climbing a 5 m high sill and passing below the Dragon's Back Block that was then still attached to the roof, to enter the Dinaledi Subsystem. In this context it is important to emphasize that it was not the Dragon's Back Block that prevented high-energy transport of coarse siliciclastic sediment from the Dragon's Back Chamber into the Dinaledi Subsystem, but rather the in situ floor block in the back wall of the Dragon's Back Chamber, against which the Dragon's Back Block slumped after it fell." This conclusion is very different from the complex pathway suggested by Berger et al. Martinón-Torres et al., 2023 also requested elaboration on this point in their reply by stating, "Moreover, recent studies by the Rising Star Cave team also point to a possible different and easier accesses for H. naledi into the fossil-bearing cave chambers than the current restricted access chute used by the research team, making clear that the degree of accessibility remains an open question (Robbins et al., 2021). Based on extensive dating studies of speleothem, this research (Robbins et al., 2021) implies that prior to 241 ka and the collapse of the Dragon's Back block hominins and other species could have more easily entered the cave via the Post Box Chamber and beneath the Dragon's Back Block before it fell. This gives access to a series of rifts that allow easier entry to the Dinaledi and other chambers beyond the present-day chute."
 
 Because this paper introduces very different sets of possibilities, it seems impossible to derive an understanding of the processes that occurred 335-241 ka throughout the cave system without going into detail on these other openings, especially openings that are hypothesized to have been used by the hominins in question.
 
 The world cares deeply about the H. naledi hominins and their story. I hope that in the coming years these issues are addressed, and perhaps other independent teams are allowed to do a full analysis since science is about replication. In any case, the excavation team has contributed important fossils to paleoanthropology.
 
 Literature cited:
 
 Foecke, Kimberly K., Queffelec, Alain, & Pickering, Robyn. (2023). No Sedimentological Evidence for Deliberate Burial by Homo naledi - A Case Study Highlighting the Need for Best Practices in Geochemical Studies Within Archaeology and Paleoanthropology. PaleoAnthropology, 2024.
 
 Martinón-Torres, M., Garate, D., Herries, A. I. R., & Petraglia, M. D. (2023). No scientific evidence that Homo naledi buried their dead and produced rock art. Journal of Human Evolution, 103464. https://doi.org/10.1016/j.jhevol.2023.103464
 
 Robbins, J. L., Dirks, P. H. G. M., Roberts, E. M., Kramers, J. D., Makhubela, T. V., HilbertWolf, H. L., Elliott, M., Wiersma, J. P., Placzek, C. J., Evans, M., & Berger, L. R. (2021). Providing context to the Homo naledi fossils: Constraints from flowstones on the age of sediment deposits in Rising Star Cave, South Africa. Chemical Geology, 567, 120108. https://doi.org/10.1016/j.chemgeo.2021.120108
 
 Review 1
3. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Before providing my review of the revised version of this study by Berger et al., which explores potential deliberate burials of Homo naledi within the Rising Star Cave System, I would like to briefly summarize the key points from my previous review of the earlier version (in 2023). Summarizing my previous review will provide context for assessing how effectively the revised study addresses the concerns I raised previously (in 2023).
 
 In my earlier comments, I highlighted significant methodological and analytical shortcomings that, in my view, undermined the authors' claim of intentional burials by Homo naledi. While the study presented detailed geological and fossil data, I found the evidence for intentional burials unconvincing due to insufficient application of archaeothanatological principles and other methodological gaps.
 
 My key concerns included:
 
 (1) The absence of a comprehensive archaeothanatological analysis, particularly with respect to taphonomic changes, bone articulations, and displacement patterns such as the collapse of sediments and bone remains into voids created by decomposition.
 
 (2) Missing or unclear illustrations of bone arrangements, which are critical for interpreting burial positions and processes.
 
 (3) A lack of detailed discussion on the sequence of decomposition, joint disarticulation, sediment infill, and secondary bone displacement.
 
 To convincingly support claims of deliberate burial, I argued that the study must reconstruct the timeline and processes surrounding death and deposition while clearly distinguishing natural taphonomic changes from intentional human actions. I emphasized the importance of integrating established archaeothanatological frameworks, such as those outlined by Duday et al. or Boulestin et al., to provide the necessary analytical rigor.
 
 I will now explain how the revised version of this study has successfully addressed all the concerns raised in my previous review and why I now think that the authors provide sufficient evidence for the presence of "repeated and patterned" deliberate burials (referred to as "cultural burials" by the authors) by Homo naledi within the Rising Star Cave System.
 
 In their revised manuscript, the authors have implemented substantial improvements in methodology, analytical depth, and overall presentation, which have effectively resolved the critical issues I previously highlighted. These revisions greatly strengthen their argument for intentional funerary practices. Importantly, the authors remain cautious in their interpretation of the evidence, explicitly refraining from inferring "symbolic" behavior or complex cognitive motivations behind these burials. Instead, they focus on presenting clear evidence for deliberate, patterned practices while leaving the broader implications for Homo naledi's cultural and cognitive capacities open for further investigation. This cautious approach adds to the credibility of their conclusions and avoids overextending the interpretation of the data.
 
 The authors' enhanced application of archaeothanatological principles now offers a more comprehensive and convincing interpretation of the burial features. Key gaps in the earlier version, such as the absence of detailed reconstructions of taphonomic processes, bone articulations, and displacement patterns, have been addressed with thorough analyses and clearer illustrations. The study also now includes a well-structured timeline of events surrounding death and deposition, demonstrating an improved ability to differentiate between natural processes and deliberate human actions. These additions lend greater clarity and rigor to the evidence, making the argument for intentional burials both robust and persuasive.
 
 Furthermore, the revised study presents detailed data on skeletal arrangements, decomposition sequences, and spatial patterns. This information is now relatively well illustrated and contextualized, enabling readers to better understand the complex processes involved in these burial practices. Importantly, the authors provide a stronger theoretical framework, integrating established archaeothanatological methodologies and taphonomic studies that situate their findings within broader archaeological and anthropological discussions of funerary behavior.
 
 That being said, there remain relatively minor issues that could be refined further. Addressing these would help ensure the study is as clear and accessible as possible to the reader. Such adjustments would enhance the overall readability and reinforce the study's impact within the scientific community.
 
 A - Suggested changes:
 
 While the revised version of this study marks a significant improvement, successfully addresses my previous major concerns and provides a convincing argument for deliberate burials by Homo naledi, I believe that including both one summary table + one summary figure for each of the three main locations and the-Hill Antechamber, and Dinaledi Chamber (Feature 1 and Puzzle Box)-would further enhance the clarity and accessibility of the findings. Such tables and figures would serve as a valuable reference, allowing readers to more easily follow how the detailed patterns observed at each site fit the criteria for distinguishing intentional from natural processes.
 
 The summary tables should consolidate key information for each location, such as:
 
 (1) Bone articulations: A comprehensive list of articulated skeletal elements, categorized by their anatomical relationships (e.g., labile vs. stable articulations).
 
 (2) Displacement patterns: Documentation of any spatial shifts in bone positions, noting directions and extents of disarticulation.
 
 (3) Sequence of decomposition: Observations regarding the sequence of decomposition, joint disarticulation and associated changes in bone arrangements.
 
 (4) Sediment interaction: Notes on sediment infill and its timing relative to decomposition, including evidence of secondary voids or delayed sediment deposition.
 
 (5) Distinguishing criteria: Clear indications of how each observed pattern supports intentional burial (e.g., structured placement, lack of natural transport mechanisms) versus natural processes (e.g., random dispersal, sediment-driven bone displacement). Including such tables would not only summarize the complex taphonomic and archaeothanatological data but also allow readers to quickly assess how the evidence supports the authors' conclusions. This approach would bridge the gap between the detailed narrative descriptions and the criteria necessary to differentiate deliberate funerary practices from natural occurrences.
 
 To streamline the main text further, many of the detailed descriptions of individual bones, specific displacement measurements, and other intricate observations could be moved to the supplementary data. This reorganization would maintain the richness of the data for those who wish to explore it in depth, while the summary tables would present the key findings concisely in the main text. This balance between accessibility and detail would ensure that the study appeals to both specialists requiring comprehensive data and readers looking for an overarching understanding of the findings.
 
 In addition to these structural changes, it is crucial to ensure that evidence is consistently illustrated throughout the text.
 
 Importantly the skeletal part representation is provided for Dinaledi Feature 1 in Figure 14, but similar data is not presented for the other burial features, such as those in the Hill Antechamber or Puzzle Box. This inconsistency could make it more challenging for readers to compare the features and fully appreciate the patterns of burial behavior across the different locations. Ensuring that similar types of evidence and analyses are presented uniformly for all features would strengthen the study and make its conclusions more cohesive and compelling.
 
 Adding supplementary figures to represent the skeletal part distribution (as in Figure 14) within each excavated area (i.e., not only for Dinaledi Feature 1 but also for Hill Antechamber and Puzzle Box) would significantly enhance the study's clarity and accessibility. These figures could provide a visual summary of skeletal part representation, allowing readers to easily understand the nature of human remains within each burial context.
 
 Specifically, such figures could:
 
 (1) Illustrate Skeletal Part Representation: By visually mapping the presence and location of various skeletal elements, the figures would make it easier for readers to assess the completeness and arrangement of remains in each feature. This is particularly important for interpreting patterns of bone articulation and disarticulation. For example, it is quite challenging to determine the exact number and characteristics of the human skeletal remains identified within the Puzzle Box and those recovered through the "subsurface collection" in its surrounding area. The authors state that "at least six individuals" were identified in this area (during "subsurface collection") but provide no further clarification. They simply mention that "most elements" were described previously, without specifying which elements or where this prior description can be found.
 
 (2) Highlight Articulations and Displacements: Figures could indicate which bones are articulated and their relative positions, as well as the spatial distribution of disarticulated elements. This would provide a clear visual context to support interpretations of taphonomic processes.
 
 (3) Facilitate Comparisons Across Locations: By presenting skeletal part representation consistently for each location, the figures would enable readers to directly compare features, reinforcing the argument for "repeated and patterned" behavior.
 
 (4) Simplify Complex Data: Instead of relying solely on textual descriptions, the visual format would allow readers to quickly grasp the key findings, making the study more accessible to a broader audience
 
 By including such figures alongside the proposed summary tables in the main text, the study would achieve a balance between detailed narrative descriptions and concise, visual representation of the data. This approach would strengthen the overall presentation and support the authors' conclusions effectively.
 
 Again, by presenting the data in a structured and comparative format, the new tables + figures could also highlight the differences and similarities between the three locations. This would reinforce the argument for "repeated and patterned" behavior, as the tables would make it easier to observe consistent burial practices across different contexts within the Rising Star Cave System.
 
 Adding these summary tables + figures, ensuring consistent presentation of evidence, and reallocating detailed descriptions to supplementary materials would not require significant new analysis. However, these organizational adjustments would greatly enhance the study's clarity, readability, and overall impact.
 
 B - A few additional changes are needed:
 
 Figure 8: This figure is critical but lacks clarity. Specifically:
 
 Panels 8a-c suffer from low contrast, making details difficult to discern. Panel 8d (sediment profile) is too small and lacks annotations that would aid interpretation. Figure S7: While this figure has significantly better contrast than Figures 8a-c, I am unable to identify the "articulated foot ... at right of frame," as mentioned in the caption. Please clarify this by adding annotations directly to the figure.
 
 Page 4, 2nd paragraph: In the sentence "Researchers thus have diverse opinions about how to test whether ...," the word "opinions" should be replaced with a more precise term, such as "approaches."
 
 C - In conclusion, I am impressed by the significant effort and meticulous work that has gone into this revised version of the study. The quality of the new evidence presented is commendable, and the findings now convincingly demonstrate not only clear evidence of intentional burial practices by Homo naledi but also compelling indications of post-depositional reworking. These advancements reflect a major improvement in the study's analytical rigor and the robustness of its conclusions, making it a valuable contribution to the understanding of early hominin funerary behavior.
 
 Review 2
4. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Author response:
 
 [The following is the authors’ response to the original reviews.]
 
 We extend our sincere thanks to the editor, referees for eLife, and other commentators who have written evaluations of this manuscript, either in whole or in part. Sources of these comments were highly varied, including within the bioRxiv preprint server, social media (including many comments received on X/Twitter and some YouTube presentations and interviews), comments made by colleagues to journalists, and also some reviews of the work published in other academic journals. Some of these are formal and referenced with citations. Others were informal but nonetheless expressed perspectives that helped enable us to revise the manuscript with the inclusion of broader perspectives than the formal review process. It is beyond the scope of this summary to list every one of these, which have often been brought to the attention of different coauthors, but we begin by acknowledging the very wide array of peer and public commentary that have contributed to this work. The reaction speaks to a broad interest in open discussion and review of preprints.
 
 As we compiled this summary of changes to the manuscript, we recognized that many colleagues made comments about the process of preprint dissemination and evaluation rather than the data or analyses in the manuscript. Addressing such comments is outside the scope of this revised manuscript, but we do feel that a broader discussion of these comments would be valuable in another venue. Many commentators have expressed confusion about the eLife system of evaluation of preprints, which differs from the editorial acceptance or rejection practiced in most academic journals. As authors in many different nations, in varied fields, and in varied career stages, we ourselves are still working to understand how the academic publication landscape is changing, and how best to prepare work for new models of evaluation and dissemination.
 
 The manuscript and coauthor list reflect an interdisciplinary collaboration. Analyses presented in the manuscript come from a wide range of scientific disciplines. These range from skeletal inventory, morphology, and description, spatial taphonomy, analysis of bone fracture patterns and bone surface modifications, sedimentology, geochemistry, and traditional survey and mapping. The manuscript additionally draws upon a large number of previous studies of the Rising Star cave system and the Dinaledi Subsystem, which have shaped our current work. No analysis within any one area of research stands alone within this body of work: all are interpreted in conjunction with the outcomes of other analyses and data from other areas of research. Any single analysis in isolation might be consistent with many different hypotheses for the formation of sediments and disposition of the skeletal remains. But testing a hypothesis requires considering all data in combination and not leaving out data that do not fit the hypothesis. We highlight this general principle at the outset because a number of the comments from referees and outside specialists have presented alternative hypotheses that may arguably be consistent with one kind of analysis that we have presented, while seeming to overlook other analyses, data, or previous work that exclude these alternatives. In our revision, we have expanded all sections describing results to consider not only the results of each analysis, but how the combination of data from different kinds of analysis relate to hypotheses for the deposition and subsequent history of the Homo naledi remains. We address some specific examples and how we have responded to these in our summary of changes below.
 
 General organization:
 
 The referee and editor comments are mostly general and not line-by-line questions, and we have compiled them and treated them as a group in this summary of changes, except where specifically noted.
 
 The editorial comments on the previous version included the suggestion that the manuscript should be reorganized to test “natural” (i.e. noncultural) hypotheses for the situations that we examine. The editorial comment suggested this as a “null hypothesis” testing approach. Some outside comments also viewed noncultural deposition as a null hypothesis to be rejected. We do not concur that noncultural processes should be construed as a null hypothesis, as we discuss further below. However, because of the clear editorial opinion we elected to revise the manuscript to make more explicit how the data and analyses test noncultural depositional hypotheses first, followed by testing of cultural hypotheses. This reorganization means that the revised manuscript now examines each hypothesis separately in turn.
 
 Taking this approach resulted in a substantial reorganization of the “Results” section of the manuscript. The “Results” section now begins with summaries of analyses and data conducted on material from each excavation area. After the presentation of data and analyses from each area, we then present a separate section for each of several hypotheses for the disposition and sedimentary context of the remains. These hypotheses include deposition of bodies upon a talus (as hypothesized in some previous work), slow sedimentary burial on a cave floor or within a natural depression, rapid burial by gravity-driven slumping, and burial of naturally mummified remains. We then include sections to test the hypothesis of primary cultural burial and secondary cultural burial. This approach adds substantial length to the Results. While some elements may be repeated across sections, we do consider the new version to be easier to take piece by piece for a reader trying to understand how each hypothesis relates to the evidence.
 
 The Results section includes analyses on several different excavation areas within the Dinaledi Subsystem. Each of these presents somewhat different patterns of data. We conceived of this manuscript combining these distinct areas because each of them provides information about the formation history of the Homo naledi-associated sediments and the deposition of the Homo naledi remains. Together they speak more strongly than separately. In the previous version of the manuscript, two areas of excavation were considered in detail (Dinaledi Feature 1 and the Hill Antechamber Feature), with a third area (the Puzzle Box area) included only in the Discussion and with reference to prior work. We now describe the new work undertaken after the 2013-2014 excavations in more detail. This includes an overview of areas in the Hill Antechamber and Dinaledi Chamber that have not yielded substantial H. naledi remains and that thereby help contextualize the spatial concentration of H. naledi skeletal material. The most substantial change in the data presented is a much expanded reanalysis of the Puzzle Box area. This reanalysis provides greater clarity on how previously published descriptions relate to the new evidence. The reanalysis also provides the data to integrate the detailed information on bone identification fragmentation, and spatial taphonomy from this area with the new excavation results from the other areas.
 
 In addition to Results, the reorganization also affected the manuscript’s Introduction section. Where the previous version led directly from a brief review of Pleistocene burial into the description of the results, this revised manuscript now includes a review of previous studies of the Rising Star cave system. This review directly addresses referee comments that express some hesitation to accept previous results concerning the structure and formation of sediments, the accessibility of the Dinaledi Subsystem, the geochronological setting of the H. naledi remains, and the relation of the Dinaledi Subsystem to nearby cave areas. Some parts of this overview are further expanded in the Supplementary Information to enable readers to dive more deeply into the previous literature on the site formation and geological configuration of the Rising Star cave system without needing to digest the entirety of the cited sources.
 
 The Discussion section of the revised manuscript is differentiated from Results and focuses on several areas where the evidence presented in this study may benefit from greater context. One new section addresses hypothesis testing and parsimony for Pleistocene burial evidence, which we address at greater length in this summary below. The majority of the Discussion concerns the criteria for recognizing evidence for burial as applied in other studies. In this research we employ a minimal definition but other researchers have applied varied criteria. We consider whether these other criteria have relevance in light of our observations and whether they are essential to the recognition of burial evidence more broadly.
 
 Vocabulary:
 
 We introduce the term “cultural burial” in this revised manuscript to refer to the burial of dead bodies as a mortuary practice. “Burial” as an unmodified term may refer to the passive covering of remains by sedimentary processes. Use of the term “intentional burial” would raise the question of interpreting intent, which we do not presume based on the evidence presented in this research. The relevant question in this case is whether the process of burial reflects repeated behavior by a group. As we received input from various colleagues it became clear that burial itself is a highly loaded term. In particular there is a common assumption within the literature and among professionals that burial must by definition be symbolic. We do not take any position on that question in this manuscript, and it is our hope that the term “cultural burial” may focus the conversation around the extent that the behavioral evidence is repeated and patterned.
 
 Sedimentology and geochemistry of Dinaledi Feature 1:
 
 Reviewer 4 provided detailed comments on the sedimentological and geochemical context that we report in the manuscript. One outside review (Foecke et al. 2024) included some of the points raised by reviewer 4, and additionally addressed the reporting of geochemical and sedimentological data in previous work that we cite.
 
 To address these comments we have revised the sedimentary context and micromorphology of sediments associated with Dinaledi Feature 1. In the new text we demonstrate the lack of microstratigraphy (supported by grain size analysis) in the unlithified mud clast breccia (UMCB), while such a microstratigraphy is observed in the laminated orange-red mudstones (LORM) that contribute clasts to the UMCB. Thus, we emphasize the presence and importance of a laterally continuous layer of LORM nature occurring at a level that appears to be the maximum depth of fossil occurrence. This layer is severely broken under extensive accumulation of fossils such as Feature 1 and only evidenced by abundant LORM clasts within and around the fossils.
 
 We have completely reworked the geochemical context associated with Feature 1 following the comments of reviewer 4. We described the variations and trends observed in the major oxides separate from trace and rare-earth elements. We used Harker variations plots to assess relationships between these element groups with CaO and Zn, followed by principal component analysis of all elements analyzed. The new geochemical analysis clearly shows that Feature 1 is associated with localized trace element signatures that exist in the sediments only in association with the fossil bones, which suggests lack of postdepositional mobilization of the fossils and sediments. We additionally have included a fuller description of XRF methods.
 
 To clarify the relation of all results to the features described in this study, we removed the geochemical and sedimentological samples from other sites within the Dinaledi Subsystem. These localities within the fissure network represent only surface collection of sediment, as no excavation results are available from those sites to allow for comparison in the context of assessing evidence of burial. These were initially included for comparison, but have now been removed to avoid confusion.
 
 Micromorphology of sediments:
 
 Some referees (1, 3, and 4) and other commentators (including Martinón-Torres et al. 2024) have suggested that the previous manuscript was deficient due to an insufficient inclusion of micromorphological analysis of sediments. Because these commentators have emphasized this kind of evidence as particularly important, we review here what we have included and how our revision has addressed this comment. Previous work in the Dinaledi Chamber (Dirks et al., 2015; 2017) included thin section illustrations and analysis of sediment facies, including sediments in direct association with H. naledi remains within the Puzzle Box area. The previous work by Wiersma and coworkers (2020) used micromorphological analysis as one of several approaches to test the formation history of Unit 3 sediments in the Dinaledi Subsystem, leading to the interpretation of autobrecciation of earlier Unit 1 sediment. In the previous version of this manuscript we provided citations to this earlier work. The previous manuscript also provided new thin section illustrations of Unit 3 sediment near Dinaledi Feature 1 to place the disrupted layer of orange sediment (now designated the laminated orange silty mudstone unit) into context.
 
 In the new revised manuscript we have added to this information in three ways. First, as noted above in response to reviewer 4, we have revised and added to our discussion of micromorphology within and adjacent to the Dinaledi Feature 1. Second, we have included more discussion in the Supplementary Information of previous descriptions of sediment facies and associated thin section analysis, with illustrations from prior work (CC-BY licensed) brought into this paper as supplementary figures, so that readers can examine these without following the citations. Third, we have included Figure 10 in the manuscript which includes six panels with microtomographic sections from the Hill Antechamber Feature. This figure illustrates the consistency of sub-unit 3b sediment in direct contact with H. naledi skeletal material, including anatomically associated skeletal elements, with previous analyses that demonstrate the angular outlines and chaotic orientations of LORM clasts. It also shows density contrasts of sediment in immediate contact with some skeletal elements, the loose texture of this sediment with air-filled voids, and apparent invertebrate burrowing activity. To our knowledge this is the first application of microtomography to sediment structure in association with a Pleistocene burial feature.
 
 To forestall possible comments that the revised manuscript does not sufficiently employ micromorphological observations, or that any one particular approach to micromorphology is the standard, we present here some context from related studies of evidence from other research groups working at varied sites in Africa, Europe, and Asia. Hodgkins et al. (2021) noted: “Only a handful of micromorphological studies have been conducted on human burials and even fewer have been conducted on suspected burials from Paleolithic or hunter-gatherer contexts.” In that study, one supplementary figure with four photomicrographs of thin sections of sediments was presented. Interpretation of the evidence for a burial pit by Hodgkins et al. (2021) noted the more open microstructure of sediment but otherwise did not rely upon the thin section data in characterizing the sediments associated with grave fill. Martinón-Torres et al. (2021) included one Extended Data figure illustrating thin sections of sediments and bone, with two panels showing sediments (the remainder showing bone histology). The micromorphological analysis presented in the supplementary information of that paper was restricted to description of two microfacies associated with the proposed “pit” in that study. That study did carry out microCT scanning of the partially-prepared skeletal remains but did not report any sediment analysis from the microtomographic results. Maloney et al. (2022) reported no micromorphological or thin section analysis. Pomeroy et al. (2020a) included one illustration of a thin section; this study may be regarded as a preliminary account rather than a full description of the work undertaken. Goldberg et al. (2017) analyzed the geoarchaeology of the Roc de Marsal deposits in which possible burial-associated sediments had been fully excavated in the 1960s, providing new morphological assessments of sediment facies; the supplementary information to this work included five scans (not microscans) of sediment thin sections and no microphotographs. Fewlass et al. (2023) presented no thin section or micromorphological illustrations or methods. In summary of this research, we note that in one case micromorphological study provided observations that contributed to the evidence for a pit, in other cases micromorphological data did not test this hypothesis, and many researchers do not apply micromorphological techniques in their particular contexts.
 
 Sediment micromorphology is a growing area of research and may have much to provide to the understanding of ancient burial evidence as its standards continue to develop (Pomeroy et al. 2020b). In particular microtomographic analysis of sediments, as we have initiated in this study, may open new horizons that are not possible with more destructive thin-section preparation. In this manuscript, the thin section data reveals valuable evidence about the disruption of sediment structure by features within the Dinaledi Chamber, and microtomographic analysis further documents that the Hill Antechamber Feature reflects similar processes, in addition to possible post-burial diagenesis and invertebrate activity. Following up in detail on these processes will require further analysis outside the scope of this manuscript.
 
 Access into the Dinaledi Subsystem:
 
 Reviewer 1 emphasizes the difficulty of access into the Dinaledi Subsystem as a reason why the burial hypothesis is not parsimonious. Similar comments have been made by several outside commentators who question whether past accessibility into the Dinaledi Subsystem may at one time have been substantially different from the situation documented in previous work. Several pieces of evidence are relevant to these questions and we have included some discussion of them in the Introduction, and additionally include a section in the Supplementary Information (“Entrances to the cave system”) to provide additional context for these questions. Homo naledi remains are found not only within the Dinaledi Subsystem but also in other parts of the cave system including the Lesedi Chamber, which is similarly difficult for non-expert cavers to access. The body plan, mass, and specific morphology of H. naledi suggest that this species would be vastly more suited to moving and climbing within narrow underground passages than living people. On this basis it is not unparsimonious to suggest that the evidence resulted from H. naledi activity within these spaces. We note that the accessibility of the subsystem is not strictly relevant to the hypothesis of cultural burial, although the location of the remains does inform the overall context which may reflect a selection of a location perceived as special in some way.
 
 Stuffing bodies down the entry to the subsystem:
 
 Reviewer 3 suggests that one explanation for the emplacement of articulated remains at the top of the sloping floor of the Hill Antechamber is that bodies were “stuffed” into the chute that comprises the entry point of the subsystem and passively buried by additional accumulation of remains. This was one hypothesis presented in earlier work (Dirks et al. 2015) and considered there as a minimal explanation because it did not entail the entry of H. naledi individuals into the subsystem. The further exploration (Elliott et al. 2021) and ongoing survey work, as well as this manuscript, all have resulted in data that rejects this hypothesis. The revised manuscript includes a section in the results “Deposition upon a talus with passive burial” that examines this hypothesis in light of the data.
 
 Recognition of pits:
 
 Referee 3 and 4 and several additional commentators have emphasized that the recognition of pit features is necessary to the hypothesis of burial, and questioned whether the data presented in the manuscript were sufficient to demonstrate that pits were present. We have revised the manuscript in several ways to clarify how all the different kinds of evidence from the subsystem test the hypothesis that pits were present. This includes the presentation of a minimal definition of burial to include a pit dug by hominins, criteria for recognizing that a pit was present, and an evaluation of the evidence in each case to make clear how the evidence relates to the presence of a pit and subsequent infill. As referee 3 notes, it can be challenging to recognize a pit when sediment is relatively homogeneous. This point was emphasized in the review by Pomeroy and coworkers (2020b), who reflected on the difficulty seeing evidence for shallow pits constructed by hominins, and we have cited this in the main text. As a result, the evidence for pits has been a recurrent topic of debate for most Pleistocene burial sites. However in addition to the sedimentological and contextual evidence in the cases we describe, the current version also reflects upon other possible mechanisms for the accumulation of bones or bodies. The data show that the sedimentary fill associated with the H. naledi remains in the cases we examine could not have passively accumulated slowly and is not indicative of mass movement by slumping or other high-energy flow. To further put these results into context, we added a section to the Discussion that briefly reviews prior work on distinguishing pits in Pleistocene burial contexts, including the substantial number of sites with accepted burial evidence for which no evidence of a pit is present.
 
 Extent of articulation and anatomical association:
 
 We have added significantly greater detail to the descriptions of articulated remains and orientation of remains in order to describe more specifically the configuration of the skeletal material. We also provide 14 figures in main text (13 of them new) to illustrate the configuration of skeletal remains in our data. For the Puzzle Box area, this now includes substantial evidence on the individuation of skeletal fragments, which enables us to illustrate the spatial configuration of remains associated with the DH7 partial skeleton, as well as the spatial position of fragments refitted as part of the DH1, DH2, DH3, and DH4 crania. For Dinaledi Feature 1 and the Hill Antechamber Feature we now provide figures that key skeletal parts as identified, including material that is unexcavated where possible, and a skeletal part representation figure for elements excavated from Dinaledi Feature 1.
 
 Archaeothanatology:
 
 Reviewer 2 suggests that a greater focus on the archaeothanatology literature would be helpful to the analysis, with specific reference to the sequence of joint disarticulation, the collapse of sediment and remains into voids created by decomposition, and associated fragmentation of the remains. In the revised manuscript we have provided additional analysis of the Hill Antechamber Feature with this approach in mind. This includes greater detail and illustration of our current hypothesis for individuation of elements. We now discuss a hypothesis of body disposition, describe the persistent joints and articulation of elements, and examine likely decomposition scenarios associated with these remains. Additionally, we expand our description and illustration of the orientation of remains and degree of anatomical association and articulation within Dinaledi Feature 1. For this feature and for the Hill Antechamber Feature we have revised the text to describe how fracturing and crushing patterns are consistent with downward pressure from overlying sediment and material. In these features, postdepositional fracturing occurred subsequent to the decomposition of soft tissue and partial loss of organic integrity of the bone. We also indicate that the loss by postdepositional processes of most long bone epiphyses, vertebral bodies, and other portions of the skeleton less rich in cortical bone, poses a challenge for testing the anatomical associations of the remaining elements. This is a primary reason why we have taken a conservative approach to identification of elements and possible associations.
 
 A further aspect of the site revealed by our analysis is the selective reworking of sediments within the Puzzle Box area subsequent to the primary deposition of some bodies. The skeletal evidence from this area includes body parts with elements in anatomical association or articulation, juxtaposed closely with bone fragments at varied pitch and orientation. This complexity of events evidenced within this area is a challenge for approaches that have been developed primarily based on comparative data from single-burial situations. In these discussions we deepen our use of references as suggested by the referee.
 
 Burial positions:
 
 Reviewer 2 further suggests that illustrations of hypothesized burial positions would be valuable. We recognize that a hypothesized burial position may be an appealing illustration, and that some recent studies have created such illustrations in the context of their scientific articles. However such illustrations generally include a great deal of speculation and artist imagination, and tend to have an emotive character. We have added more discussion to the manuscript of possible primary disposition in the case of the Hill Antechamber Feature as discussed above. We have not created new illustrations of hypothesized burial positions for this revision.
 
 Carnivore involvement:
 
 Referee 1 suggests that the manuscript should provide further consideration of whether carnivore activity may have introduced bones or bodies into the cave system. The reorganized Introduction now includes a review of previous work, and an expanded discussion within the Supplementary Information (“Hypotheses tested in previous work”). This includes a review of literature on the topic of carnivore accumulation and the evidence from the Dinaledi and Lesedi Chamber that rejects this hypothesis.
 
 Water transport and mud:
 
 The eLife referees broadly accepted previous work showing that water inundation or mass flow of water-saturated sediment did not occur within the history of Unit 2 and 3 sediments, including those associated with H. naledi remains. However several outside commentators did refer specifically to water flow or mud flow as a mechanism for slumping of deposits and possible sedimentary covering of the remains. To address these comments we have added a section to the
 
 Supplementary Information (“Description of the sedimentary deposits of the Dinaledi Subsystem”) that reviews previous work on the sedimentary units and formation processes documented in this area. We also include a subsection specifically discussing the term “mud” as used in the description of the sedimentology within the system, as this term has clearly been confusing for nonspecialists who have read and commented on the work. We appreciate the referees’ attention to the previous work and its terminology.
 
 Redescription of areas of the cave system:
 
 Reviewer 1 suggests that a detailed reanalysis of all portions of the cave system in and around the Dinaledi Subsystem is warranted to reject the hypothesis that bodies entered the space passively and were scattered from the floor by natural (i.e. noncultural) processes. The referee suggests that National Geographic could help us with these efforts. To address this comment we have made several changes to the manuscript. As noted above, we have added material in Supplementary Information to review the geochronology of the Dinaledi Subsystem and nearby Dragon’s Back Chamber, together with a discussion of the connections between these spaces.
 
 Most directly in response to this comment we provide additional documentation of the possibility of movement of bodies or body parts by gravity within the subsystem itself. This includes detailed floor maps based on photogrammetry and LIDAR measurement, where these are physically possible, presented in Figures 2 and 3. In some parts of the subsystem the necessary equipment cannot be used due to the extremely confined spaces, and for these areas our maps are based on traditional survey methods. In addition to plan maps we have included a figure showing the elevation of the subsystem floor in a cross-section that includes key excavation areas, showing their relative elevation. All figures that illustrate excavation areas are now keyed to their location with reference to a subsystem plan. These data have been provided in previous publications but the visualization in the revised manuscript should make the relationship of areas clear for readers. The Introduction now includes text that discusses the configuration of the Hill Antechamber, Dinaledi Chamber, and nearby areas, and also discusses the instances in which gravity-driven movement may be possible, at the same time reviewing that gravity-driven movement from the entry point of the subsystem to most of the localities with hominin skeletal remains is not possible.
 
 Within the Results, we have added a section on the relationship of features to their surroundings in order to assist readers in understanding the context of these bone-bearing areas and the evidence this context brings to the hypothesis in question. We have also included within this new section a discussion of the discrete nature of these features, a question that has been raised by outside commentators.
 
 Passive sedimentation upon a cave floor or within a natural depression:
 
 Reviewer 3 suggests that the situation in the Dinaledi Subsystem may be similar to a European cave where a cave bear skeleton might remain articulated on a cave floor (or we can add, within a hollow for hibernation), later to be covered in sediment. The reviewer suggests that articulation is therefore no evidence of burial, and suggests that further documentation of disarticulation processes is essential to demonstrating the processes that buried the remains. We concur that articulation by itself is not sufficient evidence of cultural burial. To address this comment we have included a section in the Results that tests the hypothesis that bodies were exposed upon the cave floor or within a natural depression. To a considerable degree, additional data about disarticulation processes subsequent to deposition are provided in our reanalysis of the Puzzle Box area, including evidence for selective reworking of material after burial.
 
 Postdepositional movement and floor drains:
 
 Reviewer 3 notes that previous work has suggested that subsurface floor drains may have caused some postdepositional movement of skeletal remains. The hypothesis of postdepositional slumping or downslope movement has also been discussed by some external commentators (including Martinón-Torres et al. 2024). We have addressed this question in several places within the revised manuscript. As we now review, previous discussion of floor drains attempted to explain the subvertical orientation of many skeletal elements excavated from the Puzzle Box area. The arrangement of these bones reflects reworking as described in our previous work, and without considering the possibility of reworking by hominins, one mechanism that conceivably might cause reworking was downward movement of sediments into subsurface drains. Further exploration and mapping, combined with additional excavation into the sediments beneath the Puzzle Box area provided more information relevant to this hypothesis. In particular this evidence shows that subsurface drains cannot explain the arrangement of skeletal material observed within the Puzzle Box area. As now discussed in the text, the reworking is selective and initiated from above rather than below. This is best explained by hominin activity subsequent to burial.
 
 In a new section of the Results we discuss slumping as a hypothesis for the deposition of the remains. This includes discussion of downslope movement within the Hill Antechamber and the idea that floor drains may have been a mechanism for sediment reworking in and around the Puzzle Box area and Dinaledi Feature 1. As described in this section the evidence does not support these hypotheses.
 
 Hypothesis testing and parsimony:
 
 Referees 1 and 3 and the editorial guidance all suggested that a more appropriate presentation would adopt a null hypothesis and test it. The specific suggestion that the null hypothesis should be a natural sedimentary process of deposition was provided not only by these reviewers but also by some outside commentators. To address this comment, we have edited the manuscript in two ways. The first is the addition of a section to the Discussion that specifically discusses hypothesis testing and parsimony as related to Pleistocene evidence of cultural burial. This includes a brief synopsis of recent disciplinary conversations and citation of work by other groups of authors, none of whom adopted this “null hypothesis” approach in their published work.
 
 As we now describe in the manuscript, previous work on the Dinaledi evidence never assumed any role for H. naledi in the burial of remains. Reading the reviewer reports caused us to realize that this previous work had followed exactly the “null hypothesis” approach that some suggested we follow. By following this null hypothesis approach, we neglected a valuable avenue of investigation. In retrospect, we see how this approach impeded us from understanding the pattern of evidence within the Puzzle Box area. Thus in the revised manuscript we have mentioned this history within the Discussion and also presented more of the background to our previous work in the Introduction. Hopefully by including this discussion of these issues, the manuscript will broaden conversation about the relation of parsimony to these issues.
 
 Language and presentation style:
 
 Reviewer 4 criticizes our presentation, suggesting that the text “gives the impression that a hypothesis was formulated before data were collected.” Other outside commentators have mentioned this notion also, including Martinón-Torres et al. (2024) who suggest that the study began from a preferred hypothesis and gathered data to support it. The accurate communication of results and hypotheses in a scientific article is a broader issue than this one study. Preferences about presentation style vary across fields of study as well as across languages. We do not regret using plain language where possible. In any study that combines data and methods from different scientific disciplines, the use of plain language is particularly important to avoid misunderstandings where terms may mean different things in different fields.
 
 The essential question raised by these comments is whether it is appropriate to present the results of a study in terms of the hypothesis that is best supported. As noted above, we read carefully many recent studies of Pleistocene burial evidence. We note that in each of these studies that concluded that burial is the best hypothesis, the authors framed their results in the same way as our previous manuscript: an introduction that briefly reviews background evidence for treatment of the dead, a presentation of results focused on how each analysis supports the hypothesis of burial for the case, and then in some (but not all) cases discussion of why some alternative hypotheses could be rejected. We do not infer from this that these other studies started from a presupposition and collected data only to confirm it. Rather, this is a simple matter of presentation style.
 
 The alternative to this approach is to present an exhaustive list of possible hypotheses and to describe how the data relate to each of them, at the end selecting the best. This is the approach that we have followed in the revised manuscript, as described above under the direction of the reviewer and editorial guidance. This approach has the advantage of bringing together evidence in different combinations to show how each data point rejects some hypotheses while supporting others. It has the disadvantage of length and repetition.
 
 Possible artifact:
 
 We have chosen to keep the description of the possible artifact associated with the Hill Antechamber Feature in the Supplementary Information. We do this while acknowledging that this is against the opinion of reviewer 4, who felt the description should be removed unless the object in question is fully excavated and physically analyzed. The previous version of the manuscript did not rely upon the stone as positive evidence of grave goods or symbolic content, and it noted that the data do not test whether the possible artifact was placed or was intentionally modified. However this did not satisfy reviewer 4, and some outside commentators likewise asserted that the object must be a “geofact” and that it should be removed.
 
 We have three arguments against this line of thinking. First, we do not omit data from our reporting. Whether Homo naledi shaped the rock or not, used it as a tool or not, whether the rock was placed with the body or not, it is unquestionably there. Omitting this one object from the report would be simply dishonest. Second, the data on this rock are at 16 micron resolution. While physical inspection of its surface may eventually reveal trace evidence and will enable better characterization of the raw material, no mode of surface scanning will produce better evidence about the object’s shape. Third, the position of this possible artifact within the feature provides significant information about the deposition of the skeletal material and associated sediments. The pitch, orientation, and position of the stone is not consistent with slow deposition but are consistent with the hypothesis that the surrounding sediment was rapidly emplaced at the same time as the articulated elements less than 2 cm away.
 
 In the current version, we have redoubled our efforts to provide information about the position and shape of this stone while not presupposing the intentionality of its shape or placement. We add here that the attitude expressed by referee 4 and other commentators, if followed at other sites, would certainly lead to the loss or underreporting of evidence, which we are trying to avoid.
 
 Consistency versus variability of behavior:
 
 As described in the revised manuscript, different features within the Dinaledi Subsystem exhibit some shared characteristics. At the same time, they vary in positioning, representation of individuals and extent of commingling. Other localities within the subsystem and broader cave system present different evidence. Some commentators have questioned whether the patterning is consistent with a single common explanation, or whether multiple explanations are necessary. To address this line of questioning, we have added several elements to the manuscript. We created a new section on secondary cultural burial, discussing whether any of the situations may reflect this practice. In the Discussion, we briefly review the ways in which the different features support the involvement of H. naledi without interpreting anything about the intentionality or meaning of the behavior. We further added a section to the Discussion to consider whether variation among the features reflects variation in mortuary practices by H. naledi. One aspect of this section briefly cites variation in the location and treatment of skeletal remains at other sites with evidence of burial.
 
 Grave goods:
 
 Some commentators have argued that grave goods are a necessary criterion for recognizing evidence of ancient burial. We added a section to the Discussion to review evidence of grave goods at other Pleistocene sites where burial is accepted.
 
 References:
 
 Dirks, P. H., Berger, L. R., Roberts, E. M., Kramers, J. D., Hawks, J., Randolph-Quinney, P. S., Elliott, M., Musiba, C. M., Churchill, S. E., de Ruiter, D. J., Schmid, P., Backwell, L. R., Belyanin, G. A., Boshoff, P., Hunter, K. L., Feuerriegel, E. M., Gurtov, A., Harrison, J. du G., Hunter, R., … Tucker, S. (2015). Geological and taphonomic context for the new hominin species Homo naledi from the Dinaledi Chamber, South Africa. eLife, 4, e09561. https://doi.org/10.7554/eLife.09561
 
 Dirks, P. H., Roberts, E. M., Hilbert-Wolf, H., Kramers, J. D., Hawks, J., Dosseto, A., Duval, M., Elliott, M., Evans, M., Grün, R., Hellstrom, J., Herries, A. I., Joannes-Boyau, R., Makhubela, T. V., Placzek, C. J., Robbins, J., Spandler, C., Wiersma, J., Woodhead, J., & Berger, L. R. (2017). The age of Homo naledi and associated sediments in the Rising Star Cave, South Africa. eLife, 6, e24231. https://doi.org/10.7554/eLife.24231
 
 Elliott, M., Makhubela, T., Brophy, J., Churchill, S., Peixotto, B., FEUERRIEGEL, E., Morris, H., Van Rooyen, D., Ramalepa, M., Tsikoane, M., Kruger, A., Spandler, C., Kramers, J., Roberts, E., Dirks, P., Hawks, J., & Berger, L. R. (2021). Expanded Explorations of the Dinaledi Subsystem,Rising Star Cave System, South Africa. PaleoAnthropology, 2021(1), 15–22. https://doi.org/10.48738/2021.iss1.68
 
 Fewlass, H., Zavala, E. I., Fagault, Y., Tuna, T., Bard, E., Hublin, J.-J., Hajdinjak, M., & Wilczyński, J. (2023). Chronological and genetic analysis of an Upper Palaeolithic female infant burial from Borsuka Cave, Poland. iScience, 26(12). https://doi.org/10.1016/j.isci.2023.108283
 
 Foecke, Kimberly K., Queffelec, Alain, & Pickering, Robyn. (n.d.). No Sedimentological Evidence for Deliberate Burial by Homo naledi – A Case Study Highlighting the Need for Best Practices in Geochemical Studies Within Archaeology and Paleoanthropology. PaleoAnthropology, 2024. https://doi.org/10.48738/202x.issx.xxx
 
 Goldberg, P., Aldeias, V., Dibble, H., McPherron, S., Sandgathe, D., & Turq, A. (2017). Testing the Roc de Marsal Neandertal “Burial” with Geoarchaeology. Archaeological and Anthropological Sciences, 9(6), 1005–1015. https://doi.org/10.1007/s12520-013-0163-2
 
 Maloney, T. R., Dilkes-Hall, I. E., Vlok, M., Oktaviana, A. A., Setiawan, P., Priyatno, A. A. D., Ririmasse, M., Geria, I. M., Effendy, M. A. R., Istiawan, B., Atmoko, F. T., Adhityatama, S., Moffat, I., Joannes-Boyau, R., Brumm, A., & Aubert, M. (2022). Surgical amputation of a limb 31,000 years ago in Borneo. Nature, 609(7927), 547–551. https://doi.org/10.1038/s41586-022-05160-8
 
 Martinón-Torres, M., d’Errico, F., Santos, E., Álvaro Gallo, A., Amano, N., Archer, W., Armitage, S. J., Arsuaga, J. L., Bermúdez de Castro, J. M., Blinkhorn, J., Crowther, A., Douka, K., Dubernet, S., Faulkner, P., Fernández-Colón, P., Kourampas, N., González García, J., Larreina, D., Le Bourdonnec, F.-X., … Petraglia, M. D. (2021). Earliest known human burial in Africa. Nature, 593(7857), Article 7857. https://doi.org/10.1038/s41586021-03457-8
 
 Martinón-Torres, M., Garate, D., Herries, A. I. R., & Petraglia, M. D. (2023). No scientific evidence that Homo naledi buried their dead and produced rock art. Journal of Human Evolution, 103464. https://doi.org/10.1016/j.jhevol.2023.103464
 
 Pomeroy, E., Bennett, P., Hunt, C. O., Reynolds, T., Farr, L., Frouin, M., Holman, J., Lane, R., French, C., & Barker, G. (2020a). New Neanderthal remains associated with the ‘flower burial’ at Shanidar Cave. Antiquity, 94(373), 11–26. https://doi.org/10.15184/aqy.2019.207
 
 Pomeroy, E., Hunt, C. O., Reynolds, T., Abdulmutalb, D., Asouti, E., Bennett, P., Bosch, M., Burke, A., Farr, L., Foley, R., French, C., Frumkin, A., Goldberg, P., Hill, E., Kabukcu, C., Lahr, M. M., Lane, R., Marean, C., Maureille, B., … Barker, G. (2020b). Issues of theory and method in the analysis of Paleolithic mortuary behavior: A view from Shanidar Cave. Evolutionary Anthropology: Issues, News, and Reviews, 29(5), 263–279. https://doi.org/10.1002/evan.21854
 
 Robbins, J. L., Dirks, P. H. G. M., Roberts, E. M., Kramers, J. D., Makhubela, T. V., HilbertWolf, H. L., Elliott, M., Wiersma, J. P., Placzek, C. J., Evans, M., & Berger, L. R. (2021). Providing context to the Homo naledi fossils: Constraints from flowstones on the age of sediment deposits in Rising Star Cave, South Africa. Chemical Geology, 567, 120108. https://doi.org/10.1016/j.chemgeo.2021.120108
 
 Wiersma, J. P., Roberts, E. M., & Dirks, P. H. G. M. (2020). Formation of mud clast breccias and the process of sedimentary autobrecciation in the hominin-bearing (Homo naledi) Rising Star Cave system, South Africa. Sedimentology, 67(2), 897–919. https://doi.org/10.1111/sed.12666
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.06.01.543127v2
www.biorxiv.org www.biorxiv.org

Synaptic Connectivity of Sensorimotor Circuits for Vocal Imitation in the Songbird

5
1. Public_Reviews 09 Jun 2025
 
 in eLife
 
 eLife Assessment
 
 The songbird vocal motor nucleus HVC contains cells that project to the basal ganglia, the auditory system, or to downstream vocal motor structures. In this fundamental study, the authors conduct optogenetic circuit mapping to clarify how four distinct inputs to HVC act on these distinct HVC cell types. They provide compelling evidence that all long range projections engage inhibitory circuits in HVC and can also exhibit cell-type specific preferences in monosynaptic input strength. Understanding HVC at this microcircuit level is critical for constraining models of song learning and production.
 
 Summary
2. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 This work has crated the map of synaptic connectivity between the inputs and outputs of song premotor nucleus, HVC in zebra finches to understand how sensory (auditory) to motor circuit interact to coordinate song production and learning. The authors optimized the optogenetic technique via AAV to manipulate auditory inputs from a specific auditory area one-by-one and recorded synaptic activity from a neuron in HVC with whole-cell recording from slice preparation with identification of projection area by retrograde neuronal tracing. These thorough and detailed analysis provide compelling evidence of synaptic connections between 4 major auditory inputs (3 forebrain and 1 thalamic regions) within three projection neurons in the HVC; all areas give monosynaptic excitatory inputs and polysynaptic inhibitory inputs, but proportions of projection to each projection neuron varied. They also find specific reciprocal connections between mMAN and Av. Taken together the authors provide the map of synaptic connection between intercortical sensory to motor areas which is suggested to be involved in zebra finch song production and learning.
 
 Strengths:
 
 The authors optimized optogenetical tools with eGtACR1 by using AAV which allow them to manipulate synaptic inputs in a projection-specific manner in zebra finches. They also identify HVC cell type based on projection area. With their technical advance and thorough experiments, they provided detailed map synaptic connection and gave insights into the neuronal circuit for auditory guided vocal (motor) learning.
 
 Weaknesses:
 
 As this study is in adult brain slices, there might be a gap to the functions in developmental song learning.
 
 Review 1
3. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The manuscript describes synaptic connectivity in Songbird cortex four main classes of sensory neurons afferents onto three known classes of projection neurons of the pre-motor cortical region HVC. HVC is a region associated with the generation of learned bird song. Investigators here use all male zebra finches to examine the functional anatomy of this region using patch clamp methods combined with optogenetic activation of select neuronal groups.
 
 Strengths:
 
 The quality of the recordings is extremely high and the quantity of data is on a very significant scale, this will certainly aid the field.
 
 Weaknesses:
 
 Could make the figures a little easier to navigate by having some atlas drawings.
 
 Comments on revisions:
 
 The authors have addressed the minor concerns and suggestions
 
 Review 2
4. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Nucleus HVC is critical both for song production as well as learning and arguably, sitting at the top of the song control system, is the most critical node in this circuit receiving a multitude of inputs and sending precisely timed commands that determine the temporal structure of song. The complexity of this structure and its underlying organization seem to become more apparent with each experimental manipulation, and yet our understanding of the underlying circuit organization remains relatively poorly understood. In this study, Trusel and Roberts use classic whole-cell patch clamp techniques in brain slices coupled with optogenetic stimulation of select inputs to provide a careful characterization and quantification of synaptic inputs into HVC. By identifying individual projections neurons using retrograde tracer injections combined with pharmacological manipulations, they classify monosynaptic inputs onto each of the three main classes of glutamatergic projection neurons in HVC (RA-, Area X- and Av-projecting neurons). This study is remarkable in the amount of information that it generates, and the tremendous labor involved for each experiment, from the expression of opsins in each of the target inputs (Uva, NIf, mMAN and Av), the retrograde labelling of each type of projection neuron, and ultimately the optical stimulation of infected axons while recording from identified projection neurons. Taken together, this study makes an important contribution to increasing our identification, and ultimately understanding, of the basic synaptic elements that make up the circuit organization of HVC, and how external inputs, which we know to be critical for song production and learning, contribute to the intrinsic computations within this critic circuit.
 
 This study is impressive in its scope, rigorous in its implementation and thoughtful regarding its limitations. The manuscript is well written, and I appreciate the clarity with which the authors use our latest understanding of the evolutionary origins of this circuit to place these studies within a larger context and their relevance to the study of vocal control, including human speech. My comments are minor and primarily about legibility, clarification of certain manipulations and organization of some of the summary figures.
 
 Comments on revisions:
 
 The authors have done a very nice job addressing the reviewers' comments.
 
 Review 3
5. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews
 
 Public Reviews:
 
 Reviewer #1 (Public review):
 
 Summary:
 
 This work tried to map the synaptic connectivity between the inputs and outputs of the song premotor nucleus, HVC in zebra finches to understand how sensory (auditory) to motor circuits interact to coordinate song production and learning. The authors optimized the optogenetic technique via AAV to manipulate auditory inputs from a specific auditory area one-by-one and recorded synaptic activity from a neuron with whole-cell recording from slice preparation with identification of the projection area by retrograde neuronal tracing. This thorough and detailed analysis provides compelling evidence of synaptic connections between 4 major auditory inputs (3 forebrain and 1 thalamic region) within three projection neurons in the HVC; all areas give monosynaptic excitatory inputs and polysynaptic inhibitory inputs, but proportions of projection to each projection neuron varied. They also find specific reciprocal connections between mMAN and Av. Taken together the authors provide the map of the synaptic connection between intercortical sensory to motor areas which is suggested to be involved in zebra finch song production and learning.
 
 Strengths:
 
 The authors optimized optogenetic tools with eGtACR1 by using AAV which allow them to manipulate synaptic inputs in a projection-specific manner in zebra finches. They also identify HVC cell types based on projection area. With their technical advance and thorough experiments, they provided detailed map synaptic connections.
 
 Weaknesses:
 
 As it is the study in brain slice, the functional implication of synaptic connectivity is limited. Especially as all the experiments were done in the adult preparation, there could be a gap in discussing the functions of developmental song learning.
 
 We thank the reviewer for their appreciation of our work. Although we agree that there can be limitations to brain slice preparations, the approaches used here for synaptic connectivity mapping are well-designed to identify long-range synaptic connectivity patterns. Optogenetic stimulation of axon terminals in brain slices does not require intact axons and works well when axons are cut, allowing identification of all inputs expressing optogenetic channels from aXerent regions. Terminal stimulation in slices yields stable post-synaptic responses for hours without rundown, assuring that polysynaptic and monosynaptic connections can be reliably identified in our brain slices. Additionally, conducting similar types of experiments in vivo can run into important limitations. First, the extent of TTX and 4-AP diXusion, which is necessary for identification of long-range monosynaptic connections, can be diXicult to verify in vivo - potentially confounding identification of monosynaptic connectivity. Second, conducting whole-cell patch-clamp experiments in vivo, particularly in deeper brain regions, is technically challenging, and would limit the number of cells that can be patched and increase the number of animals needed.
 
 We agree that there may well be important diXerences between adult connectivity and connectivity patterns in the juvenile brain. Indeed, learning and experience during development almost certainly shape connectivity patterns and these patterns of connectivity may change incrementally and/or dynamically during development. Ultimately, adult connectivity patterns are the result of changes in the brain that accrue over development. Given that this is the first study mapping long-range connectivity of HVC input-output pathways, we reasoned that the adult connectivity would provide a critical reference allowing future studies to map diXerent stages of juvenile connectivity and the changes in connectivity driven by milestones like forming a tutor song memory, sensorimotor learning, and song crystallization.
 
 In this revision we worked to better highlight the points raised above and thank the reviewer for their comments.
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The manuscript describes synaptic connectivity in the Songbird cortex's four main classes of sensory neuron aXerents onto three known classes of projection neurons of the pre-motor cortical region HVC. HVC is a region associated with the generation of learned bird songs. Investigators here use all male zebra finches to examine the functional anatomy of this region using patch clamp methods combined with optogenetic activation of select neuronal groups.
 
 Strengths:
 
 The quality of the recordings is extremely high and the quantity of data is on a very significant scale, this will certainly aid the field.
 
 Weaknesses:
 
 The authors could make the figures a little easier to navigate. Most of the figures use actual anatomical images but it would be nice to have this linked with a zebra finch atlas in more of a cartoon format that accompanied each fluro image. Additionally, for the most part, figures showing the labeling lack scale bar values (in um). These should be added not just shown in the legends.
 
 The authors could make it clear in the abstract that this is all male zebra finches - perhaps this is obvious given the bird song focus, but it should be stated. The number of recordings from each neuron class and the overall number of birds employed should be clearly stated in the methods (this is in the figures, but it should say n=birds or cells as appropriate).
 
 The authors should consider sharing the actual electrophysiology records as data.
 
 We thank the reviewer for their assessment of our research and suggestions. We have implemented many of these suggestions and provide details in our response to their specific Recommendations. Additionally, we are organizing our data and will make it publicly available with the version of record.
 
 Reviewer #3 (Public review):
 
 Nucleus HVC is critical both for song production as well as learning and arguably, sitting at the top of the song control system, is the most critical node in this circuit receiving a multitude of inputs and sending precisely timed commands that determine the temporal structure of song. The complexity of this structure and its underlying organization seem to become more apparent with each experimental manipulation, and yet our understanding of the underlying circuit organization remains relatively poorly understood. In this study, Trusel and Roberts use classic whole-cell patch clamp techniques in brain slices coupled with optogenetic stimulation of select inputs to provide a careful characterization and quantification of synaptic inputs into HVC. By identifying individual projection neurons using retrograde tracer injections combined with pharmacological manipulations, they classify monosynaptic inputs onto each of the three main classes of glutamatergic projection neurons in HVC (RA-, Area X- and Av-projecting neurons). This study is remarkable in the amount of information that it generates, and the tremendous labor involved for each experiment, from the expression of opsins in each of the target inputs (Uva, NIf, mMAN, and Av), the retrograde labelling of each type of projection neuron, and ultimately the optical stimulation of infected axons while recording from identified projection neurons. Taken together, this study makes an important contribution to increasing our identification, and ultimately understanding, of the basic synaptic elements that make up the circuit organization of HVC, and how external inputs, which we know to be critical for song production and learning, contribute to the intrinsic computations within this critic circuit.
 
 This study is impressive in its scope, rigorous in its implementation, and thoughtful regarding its limitations. The manuscript is well-written, and I appreciate the clarity with which the authors use our latest understanding of the evolutionary origins of this circuit to place these studies within a larger context and their relevance to the study of vocal control, including human speech. My comments are minor and primarily about legibility, clarification of certain manipulations, and organization of some of the summary figures.
 
 We thank the reviewer for their thoughtful assessment of our research.
 
 Recommendations for the authors:
 
 The following recommendations were considered by all reviewers to be important to incorporate for improving this paper:
 
 (1) Clarify the site of viral injection and the possibility of labeling other structures a) Show images of viral injection sites.
 
 We provide a representative image of viral expression for each pathway studied in this manuscript. Please see panel A in Figures 2-3 and 5-6 showing our viral expression in Uva, NIf, mMAN, and Av respectively.
 
 b) Include in discussion caveats that the virus may spread beyond the boundaries of structures (e.g. especially injections into NIF could spread into Field L).
 
 For each HVC aXerent nucleus we have now included a sentence describing the possible spread of viral infection in surrounding structures in the Results. We also now expanded the image from the Av section to include NIf, to showcase lack of viral expression in NIf (see Fig. 6A).
 
 (2) Clarify the logic and precise methods of the TTX and 4-AP experiments
 
 a) Please see the detailed issue raised by Reviewer 3, Major Point 1 below.
 
 The TTX and 4AP application is the gold-standard of opsin-assisted synaptic circuit interrogation, pioneered by the Svoboda lab in 2009 (Petreanu, Mao et al. 2009) and widely used to assess monosynaptic connectivity in multiple brain circuits, as summarized in a recent review(Linders, Supiot et al. 2022). We now better describe the logic of this approach in the second paragraph of the Results section and cite the first description of this method from the Svoboda lab and a recent review weighing this method with other optogenetic methods for tracing synaptic connections in the brain.
 
 (3) Include caveats in discussion
 
 a) Note that there may be other inputs to HVC that were not examined in this study (e.g. CMM, Field L)
 
 In our original manuscript we did state “Although a complete description of HVC circuitry will require the examination of other potential inputs (i.e. RAHVC PNs, A11 glutamatergic neurons(Roberts, Klein et al. 2008, Ben-Tov, Duarte et al. 2023)) and a characterization of interneuron synaptic connectivity, here we provide a map of the synaptic connections between the 4 best described aPerents to HVC and its 3 populations of projection neurons” in the last paragraph of the Discussion. We have now edited this sentence to include the projection from NCM to HVC and cited Louder et al., 2024.
 
 We have extensively mapped input pathways to HVC, and consistent with Vates (Vates, Broome et al. 1996) we have not found evidence that Field L projects to HVC. Rather that it projects to the shelf region outside of HVC. Consistent with this, we do not see retrogradely labeled neurons in Field L following tracer injections confined to HVC (see Fig. 3G). Additionally, we find that CM projections to HVC arise from the nucleus Avalanche (Roberts, Hisey et al. 2017) which we specifically examine in this study. We do not dispute that there may be other pathways projecting to HVC that will need to be examined in the future, including known projections from neuromodulatory regions and RA, from developmentally restricted pathway(s) like NCM (Louder, Kuroda et al. 2024), and from yet unidentified pathways.
 
 b) Also note that birds in this study were adults and that some inputs to HVC likely to be important for learning may recede during development (e.g. Louder et al, 2024).
 
 In the second to last paragraph of the Discussion we now state: While our opsin-assisted circuit mapping provides us with a new level of insight into HVC synaptic circuitry, there are limitations to this research that should be considered. All circuit mapping in this study was carried out in brain slices from adult male zebra finches. Future studies will be needed to examine how this adult connectivity pattern relates to patterns of connectivity in juveniles during sensory or sensorimotor phases of vocal learning and connectivity patterns in female birds.
 
 (4) Consider cosmetic changes to figures as suggested by Reviewers 2-3 below.
 
 We thank the reviewers for their suggestions and have implemented the changes as best we can.
 
 (5) Address all minor issues raised below.
 
 Reviewer #1 (Recommendations for the authors):
 
 I see this study is well designed to answer the author's specific question, mapping synaptic auditorymotor connections within HVC. Their experiments with advanced techniques of projection-specific optogenetic manipulation of synaptic inputs and retrograde identification of projection areas revealed input-output combination selective synaptic mapping.
 
 As I found this study advanced our knowledge with the compelling dataset, I have only some minor comments here.
 
 (1) One technical concern is we don't see how much the virus infection was focused on the target area and if we can ignore the eXect of synaptic connectivity from surrounding areas. As the amount of virus they injected is large (1.5ul) and target areas are small, we assume the virus might spread to the surrounding area, such as field L which also projects to HVC when targeting Nif. While I think the majority of the projections were from their target areas, it would be better to mention (also the images with larger view areas) the possibilities of projections of surrounding areas.
 
 We agree with the reviewer about the concern about specificity of viral expression. For this reason, we included sample images of the viral expression in each target area (panel A in Fig. 2,3,5,6). We have now also included a sentence at the beginning of each subsection of our Result to describe how we have ensured interpretability of the results. Uva and mMAN’s surrounding areas are not known to project to HVC. Possible cross-infection is an issue for Av and NIf, and we checked each bird’s injection site to ensure that eGtACR1+ cells were not visible in the unintended HVC-projecting areas.
 
 As mentioned in our response the public comment, consistent with Vates (Vates, Broome et al. 1996) we do not see evidence that Field L projects directly to HVC (see Fig. 3G).
 
 (2) Another concern about the technical issue is the damage to axonal projections. While I understand the authors stimulated axonal terminals axonal projections were assumed to be cut and their ability to release neurotransmitters would be reduced especially after long-term survival or repeated stimulation. Mentioning whether projection pathways were within their 230um-thick slice (probably depends on input sites) or not and the eXect of axonal cut would be helpful.
 
 We agree that slice electrophysiology has limitations. However, we disagree with the claim of reduced reliability or stability of the evoked response. We and others find that electrical and optogenetic repeated terminal stimulation in slices can yield stable post-synaptic responses for tens of minutes and even hours (Bliss and Gardner-Medwin 1973, Bliss and Lomo 1973, Liu, Kurotani et al. 2004, Pastalkova, Serrano et al. 2006, Xu, Yu et al. 2009, Trusel, Cavaccini et al. 2015, Trusel, Nuno-Perez et al. 2019). Indeed, long-term synaptic plasticity experiments in most preparations and across brain areas rely on such stability of the presynaptic machinery for synaptic release, despite axons being severed from their parent soma. Our assumption is the vast majority, if not all, connections between axon terminals and their cell body in the aXerent regions have been cut in our preparations. Nonetheless, the diversity of outcomes we report (currents returning after TTX+4AP or not, depending on the specific combination of input and HVCPN class) is consistent with the robustness of the synaptic interrogation method.
 
 (3) While I understand this study focused on 4 major input areas and the authors provide good pictures of synaptic HVC connections from those areas, HVC has been reported to receive auditory inputs from other areas as well (CMM, FieldL, etc.). It is worth mentioning that there are other auditory inputs and would be interesting to discuss coordination with the inputs from other areas.
 
 We have extensively mapped input pathways to HVC, and consistent with Vates (Vates, Broome et al. 1996) we have not found evidence that Field L projects to HVC. Rather that it projects to the shelf region outside of HVC. Consistent with this, we do not see retrogradely labeled neurons in Field L following tracer injections confined to HVC (see Fig. 3G). Additionally, we find that CM projections to HVC arise from the nucleus Avalanche (Roberts, Hisey et al. 2017) which we specifically examine in this study. We do not dispute that there may be other pathways projecting to HVC that will need to be examined in the future, including known projections from neuromodulatory regions and RA, from developmentally restricted pathway(s) like NCM (Louder, Kuroda et al. 2024), and from yet unidentified pathways.
 
 (4) The HVC local neuronal connections have been reported to be modified and a recent study revealed the transient auditory inputs into HVC during song learning period. The author discusses the functions of HVC synaptic connections on song learning (also title says synaptic connection for song learning), however, the experiments were done in adults and dp not discuss the possibility of diXerent synaptic connection mapping in juveniles in the song learning period. Mentioning the neuronal activities and connectivity changes during song learning is important. Also, it would be helpful for the readers to discuss the potential diXerences between juveniles/adults if they want to discuss the functions of song learning.
 
 We now mention in the Discussion that this is an important caveat of our research and that future studies will be needed to examine how these adult connectivity patterns relate to connectivity patterns in juveniles during sensory or sensorimotor phases of vocal learning and connectivity patterns in female birds. Nonetheless, the title and abstract cite song learning because it is important for the broader public to understand that at least some of these aXerent brain regions carry an essential role in song learning (Foster and Bottjer 2001, Roberts, Gobes et al. 2012, Roberts, Hisey et al. 2017, Zhao, Garcia-Oscos et al. 2019, Koparkar, Warren et al. 2024).
 
 Reviewer #2 (Recommendations for the authors):
 
 The work is very detailed and will be an important resource to those working in the field. The recordings are of a high quality and lots of information is included such as measures of response kinetics amplitude and pharmacological confirmation of excitatory and inhibitory synaptic responses. In general, I feel the quality is extremely high and the quantity of data is on a very significant exhaustive scale that will certainly aid the field. I have come at this conclusion as a non zebra finch person but I feel the connection information shown will be of benefit given its high quality.
 
 Figure 7 is a nice way of showing the overall organization. Optional suggestion, consider highlighting anything in Figure 7 that results in a new understanding of the song system as compared to previous work on anatomy and function.
 
 We thank the reviewer for the kind comments about our research. We have highlighted our newly found connection between mMAN and Av and all the connections onto the HVC PNs in Panel B are newly identified in this study.
 
 Reviewer #3 (Recommendations for the authors):
 
 Major points
 
 (1) Clarification regarding methods for determining monosynaptic events:
 
 One of the manipulations that I struggled the most with was those describing the use of TTX + 4AP to isolate monosynaptic events. Initially, not being as familiar with the use of optically based photostimulation of axons to release transmitter locally, I was initially confused by statements such as "we found that oEPSC returned after application of TTX+4AP". This might be clear to someone performing these manipulations, but a bit more clarification would be helpful. Should I assume that an existing monosynaptic EPSC would be masked by co-occurring polysynaptic IPSCs which disappear following application of TTX + 4AP, thereby unmasking the monosynaptic EPSC, thereby causing the EPSC to "return"? A word that I am not sure works. Continuing my confusion with these experiments, I am unsure how this cocktail of drugs is added, if it is even added as a cocktail, which is what I initially assumed. The methods and the results are not so clear if they are added in sequence and why and if traces are recorded after the addition of both drugs or if they are recorded for TTX and then again for TTX + 4AP. Finally, looking at the traces in the experimental figures (e.g. Figures 2F, 3F, 5F, and 6F), it is diXicult to see what is being shown, at least for me. First, the authors need to describe better in the results why they stimulate twice in short succession and why they seem to use the response to the second pulse (unless I am mistaken) to measure the monosynaptic event. Second, I was confused by the traces (which are very small) in the presence of TTX. I would have expected to see a response if there was a monosynaptic EPSC but I only seem to see a flat line.
 
 The confusion that I list above might be due in part to my ignorance, but it is important in these types of papers not to assume too much expertise if you want readers with a less sophisticated understanding of synaptic physiology to understand the data. In other words, a little bit more clarity and hand-holding would be welcome.
 
 We understand the reviewer’s confusion about the methodology. In Voltage clamp, the amplifier injects current through the electrode maintaining the membrane voltage to -70mV, where the equilibrium potential for Cl- is near equilibrium, and therefore the only synaptic current evoked by light stimulation is due to cation influx, mainly through AMPA receptors (see Fig. 1). Therefore, cooccurring polysynaptic IPSCs wouldn’t be visible. We examine those holding the membrane voltage at +10mV, see Fig. 1. TTX application suppresses V-dependent Na+ channels and therefore stops all neurotransmission. We show the traces upon TTX to show that currents we were recording prior to TTX application were of synaptic origin, and not due to accidental expression of opsin in the patched cell. Also, this ensures that any current visible after 4AP application is due to monosynaptic transmission and not to a failure of TTX application.
 
 After recording and light stimulation with TTX, we then add 4AP, which is a blocker of presynaptic K+ channels. This prevents the repolarization of the terminals that would occur in response to opsinmediated local depolarization. 4AP application, therefore, allows local opsin-driven depolarizations to reach the threshold for Ca2+-dependent vesicle docking and release. This procedure selectively reveals or unmasks the monosynaptic currents because any non-monosynaptically connected neuron would still need V-dependent Na+ channels to eXectively produce indirect neurotransmission onto the patched cell. The TTX and 4AP application is the gold-standard of opsinassisted synaptic circuit interrogation, pioneered by the Svoboda lab in 2009 and widely used to assess monosynaptic connectivity in multiple brain circuits, as summarized in a recent review (Linders et al., 2022). We now include 2 more sentences near the beginning of the Results to clarify this process and directly point to the Linders review for researchers wanting a deeper explanation of this technique.
 
 The double stimulation is unrelated to our testing of monosynaptic connections. We originally conducted the experiments by delivering 2 pulses of light separated by 50ms, a common way to examine the pair-pulse ratio (PPR) – a physiological measure which is used to probe synapses for short-term plasticity and release probability. However, through discussions with colleagues we realized that the slow decay time of eGtACR1 may complicate interpretation of the response to the second light pulse. Thus, we elected to not report these results and indicated this in the Methods section: “We calculated the paired-pulse ratio (PPR) as the amplitude of the second peak divided by the amplitude of the first peak elicited by the twin stimuli, however due to slow kinetics of eGtACR1 the results would be diPicult to interpret, and therefore we are not currently reporting them.”
 
 (2) Suggestions for improving summary figures:
 
 Summary Figure 1a: The circuit diagram (schematic to the right of 1a) is OK but I initially found it a bit diXicult to interpret. For example, it is not clear why pink RA projecting neurons don't reach as far to the right as X or Av projecting neurons, suggesting that they are not really projection neurons. Also, the big question marks in the intermediate zone are not entirely intuitive. It seems there might be a better way of representing this. It might also be worth stating in the figure legend that the interconnectivity patterns shown in the figure between PNs in HVC are based on specific prior studies.
 
 We thank the reviewer for the constructive criticism. We have modified the figure to extend the RA projection line and mentioned in the figure legend that connectivity between PNs is based on prior studies.
 
 Summary Figure 1a: I am not sure I love this figure. There are a few minor issues. First, there are too many browns [Nif/AV and mMAN] which makes it more challenging to clearly disambiguate the diXerent projections. Second, it is unclear why this figure does not represent projections from RA to HVC. My biggest concern with this figure is that it oversimplifies some of the findings. From the figure, one gets the impression that Uva only projects to RA-PNs and that Av only projects to X-PNs even though the authors show connections to other PNs. With the small sample size in this current study for each projection and each PN type, one really cannot rule out that these "minority" projections are not important. I, therefore, suggest that the authors qualitatively represent the strength/probability of connections by weighting with thickness of aXerent connections.
 
 We assume the reviewer is commenting on our summary figure panel 7B. We agree with the referee that this is a simplified representation of our findings. We had indeed indicated in the legend that this was just a “Schematic of the HVC aXerent connectivity map resulting from the present work” and that “For conceptualization purposes, aXerent connectivity to HVC-PNs is shown only when the rate of monosynaptic connectivity reaches 50% of neurons examined”. We have added a title to highlight that this is but a simplification. We have now adjusted the colors to make the figure easier to follow. Based on the reviewers critique we searched for a better method for summarizing the complex connectivity patterns described in this research. We settled on a Sankey diagram of connectivity. This is now Figure 7C. In this diagram, we are able to show the proportion of connections from each input pathway onto each class of neuron and if these connections are poly or monosynaptic. We find this to a straightforward way of displaying all of the connectivity patterns identified in our figure 2-3 and 4-5 look forward to understanding if the reviewers find this a useful way of illustrating our findings.
 
 Minor points:
 
 (1) Line 50 - typo - song circuits.
 
 Thank you for catching this.
 
 (2) Line 106 - 111 - The findings suggest that 100% of Uva projections onto HVCRA neurons are monosynaptic. However, because the authors only tested 6 neurons their statements that their findings are so diXerent from other studies, should be somewhat tempered since these other studies (e.g. Moll et al.) looked at 251 neurons in HVC and sampling bias could still somewhat explain the diXerence.
 
 We observed oEPSCs in 43 of 51 (84.3%) HVC-RA neurons recorded (mean rise time = 2.4 ms) and monosynaptic connections onto 100% of the HVC-RA neurons tested (n = 6). Moll et al. combined electrical stimulation of Uva with two-photon calcium imaging (GCaMP6s) of putative HVC-RA neurons (n = 251 neurons). We should note that these are putative HVC-RA neurons because they were not visually identified using retrograde tracing or using some other molecular handle. They report that only ~16% of HVC-RA neurons showed reliable calcium responses following Uva stimulation. Although the experiments by Moll et al are technically impressive, calcium imaging is an insensitive technique for measuring post-synaptic responses, particularly subthreshold responses, when compared to whole-cell patch-clamp recordings. This approach cannot identify monosynaptic connections and is likely limited to only be sensitive suprathreshold activity that likely relies on recruitment of other polysynaptic inputs onto the neurons in HVC. Furthermore, as indicated in the Discussion, our opsin-mediated synaptic interrogation recruits any eGtACR1+ Uva terminal in the slice and therefore will have great likelihood of revealing any existing connections.
 
 A limitation of whole-cell patch-clamp recordings is that it is a laborious low throughput technique. Future experiments using better imaging approaches, like voltage imaging, may be able to weigh in on diXerences between what we report here using whole-cell patch-clamp recordings from visually identified HVC-RA neurons combined with optogenetic manipulations of Uva terminals and the calcium imaging results reported by Moll. Nonetheless, whole-cell patch-clamp recordings combined with optogenetic manipulations is likely to remain the most sensitive method for identifying synaptic connectivity.
 
 (3) Figure 2G - the significance of white circles is not clear.
 
 The figure legend indicates that those highlight and mark the position of “retrogradely labeled HVCprojecting neurons in Uva (cyan, white circles)” to facilitate identification of colocalization with the in-situ markers.
 
 (4) Line 135 - Cardin et al. (J. Neurophys. 2004) is the first to show that song production does not require Nif.
 
 We thank the reviewer pointing this out and we have cited this important study.
 
 (5) Line 183 - This is a confusing sentence because I initially thought that mMAN-mMANHVC PNs was a category!
 
 We switched the dash with a colon.
 
 (6) Figure 4d could use some arrows to identify what is shown. It is assumed that the box represents mMAN. Should it be assumed that Av is not in the plane of this section? If not, this should be stated in the legend. It is also unclear where the anterograde projections are. Is this the dork highway that goes from the box to the dorsal surface? If yes this should be indicated but it should also be made clear why the projections go both in the dorsal as well as the ventral directions.
 
 The inset, as indicated by the lines around it, is a magnification of the terminal fields in Av. We added an explanation of the inset.
 
 (7) Discussion. In the introduction, the authors mention projections from RA to HVC but never end up studying them in the current manuscript which seems like a missed opportunity and perhaps even a weakness of the study. In the discussion, it would certainly be good for the authors to at least discuss the possible significance of these projections and perhaps why they decided not to study them.
 
 We thank the reviewer for the comment. Unfortunately, we couldn’t reliably evoke interpretable currents from RA, and we elected to publish the current version of the paper with these 4 major inputs. Nonetheless, we have indicated in the Introduction and in the Discussion that more inputs (e.g. RA, A11, NCM) remain to be evaluated.
 
 (8) Line 622 - Is this reference incomplete?
 
 We thank the reviewer. We have corrected the reference.
 
 Ben-Tov, M., F. Duarte and R. Mooney (2023). "A neural hub for holistic courtship displays." Curr Biol 33(9): 1640-1653 e1645.
 
 Bliss, T. V. and A. R. Gardner-Medwin (1973). "Long-lasting potentiation of synaptic transmission in the dentate area of the unanaestetized rabbit following stimulation of the perforant path." J Physiol 232(2): 357-374.
 
 Bliss, T. V. and T. Lomo (1973). "Long-lasting potentiation of synaptic transmission in the dentate area of the anaesthetized rabbit following stimulation of the perforant path." J Physiol 232(2): 331-356.
 
 Foster, E. F. and S. W. Bottjer (2001). "Lesions of a telencephalic nucleus in male zebra finches: Influences on vocal behavior in juveniles and adults." J Neurobiol 46(2): 142-165.
 
 Koparkar, A., T. L. Warren, J. D. Charlesworth, S. Shin, M. S. Brainard and L. Veit (2024). "Lesions in a songbird vocal circuit increase variability in song syntax." Elife 13.
 
 Linders, L. E., L. F. Supiot, W. Du, R. D'Angelo, R. A. H. Adan, D. Riga and F. J. Meye (2022). "Studying Synaptic Connectivity and Strength with Optogenetics and Patch-Clamp Electrophysiology." Int J Mol Sci 23(19).
 
 Liu, H. N., T. Kurotani, M. Ren, K. Yamada, Y. Yoshimura and Y. Komatsu (2004). "Presynaptic activity and Ca2+ entry are required for the maintenance of NMDA receptor-independent LTP at visual cortical excitatory synapses." J Neurophysiol 92(2): 1077-1087.
 
 Louder, M. I. M., M. Kuroda, D. Taniguchi, J. A. Komorowska-Muller, Y. Morohashi, M. Takahashi, M. Sanchez-Valpuesta, K. Wada, Y. Okada, H. Hioki and Y. Yazaki-Sugiyama (2024). "Transient sensorimotor projections in the developmental song learning period." Cell Rep 43(5): 114196.
 
 Pastalkova, E., P. Serrano, D. Pinkhasova, E. Wallace, A. A. Fenton and T. C. Sacktor (2006). "Storage of spatial information by the maintenance mechanism of LTP." Science 313(5790): 1141-1144.
 
 Petreanu, L., T. Mao, S. M. Sternson and K. Svoboda (2009). "The subcellular organization of neocortical excitatory connections." Nature 457(7233): 1142-1145.
 
 Roberts, T. F., S. M. Gobes, M. Murugan, B. P. Olveczky and R. Mooney (2012). "Motor circuits are required to encode a sensory model for imitative learning." Nat Neurosci 15(10): 1454-1459.
 
 Roberts, T. F., E. Hisey, M. Tanaka, M. G. Kearney, G. Chattree, C. F. Yang, N. M. Shah and R. Mooney (2017). "Identification of a motor-to-auditory pathway important for vocal learning." Nat Neurosci 20(7): 978-986.
 
 Roberts, T. F., M. E. Klein, M. F. Kubke, J. M. Wild and R. Mooney (2008). "Telencephalic neurons monosynaptically link brainstem and forebrain premotor networks necessary for song." J Neurosci 28(13): 3479-3489.
 
 Trusel, M., A. Cavaccini, M. Gritti, B. Greco, P. P. Saintot, C. Nazzaro, M. Cerovic, I. Morella, R. Brambilla and R. Tonini (2015). "Coordinated Regulation of Synaptic Plasticity at Striatopallidal and Striatonigral Neurons Orchestrates Motor Control." Cell Rep 13(7): 1353-1365.
 
 Trusel, M., A. Nuno-Perez, S. Lecca, H. Harada, A. L. Lalive, M. Congiu, K. Takemoto, T. Takahashi, F. Ferraguti and M. Mameli (2019). "Punishment-Predictive Cues Guide Avoidance through Potentiation of Hypothalamus-to-Habenula Synapses." Neuron 102(1): 120-127.e124.
 
 Vates, G. E., B. M. Broome, C. V. Mello and F. Nottebohm (1996). "Auditory pathways of caudal telencephalon and their relation to the song system of adult male zebra finches." Journal of Comparative Neurology 366(4): 613-642.
 
 Xu, T., X. Yu, A. J. Perlik, W. F. Tobin, J. A. Zweig, K. Tennant, T. Jones and Y. Zuo (2009). "Rapid formation and selective stabilization of synapses for enduring motor memories." Nature 462(7275): 915-919.
 
 Zhao, W., F. Garcia-Oscos, D. Dinh and T. F. Roberts (2019). "Inception of memories that guide vocal learning in the songbird." Science 366: 83 - 89.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2022.11.08.515692v6
www.biorxiv.org www.biorxiv.org

Human Brain-Wide Activation of Sleep Rhythms

5
1. Public_Reviews 09 Jun 2025
 
 in eLife
 
 eLife Assessment
 
 The study reports valuable findings from a very rich EEG-fMRI dataset, including 107 participants, which was collected during nocturnal naps. Using overall solid methods, the authors link activity in memory-related brain regions (e.g., hippocampus, thalamus, and medial prefrontal cortex), and their functional connectivity to the occurrence of canonical sleep rhythms (spindles and slow oscillations) in non-rapid eye movement sleep. This work will be of broad interest to sleep and memory researchers and beyond.
 
 Summary
2. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Wang et al., recorded concurrent EEG-fMRI in 107 participants during nocturnal NREM sleep to investigate brain activity and connectivity related to slow oscillations (SO), sleep spindles, and in particular their co-occurrence. The authors found SO-spindle coupling to be correlated with increased thalamic and hippocampal activity, and with increased functional connectivity from the hippocampus to the thalamus and from the thalamus to the neocortex, especially the medial prefrontal cortex (mPFC). They concluded the brain-wide activation pattern to resemble episodic memory processing, but to be dissociated from task-related processing and suggest that the thalamus plays a crucial role in coordinating the hippocampal-cortical dialogue during sleep.
 
 The paper offers an impressively large and highly valuable dataset that provides the opportunity for gaining important new insights into the network substrate involved in SOs, spindles, and their coupling.
 
 Comments on revisions:
 
 While the authors have sufficiently addressed some of my previous comments, I still have severe concerns regarding several key aspects of the methodology, which were even corroborated by the supplementary results presented in response to the last round of reviews. I have the following specific comments (numbers refer to comments raised in the previous review):
 
 Re 1: The revised introduction now cites a couple of papers but discusses them only very superficially, lumping together several studies with very different key results. This is stil not very informative for the reader and does not sufficiently acknowledge previously published work. Here are two examples to illustrate this: a. "These studies have generally reported that slow oscillations are associated with widespread cortical and subcortical BOLD changes, whereas spindles elicit activation in the thalamus, as well as in several cortical and paralimbic regions."  Several studies even showed e.g., a clear activation of the hippocampus and parahippocampal gyrus associated with spindles, not just the thalamus b. "Although these findings provide valuable insights into the BOLD correlates of sleep rhythms, they often do not employ sophisticated temporal modeling (Huang et al., 2024) [, ...]." - previous studies have used e.g., spindle event-related regressors with individual spindle amplitudes as parametric modulators, first and second order derivatives of the HRF function, as well as PPI connectivity analyses, which I would consider rather sophisticated temporal modelling.
 
 Re 4+9: The short overall recordings in some subjects on the one hand and the large number of spindles and SOs detected in N1 sleep stages are still highly concerning, in fact even more so, now that the actual numbers have been provided in the Supplementary Tables. Either the sleep staging or the detection of SO and spindle events must be incorrect. I understand that for specific EEG analysis and fMRI modelling purposes sometimes slightly different thresholds are used as compared to clinical sleep staging, but several parameters here are alarmingly off. a. Given that proper NREM sleep (N2+N3) is the relevant stage for the analyses conducted in this paper, some of the N2+N3 durations are very short (eg 7-8 min) while those subjects' results have the same impact on the group level analyses as those with >100 min of N2+N3. Either subjects with very little relevant data (not overall recording time but N2+N3 time) should be excluded or weighting subject data for the group analyses according to the amount od contributed data should be done. b. The authors argue that the SO and spindle detection algorithms are valid since widely used and that they were developed for N2+N3 stages, which is why they will also detect events in other stages: "While, because the detection methods for SO and spindle are based on percentiles, this method will always detect a certain number of events when used for other stages (N1 and REM) sleep data, but the differences between these events and those detected in stage N23 remain unclear." I do agree that with very liberal thresholds, also SO and spindle vents may be detected in other stages, but it shouldn't be that many. If the percentiles of amplitude thresholds were defined based on properly scored N2+N3 stages only, very few events should be detected (erroneously!) in N1, as the occurrence of K-complexes (isolated SOs) and spindles per definition makes it N2, and during REM sleep only very few spindles and SOS are allowed to occur, without scoring it NREM instead. For the first subject (just as example, but with similar numbers for the rest of the sample), reveals as many as 60 SOs and 31 spindles within 8 min of N1 sleep (Table S2) as well as 13 SOs and 7 spindles within 2 min of REM sleep (Table S4). These numbers are completely unrealistic and question the correctness of the sleep staging as well as the physiological relevance of the EEG graphoelements identified as SO and spindles. It also completely undermines the interpretability of the respective event regressors for the fMRI analyses. c. Likely, given the large numbers of coupled SO-spindle events and the apparently very low amplitude criteria for event identification, also the number of SO-spindle couplings is likely severely overestimated.
 
 Re 10: The rationale for using a lateralized frontal electrode (F3) for both SO (should have been at least bilateral or central) and spindle detection (should have been a centro-parietal electrode) is not convincing. Other EEG-fMRI spindle or SO papers have used a number of frontal (SO) or centro-parietal (spindles) electrodes averaged or even approaches including all EEG electrodes. Searching events with low thresholds at suboptimal recording sites does not dot this highly valuable dataset justice.
 
 Re 7: It is not clear to me why/how larger voxels would reduce susceptibility-related distortions and partial volume effects. Usually, the opposite is true. This should be elaborated.
 
 Review 1
3. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 In this study, Wang and colleagues aimed to explore brain-wide activation patterns associated with NREM sleep oscillations, including slow oscillations (SOs), spindles, and SO-spindle coupling events. Their findings reveal that SO-spindle events corresponded with increased activation in both the thalamus and hippocampus. Additionally, they observed that SO-spindle coupling was linked to heightened functional connectivity from the hippocampus to the thalamus, and from the thalamus to the medial prefrontal cortex-three key regions involved in memory consolidation and episodic memory processes.
 
 This study's findings are timely and highly relevant to the field. The authors' extensive data collection, involving 107 participants sleeping in an fMRI while undergoing simultaneous EEG recording, deserves special recognition. If shared, this unique dataset could lead to further valuable insights.
 
 Comments on revisions:
 
 The authors' efforts in revising the manuscript and addressing the reviewers' comments are certainly commendable. However, I remain concerned about potential issues in detecting sleep-related oscillations (SOs, spindles, and consequently coupled SO-spindle events), which may arise due to suboptimal parameter selection or inaccurate sleep staging, potentially impacting all subsequent analyses.
 
 A review of Supplementary Tables 1-4 reveals an unusually high number of detected SOs and spindles during sleep stage N1 and REM sleep. While the authors correctly note that a percentile-based detection approach will always identify a certain number of events across sleep stages, the particularly high counts in N1 and REM are concerning. To mitigate the limitations of this method, the authors could have performed event detection independently of sleep stages (i.e., across the entire dataset for each participant) and subsequently assigned the detected events to the corresponding sleep stages. If the event counts in N1 and REM remained disproportionately high, this would indicate a fundamental issue with the detection procedure.
 
 Review 2
4. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 Wang et al., examined the brain activity patterns during sleep, especially when locked to those canonical sleep rhythms such as SO, spindle, and their coupling. Analyzing data from a large sample, the authors found significant coupling between spindles and SOs, particularly during the up-state of the SO. Moreover, the authors examined the patterns of whole-brain activity locked to these sleep rhythms. The authors next investigated the functional connectivity analyses, and found enhanced connectivity between the hippocampus and the thalamus and the medial PFC. These results reinforced the theoretical model of sleep-dependent memory consolidation, such that SO-spindle coupling is conducive for systems-level memory reactivation and consolidation.
 
 Strengths:
 
 There are obvious strengths in this work, including the large sample size, state-of-the-art neuroimaging and neural oscillation analyses, and the richness of results. The results now inform hemodynamic neural activity that coincided with SO-spindle couplings.
 
 Weaknesses:
 
 My earlier comments were about the inability to make inferences on memory given the lack of memory tasks, and the weakness in using the open-ended cognitive state decoding.
 
 The current revision has addressed these major concerns. The authors expanded discussions regarding the theoretical implications of the work in a more nuanced manner.
 
 Review 3
5. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews
 
 Public Reviews:
 
 Reviewer #1 (Public review):
 
 Wang et al., recorded concurrent EEG-fMRI in 107 participants during nocturnal NREM sleep to investigate brain activity and connectivity related to slow oscillations (SO), sleep spindles, and in particular their co-occurrence. The authors found SO-spindle coupling to be correlated with increased thalamic and hippocampal activity, and with increased functional connectivity from the hippocampus to the thalamus and from the thalamus to the neocortex, especially the medial prefrontal cortex (mPFC). They concluded the brain-wide activation pattern to resemble episodic memory processing, but to be dissociated from task-related processing and suggest that the thalamus plays a crucial role in coordinating the hippocampal-cortical dialogue during sleep.
 
 The paper offers an impressively large and highly valuable dataset that provides the opportunity for gaining important new insights into the network substrate involved in SOs, spindles, and their coupling. However, the paper does unfortunately not exploit the full potential of this dataset with the analyses currently provided, and the interpretation of the results is often not backed up by the results presented. I have the following specific comments.
 
 Thank you for your thoughtful and constructive feedback. We greatly appreciate your recognition of the strengths of our dataset and findings Below, we address your specific comments and provide responses to each point you raised to ensure our methods and results are as transparent and comprehensible as possible. We hope these revisions address your comments and further strengthen our manuscript. Thank you again for the constructive feedback.
 
 (1) The introduction is lacking sufficient review of the already existing literature on EEG-fMRI during sleep and the BOLD-correlates of slow oscillations and spindles in particular (Laufs et al., 2007; Schabus et al., 2007; Horovitz et al., 2008; Laufs, 2008; Czisch et al., 2009; Picchioni et al., 2010; Spoormaker et al., 2010; Caporro et al., 2011; Bergmann et al., 2012; Hale et al., 2016; Fogel et al., 2017; Moehlman et al., 2018; Ilhan-Bayrakci et al., 2022). The few studies mentioned are not discussed in terms of the methods used or insights gained.
 
 We acknowledge the need for a more comprehensive review of prior EEG-fMRI studies investigating BOLD correlates of slow oscillations and spindles. However, these articles are not all related to sleep SO or spindle. Articles (Hale et al., 2016; Horovitz et al., 2008; Laufs, 2008; Laufs, Walker, & Lund, 2007; Spoormaker et al., 2010) mainly focus on methodology for EEG-fMRI, sleep stages, or brain networks, which are not the focus of our study. Thank you again for your attention to the comprehensiveness of our literature review, and we will expand the introduction to include a more detailed discussion of the existing literature, ensuring that the contributions of previous EEG-fMRI sleep studies are adequately acknowledged.
 
 Introduction, Page 4 Lines 62-76
 
 “Investigating these sleep-related neural processes in humans is challenging because it requires tracking transient sleep rhythms while simultaneously assessing their widespread brain activation. Recent advances in simultaneous EEG-fMRI techniques provide a unique opportunity to explore these processes. EEG allows for precise event-based detection of neural signal, while fMRI provides insight into the broader spatial patterns of brain activation and functional connectivity (Horovitz et al., 2008; Huang et al., 2024; Laufs, 2008; Laufs, Walker, & Lund, 2007; Schabus et al., 2007; Spoormaker et al., 2010). Previous EEG-fMRI studies on sleep have focused on classifying sleep stages or examining the neural correlates of specific waves (Bergmann et al., 2012; Caporro et al., 2012; Czisch et al., 2009; Fogel et al., 2017; Hale et al., 2016; Ilhan-Bayrakcı et al., 2022; Moehlman et al., 2019; Picchioni et al., 2011). These studies have generally reported that slow oscillations are associated with widespread cortical and subcortical BOLD changes, whereas spindles elicit activation in the thalamus, as well as in several cortical and paralimbic regions. Although these findings provide valuable insights into the BOLD correlates of sleep rhythms, they often do not employ sophisticated temporal modeling (Huang et al., 2024), to capture the dynamic interactions between different oscillatory events, e.g., the coupling between SOs and spindles.”
 
 (2) The paper falls short in discussing the specific insights gained into the neurobiological substrate of the investigated slow oscillations, spindles, and their interactions. The validity of the inverse inference approach ("Open ended cognitive state decoding"), assuming certain cognitive functions to be related to these oscillations because of the brain regions/networks activated in temporal association with these events, is debatable at best. It is also unclear why eventually only episodic memory processing-like brain-wide activation is discussed further, despite the activity of 16 of 50 feature terms from the NeuroSynth v3 dataset were significant (episodic memory, declarative memory, working memory, task representation, language, learning, faces, visuospatial processing, category recognition, cognitive control, reading, cued attention, inhibition, and action).
 
 Thank you for pointing this out, particularly regarding the use of inverse inference approaches such as “open-ended cognitive state decoding.” Given the concerns about the indirectness of this approach, we decided to remove its related content and results from Figure 3 in the main text and include it in Supplementary Figure 7. We will refocus the main text on direct neurobiological insights gained from our EEG-fMRI analyses, particularly emphasizing the hippocampal-thalamocortical network dynamics underlying SO-spindle coupling, and we will acknowledge the exploratory nature of these findings and highlight their limitations.
 
 Discussion, Page 17-18 Lines 323-332
 
 “To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potential functional claims.”
 
 (3) Hippocampal activation during SO-spindles is stated as a main hypothesis of the paper - for good reasons - however, other regions (e.g., several cortical as well as thalamic) would be equally expected given the known origin of both oscillations and the existing sleep-EEG-fMRI literature. However, this focus on the hippocampus contrasts with the focus on investigating the key role of the thalamus instead in the Results section.
 
 We appreciate your insight regarding the relative emphasis on hippocampal and thalamic activation in our study. We recognize that the manuscript may currently present an inconsistency between our initial hypothesis and the main focus of the results. To address this concern, we will ensure that our Introduction and Discussion section explicitly discusses both regions, highlighting the complementary roles of the hippocampus (memory processing and reactivation) and the thalamus (spindle generation and cortico-hippocampal coordination) in SO-spindle dynamics.
 
 Introduction, Page 5 Lines 87-103
 
 “To address this gap, our study investigates brain-wide activation and functional connectivity patterns associated with SO-spindle coupling, and employs a cognitive state decoding approach (Margulies et al., 2016; Yarkoni et al., 2011)—albeit indirectly—to infer potential cognitive functions. In the current study, we used simultaneous EEG-fMRI recordings during nocturnal naps (detailed sleep staging results are provided in the Methods and Table S1) in 107 participants. Although directly detecting hippocampal ripples using scalp EEG or fMRI is challenging, we expected that hippocampal activation in fMRI would coincide with SO-spindle coupling detected by EEG, given that SOs, spindles, and ripples frequently co-occur during NREM sleep. We also anticipated a critical role of the thalamus, particularly thalamic spindles, in coordinating hippocampal-cortical communication.
 
 We found significant coupling between SOs and spindles during NREM sleep (N2/3), with spindle peaks occurring slightly before the SO peak. This coupling was associated with increased activation in both the thalamus and hippocampus, with functional connectivity patterns suggesting thalamic coordination of hippocampal-cortical communication. These findings highlight the key role of the thalamus in coordinating hippocampal-cortical interactions during human sleep and provide new insights into the neural mechanisms underlying sleep-dependent brain communication. A deeper understanding of these mechanisms may contribute to future neuromodulation approaches aimed at enhancing sleep-dependent cognitive function and treating sleep-related disorders.”
 
 Discussion, Page 16-17 Lines 292-307
 
 “When modeling the timing of these sleep rhythms in the fMRI, we observed hippocampal activation selectively during SO-spindle events. This suggests the possibility of triple coupling (SOs–spindles–ripples), even though our scalp EEG was not sufficiently sensitive to detect hippocampal ripples—key markers of memory replay (Buzsáki, 2015). Recent iEEG evidence indicates that ripples often co-occur with both spindles (Ngo, Fell, & Staresina, 2020) and SOs (Staresina et al., 2015; Staresina et al., 2023). Therefore, the hippocampal involvement during SO-spindle events in our study may reflect memory replay from the hippocampus, propagated via thalamic spindles to distributed cortical regions.
 
 The thalamus, known to generate spindles (Halassa et al., 2011), plays a key role in producing and coordinating sleep rhythms (Coulon, Budde, & Pape, 2012; Crunelli et al., 2018), while the hippocampus is found essential for memory consolidation (Buzsáki, 2015; Diba & Buzsá ki, 2007; Singh, Norman, & Schapiro, 2022). The increased hippocampal and thalamic activity, along with strengthened connectivity between these regions and the mPFC during SO-spindle events, underscores a hippocampal-thalamic-neocortical information flow. This aligns with recent findings suggesting the thalamus orchestrates neocortical oscillations during sleep (Schreiner et al., 2022). The thalamus and hippocampus thus appear central to memory consolidation during sleep, guiding information transfer to the neocortex, e.g., mPFC.”
 
 (4) The study included an impressive number of 107 subjects. It is surprising though that only 31 subjects had to be excluded under these difficult recording conditions, especially since no adaptation night was performed. Since only subjects were excluded who slept less than 10 min (or had excessive head movements) there are likely several datasets included with comparably short durations and only a small number of SOs and spindles and even less combined SO-spindle events. A comprehensive table should be provided (supplement) including for each subject (included and excluded) the duration of included NREM sleep, number of SOs, spindles, and SO+spindle events. Also, some descriptive statistics (mean/SD/range) would be helpful.
 
 We appreciate your recognition of our sample size and the challenges associated with simultaneous EEG-fMRI sleep recordings. We acknowledge the importance of transparently reporting individual subject data, particularly regarding sleep duration and the number of detected SOs, spindles, and SO-spindle events. To address this, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics (Table S1), as well as detailed information about sleep waves at each sleep stage for all 107 subjects(Table S2-S4), listing for each subject:(1)Different sleep stage duration; (2)Number of detected SOs; (3)Number of detected spindles; (4)Number of detected SO-spindle coupling events; (5)Density of detected SOs; (6)Density of detected spindles; (7)Density of detected SO-spindle coupling events.
 
 However, most of the excluded participants were unable to fall asleep or had too short a sleep duration, so they basically had no NREM sleep period, so it was impossible to count the NREM sleep duration, SO, spindle, and coupling numbers.
 
 Supplementary Materials, Page 42-54, Table S1-S4
 
 (5) Was the 20-channel head coil dedicated for EEG-fMRI measurements? How were the electrode cables guided through/out of the head coil? Usually, the 64-channel head coil is used for EEG-fMRI measurements in a Siemens PRISMA 3T scanner, which has a cable duct at the back that allows to guide the cables straight out of the head coil (to minimize MR-related artifacts). The choice for the 20-channel head coil should be motivated. Photos of the recording setup would also be helpful.
 
 Thank you for your comment regarding our choice of the 20-channel head coil for EEG-fMRI measurements. We acknowledge that the 64-channel head coil is commonly used in Siemens PRISMA 3T scanners; however, the 20-channel coil was selected due to specific practical and technical considerations in our study. In particular, the 20-channel head coil was compatible with our EEG system and ensured sufficient signal-to-noise ratio (SNR) for both EEG and fMRI acquisition. The EEG electrode cables were guided through the lateral and posterior openings of the head coil, secured with foam padding to reduce motion and minimize MR-related artifacts. Moreover, given the extended nature of nocturnal sleep recordings, the 20-channel coil allowed us to maintain participant comfort while still achieving high-quality simultaneous EEG-fMRI data.
 
 We have made this clearer in the revised manuscript.
 
 Methods, Page 20 Lines 385-392
 
 “All MRI data were acquired using a 20-channel head coil on a research-dedicated 3-Tesla Siemens Magnetom Prisma MRI scanner. Earplugs and cushions were provided for noise protection and head motion restriction. We chose the 20-channel head coil because it was compatible with our EEG system and ensured sufficient signal-to-noise ratio (SNR) for both EEG and fMRI acquisition. The EEG electrode cables were guided through the lateral and posterior openings of the head coil, secured with foam padding to reduce motion and minimize MR-related artifacts. Moreover, given the extended nature of nocturnal sleep recordings, the 20-channel coil helped maintain participant comfort while still achieving high-quality simultaneous EEG-fMRI data.”
 
 (6) Was the EEG sampling synchronized to the MR scanner (gradient system) clock (the 10 MHz signal; not referring to the volume TTL triggers here)? This is a requirement for stable gradient artifact shape over time and thus accurate gradient noise removal.
 
 Thank you for raising this important point. We confirm that the EEG sampling was synchronized to the MR scanner’s 10 MHz gradient system clock, ensuring a stable gradient artifact shape over time and enabling accurate artifact removal. This synchronization was achieved using the standard clock synchronization interface of the EEG amplifier, minimizing timing jitter and drift. As a result, the gradient artifact waveform remained stable across volumes, allowing for more effective artifact correction during preprocessing. We appreciate your attention to this critical aspect of EEG-fMRI data acquisition.
 
 We have made this clearer in the revised manuscript.
 
 Methods, Page 19-20 Lines 371-383
 
 “EEG was recorded simultaneously with fMRI data using an MR-compatible EEG amplifier system (BrainAmps MR-Plus, Brain Products, Germany), along with a specialized electrode cap. The recording was done using 64 channels in the international 10/20 system, with the reference channel positioned at FCz. In order to adhere to polysomnography (PSG) recording standards, six electrodes were removed from the EEG cap: one for electrocardiogram (ECG) recording, two for electrooculogram (EOG) recording, and three for electromyogram (EMG) recording. EEG data was recorded at a sample rate of 5000 Hz, the resistance of the reference and ground channels was kept below 10 kΩ, and the resistance of the other channels was kept below 20 kΩ. To synchronize the EEG and fMRI recordings, the BrainVision recording software (BrainProducts, Germany) was utilized to capture triggers from the MRI scanner. The EEG sampling was synchronized to the MR scanner’s 10 MHz gradient system clock, ensuring a stable gradient artifact shape over time and enabling accurate artifact removal. This was achieved via the standard clock synchronization interface of the EEG amplifier, minimizing timing jitter and drift.”
 
 (7) The TR is quite long and the voxel size is quite large in comparison to state-of-the-art EPI sequences. What was the rationale behind choosing a sequence with relatively low temporal and spatial resolution?
 
 We acknowledge that our chosen TR and voxel size are relatively long and large compared to state-of-the-art EPI sequences. This decision was made to optimize the signal-to-noise ratio (SNR) and reduce susceptibility-related distortions, which are particularly critical in EEG-fMRI sleep studies where head motion and physiological noise can be substantial. A longer TR allowed us to sample whole-brain activity with sufficient coverage, while a larger voxel size helped enhance BOLD sensitivity and minimize partial volume effects in deep brain structures such as the thalamus and hippocampus, which are key regions of interest in our study. We appreciate your concern and hope this clarification provides sufficient rationale for our sequence parameters.
 
 We have made this clearer in the revised manuscript.
 
 Methods, Page 20-21 Lines 398-408
 
 “Then, the “sleep” session began after the participants were instructed to try and fall asleep. For the functional scans, whole-brain images were acquired using k-space and steady-state T2*-weighted gradient echo-planar imaging (EPI) sequence that is sensitive to the BOLD contrast. This measures local magnetic changes caused by changes in blood oxygenation that accompany neural activity (sequence specification: 33 slices in interleaved ascending order, TR = 2000 ms, TE = 30 ms, voxel size = 3.5 × 3.5 × 4.2 mm3, FA = 90°, matrix = 64 × 64, gap = 0.7 mm). A relatively long TR and larger voxel size were chosen to optimize SNR and reduce susceptibility-related distortions, which are critical in EEG-fMRI sleep studies where head motion and physiological noise can be substantial. The longer TR allowed whole-brain coverage with sufficient temporal resolution, while the larger voxel size helped enhance BOLD sensitivity and minimize partial volume effects in deep brain structures (e.g., the thalamus and hippocampus), which are key regions of interest in this study.”
 
 (8) The anatomically defined ROIs are quite large. It should be elaborated on how this might reduce sensitivity to sleep rhythm-specific activity within sub-regions, especially for the thalamus, which has distinct nuclei involved in sleep functions.
 
 We appreciate your insight regarding the use of anatomically defined ROIs and their potential limitations in detecting sleep rhythm-specific activity within sub-regions, particularly in the thalamus. Given the distinct functional roles of thalamic nuclei in sleep processes, we acknowledge that using a single, large thalamic ROI may reduce sensitivity to localized activity patterns. To address this, we will discuss this limitation in the revised manuscript, acknowledging that our approach prioritizes whole-structure effects but may not fully capture nucleus-specific contributions.
 
 Discussion, Page 18 Lines 333-341
 
 “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”
 
 (9) The study reports SO & spindle amplitudes & densities, as well as SO+spindle coupling, to be larger during N2/3 sleep compared to N1 and REM sleep, which is trivial but can be seen as a sanity check of the data. However, the amount of SOs and spindles reported for N1 and REM sleep is concerning, as per definition there should be hardly any (if SOs or spindles occur in N1 it becomes by definition N2, and the interval between spindles has to be considerably large in REM to still be scored as such). Thus, on the one hand, the report of these comparisons takes too much space in the main manuscript as it is trivial, but on the other hand, it raises concerns about the validity of the scoring.
 
 We appreciate your concern regarding the reported presence of SOs and spindles in N1 and REM sleep and the potential implications. Our detection method for detecting SO, spindle, and coupling were originally designed only for N2&N3 sleep data based on the characteristics of the data itself, and this method is widely recognized and used in the sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). While, because the detection methods for SO and spindle are based on percentiles, this method will always detect a certain number of events when used for other stages (N1 and REM) sleep data, but the differences between these events and those detected in stage N23 remain unclear. We will acknowledge the reasons for these results in the Methods section and emphasize that they are used only for sanity checks.
 
 Methods, Page 25 Lines 515-524
 
 “We note that the above methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).”
 
 (10) Why was electrode F3 used to quantify the occurrence of SOs and spindles? Why not a midline frontal electrode like Fz (or a number of frontal electrodes for SOs) and Cz (or a number of centroparietal electrodes) for spindles to be closer to their maximum topography?
 
 We appreciate your suggestion regarding electrode selection for SO and spindle quantification. Our choice of F3 was primarily based on previous studies (Massimini et al., 2004; Molle et al., 2011), where bilateral frontal electrodes are commonly used for detecting SOs and spindles. Additionally, we considered the impact of MRI-related noise and, after a comprehensive evaluation, determined that F3 provided an optimal balance between signal quality and artifact minimization. We also acknowledge that alternative electrode choices, such as Fz for SOs and Cz for spindles, could provide additional insights into their topographical distributions.
 
 (11) Functional connectivity (hippocampus -> thalamus -> cortex (mPFC)) is reported to be increased during SO-spindle coupling and interpreted as evidence for coordination of hippocampo-neocortical communication likely by thalamic spindles. However, functional connectivity was only analysed during coupled SO+spindle events, not during isolated SOs or isolated spindles. Without the direct comparison of the connectivity patterns between these three events, it remains unclear whether this is specific for coupled SO+spindle events or rather associated with one or both of the other isolated events. The PPIs need to be conducted for those isolated events as well and compared statistically to the coupled events.
 
 We appreciate your critical perspective on our functional connectivity analysis and the interpretation of hippocampus-thalamus-cortex (mPFC) interactions during SO-spindle coupling. We acknowledge that, in the current analysis, functional connectivity was only examined during coupled SO-spindle events, without direct comparison to isolated SOs or isolated spindles. To address this concern, we have conducted PPI analyses for all three ROIs(Hippocampus, Thalamus, mPFC) and all three event types (SO-spindle couplings, isolated SOs, and isolated spindles). Our results indicate that neither isolated SOs nor isolated Spindles yielded significant connectivity changes in all three ROIs, as all failed to survive multiple comparison corrections. This suggests that the observed connectivity increase is specific to SO-spindle coupling, rather than being independently driven by either SOs or spindles alone.
 
 Results, Page 14 Lines 248-255
 
 “Crucially, the interaction between FC and SO-spindle coupling revealed that only the functional connectivity of hippocampus -> thalamus (ROI analysis, t(106) = 1.86, p = 0.0328) and thalamus -> mPFC (ROI analysis, t(106) = 1.98, p = 0.0251) significantly increased during SO-spindle coupling, with no significant changes in all other pathways (Fig. 4e). We also conducted PPI analyses for the other two events (SOs and spindles), and neither yielded significant connectivity changes in the three ROIs, as all failed to survive whole-brain FWE correction at the cluster level (p < 0.05). Together, these findings suggest that the thalamus, likely via spindles, coordinates hippocampal-cortical communication selectively during SO-spindle coupling, but not isolated SOs or spindle events alone.”
 
 (12) The limited temporal resolution of fMRI does indeed not allow for easily distinguishing between fMRI activation patterns related to SO-up- vs. SO-down-states. For this, one could try to extract the amplitudes of SO-up- and SO-down-states separately for each SO event and model them as two separate parametric modulators (with the risk of collinearity as they are likely correlated).
 
 We appreciate your insightful comment regarding the challenge of distinguishing fMRI activation patterns related to SO-up vs. SO-down states due to the limited temporal resolution of fMRI. While our current analysis does not differentiate between these two phases, we acknowledge that separately modeling SO-up and SO-down states using parametric modulators could provide a more refined understanding of their distinct neural correlates. However, as you notes, this approach carries the risk of collinearity, and there is indeed a high correlation between the two amplitudes across all subjects in our results (r=0.98). Future studies could explore more on leveraging high-temporal-resolution techniques. While implementing this in the current study is beyond our scope, we will acknowledge this limitation in the Discussion section.
 
 Discussion, Page 17 Lines 308-322
 
 “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.”
 
 Discussion, Page 18 Lines 333-341
 
 “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”
 
 (13) L327: "It is likely that our findings of diminished DMN activity reflect brain activity during the SO DOWN-state, as this state consistently shows higher amplitude compared to the UP-state within subjects, which is why we modelled the SO trough as its onset in the fMRI analysis." This conclusion is not justified as the fact that SO down-states are larger in amplitude does not mean their impact on the BOLD response is larger.
 
 We appreciate your concern regarding our interpretation of diminished DMN activity reflecting the SO down-state. We acknowledge that the current expression is somewhat misleading, and our interpretation of it is: it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. And we will make this clear in the Discussion section.
 
 Discussion, Page 17 Lines 308-322
 
 “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.”
 
 (14) Line 77: "In the current study, while directly capturing hippocampal ripples with scalp EEG or fMRI is difficult, we expect to observe hippocampal activation in fMRI whenever SOs-spindles coupling is detected by EEG, if SOs- spindles-ripples triple coupling occurs during human NREM sleep". Not all SO-spindle events are associated with ripples (Staresina et al., 2015), but hippocampal activation may also be expected based on the occurrence of spindles alone (Bergmann et al., 2012).
 
 We appreciate your clarification regarding the relationship between SO-spindle coupling and hippocampal ripples. We acknowledge that not all SO-spindle events are necessarily accompanied by ripples (Staresina et al., 2015). However, based on previous research, we found that hippocampal ripples are significantly more likely to occur during SO-spindle coupling events. This suggests that while ripple occurrence is not guaranteed, SO-spindle coupling creates a favorable network state for ripple generation and potential hippocampal activation. To ensure accuracy, we will revise the manuscript to delete this misleading sentence in the Introduction section and acknowledge in the Discussion that our results cannot conclusively directly observe the triple coupling of SO, spindle, and hippocampal ripples.
 
 Discussion, Page 18 Lines 333-341
 
 “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”
 
 Reviewer #2 (Public review):
 
 In this study, Wang and colleagues aimed to explore brain-wide activation patterns associated with NREM sleep oscillations, including slow oscillations (SOs), spindles, and SO-spindle coupling events. Their findings reveal that SO-spindle events corresponded with increased activation in both the thalamus and hippocampus. Additionally, they observed that SO-spindle coupling was linked to heightened functional connectivity from the hippocampus to the thalamus, and from the thalamus to the medial prefrontal cortex-three key regions involved in memory consolidation and episodic memory processes.
 
 This study's findings are timely and highly relevant to the field. The authors' extensive data collection, involving 107 participants sleeping in an fMRI while undergoing simultaneous EEG recording, deserves special recognition. If shared, this unique dataset could lead to further valuable insights. While the conclusions of the data seem overall well supported by the data, some aspects with regard to the detection of sleep oscillations need clarification.
 
 The authors report that coupled SO-spindle events were most frequent during NREM sleep (2.46 [plus minus] 0.06 events/min), but they also observed a surprisingly high occurrence of these events during N1 and REM sleep (2.23 [plus minus] 0.09 and 2.32 [plus minus] 0.09 events/min, respectively), where SO-spindle coupling would not typically be expected. Combined with the relatively modest SO amplitudes reported (~25 µV, whereas >75 µV would be expected when using mastoids as reference electrodes), this raises the possibility that the parameters used for event detection may not have been conservative enough - or that sleep staging was inaccurately performed. This issue could present a significant challenge, as the fMRI findings are largely dependent on the reliability of these detected events.
 
 Thank you very much for your thorough and encouraging review. We appreciate your recognition of the significance and relevance of our study and dataset, particularly in highlighting how simultaneous EEG-fMRI recordings can provide complementary insights into the temporal dynamics of neural oscillations and their associated spatial activation patterns during sleep. In the sections that follow, we address each of your comments in detail. We have revised the text and conducted additional analyses wherever possible to strengthen our argument, clarify our methodological choices. We believe these revisions improve the clarity and rigor of our work, and we thank you for helping us refine it.
 
 We appreciate your insightful comments regarding the detection of sleep oscillations. Our methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM. We will acknowledge the reasons for these results in the Methods section and emphasize that they are used only for sanity checks.
 
 Regarding the reported SO amplitudes (~25 µV), during preprocessing, we applied the Signal Space Projection (SSP) method to more effectively remove MRI gradient artifacts and cardiac pulse noise. While this approach enhances data quality, it also reduces overall signal power, leading to systematically lower reported amplitudes. Despite this, our SO detection in NREM sleep (especially N2/N3) remain physiologically meaningful and are consistent with previous fMRI studies using similar artifact removal techniques. We appreciate your careful evaluation and valuable suggestions.
 
 In addition, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics (Table S1), as well as detailed information about sleep waves at each sleep stage for all 107 subjects(Table S2-S4), listing for each subject:(1)Different sleep stage duration; (2)Number of detected SOs; (3)Number of detected spindles; (4)Number of detected SO-spindle coupling events; (2)Density of detected SOs; (3)Density of detected spindles; (4)Density of detected SO-spindle coupling events.
 
 Methods, Page 25 Lines 515-524
 
 “We note that the above methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).”
 
 Supplementary Materials, Page 42-54, Table S1-S4
 
 Reviewer #3 (Public review):
 
 Summary:
 
 Wang et al., examined the brain activity patterns during sleep, especially when locked to those canonical sleep rhythms such as SO, spindle, and their coupling. Analyzing data from a large sample, the authors found significant coupling between spindles and SOs, particularly during the upstate of the SO. Moreover, the authors examined the patterns of whole-brain activity locked to these sleep rhythms. To understand the functional significance of these brain activities, the authors further conducted open-ended cognitive state decoding and found a variety of cognitive processing may be involved during SO-spindle coupling and during other sleep events. The authors next investigated the functional connectivity analyses and found enhanced connectivity between the hippocampus, the thalamus, and the medial PFC. These results reinforced the theoretical model of sleep-dependent memory consolidation, such that SO-spindle coupling is conducive to systems-level memory reactivation and consolidation.
 
 Strengths:
 
 There are obvious strengths in this work, including the large sample size, state-of-the-art neuroimaging and neural oscillation analyses, and the richness of results.
 
 Weaknesses:
 
 Despite these strengths and the insights gained, there are weaknesses in the design, the analyses, and inferences.
 
 Thank you for your detailed and thoughtful review of our manuscript. We are delighted that you recognize our advanced analysis methods and rich results of neuroimaging and neural oscillations as well as the large sample size data. In the following sections, we provide detailed responses to each of your comments. And we have revised the text and conducted additional analyses to strengthen our arguments and clarify our methodological choices. We believe these revisions enhance the clarity and rigor of our work, and we sincerely appreciate your thoughtful feedback in helping us refine the manuscript.
 
 (1) A repeating statement in the manuscript is that brain activity could indicate memory reactivation and thus consolidation. This is indeed a highly relevant question that could be informed by the current data/results. However, an inherent weakness of the design is that there is no memory task before and after sleep. Thus, it is difficult (if not impossible) to make a strong argument linking SO/spindle/coupling-locked brain activity with memory reactivation or consolidation.
 
 We appreciate your suggestion regarding the lack of a pre- and post-sleep memory task in our study design. We acknowledge that, in the absence of behavioral measures, it is hard to directly link SO-spindle coupling to memory consolidation in an outcome-driven manner. Our interpretation is instead based on the well-established role of these oscillations in memory processes, as demonstrated in previous studies. We sincerely appreciate this feedback and will adjust our Discussion accordingly to reflect a more precise interpretation of our findings.
 
 Discussion, Page 18 Lines 333-341
 
 “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”
 
 (2) Relatedly, to understand the functional implications of the sleep rhythm-locked brain activity, the authors employed the "open-ended cognitive state decoding" method. While this method is interesting, it is rather indirect given that there were no behavioral indices in the manuscript. Thus, discussions based on these analyses are speculative at best. Please either tone down the language or find additional evidence to support these claims.
 
 Moreover, the results from this method are difficult to understand. Figure 3e showed that for all three types of sleep events (SO, spindle, SO-spindle), the same mental states (e.g., working memory, episodic memory, declarative memory) showed opposite directions of activation (left and right panels showed negative and positive activation, respectively). How to interpret these conflicting results? This ambiguity is also reflected by the term used: declarative memory and episodic memories are both indexed in the results. Yet these two processes can be largely overlapped. So which specific memory processes do these brain activity patterns reflect? The Discussion shall discuss these results and the limitations of this method.
 
 We appreciate your critical assessment of the open-ended cognitive state decoding method and its interpretational challenges. Given the concerns about the indirectness of this approach, we decided to remove its related content and results from Figure 3 in the main text and include it in Supplementary Figure 7.
 
 Due to the complexity of memory-related processes, we acknowledge that distinguishing between episodic and declarative memory based solely on this approach is not straightforward. We will revise the Supplementary Materials to explicitly discuss these limitations and clarify that our findings do not isolate specific cognitive processes but rather suggest general associations with memory-related networks.
 
 Discussion, Page 17-18 Lines 323-332
 
 “To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potenial functional claims.”
 
 (3) The coupling strength is somehow inconsistent with prior results (Hahn et al., 2020, eLife, Helfrich et al., 2018, Neuron). Specifically, Helfrich et al. showed that among young adults, the spindle is coupled to the peak of the SO. Here, the authors reported that the spindles were coupled to down-to-up transitions of SO and before the SO peak. It is possible that participants' age may influence the coupling (see Helfrich et al., 2018). Please discuss the findings in the context of previous research on SO-spindle coupling.
 
 We appreciate your concern regarding the temporal characteristics of SO-spindle coupling. We acknowledge that the SO-spindle coupling phase results in our study are not identical to those reported by Hahn et al. (2020); Helfrich et al. (2018). However, these differences may arise due to slight variations in event detection parameters, which can influence the precise phase estimation of coupling. Notably, Hahn et al. (2020) also reported slight discrepancies in their group-level coupling phase results, highlighting that methodological differences can contribute to variability across studies. Furthermore, our findings are consistent with those of Schreiner et al. (2021), further supporting the robustness of our observations.
 
 That said, we acknowledge that our original description of SO-spindle coupling as occurring at the "transition from the lower state to the upper state" was not entirely precise. The -π/2 phase represents the true transition point, while our observed coupling phase is actually closer to the SO peak rather than strictly at the transition. We will revise this statement in the manuscript to ensure clarity and accuracy in describing the coupling phase.
 
 Discussion, Page 16 Lines 283-291
 
 “Our data provide insights into the neurobiological underpinnings of these sleep rhythms. SOs, originating mainly in neocortical areas such as the mPFC, alternate between DOWN- and UP-states. The thalamus generates sleep spindles, which in turn couple with SOs. Our finding that spindle peaks consistently occurred slightly before the UP-state peak of SOs (in 83 out of 107 participants), concurs with prior studies, including Schreiner et al. (2021). Yet it differs from some results suggesting spindles might peak right at the SO UP-state (Hahn et al., 2020; Helfrich et al., 2018). Such discrepancies could arise from differences in detection algorithms, participant age (Helfrich et al., 2018), or subtle variations in cortical-thalamic timing. Nonetheless, these results underscore the importance of coordinated SO-spindle interplay in supporting sleep-dependent processes.”
 
 (4) The discussion is rather superficial with only two pages, without delving into many important arguments regarding the possible functional significance of these results. For example, the author wrote, "This internal processing contrasts with the brain patterns associated with external tasks, such as working memory." Without any references to working memory, and without delineating why WM is considered as an external task even working memory operations can be internal. Similarly, for the interesting results on SO and reduced DMN activity, the authors wrote "The DMN is typically active during wakeful rest and is associated with self-referential processes like mind-wandering, daydreaming, and task representation (Yeshurun, Nguyen, & Hasson, 2021). Its reduced activity during SOs may signal a shift towards endogenous processes such as memory consolidation." This argument is flawed. DMN is active during self-referential processing and mind-wandering, i.e., when the brain shifts from external stimuli processing to internal mental processing. During sleep, endogenous memory reactivation and consolidation are also part of the internal mental processing given the lack of external environmental stimulation. So why during SO or during memory consolidation, the DMN activity would be reduced? Were there differences in DMN activity between SO and SO-spindle coupling events?
 
 We appreciate your concerns regarding the brevity of the discussion and the need for clearer theoretical arguments. We will expand this section to provide more in-depth interpretations of our findings in the context of prior literature. Regarding working memory (WM), we acknowledge that our phrasing was ambiguous. We will modify this statement in the Discussion section.
 
 For the SO-related reduction in DMN activity, we recognize the need for a more precise explanation. This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state.
 
 To address your final question, we have conducted the additional post hoc comparison of DMN activity between isolated SOs and SO-spindle coupling events. Our results indicate that
 
 DMN activation during SOs was significantly lower than during SO-spindle coupling (t(106) = -4.17, p < 1e-4). This suggests that SO-spindle coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. We appreciate your constructive feedback and will integrate these expanded analyses and discussions into our revised manuscript.
 
 Results, Page 11 Lines 199-208
 
 “Spindles were correlated with positive activation in the thalamus (ROI analysis, t(106) = 15.39, p < 1e-4), the anterior cingulate cortex (ACC), and the putamen, alongside deactivation in the DMN (Fig. 3c). Notably, SO-spindle coupling was linked to significant activation in both the thalamus (ROI analysis, t(106) \= 3.38, p = 0.0005) and the hippocampus (ROI analysis, t(106) \= 2.50, p = 0.0070, Fig. 3d). However, no decrease in DMN activity was found during SO-spindle coupling, and DMN activity during SO was significantly lower than during coupling (ROI analysis, t(106) \= -4.17, p < 1e-4). For more detailed activation patterns, see Table S5-S7. We also varied the threshold used to detect SO events to assess its effect on hippocampal activation during SO-spindle coupling and observed that hippocampal activation remained significant when the percentile thresholds for SO detection ranged between 71% and 80% (see Fig. S6).”
 
 Discussion, Page 17-18 Lines 308-332
 
 “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.
 
 To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potential functional claims.”
 
 Recommendations for the authors:
 
 Reviewing Editor Comment:
 
 The reviewers think that you are working on a relevant and important topic. They are praising the large sample size used in the study. The reviewers are not all in line regarding the overall significance of the findings, but they all agree the paper would strongly benefit from some extra work, as all reviewers raise various critical points that need serious consideration.
 
 We appreciate your recognition of the relevance and importance of our study, as well as your acknowledgment of the large sample size as a strength of our work. We understand that there are differing perspectives regarding the overall significance of our findings, and we value the constructive critiques provided. We are committed to addressing the key concerns raised by all reviewers, including refining our analyses, clarifying our interpretations, and incorporating additional discussions to strengthen the manuscript. Below, we address your specific recommendations and provide responses to each point you raised to ensure our methods and results are as transparent and comprehensible as possible. We believe that these revisions will significantly enhance the rigor and impact of our study, and we sincerely appreciate your thoughtful feedback in helping us improve our work.
 
 Reviewer #1 (Recommendations for the authors):
 
 (1) The phrase "overnight sleep" suggests an entire night, while these were rather "nocturnal naps". Please rephrase.
 
 Response: Thank you for pointing this out. We have revised the phrasing in our manuscript to "nocturnal naps" instead of "overnight sleep" to more accurately reflect the duration of the sleep recordings.
 
 (2) Sleep staging results (macroscopic sleep architecture) should be provided in more detail (at least min and % of the different sleep stages, sleep onset latency, total sleep duration, total recording duration), at least mean/SD/range.
 
 Thank you for this suggestion. We will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics. This information will help provide a clearer overview of the macroscopic sleep architecture in our dataset.
 
 Reviewer #2 (Recommendations for the authors):
 
 In order to allow for a better estimation of the reliability of the detected sleep events, please:
 
 (1) Provide densities and absolute numbers of all detected SOs and spindles (N1, NREM, and REM sleep).
 
 Thank you for pointing this out. We will provide comprehensive tables in the supplementary materials, contains detailed information about sleep waves at each sleep stage for all 107 subjects (Table S2-S4), listing for each subject:1) Different sleep stage duration; 2) Number of detected SOs; 3) Number of detected spindles; 4) Number of detected SO-spindle coupling events; 5) Density of detected SOs; 6) Density of detected spindles; 7) Density of detected SO-spindle coupling events.
 
 Supplementary Materials, Page 43-54, Table S2-S4
 
 (2) Show ERPs for all detected SOs and spindles (per sleep stage).
 
 Thank you for the suggestion. We will provide ERPs for all detected SOs and spindles, separated by sleep stage (N1, N2&N3, and REM) in supplementary Fig. S2-S4. These ERP waveforms will help illustrate the characteristic temporal profiles of SOs and spindles across different sleep stages.
 
 Methods, Page 25, Line 525-532
 
 “Event-related potentials (ERP) analysis. After completing the detection of each sleep rhythm event, we performed ERP analyses for SOs, spindles, and coupling events in different sleep stages. Specifically, for SO events, we took the trough of the DOWN-state of each SO as the zero-time point, then extracted data in a [-2 s to 2 s] window from the broadband (0.1–30 Hz) EEG and used [-2 s to -0.5 s] for baseline correction; the results were then averaged across 107 subjects (see Fig. S2a). For spindle events, we used the peak of each spindle as the zero-time point and applied the same data extraction window and baseline correction before averaging across 107 subjects (see Fig. S2b). Finally, for SO-spindle coupling events, we followed the same procedure used for SO events (see Fig. 2a, Figs. S3–S4).”
 
 (3) Provide detailed info concerning sleep characteristics (time spent in each sleep stage etc.).
 
 Thank you for this suggestion. Same as the response above, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics.
 
 Supplementary Materials, Page 42, Table S1 (same as above)
 
 (4) What would happen if more stringent parameters were used for event detection? Would the authors still observe a significant number of SO spindles during N1 and REM? Would this affect the fMRI-related results?
 
 Thank you for this suggestion. Our methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).
 
 Furthermore, in order to explore the impact of this on our fMRI results, we conducted an additional sensitivity analysis by applying different detection parameters for SOs. Specifically, we adjusted amplitude percentile thresholds for SO detection (the parameter that has the greatest impact on the results). We used the hippocampal activation value during N2&N3 stage SO-spindle coupling as an anchor value and found that when the parameters gradually became stricter, the results were similar to or even better than the current results. However, when we continued to increase the threshold, the results began to gradually decrease until the threshold was increased to 80%, and the results were no longer significant. This indicates that our results are robust within a specific range of parameters, but as the threshold increases, the number of trials decreases, ultimately weakening the statistical power of the fMRI analysis.
 
 Thank you again for your suggestions on sleep rhythm event detection. We will add the results in Supplementary and revise our manuscript accordingly.
 
 Results, Page 11, Line 199-208
 
 “Spindles were correlated with positive activation in the thalamus (ROI analysis, t(106) = 15.39, p < 1e-4), the anterior cingulate cortex (ACC), and the putamen, alongside deactivation in the DMN (Fig. 3c). Notably, SO-spindle coupling was linked to significant activation in both the thalamus (ROI analysis, t(106) \= 3.38, p = 0.0005) and the hippocampus (ROI analysis, t(106) \= 2.50, p = 0.0070, Fig. 3d). However, no decrease in DMN activity was found during SO-spindle coupling, and DMN activity during SO was significantly lower than during coupling (ROI analysis, t(106) \= -4.17, p < 1e-4). For more detailed activation patterns, see Table S5-S7. We also varied the threshold used to detect SO events to assess its effect on hippocampal activation during SO-spindle coupling and observed that hippocampal activation remained significant when the percentile thresholds for SO detection ranged between 71% and 80% (see Fig. S6).”
 
 Finally, we sincerely thank all again for your thoughtful and constructive feedback. Your insights have been invaluable in refining our analyses, strengthening our interpretations, and improving the clarity and rigor of our manuscript. We appreciate the time and effort you have dedicated to reviewing our work, and we are grateful for the opportunity to enhance our study based on your recommendations.
 
 References:
 
 Bergmann, T. O., Mölle, M., Diedrichs, J., Born, J., & Siebner, H. R. (2012). Sleep spindle-related reactivation of category-specific cortical regions after learning face-scene associations. NeuroImage, 59(3), 2733-2742.
 
 Buzsáki, G. (2015). Hippocampal sharp wave‐ripple: A cognitive biomarker for episodic memory and planning. Hippocampus, 25(10), 1073-1188.
 
 Caporro, M., Haneef, Z., Yeh, H. J., Lenartowicz, A., Buttinelli, C., Parvizi, J., & Stern, J. M. (2012). Functional MRI of sleep spindles and K-complexes. Clinical neurophysiology, 123(2), 303-309.
 
 Coulon, P., Budde, T., & Pape, H.-C. (2012). The sleep relay—the role of the thalamus in central and decentral sleep regulation. Pflügers Archiv-European Journal of Physiology, 463, 53-71.
 
 Crunelli, V., Lőrincz, M. L., Connelly, W. M., David, F., Hughes, S. W., Lambert, R. C., Leresche, N., & Errington, A. C. (2018). Dual function of thalamic low-vigilance state oscillations: rhythm-regulation and plasticity. Nature Reviews Neuroscience, 19(2), 107-118.
 
 Czisch, M., Wehrle, R., Stiegler, A., Peters, H., Andrade, K., Holsboer, F., & Sämann, P. G. (2009). Acoustic oddball during NREM sleep: a combined EEG/fMRI study. PloS one, 4(8), e6749.
 
 Diba, K., & Buzsáki, G. (2007). Forward and reverse hippocampal place-cell sequences during ripples. Nature Neuroscience, 10(10), 1241.
 
 Diekelmann, S., & Born, J. (2010). The memory function of sleep. Nature Reviews Neuroscience, 11(2), 114-126.
 
 Fogel, S., Albouy, G., King, B. R., Lungu, O., Vien, C., Bore, A., Pinsard, B., Benali, H., Carrier, J., & Doyon, J. (2017). Reactivation or transformation? Motor memory consolidation associated with cerebral activation time-locked to sleep spindles. PloS one, 12(4), e0174755.
 
 Hahn, M. A., Heib, D., Schabus, M., Hoedlmoser, K., & Helfrich, R. F. (2020). Slow oscillation-spindle coupling predicts enhanced memory formation from childhood to adolescence. Elife, 9, e53730.
 
 Halassa, M. M., Siegle, J. H., Ritt, J. T., Ting, J. T., Feng, G., & Moore, C. I. (2011). Selective optical drive of thalamic reticular nucleus generates thalamic bursts and cortical spindles. Nature Neuroscience, 14(9), 1118-1120.
 
 Hale, J. R., White, T. P., Mayhew, S. D., Wilson, R. S., Rollings, D. T., Khalsa, S., Arvanitis, T. N., & Bagshaw, A. P. (2016). Altered thalamocortical and intra-thalamic functional connectivity during light sleep compared with wake. NeuroImage, 125, 657-667.
 
 Helfrich, R. F., Lendner, J. D., Mander, B. A., Guillen, H., Paff, M., Mnatsakanyan, L., Vadera, S., Walker, M. P., Lin, J. J., & Knight, R. T. (2019). Bidirectional prefrontal-hippocampal dynamics organize information transfer during sleep in humans. Nature Communications, 10(1), 3572.
 
 Helfrich, R. F., Mander, B. A., Jagust, W. J., Knight, R. T., & Walker, M. P. (2018). Old brains come uncoupled in sleep: slow wave-spindle synchrony, brain atrophy, and forgetting. Neuron, 97(1), 221-230. e224.
 
 Horovitz, S. G., Fukunaga, M., de Zwart, J. A., van Gelderen, P., Fulton, S. C., Balkin, T. J., & Duyn, J. H. (2008). Low frequency BOLD fluctuations during resting wakefulness and light sleep: A simultaneous EEG‐fMRI study. Human brain mapping, 29(6), 671-682.
 
 Huang, Q., Xiao, Z., Yu, Q., Luo, Y., Xu, J., Qu, Y., Dolan, R., Behrens, T., & Liu, Y. (2024). Replay-triggered brain-wide activation in humans. Nature Communications, 15(1), 7185.
 
 Ilhan-Bayrakcı, M., Cabral-Calderin, Y., Bergmann, T. O., Tüscher, O., & Stroh, A. (2022). Individual slow wave events give rise to macroscopic fMRI signatures and drive the strength of the BOLD signal in human resting-state EEG-fMRI recordings. Cerebral Cortex, 32(21), 4782-4796.
 
 Laufs, H. (2008). Endogenous brain oscillations and related networks detected by surface EEG‐combined fMRI. Human brain mapping, 29(7), 762-769.
 
 Laufs, H., Walker, M. C., & Lund, T. E. (2007). ‘Brain activation and hypothalamic functional connectivity during human non-rapid eye movement sleep: an EEG/fMRI study’—its limitations and an alternative approach. Brain, 130(7), e75.
 
 Margulies, D. S., Ghosh, S. S., Goulas, A., Falkiewicz, M., Huntenburg, J. M., Langs, G., Bezgin, G., Eickhoff, S. B., Castellanos, F. X., & Petrides, M. (2016). Situating the default-mode network along a principal gradient of macroscale cortical organization. Proceedings of the National Academy of Sciences, 113(44), 12574-12579.
 
 Massimini, M., Huber, R., Ferrarelli, F., Hill, S., & Tononi, G. (2004). The sleep slow oscillation as a traveling wave. Journal of Neuroscience, 24(31), 6862-6870.
 
 Moehlman, T. M., de Zwart, J. A., Chappel-Farley, M. G., Liu, X., McClain, I. B., Chang, C., Mandelkow, H., Özbay, P. S., Johnson, N. L., & Bieber, R. E. (2019). All-night functional magnetic resonance imaging sleep studies. Journal of neuroscience methods, 316, 83-98.
 
 Molle, M., Bergmann, T. O., Marshall, L., & Born, J. (2011). Fast and slow spindles during the sleep slow oscillation: disparate coalescence and engagement in memory processing. Sleep, 34(10), 1411-1421.
 
 Ngo, H.-V., Fell, J., & Staresina, B. (2020). Sleep spindles mediate hippocampal-neocortical coupling during long-duration ripples. Elife, 9, e57011.
 
 Picchioni, D., Horovitz, S. G., Fukunaga, M., Carr, W. S., Meltzer, J. A., Balkin, T. J., Duyn, J. H., & Braun, A. R. (2011). Infraslow EEG oscillations organize large-scale cortical– subcortical interactions during sleep: a combined EEG/fMRI study. Brain research, 1374, 63-72.
 
 Schabus, M., Dang-Vu, T. T., Albouy, G., Balteau, E., Boly, M., Carrier, J., Darsaud, A., Degueldre, C., Desseilles, M., & Gais, S. (2007). Hemodynamic cerebral correlates of sleep spindles during human non-rapid eye movement sleep. Proceedings of the National Academy of Sciences, 104(32), 13164-13169.
 
 Schreiner, T., Kaufmann, E., Noachtar, S., Mehrkens, J.-H., & Staudigl, T. (2022). The human thalamus orchestrates neocortical oscillations during NREM sleep. Nature communications, 13(1), 5231.
 
 Schreiner, T., Petzka, M., Staudigl, T., & Staresina, B. P. (2021). Endogenous memory reactivation during sleep in humans is clocked by slow oscillation-spindle complexes. Nature Communications, 12(1), 3112.
 
 Singh, D., Norman, K. A., & Schapiro, A. C. (2022). A model of autonomous interactions between hippocampus and neocortex driving sleep-dependent memory consolidation. Proceedings of the National Academy of Sciences, 119(44), e2123432119.
 
 Spoormaker, V. I., Schröter, M. S., Gleiser, P. M., Andrade, K. C., Dresler, M., Wehrle, R., Sämann, P. G., & Czisch, M. (2010). Development of a large-scale functional brain network during human non-rapid eye movement sleep. Journal of Neuroscience, 30(34), 11379-11387.
 
 Staresina, B. P., Bergmann, T. O., Bonnefond, M., van der Meij, R., Jensen, O., Deuker, L., Elger, C. E., Axmacher, N., & Fell, J. (2015). Hierarchical nesting of slow oscillations, spindles and ripples in the human hippocampus during sleep. Nature Neuroscience, 18(11), 1679-1686.
 
 Staresina, B. P., Niediek, J., Borger, V., Surges, R., & Mormann, F. (2023). How coupled slow oscillations, spindles and ripples coordinate neuronal processing and communication during human sleep. Nature Neuroscience, 1-9.
 
 Yarkoni, T., Poldrack, R. A., Nichols, T. E., Van Essen, D. C., & Wager, T. D. (2011). Large-scale automated synthesis of human functional neuroimaging data. Nature methods, 8(8), 665-670.
 
 Yeshurun, Y., Nguyen, M., & Hasson, U. (2021). The default mode network: where the idiosyncratic self meets the shared social world. Nature Reviews Neuroscience, 1-12.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.10.14.618165v2
www.biorxiv.org www.biorxiv.org

Neural dynamics of reversal learning in the prefrontal cortex and recurrent neural networks

5
1. Public_Reviews 09 Jun 2025
 
 in eLife
 
 eLife Assessment
 
 The findings of this study are valuable, offering insights into the neural representation of reversal probability in decision-making tasks, with potential implications for understanding flexible behavior in changing environments. The study contains interesting comparisons between neural data and models, including evidence for partial consistency with line attractor models in this probabilistic reversal learning task. However, it remains incomplete due to issues related to how the RNN training and the analysis of its dynamics, which renders the evidence as not complete.
 
 Summary
2. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 The authors aimed to investigate how the probability of a reversal in a decision-making task is computed in cortical neurons. They analyzed neural activity in the prefrontal cortex of monkeys and units in recurrent neural networks (RNNs) trained on a similar task. Their goal was to understand how the dynamical systems that implement computation perform a probabilistic reversal learning task in RNNs and nonhuman primates.
 
 Major strengths and weaknesses:
 
 Strengths:
 
 (1) Integrative Approach: The study exemplifies a modern approach by combining empirical data from monkey experiments with computational modeling using RNNs. This integration allows for a more comprehensive understanding of the dynamical systems that implement computation in both biological and artificial neural networks. (2) The focus on using perturbations to identify causal relationships in dynamical systems is a good goal. This approach aims to go beyond correlational observations. (3) The revised manuscript provides a more nuanced interpretation of the dynamics, reconciling the observations with aspects of line attractor models.
 
 Weaknesses:
 
 (1) The use of targeted dimensionality reduction (TDR) to identify the axis determining reversal probability may not necessarily isolate the dimension along which the RNN computes reversal probability. This should be computed from the RNN update itself rather than through a readout of network variance. Depending on how this is formulated, it could be something like the Jacobian of the state update with respect to inputs at input onset and with respect to the state during relaxation dynamics. This is worth thinking through further. It's important to try to take advantage of access afforded by using RNNs rather than solely relying on analyses available to us in neural data.
 
 Appraisal of aims and conclusions:
 
 The authors have substantially revised their interpretation of the results to reconcile their findings with line attractor models. They now acknowledge that their observation of reward integration explaining reversal probability activity (x_rev) is compatible with line attractor models, which addresses one of my main concerns.
 
 Their expanded analysis now differentiates between two activity modes: (1) substantial non-stationary dynamics during a trial (incompatible with line attractors) and (2) stationary and stable dynamics at trial start (compatible with point attractors and line attractor models). This dual characterization provides a more complete picture of the dynamical system and highlights the composability of dynamical features.
 
 Likely impact and utility:
 
 This work makes a stronger contribution to our understanding of how probabilistic information is represented in neural circuits with intervening behaviors. The augmented model that combines elements of attractor dynamics with non-stationary trajectories offers a more comprehensive framework for understanding neural computations in decision-making tasks.
 
 The data and methods could be useful to the community. While the authors have improved their analysis of network dynamics, additional reverse engineering that takes full advantage of access to the RNN's update equations could further strengthen the work.
 
 Review 1
3. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 In this work the authors trained RNN to perform a reversal task also performed by animals while PFC activity is recorded. The authors devised a new method to train RNN on this type of reversal task, which in principle ensures that the behavior of the RNN matches the behavior of the animal. They then performed some analysis of neural activity, both RNN and PFC recording, focusing on the neural representation of the reversal probability and its evolution across trials. Given the analysis presented, it has been difficult for me to asses at which point RNN can reasonably be compared to PFC recordings.
 
 Strengths:
 
 Focusing on a reversal task, the authors address a challenge in RNN training, as they do not use a standard supervised learning procedure where the desired output is available for each trial. They propose a new way of doing that.
 
 They attempt to confront RNN and neural recordings in behaving animals.
 
 Weaknesses:
 
 It would be nice to better articulate the analysis results of the two training set-ups (with and without 0 response during fixation). The dynamical system analysis is confusing, the notions of stationary and non-stationary dynamics and its relationship with attractors are puzzling. Is there a line attractor in one case (with inputs orthogonal to the integration direction being called back to the attractor, and reward input aligned with the stable direction)? In the other case, do we have a cylindrical attracting manifold on which activity circles around and is pushed along the axis of the cylinder by reward inputs? Which case is closest to the PFC recordings?
 
 Review 2
4. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 Kim et al. present a study of the neural dynamics underlying reversal learning in monkey PFC and neural networks. Their main finding is that neural activity during fixation resembles a line attractor storing the current belief of the reversal state of the task. This is followed by richer dynamics unfolding throughout the remainder of the trial, which eventually converge to a new point on the line attractor by the start of the next trial. The idea of studying neural dynamics throughout the task (including intervening behaviour) is interesting, and the data provides some insights into the neural dynamics driving reversal learning. The modelling seems to support the analyses, but both the modelling and analyses also leave several open questions.
 
 Strengths:
 
 The paper addresses an interesting topic of the neural dynamics underlying reversal learning in PFC, using a combination of biological and simulated data. Reversal learning has been studied extensively in neuroscience, but this paper takes a step further by analysing neural dynamics throughout the trials instead of focusing on just the evidence integration epoch.
 
 The authors show some close parallels between the experimental data and RNN simulations, both in terms of behaviour and neural dynamics. The analyses of how rewarded and unrewarded trials differentially affect dynamics throughout the trials in RNNs and PFC were particularly interesting. This work has the potential to provide new insights into the neural underpinnings of reversal learning.
 
 Weaknesses:
 
 Data analyses:
 
 While the analyses seem mostly sound, one shortcoming is that they are all aligned to the inferred reversal trial rather than the true experimental reversal trial. For example, the analyses showing that 'x_rev' decays strongly after the reversal trial, irrespective of the reward outcome, seem like they are true essentially by design. The choice to align to the inferred reversal trial also makes this trial seem 'special' (e.g. in Fig 2 & Fig 6A), but it is unclear whether this is a real feature of the data or an artifact of effectively conditioning on a change in behaviour. It would be useful to investigate whether any of these analyses differ when aligned to the true reversal trial. It is also unsurprising that x_rev increases before the reversal and decreases after the reversal (it is hard to imagine a system where this is not the case), yet all of Fig 6 and several other analyses are devoted to this point.
 
 Most of the analyses focus on the dynamics specifically in the x_rev subspace, but a major point of the paper is to say that biological (and artificial) networks may also have to do other things at different times in the trial. If that is the case, it would be interesting to also ask what happens in other subspaces of neural activity, which are not specifically related to evidence integration or choice - are there other subspaces that explain substantial variance? Do they relate to any meaningful features of the experiment?
 
 This is especially important when considering analyses trying to establish the presence (or absence) of attractor dynamics in the circuit. In particular, activity in the x_rev subspace both affects and depends on other subspaces of neural activity, so it is not as meaningful to analyse the dynamics of this subspace in isolation. It would e.g. have been preferable to analyse the early-trial dynamics in the full state space and then possibly projecting onto x_rev, rather than first projecting activity onto x_rev and then fitting a linear autoregressive model.
 
 Modelling:
 
 There are a number of surprising and non-standard modelling choices made in this paper. For example, the choice to only use inhibitory neurons is non-conventional and it is not clear whether and how this impacts the results. The inputs are also provided without any learnable input weights, which makes it harder to interpret the input-driven dynamics during the different phases of a trial.
 
 It is surprising that the RNN is "trained to flip its preferred choice a few trials after the inferred scheduled reversal trial", with the reversal trial inferred by an ideal Bayesian observer. A more natural approach would be to directly train the RNN to solve the task (by predicting the optimal choice) and then investigating the emergent behaviour & dynamics. If the authors prefer their imitation learning approach, it is also surprising that the network is trained to predict the reversal trial inferred using Bayesian smoothing instead of Bayesian filtering.
 
 Finally, it was surprising that the network is trained and tested with different block lengths (24 & 36 trials, respectively), and it is not mentioned whether or how this affects behaviour.
 
 Review 3
5. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews
 
 Main revision made to the manuscript
 
 The main revision made to the manuscript is to reconcile our findings with the line attractor model. The revision is based on Reviewer 1’s comment on reinterpreting our results as a superposition of an attractor model with fast timescale dynamics. We expanded our analysis regime to the start of a trial and characterized the overall within-trial dynamics to reinterpret our findings.
 
 We first acknolwedge that our results are not in contradiction with evidence integration on a line attractor. As pointed out by the reviewers, our finding that the integration of reward outcome explains the reversal probability activity x_rev (Figure 3) is compatible with the line attractor model. However, the reward integration equation is an algebraic relation and does not characterize the dynamics of reversal probability activity. So a closer analysis on the neural dynamics is needed to assess the feasibility of line attractor.
 
 In the revised manuscript, we show that x_rev exhibits two different activity modes (Figure 4). First, x_rev has substantial non-stationary dynamics during a trial, and this non-stationary activity is incompatible with the line attractor model, as claimed in the original manuscript. Second, we present new results showing that x_rev is stationary (i.e., constant in time) and stable (i.e., contracting) at the start of a trial. These two properties of x_rev support that it is a point attractor at the start of a trial and is compatible with the line attractor model.
 
 We further analyze how the two activity modes are linked (Figure 4, Support vector regression). We show that the non-stationary activity is predictable from the stationary activity if the underlying dynamics can be inferred. In other words, the non-stationary activity during a trial is generated by an underlying dynamics with the initial condition provided by the stationary state at the start of trial.
 
 These results suggest an extension of the line attractor model where an attractor state at the start of a trial provides an initial condition from which non-stationary activity is generated during a trial by an underlying dynamics associated with task-related behavior (Figure 4, Augmented model).
 
 The separability of non-stationary trajectories (Figure 5 and 6) is a property of the non-stationary dynamics that allows separable points in the initial stationary state to remain separable during a trial, thus making it possible to represent distinct probabilistic values in non-stationary activity.
 
 This revised interpretation of our results (1) retains our original claim that the non-stationary dynamics during a trial is incompatible with the line attractor model and (2) introduces attractor state at the start of a trial which is compatible with the line attractor model. Our anlaysis shows that the two activity modes are linked by an underlying dynamics, and the attractor state serves as initial state to launch the non-stationary activity.
 
 Responses to the Public Reviews:
 
 Reviewer # 1:
 
 (1) To provide better explanation of the reversal learning task and network training method, we added detailed description of RNN and monkey task structure (Result Section 1), included a schematic of target outputs (Figure1B), explained the rationale behind using inhibitory network model (Method Section 1) and explained the supervised RNN training scheme (Result Section 1). This information can also be found in the Methods.
 
 (2) Our understanding is that the augmented model discussed in the previous page is aligned with the model suggested by Reviewer 1: “a curved line attractor, with faster timescale dynamics superimposed on this structure”. It is likely that the “fast” non-stationary activity observed during the trial is driven by task-related behavior, thus is transient. For instance, we do not observe such non-stationary activity in the inter-trial-interval when the task-related behavior is absent. For this reason, the non-stationary trajectories were not considered to be part of the attractor. Instead, they are transient activity generated by the underlying neural dynamics associated with task-related behavior. We believe such characterization of faster timescale dynamics is consistent with Reviewer 1’s view and wanted to clarify that there are two different activity modes.
 
 (3) We appreciate the reviewers (Reviewer 1 and Reviewer 2) comment that TDR may be limited in isolating the neural subspace of interest. Our study presents what could be learned from TDR but is by no means the only way to interpret the neural data. It would be of future work to apply other methods for isolating task-related neural activities.
 
 We would appreciate it if the reviewers could share thoughts on what other alternative methods could better isolate the reversal probability activity.
 
 Reviewer # 2:
 
 (1) (i) We respectfully disagree with Reviewer 2’s comment that “no action is required to be performed by neurons in the RNN”. In our network setup, the output of RNN learns to choose a sign (+ or -), as Reviewer 2 pointed out, to make a choice. This is how the RNN takes an action. It is unclear to us what Reviewer 2 has intended by “action” and how reaching a target value (not just taking a sign) would make a significant difference in how the network performs the task.
 
 (ii) From Reviewer 2’s comment that “no intervening behavior is thus performed by neurons”, we noticed that the term “intervening behavior” has caused confusion. It refers to task-related behavior, such as making choices or receiving reward, that the subject must perform across trials before reversing its preferred choice. These are the behaviors that intervene the reversal of preferred choice. To clarify its meaning, in the revised manuscript, we changed the term to “task-related behavior” and put them in context. For example, in the Introduction we state that “However, during a trial, task-related behavior, such as making decisions or receiving feedback, produced …”
 
 (iii) As pointed out by Reviewer 2, the lack of fixation period in the RNN could make differences in the neural dynamics of RNN and PFC, especially at the start of a trial. We demonstrate this issue in Result Section 4 where we analyze the stationary activity at the start of a trial. We find that fixating the choice output to zero before making a choice promotes stationary activity and makes the RNN activity more similar to the PFC activity.
 
 Reviewer #3:
 
 (1) (i) In the previous study (Figure 1 in [Bartolo and Averbeck ‘20]), it was shown that neural activity can predict the behavioral reversal trial. This is the reason we examined the neural activity in the trials centered at the behavioral reversal trial. We explained in Result Section 2 that we followed this line of analysis in our study.
 
 (ii) We would like to emphasize that the main point of Figures 4 and 5 is to show the separability of neural trajectories: the entire trajectory shifts without overlapping. It is not obvious that high-dimensional neural population activity from two trials should remain separated when their activities are compressed into a one-dimensional subspace. The onedimensional activities can easily collide since their activities are compressed into a lowdimensional space. We revised the manuscript to bring out these points. We added an opening paragraph that discusses separability of trajectories and revised the main text to bring out the findings on separability.
 
 (iii) We agree with Reviewer 3 that it would be interesting to look at what happens in other subspace of neural activity that are not related to reversal probability and characterize how different neural subspace interact with each. However, the focus of this paper was the reversal probability activity, and we’d consider these questions out of the scope of current paper. We point out that, using the same dataset, neural activity related to other experimental variables were analyzed in other papers [Bartolo and Averbeck ’20; Tang, Bartolo and Averbeck ‘21]
 
 (2) (i) In the revised manuscript, we added explanation on the rational behind choosing inhibitory network as a simplified model for the balanced state. In brief, strong inhibitory recurrent connections with strong excitatory external input operates in the balanced state, as in the standard excitatory-inhibitory network. We included references that studied this inhibitory network. We also explained the technical reason (GPU memory) for choosing the inhibitory model.
 
 (ii) We thank the reviewer for pointing out that the original manuscript did not mention how the feedback and cue were initialized. They were random vectors sample from Gaussian distribution. We added this information in the revised manuscript. In our opinion, it is common to use random external inputs for training RNNs, as it is a priori unclear how to choose them. In fact, it is possible to analyze the effects of random feedback on one-dimensional x_rev dynamics by projecting the random feedback vector to the reversal probability vector. This is shown in Figure 4F.
 
 (iii) We agree that it would be more natural to train the RNN to solve the task without using the Bayesian model. We point out this issue in the Discussion in the revised manuscript.
 
 Recommendations for the authors:
 
 Reviewer #1:
 
 (1) My understanding of network training was that a Bayesian ideal observer signaled target output based on previous reward outcomes. However, the authors never mention that networks are trained by supervised learning in the main text until the last paragraph of the discussion. There is no mention that there was an offset in the target based on the behavior of the monkeys in the main text. These are really important things to consider in the context of the network solution after training. I couldn't actually find any figure that presents the target output for the network. Did I miss something key here?
 
 In Result Section 1, we added a paragraph that describes in detail how the RNN is trained. We explained that the network is first simulated and then the choice outputs and reward outcomes are fed into the Bayesian model to infer the scheduled reversal trial. A few trials are added to the inferred reversal trial to obtain the behavioral reversal trial, as found in a previous study [Bartolo and Averbeck ‘20]. Then the network weights are updated by backpropagation-through-time via supervised learning.
 
 In the original manuscript, the target output for the network was described in Methods Section 2.5, Step 4. To make this information readily accessible, we added a schematic in Figure 1B that shows the scheduled, inferred and behavioral reversal trials. It also shows how the target choice ouputs are defined. They switch abruptly at the behavioral reversal trial.
 
 (2) The role of block structure in the task is an important consideration. What are the statistics of block switches? The authors say on average the reversals are every 36 trials, but also say there are random block switches. The reviewer's notes suggest that both the networks and monkeys may be learning about the typical duration of blocks, which could influence their expectations of reversals. This aspect of the task design should be explained more thoroughly and considered in the context of Figure 1E and 5 results.
 
 We provided more detailed description of the reversal learning task in Result Section 1. We clarified that (1) a task is completed by executing a block of fixed number of trials and (2) reversal of reward schedule occurrs at a random trial around the mid-trial in a block. The differences in the number of trials in a block that the RNNs (36) and the monkeys (80) perform are also explained. We also pointed out the differences in how the reversal trial is randomly sampled.
 
 However, it is unclear what Reviewer 1 meant by random block switches. Our reversal learning task is completed when a block of fixed number of trials is executed. Reversal of reward schedule occurs only once on a randomly selected trial in the block, and the reversed reward schedule is maintained until the end of a block. It is different from other versions of reveral learning where the reward schedule switches multiple times across trials. We clarified this point in Result Section 1.
 
 (3) The relationship between the supervised learning approach used in the RNNs and reinforcement learning was confused in the discussion. "Although RNNs in our study were trained via supervised learning, animals learn a reversal-learning task from reward feedback, making it into a reinforcement learning (RL) problem." This is fundamentally not true. In the case of this work, the outcome of the previous trial updates the target output, rather than the trial and error type learning as is typical in reinforcement learning. Networks are not learning by reinforcement learning and this statement is confusing.
 
 We agree with Reviewer 1’s comment that the statement in the original manuscript is confusing. Our intention was to point out that our study used supervised learning, and this is different from animals learn by reinforcement learning in rea life. We revised the sentence in Discussion as follows:
 
 “The RNNs in our study were trained via supervised learning. However, in real life, animals learn a reversal learning task via reinforcement learning (RL), i.e., learn the task from reward outcomes.”
 
 (4) The distinction between line attractors and the dynamic trajectories described by the authors deserves further investigation. A significant concern arises from the authors' use of targeted dimensionality reduction (TDR), a form of regression, to identify the axis determining reversal probability. While this approach can reveal interesting patterns in the data, it may not necessarily isolate the dimension along which the RNN computes reversal probability. This limitation could lead to misinterpretation of the underlying neural dynamics.
 
 a) This manuscript cites work described in "Prefrontal cortex as a meta-reinforcement learning system," which examined a similar task. In that study, the authors identified a v-shaped curve in the principal component space of network states, representing the probability of choosing left or right.
 
 Importantly, this curve is topologically equivalent to a line and likely represents a line attractor. However, regressing against reversal probability in such a case would show that a single principal component (PC2) directly correlates with reversal probability.
 
 b) The dynamics observed in the current study bear a striking resemblance to this structure, with the addition of intervening loops in the network state corresponding to within-trial state evolution. Crucially, these observations do not preclude the existence of a line attractor. Instead, they may reflect the network's need to produce fast timescale dynamics within each trial, superimposed on the slower dynamics of the line attractor.
 
 c) This alternative interpretation suggests that reward signals could function as inputs that shift the network state along the line attractor, with information being maintained across trials. The fast "intervening behaviors" observed by the authors could represent faster timescale dynamics occurring on top of the underlying line attractor dynamics, without erasing the accumulated evidence for reversals.
 
 d) Given these considerations, the authors' conclusion that their results are better described by separable dynamic trajectories rather than fixed points on a line attractor may be premature. The observed dynamics could potentially be reconciled with a more nuanced understanding of line attractor models, where the attractor itself may be curved and coexist with faster timescale dynamics.
 
 We appreciate the insightful comments on (1) the similarity of the work by Wang et al ’18 with our findings and (2) an alternative interpretation that augments the line attractor with fast timescale dynamics.
 
 (1) We added a discussion of the work by Wang et al ’18 in Result Section 2 to point out the similarity of their findings in the principal component space with ours in the x_rev and x_choice space. We commented that such network dynamics could emerge when learning to perform the reversal learning the task, regardless of the training schemes.
 
 We also mention that the RL approach in Wang et al ’18 does not consider within-trial dynamics, therefore lacks the non-stationary activity observed during the trial in the PFC of monkeys and our trained RNNs.
 
 (2) We revised our original manuscript substantially to reconcile the line attractor model with the nonstationary activity observed during a trial.
 
 Here are the highlights of the revised interpretation of the PFC and the RNN network activity
 
 - The dynamics of x_rev consists of two activity modes, i.e., stationary activity at the start of a trial and non-stationary activity during the trial. Schematic of the augmented model that reconciles two activity modes is shown in Figure 4A. Analysis of the time derivative (dx_reverse / dt) and contractivity of the stationary state are shown in Figure 4B,C to demonstrate two activity modes.
 
 - We discuss in Result Section 4 main text that the stationary activity is consistent with the line attractor model, but the non-stationary activity deviates from the model.
 
 - The two activity modes are linked dynamically. There is an underlying dynamics that can map the stationary state to the non-stationary trajectory. This is shown by predicting the nonstationary trajectory with the stationary state using a support vector regression model. The prediction results are shown in Figure 4D,E,F.
 
 - We discuss in Result Section 4 an extension of the standard line attractor model: points on the line attractor can serve as initial states that launch non-stationary activity associated with taskrelated behavior.
 
 - The separability of neural trajectories presented in Result Section 5 is framed as a property of the non-stationary dynamics associated with task-related behavior.
 
 To strengthen their claims, the authors should:
 
 (1) Provide a more detailed description of their RNN training paradigm and task structure, including clear illustrations of target outputs.
 
 (2) Discuss how their findings relate to and potentially extend previous work on similar tasks, particularly addressing the similarities and differences with the v-shaped state organization observed in reinforcement learning contexts. (https://www.nature.com/articles/s41593-018-0147-8 Figure1).
 
 (3) Explore whether their results could be consistent with a curved line attractor model, rather than treating line attractors and dynamic trajectories as mutually exclusive alternatives.
 
 Our response to these three comments is described above.
 
 Addressing these points would significantly enhance the impact of the study and provide a more nuanced understanding of how reversal probabilities are represented in neural circuits.
 
 In conclusion, while this study provides interesting insights into the neural representation of reversal probability, there are several areas where the methodology and interpretations could be refined.
 
 Additional Minor Concerns:
 
 (1) Network Training and Reversal Timing: The authors mention that the network was trained to switch after a reversal to match animal behavior, stating "Maximum a Posterior (MAP) of the reversal probability converges a few trials past the MAP estimate." More explanation of how this training strategy relates to actual animal behavior would enhance the reader's understanding of the meaning of the model's similarity to animal behavior in Figure 1.
 
 In Method Section 2.5, we described how our observation that the running estimate of MAP converges a few trials after the actual MAP is analogous to the animal’s reversal behavior.
 
 “This observation can be interpreted as follows. If a subject performing the reversal learning task employs the ideal observer model to detect the trial at which reward schedule is reversed, the subject can infer the reversal of reward schedule a few trials past the actual reversal and then switch its preferred choice. This delay in behavioral reversal, relative to the reversal of reward schedule, is analogous to the monkeys switching their preferred choice a few trials after the reversal of reward schedule.”
 
 In Step 4, we also mentioned that the target choice outputs are defined based on our observation in Step 3.
 
 “We used the observation from Step 3 to define target choice outputs that switch abruptly a few trials after the reversal of reward schedule, denoted as $t^*$ in the following. An example of target outputs are shown in Fig.\,\ref{fig_behavior}B.”
 
 (2) How is the network simulated in step 1 of training? Is it just randomly initialized? What defines this network structure?
 
 The initial state at the start of a block was random. We think the initial state is less relevant as the external inputs (i.e., cue and feedback) are strong and drive the network dynamics. We mentioned these setup and observation in Step 1 of training.
 
 “Step 1. Simulate the network starting from a random initial state, apply the external inputs, i.e., cue and feedback inputs, at each trial and store the network choices and reward outcomes at all the trials in a block. The network dynamics is driven by the external inputs applied periodically over the trials.”
 
 (3) Clarification on Learning Approach: More description of the approach in the main text would be beneficial. The statement "Here, we trained RNNs that learned from a Bayesian inference model to mimic the behavioral strategies of monkeys performing the reversal learning task [2, 4]" is somewhat confusing, as the model isn't directly fit to monkey data. A more detailed explanation of how the Bayesian inference model relates to monkey behavior and how it's used in RNN training would improve clarity.
 
 We described the learning approach in more detail, but also tried to be concise without going into technical details.
 
 We revised the sentence in Introduction as follows:
 
 “We sought to train RNNs to mimic the behavioral strategies of monkeys performing the reversal learning task. Previous studies \cite{costa2015reversal, bartolo2020prefrontal} have shown that a Bayesian inference model can capture a key aspect of the monkey's behavioral strategy, i.e., adhere to the preferred choice until the reversal of reward is detected and then switch abruptly. We trained the RNNs to replicate this behavioral strategy by training them on target behaviors generated from the Bayesian model.”
 
 We also added a paragraph in Result Section 1 that explains in detail how the training approach works.
 
 (4) In Figure 1B, it would be helpful to show the target output.
 
 We added a figure in Fig1B that shows a schematic of how the target output is generated.
 
 (5) An important point to consider is that a line attractor can be curved while still being topologically equivalent to a line. This nuance makes Figure 4A somewhat difficult to interpret. It might be helpful to discuss how the observed dynamics relate to potentially curved line attractors, which could provide a more nuanced understanding of the neural representations.
 
 As discussed above, we interpret the “curved” activity during the trial as non-stationary activity. We do not think this non-stationary activity would be characterized as attractor. Attractor is (1) a minimal set of states that is (2) invariant under the dynamics and (3) attracting when perturbed into its neighborhood [Strogatz, Nonlinear dynamics and chaos]. If we consider the autonomous system without the behavior-related external input as the base system, then the non-stationary states could satisfy (2) and (3) but not (1), so they are not part of the attractor. If we include the behavior-related external input to the autonomous dynamics, then it may be possible that the non-stationary trajectories are part of the attractor. We adopted the former interpretation as the behavior-related inputs are external and transient.
 
 (6) The results of the perturbation experiments seem to follow necessarily from the way x_rev was defined. It would be valuable to clarify if there's more to these results than what appears to be a direct consequence of the definition, or if there are subtleties in the experimental design or analysis that aren't immediately apparent.
 
 The neural activity x_rev is correlated to the reversal probability, but it is unclear if the activity in this neural subspace is causally linked to behavioral variables, such as choice output. We added this explanation at the beginning of Results Section 7 to clarify the reason for performing the perturbation experiments.
 
 “The neural activity $x_{rev}$ is obtained by identifying a neural subspace correlated to reversal probability. However, it remains to be shown if activity within this neural subspace is causally linked to behavioral variables, such as choice output.”
 
 Reviewer #2:
 
 Below is a list of things I have found difficult to understand, and been puzzled/concerned about while reading the manuscript:
 
 (1) It would be nice to say a bit more about the dataset that has been used for PFC analysis, e.g. number of neurons used and in what conditions is Figure 2A obtained (one has to go to supplementary to get the reference).
 
 We added information about the PFC dataset in the opening paragraph of Result Section 2 to provide an overview of what type of neural data we’ve analyzed. It includes information about the number of recorded neurons, recording method and spike binning process.
 
 (2) It would be nice to give more detail about the monkey task and better explain its trial structure.
 
 In Result Section 1 we added a description of the overall task structure (and its difference with other versions of revesal learning task), the RNN / monkey trial structure and differences in RNN and monkey tasks.
 
 (3) In the introduction it is mentioned that during the hold period, the probability of reversal is represented. Where does this statement come from?
 
 The fact that neural activity during a hold period, i.e., fixation period before presenting the target images, encodes the probability of reversal was demonstrated in a previous study (Bartolo and Averbeck ’20).
 
 We realize that our intention was to state that, during the hold period, the reversal probability activity is stationary as in the line attractor model, instead of focusing on that the probability of reversal is represented during this period. We revised the sentence to convey this message. In addition, we revised the entire paragraph to reinterpret our findings: there are two activity modes where the stationary activity is consistent with the line attractor model but the non-stationary activity deviates from it.
 
 (4) "Around the behavioral reversal trial, reversal probabilities were represented by a family of rankordered trajectories that shifted monotonically". This sentence is confusing and hard to understand.
 
 Thank you for point this out. We rewrote the paragraph to reflect our revised interpretation. This sentence was removed, as it can be considered as part of the result on separable trajectories.
 
 (5) For clarity, in the first section, when it is written that "The reversal behavior of trained RNNs was similar to the monkey's behavior on the same task" it would be nice to be more precise, that this is to be expected given the strategy used to train the network.
 
 We removed this sentence as it makes a blanket statement. Instead, we compared the behavioral outputs of the RNNs and the monkeys one by one.
 
 We added a sentence in Result Section 1 that the RNN’s abrupt behavioral reversal is expected as they are trained to mimic the target choice outputs of the Bayesian model.
 
 “Such abrupt reversal behavior was expected as the RNNs were trained to mimic the target outputs of the Bayesian inference model.”
 
 (6) What is the value of tau used in eq (1), and how does it compare to trial duration?
 
 We described the value of time constant tau in Eq (1) and also discussed in Result Section 1 that tau=20ms is much faster than trial duration 500ms, thus the persistent behavior seen in trained RNNs is due to learning.
 
 (7) It would be nice to expand around the notion of « temporally flexible representation » to help readers grasp what this means.
 
 Instead of stating that the separable dynamic trajectories have “temporally flexible representation”, we break down in what sense it is temporally flexible: separable dynamic trajectories can accommodate the effects that task-related behavior have on generating non-stationary neural dynamics.
 
 “In sum, our results show that, in a probabilistic reversal learning task, recurrent neural networks encode reversal probability by adopting, not only stationary states as in a line attractor, but also separable dynamic trajectories that can represent distinct probabilistic values while accommodating non-stationary dynamics associated with task-related behavior.”
 
 Reviewer #3:
 
 (1) Data:
 
 It would be useful to describe the experimental task, recording setup, and analyses in much more detail - both in the text and in the methods. What part of PFC are the recordings from? How many neurons were recorded over how many sessions? Which other papers have they been used in? All of these things are important for the reader to know, but are not listed anywhere. There are also some inconsistencies, with the main text e.g. listing the 'typical block length' as 36 trials, and the methods listing the block length as 24 trials (if this is a difference between the biological data and RNN, that should be more explicit and motivated).
 
 We provided more detailed description of the monkey experimental task and PFC recordings in Result Section 1. We also added a new section in Methods 2.1 to describe the monkey experiment.
 
 The experimental analyses should be explained in more detail in the methods. There is e.g. no detailed description of the analysis in Figure 6F.
 
 We added a new section in Methods 6 to describe how the residual PFC activity is computed. It also describes the RNN perturbation experiments.
 
 Finally, it would be useful for more analyses of monkey behaviour and performance, either in the main text or supplementary figures.
 
 We did not pursue this comment as it is unclear how additional behavioral analyses would improve the manuscript.
 
 (2) Model:
 
 When fitting the network, 'step 1' of training in 2.3 seems superfluous. The posterior update from getting a reward at A is the same as that from not getting a reward at B (and vice versa), and it is therefore completely independent of the network choice. The reversal trial can therefore be inferred without ever simulating the network, simply by generating a sample of which trials have the 'good' option being rewarded and which trials have the 'bad' option being rewarded.
 
 We respectfully disagree with Reviewer 3’s comment that the reversal trial can be inferred without ever simulating the network. The only way for the network to know about the underlying reward schedule is to perform the task by itself. By simulating the network, it can sample the options and the reward outcomes.
 
 Our understanding is that Review 3 described a strategy that a human would use to perform this task. Our goal was to train the RNN to perform the task.
 
 Do the blocks always start with choice A being optimal? Is everything similar if the network is trained with a variable initial rewarded option? E.g. in Fig 6, would you see the appropriate swap in the effect of the perturbation on choice probability if choice B was initially optimal?
 
 Thank you for pointing out that the initial high-value option can be random. When setting up the reward schedule, the initial high-value option was chosen randomly from two choice outputs and, at the scheduled reversal, it was switched to the other option. We did not describe this in the original manuscript.
 
 We added a descrption in Training Scheme Step 4 that the the initial high-value option is selected randomly. This is also explained in Result Section 1 when we give an overview of the RNN training procedure.
 
 (3) Content:
 
 It is rarely explained what the error bars represent (e.g. Figures 3B, 4C, ...) - this should be clear in all figures.
 
 We added that the error bars represent the standard error of mean.
 
 Figure 2A: this colour scheme is not great. There are abrupt colour changes both before and after the 'reversal' trial, and both of the extremes are hard to see.
 
 We changed the color scheme to contrast pre- and post-reversal trials without the abrupt color change.
 
 Figure 3E/F: how is prediction accuracy defined?
 
 We added that the prediction accuracy is based on Pearson correlation.
 
 Figure 4B: why focus on the derivative of the dynamics? The subsequent plots looking at the actual trajectories are much easier to understand. Also - what is 'relative trial' relative to?
 
 The derivative was analyzed to demonstrate stationarity or non-stationarity of the neural activity. We think it will be clearer in the revised manuscript that the derivative allows us to characterize those two activity modes.
 
 Relative trial number indicate the trial position relative to the behavioral reversal trial. We added this description to the figures when “relative trial” is used.
 
 Figure 4C: what do these analyses look like if you match the trial numbers for the shift in trajectories? As it is now, there will presumably be more rewarded trials early and late in each block, and more unrewarded trials around the reversal point. Does this introduce biases in the analysis? A related question is (i) why the black lines are different in the top and bottom plots, and (ii) why the ends of the black lines are discontinuous with the beginnings of the red/blue lines.
 
 We could not understand what Reviewer 3 was asking in this comment. It’d help if Review 3 could clarify the following question:
 
 “Figure 4C: what do these analyses look like if you match the trial numbers for the shift in trajectories?”
 
 Question (i): We wanted to look at how the trajectory shifts in the subsequent trial if a reward is or is not received in the current trial. The top panel analyzed all the trials in which the subsquent trial did not receive a reward. The bottom panel analyzed all the trials in which the subsequent trial received a reward. So, the trials analyzed in the top and bottom panels are different, and the black lines (x_rev of “current” trial) in the top and bottom panels are different.
 
 Question (ii): Black line is from the preceding trial of the red/blue lines, so if trials are designed to be continuous with the inter-trial-interval, then black and red/blue should be continuous. However, in the monkey experiment, the inter-trial-intervals were variable, so the end of current trial does not match with the start of next trial. The neural trajectories presented in the manuscript did not include the activity in this inter-trial-interval.
 
 Figure 6C: are the individual dots different RNNs? Claiming that there is a decrease in Delta x_choice for a v_+ stimulation is very misleading.
 
 Yes individual dots are different RNN perturbations. We added explanation about the dots in Figure7C caption.
 
 We agree with the comment that \Delta x_choice did not decrease. This sentence was removed. Instead, we revised the manuscript to state that x_choice for v_+ stimulation was smaller than the x_choice for v_- stimulation. We performed KS-test to confirm statistical significance.
 
 Discussion: "...exhibited behaviour consistent with an ideal Bayesian observer, as found in our study". The RNN was explicitly trained to reproduce an ideal Bayesian observer, so this can only really be considered an assumption (not a result) in the present study.
 
 We agree that the statement in the original manuscript is inaccurate. It was revised to reflect that, in the other study, behavior outputs similar to a Bayesian observer emerged by simply learning to do the task, intead of directly mimicking the outputs of Bayesian observer as done in our study.
 
 “Authors showed that trained RNNs exhibited behavior outputs consistent with an ideal Bayesian observer without explicitly learning from the Bayesian observer. This finding shows that the behavioral strategies of monkeys could emerge by simply learning to do the task, instead of directly mimicking the outputs of Bayesian observer as done in our study.”
 
 Methods: Would the results differ if your Bayesian observer model used the true prior (i.e. the reversal happens in the middle 10 trials) rather than a uniform prior? Given the extensive literature on prior effects on animal behaviour, it is reasonable to expect that monkeys incorporate some non-uniform prior over the reversal point.
 
 Thank you for pointing out the non-uniform prior. We haven’t conducted this analysis, but would guess that the convergence to the posterior distribution would be faster. We’d have to perform further analysis, which is out of the scope of this paper, to investigate whether the posteior distribution would be different from what we obtained from uniform prior.
 
 Making the code available would make the work more transparent and useful to the community.
 
 The code is available in the following Github repository: https://github.com/chrismkkim/LearnToReverse
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.09.14.613033v2
arxiv.org arxiv.org

A stochastic explanation for observed local-to-global foraging states in Caenorhabditis elegans

3
1. Public_Reviews 09 Jun 2025
  
  in eLife (unscoped)
  
  eLife Assessment
  
  This valuable paper uses a quantitative modeling approach to explore a well-studied transition in motor behavior in the nematode C. elegans. The authors provide solid evidence that this transition, which has been considered by many in the field to be a two-state behavior, can instead be described as a process whose parameters are smoothly modulated within a single state. This finding provides insight into the relationships between latent internal states and observable behavioral states, and suggests that relatively simple neuronal mechanisms can drive behavioral sequences that appear more complex.
  
  Summary
2. Public_Reviews 09 Jun 2025
  
  in eLife (unscoped)
  
  Reviewer #1 (Public review):
  
  Summary of what the authors were trying to achieve
  
  This paper concerns mechanisms of foraging behavior in C. elegans. Upon removal from food, C. elegans first executes a stereotypical local search behavior in which it explores a small area by executing many random, undirected reversals and turns called "reorientations." If the worm fails to ﬁnd food, it transitions to a global search in which it explores larger areas by suppressing reorientations and executing long forward runs (Hills et al., 2004). At the population level, reorientation rate declines gradually. Nevertheless, about 50% of individual worms appear to exhibit an abrupt transition between local and global search, which is evident as a discrete transition from high to low reorientation rate (Lopez-Cruz et al., 2019). This observation has given rise to the hypothesis that local and global search correspond to separate internal states with the possibility of sudden transitions between them (Calhoun et al., 2014). The objective of the paper is to demonstrate that is not necessary to posit distinct internal states to account for discrete transitions from high to low reorientation rate. On the contrary, discrete transitions can occur simply because of the stochastic nature of the reorientation behavior itself.
  
  Major strengths and weaknesses of the methods and results
  
  The model was not explicitly designed to match the sudden, stable changes in reorientation rates observed in the experimental data from individual worms. Kinetic parameters were simply chosen to match the average population behavior. Nevertheless, many sudden stable changes in reorientation rates occurred. This is a strong argument that apparent state changes can arise as an epiphenomenon of stochastic processes.
  
  The new stochastic model is more parsimonious than reorientation-state change model because it posits one state rather than two.
  
  A prominent feature of the empirical data is that 50% of the worms exhibit a single (apparent) state change and the rest show either no state changes or multiple state changes. Does the model reproduce these proportions? This obvious question was not addressed.
  
  There is no obvious candidate for the neuronal basis of the decaying factor M. The authors speculate that decreasing sensory neuron activity might be the correlate of M but then provide contradictory evidence that seems to undermine that hypothesis. The absence of a plausible neuronal correlate of M weakens the case for the model.
  
  Appraisal of whether the authors achieved their aims, and whether the results support their conclusions
  
  The authors have made a solid case that is not necessary to posit distinct internal states to account for discrete transitions from high to low reorientation rate. On the contrary, discrete transitions can occur simply because of the stochastic nature of the reorientation behavior itself.
  
  Impact of the work on the field, and the utility of the methods and data to the community
  
  Posting hidden internal states to explain behavioral sequences is gaining acceptance in behavioral neuroscience. The likely impact of the paper is to establish a compelling example of how statistical reasoning can reduce the number of hidden states to achieve more parsimonious models.
  
  Review 1
3. Public_Reviews 09 Jun 2025
  
  in eLife (unscoped)
  
  Reviewer #2 (Public review):
  
  Summary:
  
  In this study, the authors build a statistical model that stochastically samples from a time-interval distribution of reorientation rates. The form of the distribution is extracted from a large array of behavioral data, is then used to describe not only the dynamics of individual worms (including the inter-individual variability in behavior), but also the aggregate population behavior. The authors note that the model does not require an assumption about behavioral state transitions, or evidence accumulation, as has been done previously, but rather that the stochastic nature of behavior is "simply the product of stochastic sampling from an exponential function".
  
  Strengths:
  
  This model provides a strong juxtaposition to other foraging models in the worm. Rather than evoking a behavioral transition function (that might arise from a change in internal state or the activity of a cell type in the network), or evidence accumulation (which again maps onto a cell type, or the activity of a network) - this model explains behavior via the stochastic sampling of a function of an exponential decay. The underlying model and the dynamics being simulated, as well as the process of stochastic sampling are well described and the model fits the exponential function (equation 1) to data on a large array of worms exhibiting diverse behaviors (1600+ worms from Lopez-Cruz et al). The work of this study is able to explain or describe the inter-individual diversity of worm behavior across a large population. The model is also able to capture two aspects of the reorientations, including the dynamics (to switch or not to switch) and the kinetics (slow vs fast reorientations). The authors also work to compare their model to a few others including the Levy walk (whose construction arises from a Markov process) to a simple exponential distribution, all of which have been used to study foraging and search behaviors.
  
  Weaknesses:
  
  This manuscript has two weaknesses that dampen the enthusiasm for the results. First, in all of the examples the authors cite where a Gillespie algorithm is used to sample from a distribution, be it the kinetics associated with chemical dynamics, or a Lotka-Volterra Competition Model, there are underlying processes that govern the evolution of the dynamics, and thus the sampling from distributions. In one of their references for instance, the stochasticity arises from the birth and death rates, thereby influencing the genetic drift in the model. In these examples, the process governing the dynamics (and thus generating the distributions from which one samples) are distinct from the behavior being studied. In this manuscript, the distribution being sampled from is the exponential decay function of the reorientation rate (lines 100-102). This appears to be tautological - a decay function fitted to the reorientation data is then sampled to generate the distributions of the reorientation data. That the model performs well, and matches the data is commendable, but it is unclear how that could not be the case if the underlying function generating the distribution was fit to the data.
  
  The second weakness is somewhat related to the first, in that absent an underlying mechanism or framework, one is left wondering what insight the model provides. Stochastic sampling a function generated by fitting the data to produce stochastic behavior is where one ends up in this framework, and the authors indeed point this out: "simple stochastic models should be sufficient to explain observably stochastic behaviors." (Line 233-234). But if that is the case, what do we learn about how the foraging is happening. The authors suggest that the decay parameter M can be considered a memory timescale; which offers some suggestion, but then go on to say that the "physical basis of M can come from multiple sources". Here is where one is left for want: The mechanisms suggested, including loss of sensory stimuli, alternations in motor integration, ionotropic glutamate signaling, dopamine, and neuropeptides are all suggested: this is basically all of the possible biological sources that can govern behavior, and one is left not knowing what insight the model provides. The array of biological processes listed are so variable in dynamics and meaning, that their explanation of what govern M is at best unsatisfying. Molecular dynamics models that generate distributions can point to certain properties of the model, such as the binding kinetics (on and off rates, etc.) as explanations for the mechanisms generating the distributions, and therefore point to how a change in the biology affects the stochasticity of the process. It is unclear how this model provides such a connection, especially taken in aggregate with the previous weakness.
  
  Providing a roadmap of how to think about the processes generating M, the meaning of those processes in search, and potential frameworks that are more constrained and with more precise biological underpinning (beyond the array of possibilities described) would go a long way to assuaging the weaknesses.
  
  Comments on revised version:
  
  The authors have addressed the main concerns of the manuscript.
  
  Review 2
Visit annotations in context

Tags

Summary

Review 1

Review 2

Annotators

Public_Reviews

URL

arxiv.org/abs/2309.15174v3
www.biorxiv.org www.biorxiv.org

Heterozygosity at a conserved candidate sex determination locus is associated with female development in the clonal raider ant (Ooceraea biroi)

5
1. Public_Reviews 09 Jun 2025
 
 in eLife
 
 eLife Assessment
 
 This study provides valuable insights into the evolutionary conservation of sex determination mechanisms in ants by identifying a candidate sex-determining region in a parthenogenetic species. The strength of evidence is solid, using genomic analyses to identify differences in heterozygosity between females and diploid males, though the conclusions are limited by the lack of functional analysis.
 
 Summary
2. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 This study investigates the sex determination mechanism in the clonal ant Ooceraea biroi, focusing on a candidate complementary sex determination (CSD) locus-one of the key mechanisms supporting haplodiploid sex determination in hymenopteran insects. Using whole genome sequencing, the authors analyze diploid females and the rarely occurring diploid males of O. biroi, identifying a 46 kb candidate region that is consistently heterozygous in females and predominantly homozygous in diploid males. This region shows elevated genetic diversity, as expected under balancing selection. The study also reports the presence of an lncRNA near this heterozygous region, which, though only distantly related in sequence, resembles the ANTSR lncRNA involved in female development in the Argentine ant, Linepithema humile (Pan et al. 2024). Together, these findings suggest a potentially conserved sex determination mechanism across ant species. However, while the analyses are well conducted and the paper is clearly written, the insights are largely incremental. The central conclusion - that the sex determination locus is conserved in ants - was already proposed and experimentally supported by Pan et al. (2024), who included O. biroi among the studied species and validated the locus's functional role in the Argentine ant. The present study thus largely reiterates existing findings without providing novel conceptual or experimental advances.
 
 Other comments:
 
 The mapping is based on a very small sample size: 19 females and 16 diploid males, and these all derive from a single clonal line. This implies a rather high probability for false-positive inference. In combination with the fact that only 11 out of the 16 genotyped males are actually homozygous at the candidate locus, I think a more careful interpretation regarding the role of the mapped region in sex determination would be appropriate. The main argument supporting the role of the candidate region in sex determination is based on the putative homology with the lncRNA involved in sex determination in the Argentine ant, but this argument was made in a previous study (as mentioned above). In the abstract, it is stated that CSD loci have been mapped in honeybees and two ant species, but we know little about their evolutionary history. But CSD candidate loci were also mapped in a wasp with multi-locus CSD (study cited in the introduction). This wasp is also parthenogenetic via central fusion automixis and produces diploid males. This is a very similar situation to the present study and should be referenced and discussed accordingly, particularly since the authors make the interesting suggestion that their ant also has multi-locus CSD and neither the wasp nor the ant has tra homologs in the CSD candidate regions. Also, is there any homology to the CSD candidate regions in the wasp species and the studied ant?
 
 The authors used different clonal lines of O. biroi to investigate whether heterozygosity at the mapped CSD locus is required for female development in all clonal lines of O. biroi (L187-196). However, given the described parthenogenesis mechanism in this species conserves heterozygosity, additional females that are heterozygous are not very informative here. Indeed, one would need diploid males in these other clonal lines as well (but such males have not yet been found) to make any inference regarding this locus in other lines.
 
 Review 1
3. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 The manuscript by Lacy et al. is well written, with a clear and compelling introduction that effectively conveys the significance of the study. The methods are appropriate and well-executed, and the results, both in the main text and supplementary materials, are presented in a clear and detailed manner. The authors interpret their findings with appropriate caution.
 
 This work makes a valuable contribution to our understanding of the evolution of complementary sex determination (CSD) in ants. In particular, it provides important evidence for the ancient origin of a non-coding locus implicated in sex determination, and shows that, remarkably, this sex locus is conserved even in an ant species with a non-canonical reproductive system that typically does not produce males. I found this to be an excellent and well-rounded study, carefully analyzed and well contextualized.
 
 That said, I do have a few minor comments, primarily concerning the discussion of the potential 'ghost' CSD locus. While the authors acknowledge (line 367) that they currently have no data to distinguish among the alternative hypotheses, I found the evidence for an additional CSD locus presented in the results (lines 261-302) somewhat limited and at times a bit difficult to follow. I wonder whether further clarification or supporting evidence could already be extracted from the existing data. Specifically:
 
 (1) Line 268: I doubt the relevance of comparing the proportion of diploid males among all males between lines A and B to infer the presence of additional CSD loci. Since the mechanisms producing these two types of males differ, it might be more appropriate to compare the proportion of diploid males among all diploid offspring. This ratio has been used in previous studies on CSD in Hymenoptera to estimate the number of sex loci (see, for example, Cook 1993, de Boer et al. 2008, 2012, Ma et al. 2013, and Chen et al., 2021). The exact method might not be applicable to clonal raider ants, but I think comparing the percentage of diploid males among the total number of (diploid) offspring produced between the two lineages might be a better argument for a difference in CSD loci number.
 
 (2) If line B indeed carries an additional CSD locus, one would expect that some females could be homozygous at the ANTSR locus but still viable, being heterozygous only at the other locus. Do the authors detect any females in line B that are homozygous at the ANTSR locus? If so, this would support the existence of an additional, functionally independent CSD locus.
 
 (3) Line 281: The description of the two tra-containing CSD loci as "conserved" between Vollenhovia and the honey bee may be misleading. It suggests shared ancestry, whereas the honey bee csd gene is known to have arisen via a relatively recent gene duplication from fem/tra (10.1038/nature07052). It would be more accurate to refer to this similarity as a case of convergent evolution rather than conservation.
 
 (4) Finally, since the authors successfully identified multiple alleles of the first CSD locus using previously sequenced haploid males, I wonder whether they also observed comparable allelic diversity at the candidate second CSD locus. This would provide useful supporting evidence for its functional relevance.
 
 Overall, these are relatively minor points in the context of a strong manuscript, but I believe addressing them would improve the clarity and robustness of the authors' conclusions.
 
 Review 2
4. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 The sex determination mechanism governed by the complementary sex determination (CSD) locus is one of the mechanisms that support the haplodiploid sex determination system evolved in hymenopteran insects. While many ant species are believed to possess a CSD locus, it has only been specifically identified in two species. The authors analyzed diploid females and the rarely occurring diploid males of the clonal ant Ooceraea biroi and identified a 46 kb CSD candidate region that is consistently heterozygous in females and predominantly homozygous in males. This region was found to be homologous to the CSD locus reported in distantly related ants. In the Argentine ant, Linepithema humile, the CSD locus overlaps with an lncRNA (ANTSR) that is essential for female development and is associated with the heterozygous region (Pan et al. 2024). Similarly, an lncRNA is encoded near the heterozygous region within the CSD candidate region of O. biroi. Although this lncRNA shares low sequence similarity with ANTSR, its potential functional involvement in sex determination is suggested. Based on these findings, the authors propose that the heterozygous region and the adjacent lncRNA in O. biroi may trigger female development via a mechanism similar to that of L. humile. They further suggest that the molecular mechanisms of sex determination involving the CSD locus in ants have been highly conserved for approximately 112 million years. This study is one of the few to identify a CSD candidate region in ants and is particularly noteworthy as the first to do so in a parthenogenetic species.
 
 Strengths:
 
 (1) The CSD candidate region was found to be homologous to the CSD locus reported in distantly related ant species, enhancing the significance of the findings.
 
 (2) Identifying the CSD candidate region in a parthenogenetic species like O. biroi is a notable achievement and adds novelty to the research.
 
 Weaknesses
 
 (1) Functional validation of the lncRNA's role is lacking, and further investigation through knockout or knockdown experiments is necessary to confirm its involvement in sex determination.
 
 (2) The claim that the lncRNA is essential for female development appears to reiterate findings already proposed by Pan et al. (2024), which may reduce the novelty of the study.
 
 Review 3
5. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Author response:
 
 Reviewer #1 (Public review):
 
 This study investigates the sex determination mechanism in the clonal ant Ooceraea biroi, focusing on a candidate complementary sex determination (CSD) locus-one of the key mechanisms supporting haplodiploid sex determination in hymenopteran insects. Using whole genome sequencing, the authors analyze diploid females and the rarely occurring diploid males of O. biroi, identifying a 46 kb candidate region that is consistently heterozygous in females and predominantly homozygous in diploid males. This region shows elevated genetic diversity, as expected under balancing selection. The study also reports the presence of an lncRNA near this heterozygous region, which, though only distantly related in sequence, resembles the ANTSR lncRNA involved in female development in the Argentine ant, Linepithema humile (Pan et al. 2024). Together, these findings suggest a potentially conserved sex determination mechanism across ant species. However, while the analyses are well conducted and the paper is clearly written, the insights are largely incremental. The central conclusion - that the sex determination locus is conserved in ants - was already proposed and experimentally supported by Pan et al. (2024), who included O. biroi among the studied species and validated the locus's functional role in the Argentine ant. The present study thus largely reiterates existing findings without providing novel conceptual or experimental advances.
 
 Although it is true that Pan et al., 2024 demonstrated (in Figure 4 of their paper) that the synteny of the region flanking ANTSR is conserved across aculeate Hymenoptera (including O. biroi), Reviewer 1’s claim that that paper provides experimental support for the hypothesis that the sex determination locus is conserved in ants is inaccurate. Pan et al., 2024 only performed experimental work in a single ant species (Linepithema humile) and merely compared reference genomes of multiple species to show synteny of the region, rather than functionally mapping or characterizing these regions.
 
 Other comments:
 
 The mapping is based on a very small sample size: 19 females and 16 diploid males, and these all derive from a single clonal line. This implies a rather high probability for false-positive inference. In combination with the fact that only 11 out of the 16 genotyped males are actually homozygous at the candidate locus, I think a more careful interpretation regarding the role of the mapped region in sex determination would be appropriate. The main argument supporting the role of the candidate region in sex determination is based on the putative homology with the lncRNA involved in sex determination in the Argentine ant, but this argument was made in a previous study (as mentioned above).
 
 Our main argument supporting the role of the candidate region in sex determination is not based on putative homology with the lncRNA in L. humile. Instead, our main argument comes from our genetic mapping (in Fig. 2), and the elevated nucleotide diversity within the identified region (Fig. 4). Additionally, we highlight that multiple genes within our mapped region are homologous to those in mapped sex determining regions in both L. humile and Vollenhovia emeryi, possibly including the lncRNA.
 
 In response to the Reviewer’s assertion that the mapping is based on a small sample size from a single clonal line, we want to highlight that we used all diploid males available to us. Although the primary shortcoming of a small sample size is to increase the probability of a false negative, small sample sizes can also produce false positives. We used two approaches to explore the statistical robustness of our conclusions. First, we generated a null distribution by randomly shuffling sex labels within colonies and calculating the probability of observing our CSD index values by chance (shown in Fig. 2). Second, we directly tested the association between homozygosity and sex using Fisher’s Exact Test (shown in Supplementary Fig. S2). In both cases, the association of the candidate locus with sex was statistically significant after multiple-testing correction using the Benjamini-Hochberg False Discovery Rate. These approaches are clearly described in the “CSD Index Mapping” section of the Methods.
 
 We also note that, because complementary sex determination loci are expected to evolve under balancing selection, our finding that the mapped region exhibits a peak of nucleotide diversity lends orthogonal support to the notion that the mapped locus is indeed a complementary sex determination locus.
 
 The fourth paragraph of the results and the sixth paragraph of the discussion are devoted to explaining the possible reasons why only 11/16 genotyped males are homozygous in the mapped region. The revised manuscript will include an additional sentence (in what will be lines 384-388) in this paragraph that includes the possible explanation that this locus is, in fact, a false positive, while also emphasizing that we find this possibility to be unlikely given our multiple lines of evidence.
 
 In response to Reviewer 1’s suggestion that we carefully interpret the role of the mapped region in sex determination, we highlight our careful wording choices, nearly always referring to the mapped locus as a “candidate sex determination locus” in the title and throughout the manuscript. For consistency, the revised manuscript version will change the second results subheading from “The O. biroi CSD locus is homologous to another ant sex determination locus but not to honeybee csd” to “O. biroi’s candidate CSD locus is homologous to another ant sex determination locus but not to honeybee csd,” and will add the word “candidate” in what will be line 320 at the beginning of the Discussion, and will change “putative” to “candidate” in what will be line 426 at the end of the Discussion.
 
 In the abstract, it is stated that CSD loci have been mapped in honeybees and two ant species, but we know little about their evolutionary history. But CSD candidate loci were also mapped in a wasp with multi-locus CSD (study cited in the introduction). This wasp is also parthenogenetic via central fusion automixis and produces diploid males. This is a very similar situation to the present study and should be referenced and discussed accordingly, particularly since the authors make the interesting suggestion that their ant also has multi-locus CSD and neither the wasp nor the ant has tra homologs in the CSD candidate regions. Also, is there any homology to the CSD candidate regions in the wasp species and the studied ant?
 
 In response to Reviewer 1’s suggestion that we reference the (Matthey-Doret et al. 2019) study in the context of diploid males being produced via losses of heterozygosity during asexual reproduction, the revised manuscript will include the following sentence: “Therefore, if O. biroi uses CSD, diploid males might result from losses of heterozygosity at sex determination loci (Fig. 1C), similar to what is thought to occur in other asexual Hymenoptera that produce diploid males (Rabeling and Kronauer 2012; Matthey-Doret et al. 2019).”
 
 We note, however, that in their 2019 study, Matthey-Doret et al. did not directly test the hypothesis that diploid males result from losses of heterozygosity at CSD loci during asexual reproduction, because the diploid males they used for their mapping study came from inbred crosses in a sexual population of that species.
 
 We address this further below, but we want to emphasize that we do not intend to argue that O. biroi has multiple CSD loci. Instead, we suggest that additional, undetected CSD loci is one possible explanation for the absence of diploid males from any clonal line other than clonal line A. In response to Reviewer 1’s suggestion that we reference the (Matthey-Doret et al. 2019) study in the context of multilocus CSD, the revised manuscript version will include the following additional sentence in the fifth paragraph of the discussion: “Multi-locus CSD has been suggested to limit the extent of diploid male production in asexual species under some circumstances (Vorburger 2013; Matthey-Doret et al. 2019).”
 
 Regarding Reviewer 2’s question about homology between the putative CSD loci from the (Matthey-Doret et al. 2019) study and O. biroi, we note that there is no homology. The revised manuscript version will have an additional Supplementary Table (which will be the new Supplementary Table S3) that will report the results of this homology search. The revised manuscript will also include the following additional sentence in the Results: “We found no homology between the genes within the O. biroi CSD index peak and any of the genes within the putative L. fabarum CSD loci (Supplementary Table S3).”
 
 The authors used different clonal lines of O. biroi to investigate whether heterozygosity at the mapped CSD locus is required for female development in all clonal lines of O. biroi (L187-196). However, given the described parthenogenesis mechanism in this species conserves heterozygosity, additional females that are heterozygous are not very informative here. Indeed, one would need diploid males in these other clonal lines as well (but such males have not yet been found) to make any inference regarding this locus in other lines.
 
 We agree that a full mapping study including diploid males from all clonal lines would be preferable, but as stated earlier in that same paragraph, we have only found diploid males from clonal line A. We stand behind our modest claim that “Females from all six clonal lines were heterozygous at the CSD index peak, consistent with its putative role as a CSD locus in all O. biroi.” In the revised manuscript version, this sentence (in what will be lines 199-201) will be changed slightly in response to a reviewer comment below: “All females from all six clonal lines (including 26 diploid females from clonal line B) were heterozygous at the CSD index peak, consistent with its putative role as a CSD locus in all O. biroi.”
 
 Reviewer #2 (Public review):
 
 The manuscript by Lacy et al. is well written, with a clear and compelling introduction that effectively conveys the significance of the study. The methods are appropriate and well-executed, and the results, both in the main text and supplementary materials, are presented in a clear and detailed manner. The authors interpret their findings with appropriate caution.
 
 This work makes a valuable contribution to our understanding of the evolution of complementary sex determination (CSD) in ants. In particular, it provides important evidence for the ancient origin of a non-coding locus implicated in sex determination, and shows that, remarkably, this sex locus is conserved even in an ant species with a non-canonical reproductive system that typically does not produce males. I found this to be an excellent and well-rounded study, carefully analyzed and well contextualized.
 
 That said, I do have a few minor comments, primarily concerning the discussion of the potential 'ghost' CSD locus. While the authors acknowledge (line 367) that they currently have no data to distinguish among the alternative hypotheses, I found the evidence for an additional CSD locus presented in the results (lines 261-302) somewhat limited and at times a bit difficult to follow. I wonder whether further clarification or supporting evidence could already be extracted from the existing data. Specifically:
 
 We agree with Reviewer 2 that the evidence for a second CSD locus is limited. In fact, we do not intend to advocate for there being a second locus, but we suggest that a second CSD locus is one possible explanation for the absence of diploid males outside of clonal line A. In our initial version, we intentionally conveyed this ambiguity by titling this section “O. biroi may have one or multiple sex determination loci.” However, we now see that this leads to undue emphasis on the possibility of a second locus. In the revised manuscript, we will split this into two separate sections: “Diploid male production differs across O. biroi clonal lines” and “O. biroi lacks a tra-containing CSD locus.”
 
 (1) Line 268: I doubt the relevance of comparing the proportion of diploid males among all males between lines A and B to infer the presence of additional CSD loci. Since the mechanisms producing these two types of males differ, it might be more appropriate to compare the proportion of diploid males among all diploid offspring. This ratio has been used in previous studies on CSD in Hymenoptera to estimate the number of sex loci (see, for example, Cook 1993, de Boer et al. 2008, 2012, Ma et al. 2013, and Chen et al., 2021). The exact method might not be applicable to clonal raider ants, but I think comparing the percentage of diploid males among the total number of (diploid) offspring produced between the two lineages might be a better argument for a difference in CSD loci number.
 
 We want to re-emphasize here that we do not wish to advocate for there being two CSD loci in O. biroi. Rather, we want to explain that this is one possible explanation for the apparent absence of diploid males outside of clonal line A. We hope that the modifications to the manuscript described in the previous response help to clarify this.
 
 Reviewer 2 is correct that comparing the number of diploid males to diploid females does not apply to clonal raider ants. This is because males are vanishingly rare among the vast numbers of females produced. We do not count how many females are produced in laboratory stock colonies, and males are sampled opportunistically. Therefore, we cannot report exact numbers. However, we will add the following sentence to the revised manuscript: “Despite the fact that we maintain more colonies of clonal line B than of clonal line A in the lab, all the diploid males we detected came from clonal line A.”
 
 (2) If line B indeed carries an additional CSD locus, one would expect that some females could be homozygous at the ANTSR locus but still viable, being heterozygous only at the other locus. Do the authors detect any females in line B that are homozygous at the ANTSR locus? If so, this would support the existence of an additional, functionally independent CSD locus.
 
 We thank the reviewer for this suggestion, and again we emphasize that we do not want to argue in favor of multiple CSD loci. We just want to introduce it as one possible explanation for the absence of diploid males outside of clonal line A.
 
 The 26 sequenced diploid females from clonal line B are all heterozygous at the mapped locus, and the revised manuscript will clarify this in what will be lines 199-201. Previously, only six of those diploid females were included in Supplementary Table S2, and that will be modified accordingly.
 
 (3) Line 281: The description of the two tra-containing CSD loci as "conserved" between Vollenhovia and the honey bee may be misleading. It suggests shared ancestry, whereas the honey bee csd gene is known to have arisen via a relatively recent gene duplication from fem/tra (10.1038/nature07052). It would be more accurate to refer to this similarity as a case of convergent evolution rather than conservation.
 
 In the sentence that Reviewer 2 refers to, we are representing the assertion made in the (Miyakawa and Mikheyev 2015) paper in which, regarding their mapping of a candidate CSD locus that contains two linked tra homologs, they write in the abstract: “these data support the prediction that the same CSD mechanism has indeed been conserved for over 100 million years.” In that same paper, Miyakawa and Mikheyev write in the discussion section: “As ants and bees diverged more than 100 million years ago, sex determination in honey bees and V. emeryi is probably homologous and has been conserved for at least this long.”
 
 As noted by Reviewer 2, this appears to conflict with a previously advanced hypothesis: that because fem and csd were found in Apis mellifera, Apis cerana, and Apis dorsata, but only fem was found in Mellipona compressipes, Bombus terrestris, and Nasonia vitripennis, that the csd gene evolved after the honeybee (Apis) lineage diverged from other bees (Hasselmann et al. 2008). However, it remains possible that the csd gene evolved after ants and bees diverged from N. vitripennis, but before the divergence of ants and bees, and then was subsequently lost in B. terrestris and M. compressipes. This view was previously put forward based on bioinformatic identification of putative orthologs of csd and fem in bumblebees and in ants [(Schmieder et al. 2012), see also (Privman et al. 2013)]. However, subsequent work disagreed and argued that the duplications of tra found in ants and in bumblebees represented convergent evolution rather than homology (Koch et al. 2014). Distinguishing between these possibilities will be aided by additional sex determination locus mapping studies and functional dissection of the underlying molecular mechanisms in diverse Aculeata.
 
 Distinguishing between these competing hypotheses is beyond the scope of our paper, but the revised manuscript will include additional text to incorporate some of this nuance. We will include these modified lines below:
 
 “A second QTL region identified in V. emeryi (V.emeryiCsdQTL1) contains two closely linked tra homologs, similar to the closely linked honeybee tra homologs, csd and fem (Miyakawa and Mikheyev 2015). This, along with the discovery of duplicated tra homologs that undergo concerted evolution in bumblebees and ants (Schmieder et al. 2012; Privman et al. 2013) has led to the hypothesis that the function of tra homologs as CSD loci is conserved with the csd-containing region of honeybees (Schmieder et al. 2012; Miyakawa and Mikheyev 2015). However, other work has suggested that tra duplications occurred independently in honeybees, bumblebees, and ants (Hasselmann et al. 2008; Koch et al. 2014), and it remains to be demonstrated that either of these tra homologs acts as a primary CSD signal in V. emeryi.”
 
 (4) Finally, since the authors successfully identified multiple alleles of the first CSD locus using previously sequenced haploid males, I wonder whether they also observed comparable allelic diversity at the candidate second CSD locus. This would provide useful supporting evidence for its functional relevance.
 
 As is already addressed in the final paragraph of the results and in Supplementary Fig. S4, there is no peak of nucleotide diversity in any of the regions homologous to V.emeryiQTL1, which is the tra-containing candidate sex determination locus (Miyakawa and Mikheyev 2015). In the revised manuscript, the relevant lines will be 307-310. We want to restate that we do not propose that there is a second candidate CSD locus in O. biroi, but we simply raise the possibility that multi-locus CSD *might* explain the absence of diploid males from clonal lines other than clonal line A (as one of several alternative possibilities).
 
 Overall, these are relatively minor points in the context of a strong manuscript, but I believe addressing them would improve the clarity and robustness of the authors' conclusions.
 
 Reviewer #3 (Public review):
 
 Summary:
 
 The sex determination mechanism governed by the complementary sex determination (CSD) locus is one of the mechanisms that support the haplodiploid sex determination system evolved in hymenopteran insects. While many ant species are believed to possess a CSD locus, it has only been specifically identified in two species. The authors analyzed diploid females and the rarely occurring diploid males of the clonal ant Ooceraea biroi and identified a 46 kb CSD candidate region that is consistently heterozygous in females and predominantly homozygous in males. This region was found to be homologous to the CSD locus reported in distantly related ants. In the Argentine ant, Linepithema humile, the CSD locus overlaps with an lncRNA (ANTSR) that is essential for female development and is associated with the heterozygous region (Pan et al. 2024). Similarly, an lncRNA is encoded near the heterozygous region within the CSD candidate region of O. biroi. Although this lncRNA shares low sequence similarity with ANTSR, its potential functional involvement in sex determination is suggested. Based on these findings, the authors propose that the heterozygous region and the adjacent lncRNA in O. biroi may trigger female development via a mechanism similar to that of L. humile. They further suggest that the molecular mechanisms of sex determination involving the CSD locus in ants have been highly conserved for approximately 112 million years. This study is one of the few to identify a CSD candidate region in ants and is particularly noteworthy as the first to do so in a parthenogenetic species.
 
 Strengths:
 
 (1) The CSD candidate region was found to be homologous to the CSD locus reported in distantly related ant species, enhancing the significance of the findings.
 
 (2) Identifying the CSD candidate region in a parthenogenetic species like O. biroi is a notable achievement and adds novelty to the research.
 
 Weaknesses
 
 (1) Functional validation of the lncRNA's role is lacking, and further investigation through knockout or knockdown experiments is necessary to confirm its involvement in sex determination.
 
 See response below.
 
 (2) The claim that the lncRNA is essential for female development appears to reiterate findings already proposed by Pan et al. (2024), which may reduce the novelty of the study.
 
 We do not claim that the lncRNA is essential for female development in O. biroi, but simply mention the possibility that, as in L. humile, it is somehow involved in sex determination. We do not have any functional evidence for this, so this is purely based on its genomic position immediately adjacent to our mapped candidate region. We agree with the reviewer that the study by Pan et al. (2024) decreases the novelty of our findings. Another way of looking at this is that our study supports and bolsters previous findings by partially replicating the results in a different species.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.01.24.634795v2
www.biorxiv.org www.biorxiv.org

Intrinsic dynamic shapes responses to external stimulation in the human brain

3
1. Public_Reviews 09 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  This manuscript presents an interesting new framework (VARX) for simultaneously quantifying effective connectivity in brain activity during sensory stimulation and how that brain activity is being driven by that sensory stimulation. The reviewers thought the model was original and its conclusion that intrinsic connectivity is reduced (rather than increased) during sensory stimulation is very interesting, but that for ideal performance, one must specify all sensory features in the model, which is not possible. Overall, however, this work is important with convincing evidence for its conclusions - it will be of interest to neuroscientists working on brain connectivity and dynamics.
  
  Summary
2. Public_Reviews 09 Jun 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  This manuscript presents an interesting new framework (VARX) for simultaneously quantifying effective connectivity in brain activity during sensory stimulation and how that brain activity is being driven by that sensory stimulation. The core idea is to combine the Vector Autoregressive model that is often used to infer Granger-causal connectivity in brain data with an encoding model that maps the features of a sensory stimulus to that brain data. The authors do a nice job of explaining the framework. And then they demonstrate its utility through some simulations and some analysis of real intracranial EEG data recorded from subjects as they watched movies. They infer from their analyses that the functional connectivity in these brain recordings is essentially unaltered during movie watching, that accounting for the driving movie stimulus can protect one against misidentifying brain responses to the stimulus as functional connectivity, and that recurrent brain activity enhances and prolongs the putative neural responses to a stimulus.
  
  This manuscript presents an interesting new framework (VARX) for simultaneously quantifying effective connectivity in brain activity during sensory stimulation and how that brain activity is being driven by that sensory stimulation. Overall, I thought this was an interesting manuscript with some rich and intriguing ideas.
  
  Comments on revisions:'
  
  The responses to the previous comments are very helpful. I think the manuscript does a nice job now of presenting its interesting findings in a convincing and measured manner.
  
  I had only one small remaining suggestion - to maybe link the finding of reduced intrinsic connectivity during stimulation to previous work on that topic. I thought of Nauhaus et al., Nature Neurosci, 2009.
  
  Review 1
3. Public_Reviews 09 Jun 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors apply the recently developed VARX model, which explicitly models intrinsic dynamics and the effect of extrinsic inputs, to simulated data and intracranial EEG recordings. This method provides a directed method of 'intrinsic connectivity'. They argue this model is better suited to the analysis of task neuroimaging data because it separates the intrinsic and extrinsic activity. They show: that intrinsic connectivity is largely unaltered during a movie-watching task compared to eyes open rest; intrinsic noise is reduced in the task; and there is intrinsic directed connectivity from sensory to higher-order brain areas.
  
  Strengths:
  
  (1) The paper tackles an important issue with an appropriate method.
  
  (2) The authors validated their method on data simulated with a neural mass model.
  
  (3) They use intracranial EEG, which provides a direct measure of neuronal activity.
  
  (4) Code is made publicly available and the paper is written well.
  
  Comments on revisions:'
  
  The authors have addressed my comments.
  
  Review 2
Visit annotations in context

Tags

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.08.05.606665v3
www.biorxiv.org www.biorxiv.org

Interpretable Protein-DNA Interactions Captured by Structure-Sequence Optimization

3
1. Public_Reviews 09 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  This valuable work presents an interpretable protein-DNA Energy Associative (IDEA) model for predicting binding sites and affinities of DNA-binding proteins. While the method is convincing, it requires some adaptation for application to different proteins. The IDEA method is available and can be potentially used for predicting genome-wide protein-DNA binding sites.
  
  Summary
2. Public_Reviews 09 Jun 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Zhang et al. present a methodology to model protein-DNA interactions via learning an optimizable energy model, taking into account a represetative bound structure for the system and binding data. The methodology is sound and interesting. They apply this model for predicting binding affinity data and binding sites in vivo.
  
  Strengths:
  
  The manuscript is well organized with good visualizations and is easy to follow. The methodology is discussed in detail. The IDEA energy model seems like an interesting way to study a protein-DNA system in the context of a given structure and binding data. The authors show that an IDEA model trained on one system can be transferred to other structurally similar systems. The authors show good performance in discriminating between binding-vs-decoy sequences for various systems, and binding affinity prediction. The authors also show evidence of the ability to predict genome-wide binding sites.
  
  Weaknesses:
  
  An energy-based model which needs to be optimized for specific systems is inherently an uncomfortable idea. Prediction of binding affinity is a well-studied domain and many competitors exist, some of which are well used. The usefulness of this method will be a test of time. The methodology is interpretable in a limited sense. The model is dependent on preserved interface geometry which might lead to suboptimal results for novel folds. The model predicts different output for reverse complement sequence (which in reality are the same as far as double helix is concerned). This is unintuitive.
  
  Comments on revisions:
  
  The authors have addressed my points regarding comparisons with existing methods, clarifying discussion terminologies and proper discussion of the existing literature. This resulted in a stronger manuscript with a clearer understanding of applicability.
  
  Review 1
3. Public_Reviews 09 Jun 2025
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  Protein-DNA interactions and sequence readout represent a challenging and rapidly evolving field of study. Recognizing the complexity of this task, the authors have developed a compact and elegant model. They applied well-established approaches to address a difficult problem, effectively enhancing the information extracted from sparse contact maps by integrating an artificial decoy sequence set and available experimental data. This has resulted in a practical tool that can be adapted for use with other proteins.
  
  Strengths:
  
  The authors integrate sparse information with available experimental data to construct a model whose utility extends beyond the limited set of structures used for training.
  
  A comprehensive methods section is included, ensuring reproducibility.
  
  The authors provide a well-represented performance comparison between their model and other existing models.
  
  Additionally, the authors have shared their model as a GitHub project, reflecting their commitment to research transparency.
  
  Weaknesses:
  
  The coarse-graining procedure is quite convoluted, but the authors provide reasoning for the proposed scheme. The authors acknowledge discrepancies between data-driven and simulation models.
  
  Review 2
Visit annotations in context

Tags

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.05.26.595895v3
www.biorxiv.org www.biorxiv.org

https://www.biorxiv.org/content/10.1101/2023.02.28.530247v4

5
1. Public_Reviews 09 Jun 2025
 
 in eLife
 
 eLife Assessment
 
 In this important paper, the authors use single-cell RNA sequencing to understand post-mitotic cone and rod developmental states and identify cone-specific features that contribute to retinoblastoma genesis. The authors report findings that have practical implications for retinal development, gene expression, and cell fate specification. The evidence is compelling as the experimental design and analysis are exceptionally rigorous.
 
 Summary
2. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 The authors have used full length single cell sequencing on a sorted population of human fetal retina to delineate expression patterns associated with the progression of progenitors to rod and cone photoreceptors. They find that rod.cone precursors contain a mix of rod/cone determinants, with a bias in both amounts and isoform balance likely deciding the ultimate cell fate. Markers of early rod/cone hybrids are clarified, and a gradient of lncRNAs is uncovered in maturing cones. Comparison of early rods and cones exposes an enriched MYCN regulon, as well as expression of SYK, which may contribute to tumor initiation in RB1 deficient cone precursors.
 
 Strengths:
 
 The insight into how cone and rod transcripts are mixed together at first is important and clarifies a long-standing notion in the field.
 
 The discovery of distinct active vs inactive mRNA isoforms for rod and cone determinants is crucial to understand how cells make the decision to form one or the other cell type. This is only really possible with full length scRNAseq analysis.
 
 New markers of subpopulations are also uncovered, such as CHRNA1 in rod/cone hybrids that seem to give rise to either rods or cones.
 
 Regulon analyses provide insight into key transcription factor programs linked to rod or cone fates.
 
 The gradient of lncRNAs in maturing cones is novel, and while the functional significance is unclear, it opens up a new line of questioning around photoreceptor maturation.
 
 The finding that SYK mRNA is naturally expressed in cone precursors is novel, as previously it was assumed that SYK expression required epigenetic rewiring in tumors.
 
 Weaknesses:
 
 Functional data on many new hypothesis regarding potential players in cone genesis are not performed, but these are beyond the scope of the current work.
 
 Validation of the SYK inhibitor data e.g. by genetic means, is not included, but the authors acknowledge this caveat throughout.
 
 Review 1
3. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The authors used deep full-length single-cell sequencing to study the human photoreceptor development, with a particular emphasis on the characteristics of photoreceptors that may contribute to retinoblastoma.
 
 Strengths:
 
 This single-cell study captures gene regulation in photoreceptors across different developmental stages, defining post-mitotic cone and rod populations by highlighting their unique gene expression profiles through analyses such as RNA velocity and SCENIC. By leveraging full-length sequencing data, the study identifies differentially expressed isoforms of NRL and THRB in L/M cone and rod precursors, illustrating the dynamic gene regulation involved in photoreceptor fate commitment. Additionally, the authors performed high-resolution clustering to explore markers defining developing photoreceptors across the fovea and peripheral retina, particularly characterizing SYK's role in the proliferative response of cones in the RB loss background. The study provides an in-depth analysis of developing human photoreceptors, with the authors conducting thorough analyses using full-length single-cell RNA sequencing. The strength of the study lies in its design, which integrates single-cell full-length RNA-seq, long-read RNA-seq, and follow-up histological and functional experiments to provide compelling evidence supporting their conclusions. The model of cell type-dependent splicing for NRL and THRB is particularly intriguing. Moreover, the potential involvement of the SYK and MYC pathways with RB in cone progenitor cells aligns with previous literature, offering additional insights into RB development.
 
 Weaknesses:
 
 The manuscript feels somewhat unfocused, with a lack of a strong connection between the analysis of developing photoreceptors, which constitutes the bulk of the manuscript, and the discussion on retinoblastoma. Additionally, given the recent publication of several single-cell studies on developing human retina, it is important for the authors to cross-validate their findings and adjust their statements where appropriate.
 
 Comments on revisions:
 
 The authors have done quite thorough work addressing concerns raised by myself and other reviewers. The identification of unresolved developing state of rod/cone precursor cell is interesting and intriguing. I do not have much more to add.
 
 Review 2
4. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 The authors use high-depth, full-length scRNA-Seq analysis of fetal human retina to identify novel regulators of photoreceptor specification and retinoblastoma progression.
 
 Strengths:
 
 The use of high-depth, full-length scRNA-Seq to identify functionally important alternatively spliced variants of transcription factors controlling photoreceptor subtype specification, and identification of SYK as a potential mediator of RB1-dependent cell cycle reentry in immature cone photoreceptors.
 
 Weaknesses:
 
 Relatively minor. This is a technically strong and thorough study that is broadly useful to investigators studying retinal development and retinoblastoma.
 
 Comments on revisions:
 
 The authors have addressed all points raised in the review and considerably strengthened the manuscript. No additional changes are required.
 
 Review 3
5. Public_Reviews 09 Jun 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews
 
 Public Reviews:
 
 Reviewer #1 (Public review):
 
 Summary:
 
 The authors have used full-length single-cell sequencing on a sorted population of human fetal retina to delineate expression patterns associated with the progression of progenitors to rod and cone photoreceptors. They find that rod and cone precursors contain a mix of rod/cone determinants, with a bias in both amounts and isoform balance likely deciding the ultimate cell fate. Markers of early rod/cone hybrids are clarified, and a gradient of lncRNAs is uncovered in maturing cones. Comparison of early rods and cones exposes an enriched MYCN regulon, as well as expression of SYK, which may contribute to tumor initiation in RB1 deficient cone precursors.
 
 Strengths:
 
 (1) The insight into how cone and rod transcripts are mixed together at first is important and clarifies a long-standing notion in the field.
 
 (2) The discovery of distinct active vs inactive mRNA isoforms for rod and cone determinants is crucial to understanding how cells make the decision to form one or the other cell type. This is only really possible with full-length scRNAseq analysis.
 
 (3) New markers of subpopulations are also uncovered, such as CHRNA1 in rod/cone hybrids that seem to give rise to either rods or cones.
 
 (4) Regulon analyses provide insight into key transcription factor programs linked to rod or cone fates.
 
 (5) The gradient of lncRNAs in maturing cones is novel, and while the functional significance is unclear, it opens up a new line of questioning around photoreceptor maturation.
 
 (6) The finding that SYK mRNA is naturally expressed in cone precursors is novel, as previously it was assumed that SYK expression required epigenetic rewiring in tumors.
 
 We thank the reviewer for describing the study’s strengths, reflecting the major conclusions of the initially submitted manuscript. However, based on new analyses – including the requested analyses of other scRNA-seq datasets, our revision clarifies that:
 
 - related to point (1), cone and rod transcripts do not appear to be mixed together at first (i.e., in immediately post-mitotic immature cone and rod precursors) but appear to be coexpressed in subsequent cone and rod precursor stages; and
 
 - related to point (3), CHRNA1 appears to mark immature cone precursors that are distinct from the maturing cone and rod precursors that co-express cone- and rod-related RNAs (despite the similar UMAP positions of the two populations in our dataset).
 
 Weaknesses:
 
 (1) The writing is very difficult to follow. The nomenclature is confusing and there are contradictory statements that need to be clarified.
 
 (2) The drug data is not enough to conclude that SYK inhibition is sufficient to prevent the division of RB1 null cone precursors. Drugs are never completely specific so validation is critical to make the conclusion drawn in the paper.
 
 We thank the reviewer for noting these important issues. Accordingly, in the revised manuscript:
 
 (1) We improve the writing and clarify the nomenclature and contradictory statements, particularly those noted in the Reviewer’s Recommendations for Authors.
 
 (2) We scale back claims related to the role of SYK in the cone precursor response to RB1 loss, with wording changes in the Abstract, Results, and Discussion, which now recognize that the inhibitor studies only support the possibility that cone-intrinsic SYK expression contributes to retinoblastoma initiation, as detailed in our responses to Reviewer’s Recommendations for Authors. We agree and now mention that genetic perturbation of SYK is required to prove its role.
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The authors used deep full-length single-cell sequencing to study human photoreceptor development, with a particular emphasis on the characteristics of photoreceptors that may contribute to retinoblastoma.
 
 Strengths:
 
 This single-cell study captures gene regulation in photoreceptors across different developmental stages, defining post-mitotic cone and rod populations by highlighting their unique gene expression profiles through analyses such as RNA velocity and SCENIC. By leveraging fulllength sequencing data, the study identifies differentially expressed isoforms of NRL and THRB in L/M cone and rod precursors, illustrating the dynamic gene regulation involved in photoreceptor fate commitment. Additionally, the authors performed high-resolution clustering to explore markers defining developing photoreceptors across the fovea and peripheral retina, particularly characterizing SYK's role in the proliferative response of cones in the RB loss background. The study provides an in-depth analysis of developing human photoreceptors, with the authors conducting thorough analyses using full-length single-cell RNA sequencing. The strength of the study lies in its design, which integrates single-cell full-length RNA-seq, longread RNA-seq, and follow-up histological and functional experiments to provide compelling evidence supporting their conclusions. The model of cell type-dependent splicing for NRL and THRB is particularly intriguing. Moreover, the potential involvement of the SYK and MYC pathways with RB in cone progenitor cells aligns with previous literature, offering additional insights into RB development.
 
 We thank the reviewer for summarizing the main findings and noting the compelling support for the conclusions, the intriguing cell type-dependent splicing of rod and cone lineage factors, and the insights into retinoblastoma development.
 
 Weaknesses:
 
 The manuscript feels somewhat unfocused, with a lack of a strong connection between the analysis of developing photoreceptors, which constitutes the bulk of the manuscript, and the discussion on retinoblastoma. Additionally, given the recent publication of several single-cell studies on the developing human retina, it is important for the authors to cross-validate their findings and adjust their statements where appropriate.
 
 We agree that the manuscript covers a range of topics resulting from the full-length scRNAseq analyses and concur that some studies of developing photoreceptors were not well connected to retinoblastoma. However, we also note that the connection to retinoblastoma is emphasized in several places in the Introduction and throughout the manuscript and was a significant motivation for pursuing the analyses. We suggest that it was valuable to highlight how deep, fulllength scRNA-seq of developing retina provides insights into retinoblastoma, including i) the similar biased expression of NRL transcript isoforms in cone precursors and RB tumors, ii) the cone precursors’ co-expression of rod- and cone-related genes such as NR2E3 and GNAT2, which may explain similar co-expression in RB cells, and iii) the expression of SYK in early cones and RB cells. While the earlier version had mainly highlighted point (iii), the revised Discussion further refers to points (i) and (ii) as described further in the response to the Reviewer’s Recommendations for Authors.
 
 We address the Reviewer’s request to cross-validate our findings with those of other single-cell studies of developing human retina by relating the different photoreceptor-related cell populations identified in our study to those characterized by Zuo et al (PMID 39117640), which was specifically highlighted by the reviewer and is especially useful for such cross-validation given the extraordinarily large ~ 220,000 cell dataset covering a wide range of retinal ages (pcw 8–23) and spatiotemporally stratified by macular or peripheral retina location. Relevant analyses of the Zuo et al dataset are shown in Supplementary Figures S3G-H, S10B, S11A-F, and S13A,B.
 
 Reviewer #3 (Public review):
 
 Summary:
 
 The authors use high-depth, full-length scRNA-Seq analysis of fetal human retina to identify novel regulators of photoreceptor specification and retinoblastoma progression.
 
 Strengths:
 
 The use of high-depth, full-length scRNA-Seq to identify functionally important alternatively spliced variants of transcription factors controlling photoreceptor subtype specification, and identification of SYK as a potential mediator of RB1-dependent cell cycle reentry in immature cone photoreceptors.
 
 Human developing fetal retinal tissue samples were collected between 13-19 gestational weeks and this provides a substantially higher depth of sequencing coverage, thereby identifying both rare transcripts and alternative splice forms, and thereby representing an important advance over previous droplet-based scRNA-Seq studies of human retinal development.
 
 Weaknesses:
 
 The weaknesses identified are relatively minor. This is a technically strong and thorough study, that is broadly useful to investigators studying retinal development and retinoblastoma.
 
 We thank the reviewer for describing the strengths of the study. Our revision addresses the concerns raised separately in the Reviewer’s Recommendations for Authors, as detailed in the responses below.
 
 Recommendations for the authors:
 
 Reviewing Editor Comments:
 
 The reviewers have completed their reviews. Generally, they note that your work is important and that the evidence is generally convincing. The reviewers are in general agreement that the paper adds to the field. The findings of rod/cone fate determination at a very early stage are intriguing. Generally, the paper would benefit from clarifications in the writing and figures. Experimentally, the paper would benefit from validation of the drug data, for example using RNAi or another assay. Alternatively, the authors could note the caveats of the drug experiments and describe how they could be improved. In terms of analysis, the paper would be improved by additional comparisons of the authors' data to previously published datasets.
 
 We thank the reviewing editor for this summary. As described in the individual reviewer responses, we clarify the writing and figures and provide comparisons to previously published datasets (in particular, the large snRNA-seq dataset of Zuo et al., 2024 (PMID 39117640). With regard to the drug (i.e., SYK inhibitor) studies, we opted to provide caveats and describe the need for genetic approaches to validate the role of SYK, owing to the infeasibility of completing genetic perturbation experiments in the appropriate timeframe. We are grateful for the opportunity to present our findings with appropriate caveats.
 
 Reviewer #1 (Recommendations for the authors):
 
 Shayler cell sort human progenitor/rod/cone populations then full-length single cell RNAseq to expose features that distinguish paths towards rods or cones. They initially distinguish progenitors (RPCs), immature photoreceptor precursors (iPRPs), long/medium wavelength (LM) cones, late-LM cones, short wavelength (S) cones, early rods (ER) and late rods (LR), which exhibit distinct transcription factor regulons (Figures 1, 2). These data expose expected and novel enriched genes, and support the notion that S cones are a default state lacking expression of rod (NRL) or cone (THRB) determinants but retaining expression of generic photoreceptor drivers (CRX/OTX2/NEUROD1 regulons). They identify changes in regulon activity, such as increasing NRL activity from iPRP to ER to LR, but decreasing from iPRP to cones, or increasing RAX/ISL2/THRB regulon activity from iPRP to LM cones, but decreasing from iPRP to S cones or rods.
 
 They report co-expression of rod/cone determinants in LM and ER clusters, and the ratios are in the expected directions (NRLTHRB or RXRG in ER). A novel insight from the FL seq is that there are differing variants generated in each cell population. Full-length NRL (FL-NRL) predominates in the rod path, whereas truncated NRL (Tr-NRL) does so in the cone path, then similar (but opposite) findings are presented for THRB (Fig 3, 4), whereas isoforms are not a feature of RXRG expression, just the higher expression in cones.
 
 The authors then further subcluster and perform RNA velocity to uncover decision points in the tree (Figure 5). They identify two photoreceptor precursor streams, the Transitional Rods (TRs) that provide one source for rod maturation and (reusing the name from the initial clustering) iPRPs that form cones, but also provide a second route to rods. TR cells closest to RPCs (immediately post-mitotic) have higher levels of the rod determinant NR2E3 and NRL, whereas the higher resolution iPRPs near RPCs lack NR2E3 and have higher levels of ONECUT1, THRB, and GNAT2, a cone bias. These distinct rod-biased TR and cone-biased high-resolution iPRPs were not evident in published scRNAseq with 3′ end-counting (i.e. not FL seq). Regulon analysis confirmed higher NRL activity in TR cells, with higher THRB activity in highresolution iPRP cells.
 
 Many of the more mature high-resolution iPRPs show combinations of rod (GNAT1, NR2E3) and cone (GNAT2, THRB) paths as well as both NRL and THRB regulons, but with a bias towards cone-ness (Figure 6). Combined FISH/immunofluorescence in fetal retina uncovers cone-biased RXRG-protein-high/NR2E3-protein-absent cone-fated cells that nevertheless expressed NR2E3 mRNA. Thus early cone-biased iPRP cells express rod gene mRNA, implying a rod-cone hybrid in early photoreceptor development. The authors refer to these as "bridge region iPRP cells".
 
 In Figure 7, they identify CHRNA1 as the most specific marker of these bridge cells (overlapping with ATOH7 and DLL3, previously linked to cone-biased precursors), and FISH shows it is expressed in rod-biased NRL protein-positive and cone-biased RXRG proteinpositive cones at fetal week 12.
 
 Figure 8 outlines the graded expression of various lncRNAs during cone maturation, a novel pattern.
 
 Finally (Figure 9), the authors identify differential genes expressed in early rods (ER cluster from Figure 1) vs early cones (LM cluster, excluding the most mature opsin+ cells), revealing high levels of MYCN targets in cones. They also find SYK expression in cones. SYK was previously linked to retinoblastoma, so intrinsic expression may predispose cone precursors to transformation upon RB loss. They finish by showing that a SYK inhibitor blocks the proliferation of dividing RB1 knockdown cone precursors in the human fetal retina.
 
 Overall, the authors have uncovered interesting patterns of biased expression in cone/rod developmental paths, especially relating to the isoform differences for NRL and THRB which add a new layer to our understanding of this fate choice. The analyses also imply that very soon after RPCs exit the cell cycle, they generate post-mitotic precursors biased towards a rod or cone fate, that carry varying proportions of mixed rod/cone determinants and other rod/cone marker genes. They also introduce new markers that may tag key populations of cells that precede the final rod/cone choice (e.g. CHRNA1), catalogue a new lncRNA gradient in cone maturation, and provide insight into potential genes that may contribute to retinoblastoma initiation, like SYK, due to intrinsic expression in cone precursors. However, as detailed below, the text needs to be improved considerably, and overinterpretations need to be moderated, removed, or tested more rigorously with extra data.
 
 Major Comments
 
 The manuscript is very difficult to follow. The nomenclature is at times torturous, and the description of hybrid rod/cone hybrid cells is confusing in many aspects.
 
 (1) A single term, iPRP, is used to refer to an initial low-resolution cluster, and then to a subset of that cluster later in the paper.
 
 We agree that using immature photoreceptor precursor (iPRP) for both high-resolution and lowresolution clusters was confusing. We kept this name for the low-resolution cluster (which includes both immature cone and immature rod precursors), renamed the high-resolution iPRP cluster immature cone precursors (iCPs). and renamed their transitional rod (TR) counterparts immature rod precursors (iRPs). These designations are based on
 
 - the biased expression of THRB, ONECUT1, and the THRB regulon in iCPs (Fig. 5D,E);
 
 - the biased expression of NRL, NR2E3, and NRL regulon iRPs (Fig. 5D,E);
 
 - the partially distinct iCP and iRP UMAP positions (Figure 5C); and
 
 - the evidence of similar immature cone versus rod precursor populations in the Zuo et al 3’ snRNA-seq dataset, as noted below and described in two new paragraphs starting at the bottom of p. 12.
 
 (2) To complicate matters further, the reader needs to understand the subset within the iPRP referred to as bridge cells, and we are told at one point that the earliest iPRPs lack NR2E3, then that they later co-express NR2E3, and while the authors may be referring to protein and RNA, it serves to further confuse an already difficult to follow distinction. I had to read and re-read the iPRP data many times, but it never really became totally clear.
 
 We agree that the description of the high-resolution iPRP (now “iCP”) subsets was unclear, although our further analyses of a large 3’ snRNA-seq dataset in Figure S11 support the impression given in the original manuscript that the earliest iCPs lack NR2E3 and then later coexpress NR2E3 while the earliest iRPs lack THRB and then later express THRB. As described in new text in the Two post-mitotic immature photoreceptor precursor populations section (starting on line 7 of p. 13):
 
 When considering only the main cone and rod precursor UMAP regions, early (pcw 8 – 13) cone precursors expressed THRB and lacked NR2E3 (Figure S11D,E, blue arrows), while early (pcw 10 – 15) rod precursors expressed NR2E3 and lacked THRB (Figure S11D,E, red arrows), similar to RPC-localized iCPs and iRPs in our study (Figure 5D).
 
 Next, as summarized in new text in the Early cone and rod precursors with rod- and conerelated RNA co-expression section (new paragraph at top of p. 16):
 
 Thus, a 3’ snRNA-seq analysis confirmed the initial production of immature photoreceptor precursors with either L/M cone-precursor-specific THRB or rod-precursor-specific NR2E3 expression, followed by lower-level co-expression of their counterparts, NR2E3 in cone precursors and THRB in rod precursors. However, in the Zuo et al. analyses, the co-expression was first observed in well-separated UMAP regions, as opposed to a region that bridges the early cone and early rod populations in our UMAP plots. These findings are consistent with the notion that cone- and rod-related RNA co-expression begins in already fate-determined cone and rod precursors, and that such precursors aberrantly intermixed in our UMAP bridge region due to their insufficient representation in our dataset.
 
 Importantly, and as noted in our ‘Public response’ to Reviewer 1, “CHRNA1 appears to mark immature cone precursors that are distinct from the maturing cone and rod precursors that coexpress cone- and rod-related RNAs (despite the similar UMAP positions of the two populations in our dataset).” In support of this notion, the immature cone precursors expressing CHRNA1 and other populations did not overlap in UMAP space in the Zuo et al dataset. We hope the new text cited above along with other changes will significantly clarify the observations.
 
 (3) The term "cone/rod precursor" shows up late in the paper (page 12), but it was clear (was it not?) much earlier in this manuscript that cone and rod genes are co-expressed because of the coexpressed NRL and THRB isoforms in Figures 3/4.
 
 We thank the reviewer for noting that the differential NRL and THRB isoform expression already implies that cone and rod genes are co-expressed. However, as we now state, the co-expression of RNAs encoding an additional cone marker (GNAT2) and rod markers (GNAT1, NR2E3) was
 
 “suggestive of a proposed hybrid cone/rod precursor state more extensive than implied by the coexpression of different THRB and NRL isoforms” (first paragraph of “Early cone and rod …” section on p. 14; new text underlined).
 
 (4) The (incorrect) impression given later in the manuscript is that the rod/cone transcript mixture applies to just a subset of the iPRP cells, or maybe just the bridge cells (writing is not clear), but actually, neither of those is correct as the more abundant and more mature LM and ER populations analyzed earlier coexpress NRL and THRB mRNAs (Figures 2, 3). Overall, the authors need to vastly improve the writing, simplify/clarify the nomenclature, and better label figures to match the text and help the reader follow more easily and clearly. As it stands, it is, at best, obtuse, and at worst, totally confusing.
 
 We thank the reviewer for bringing the extent of the confusing terminology and wording to our attention. We revised the terminology (as in our response to point 1) and extensively revised the text. We also performed similar analyses of the Zuo et al. data (as described in more detail in our response to Reviewer 2), which clarifies the distinct status of cells with the “rod/cone transcript mixture” and cells co-expressing early cone and rod precursor markers.
 
 To more clearly describe data related to cells with rod- and cone-related RNA co-expression, we divided the former Figure 6 into two figures, with Figure 6 now showing the cone- and rodrelated RNA co-expression inferred from scRNA-seq and Figure 7 showing GNAT2 and NR2E3 co-expression in FISH analyses of human retina plus a new schematic in the new panel 7E.
 
 To separate the conceptually distinct analyses of cone and rod related RNA co-expression and the expression of early photoreceptor precursor markers (which were both found in the so-called bridge region – yet now recognized to be different subpopulations), we separated the analyses of the early photoreceptor precursor markers to form a new section, “Developmental expression of photoreceptor precursor markers and fate determinants,” starting on p. 16.
 
 Additionally, we further review the findings and their implications in four revised Discussion paragraphs starting at the bottom of p. 23).
 
 (5) The data showing that overexpressing Tr-NRL in murine NIH3T3 fibroblasts blocks FL-NRL function is presented at the end of page 7 and in Figure 3G. Subsequent analysis two paragraphs and two figures later (end page 8, Figure 5C + supp figs) reveal that Tr-NRL protein is not detectable in retinoblastoma cells which derive from cone precursors cells and express Tr-NRL mRNA, and the protein is also not detected upon lentiviral expression of Tr-NRL in human fetal retinal explants, suggesting it is unstable or not translated. It would be preferable to have the 3T3 data and retinoblastoma/explant data juxtaposed. E.g. they could present the latter, then show the 3T3 that even if it were expressed (e.g. briefly) it would interfere with FL-NRL. The current order and spacing are somewhat confusing.
 
 We thank the reviewer for this suggestion and moved the description of the luciferase assays to follow the retinoblastoma and explant data and switched the order of Figure panels 3G and 3H.
 
 (6) On page 15, regarding early rod vs early cone gene expression, the authors state: "although MYCN mRNA was not detected....", yet on the volcano plot in Figure S14A MYCN is one of the marked genes that is higher in cones than rods, meaning it was detected, and a couple of sentences later: "Concordantly, the LM cluster had increased MYCN RNA". The text is thus confusing.
 
 With respect, we note that the original text read, “although MYC RNA was not detected,” which related to a statement in the previous sentence that the gene ontology analysis identified “MYC targets.” However, given that this distinction is subtle and may be difficult for readers to recognize, we revised the text (now on p. 19) to more clearly describe expression of MYCN (but not MYC) as follows:
 
 “The upregulation of MYC target genes was of interest given that many MYC target genes are also targets of MYCN, that MYCN protein is highly expressed in maturing (ARR3+) cone precursors but not in NRL+ rods (Figure 10A), and that MYCN is critical to the cone precursor proliferative response to pRB loss8–10. Indeed, whereas MYC RNA was not detected, the LM cone cluster had increased MYCN RNA …”
 
 (7) The authors state that the SYK drug is "highly specific". They provide no evidence, but no drug is 100% specific, and it is possible that off-target hits are important for the drug phenotype. This data should be removed or validated by co-targeting the SYK gene along with RB1.
 
 We agree that our data only show the potential for SYK to contribute to the cone proliferative response; however, we believe the inhibitor study retains value in that a negative result (no effect of the SYK inhibitor) would disprove its potential involvement. To reflect this, we changed wording related to this experiment as follows:
 
 In the Abstract, we changed:
 
 (1) “SYK, which contributed to the early cone precursors’ proliferative response to RB1 loss” To: “SYK, which was implicated in the early cone precursors’ proliferative response to RB1 loss.”
 
 (2) “These findings reveal … and a role for early cone-precursor-intrinsic SYK expression.” To: “These findings reveal … and suggest a role for early cone-precursor-intrinsic SYK expression.”
 
 In the last paragraph of the Results, we changed:
 
 (1) “To determine if SYK contributes…” To: “To determine if SYK might contribute…”
 
 (2) “the highly specific SYK inhibitor” To: “the selective SYK inhibitor”
 
 (3) “indicating that cone precursor intrinsic SYK activity is critical to the proliferative response” To: “consistent with the notion that cone precursor intrinsic SYK activity contributes to the proliferative response.”
 
 In the Results, we added a final sentence:
 
 “However, given potential SYK inhibitor off-target effects, validation of the role of SYK in retinoblastoma initiation will require genetic ablation studies.”
 
 In the Discussion (2nd-to-last paragraph), we changed:
 
 “SYK inhibition impaired pRB-depleted cone precursor cell cycle entry, implying that native SYK expression rather than de novo induction contributes to the cone precursors’ initial proliferation.” To: “…the pRB-depleted cone precursors’ sensitivity to a SYK inhibitor suggests that native SYK expression rather than de novo induction contributes to the cone precursors’ initial proliferation, although genetic ablation of SYK is needed to confirm this notion.” In the Discussion last sentence, we changed:
 
 “enabled the identification of developmental stage-specific cone precursor features that underlie retinoblastoma predisposition.” To: “enabled the identification of developmental stage-specific cone precursor features that are associated with the cone precursors’ predisposition to form retinoblastoma tumors.”
 
 Minor/Typos
 
 Figure 7 legend, H should be D.
 
 We corrected the figure legend (now related to Figure 8).
 
 Reviewer #2 (Recommendations for the authors):
 
 (1) The author should take advantage of recently published human fetal retina data, such as PMID:39117640, which includes a larger dataset of cells that could help validate the findings. Consequently, statements like "To our knowledge, this is the first indication of two immediately post-mitotic photoreceptor precursor populations with cone versus rod-biased gene expression" may need to be revised.
 
 We thank the reviewer for noting the evidence of distinct immediately post-mitotic rod and cone populations published by others after we submitted our manuscript. In response, we omitted the sentence mentioned and extensively cross-checked our results including:
 
 - comparison of our early versus late cone and rod maturation states to the cone and rod precursor versus cone and rod states identified by Zuo et al (new paragraph on the top half of p. 6 and new figure panels S3G,H);
 
 - detection of distinct immediately post-mitotic versus later cone and rod precursor populations (two new paragraphs on pp. 12-13 and new Figures S10B and S11A-E);
 
 - identification of cone and rod precursor populations that co-express cone and rod marker genes (two new paragraphs starting at the bottom of p. 15 and new Figures S11D-F);
 
 - comparison of expression patterns of immature cone precursor (iCP) marker genes in our and the Zuo et al dataset (new paragraph on top half of p. 17 and new Figure S13).
 
 We also compare the cell states discerned in our study and the Zuo et al. study in a new Discussion paragraph (bottom of p. 23) and new Figure S17.
 
 (2) The data generated comes from dissociated cells, which inherently lack spatial context. Additionally, it is unclear whether the dataset represents a pool of retinas from multiple developmental stages, and if so, whether the developmental stage is known for each cell profiled. If this information is available, the authors should examine the distribution of developmental stages on the UMAP and trajectory analysis as part of the quality control process.
 
 We thank the reviewer for highlighting the importance of spatial context and developmental stage.
 
 Related to whether the dataset represents a pool of retinae from multiple developmental stages, the different cell numbers examined at each time point are indicated in Figure S1A. To draw the readers’ attention to this detail, Figure S1A is now cited in the first sentence of the Results.
 
 Related to the age-related cell distributions in UMAP plots, the distribution of cells from each retina and age was (and is) shown in Fig. S1F. In addition, we now highlight the age distributions by segregating the FW13, FW15-17, and FW17-18-19 UMAP positions in the new Figure 1C. We describe the rod temporal changes in a new sentence at the top of p. 5:
 
 “Few rods were detected at FW13, whereas both early and late rods were detected from FW15-19 (Figure 1C), corroborating prior reports [15,20].”
 
 We describe the cone temporal changes and note the likely greater discrimination of cell state changes that would be afforded by separately analyzing macula versus peripheral retina at each age in a new sentence at the bottom of p. 5:
 
 “L/M cone precursors from different age retinae occupied different UMAP regions, suggesting age-related differences in L/M cone precursor maturation (Figure 1C).”
 
 Moreover, they should assess whether different developmental stages impact gene expression and isoform ratios. It is well established that cone and rod progenitors typically emerge at different developmental times and in distinct regions of the retina, with minimal physical overlap. Grouping progenitor cells based solely on their UMAP positioning may lead to an oversimplified interpretation of the data.
 
 (2a) We agree that different developmental stages may impact gene expression and isoform ratios, and evaluated stages primarily based on established Louvain clustering rather than UMAP position. However, we also used UMAP position to segregate so-called RPC-localized and nonRPC-localized iCPs and iRPs, as well as to characterize the bridge region iCP sub-populations. In the revision, we examine whether cell groups defined by UMAP positions helped to identify transcriptomically distinct populations and further examine the spatiotemporal gene expression patterns of the same genes in the Zuo et al. 3’ snRNA-seq dataset.
 
 (2b) Related to analyses of immediately post-mitotic iRPs and iCPs, the new Figure S10A expanded the violin plots first shown in Figure 5D to compare gene expression in RPC-localized versus non-RPC-localized iCPs and iRPs and subsequent cone and rod precursor clusters (also presented in response to Reviewer 3). The new Figure S10C, shows a similar analysis of UMAP region-specific regulon activities. These figures support the idea that there are only subtle UMAP region-related differences in the expression of the selected gene and regulons.
 
 To further evaluate early cone and rod precursors, we compared expression patterns in our cluster- and UMAP-defined cell groups to those of the spatiotemporally defined cell groups in the Zuo et al. 3’ snRNA-seq study. The results revealed similar expression timing of the genes examined, although the cluster assignments of a subset of cells were brought into question, especially the assigned rod precursors at pcw 10 and 13, as shown in new Figures S10B (grey columns) and S11, and as described in two new paragraphs starting near the bottom of p.12.
 
 (2c) Related to analyses of iCPs in the so-called bridge region, our analyses of the Zuo et al dataset helped distinguish early cone and rod precursor populations (expressing early markers such as ATOH7 and CHRNA1) from the later stages exhibiting rod- and cone-related gene coexpression, which had intermixed in the UMAP bridge region in our dataset. Further parsing of early cone precursor marker spatiotemporal expression revealed intriguing differences as now described in the second half of a new paragraph at the top of p. 17, as follows:
 
 “Also, different iCP markers had different spatiotemporal expression: CHRNA1 and ATOH7 were most prominent in peripheral retina with ATOH7 strongest at pcw 10 and CHRNA1 strongest at pcw 13; CTC-378H22.2 was prominently expressed from pcw 10-13 in both the macula and the periphery; and DLL3 and ONECUT1 showed the earliest, strongest, and broadest expression (Figure S13B). The distinct patterns suggest spatiotemporally distinct roles for these factors in cone precursor differentiation.”
 
 (3) I would commend the authors for performing a validation experiment via RNA in situ to validate some of the findings. However, drawing conclusions from analyzing a small number of cells can still be dangerous. Furthermore, it is not entirely clear how the subclustering is done. Some cells change cell type identities in the high-resolution plot. For example, some iPRP cells from the low-resolution plots in Figure 1 are assigned as TR in high-resolution plots in Figure 5.
 
 The authors should provide justification on the identifies of RPC localized iPRP and TR.
 
 Comparison of their data with other publicly available data should strengthen their annotation
 
 We agree that drawing conclusions from scRNA-seq or in situ hybridization analysis of a small number of cells can be dangerous and have followed the reviewer’s suggestion to compare our data with other publicly available data, focusing on the 3’ snRNA-seq of Zuo et al. given its large size and extensive annotation. Our analysis of the Zuo et al. dataset helped clarify cell identities by segregating cone and rod precursors with similar gene expression properties in distinct UMAP regions. However, we noted that the clustering of early cone and rod precursors likely gave numerous mis-assigned cells (as noted in response 2b above and shown in the new Figure S11). It would appear that insights may be derived from the combination of relatively shallow sequencing of a high number of cells and deep sequencing of substantially fewer cells.
 
 Related to how subclustering was done, the Methods state, “A nearest-neighbors graph was constructed from the PCA embedding and clusters were identified using a Louvain algorithm at low and high resolutions (0.4 and 1.6)[70],” citing the Blondel et al reference for the Louvain clustering algorithm used in the Seurat package. To clarify this, the results text was revised such that it now indicates the levels used to cluster at low resolution (0.4, p. 4, 2nd paragraph) and at high resolution (1.6, top of p. 11) .
 
 Related to the assignment of some iPRP cells from the low-resolution plots in Figure 1 to the TR cluster (now called the ‘iRP’ ‘cluster) in the high-resolution plots in Figure 5, we suggest that this is consistent with Louvain clustering, which does not follow a single dendrogram hierarchy.
 
 The justification for referring to these groups as RPC-localized iCPs and iRPs relates to their biased gene and regulon expression in Fig. 5D and 5E, as stated on p. 12:
 
 “In the RPC-localized region, iCPs had higher ONECUT1, THRB, and GNAT2, whereas iRPs trended towards higher NRL and NR2E3 (p= 0.19, p=0.054, respectively).”
 
 (4) Late-stage LM5 cluster Figure 9 is not defined anywhere in previous figures, in which LM clusters only range from 1 to 4. The inconsistency in cluster identification should be addressed.
 
 We revised the text related to this as follows:
 
 “Indeed, our scRNA-seq analyses revealed that SYK RNA expression increased from the iCP stage through cluster LM4, in contrast to its minimal expression in rods (Figure 10E). Moreover, SYK expression was abolished in the five-cell group with properties of late maturing cones (characterized in Figure 1E), here displayed separately from the other LM4 cells and designated LM5 (Figure 10E).” (p. 19-20)
 
 (5) Syk inhibitor has been shown to be involved in RB cell survival in previous studies. The manuscript seems to abruptly make the connection between the single-cell data to RB in the last figure. The title and abstract should not distract from the bulk of the manuscript focusing on the rod and cone development, or the manuscript should make more connection to retinoblastoma.
 
 We appreciate the reviewer’s concern that the title may seem to over-emphasize the connection to retinoblastoma based solely on the SYK inhibitor studies. However, we suggest the title also emphasizes the identification and characterization of early human photoreceptor states, per se, and that there are a number of important connections beyond the SYK studies that could warrant the mention of cell-state-specific retinoblastoma-related features in the title.
 
 Most importantly, a prior concern with the cone cell-of-origin theory was that retinoblastoma cells express RNAs thought to mark retinal cell types other than cones, especially rods. The evidence presented here, that cone precursors also express the rod-related genes helps resolve this issue. The issue is noted numerous times in the manuscript, as follows:
 
 In the Introduction, we write:
 
 “However, retinoblastoma cells also express rod lineage factor NRL RNAs, which – along with other evidence – suggested a heretofore unexplained connection between rod gene expression and retinoblastoma development[12,13]. Improved discrimination of early photoreceptor states is needed to determine if co-expression of rod- and cone-related genes is adopted during tumorigenesis or reflects the co-expression of such genes in the retinoblastoma cell of origin.” (bottom, p. 2) And:
 
 “In this study, we sought to further define the transcriptomic underpinnings of human photoreceptor development and their relationship to retinoblastoma tumorigenesis.” (last paragraph, p. 3)
 
 The Discussion also alluded to this issue and in the revised Discussion, we aimed to make the connection clearer. We previously ended the 3rd-to-last paragraph with,
 
 “iPRP [now iCP] and early LM cone precursors’ expression of NR2E3 and NRL RNAs suggest that their presence in retinoblastomas[12,13] reflects their normal expression in the L/M cone precursor cells of origin.”
 
 We now separate and elaborate on this point in a new paragraph as follows:
 
 “Our characterization of cone and rod-related RNA co-expression may help resolve questions about the retinoblastoma cell of origin. Past studies suggested that retinoblastoma cells co-express RNAs associated with rods, cones, or other retinal cells due to a loss of lineage fidelity[12]. However, the early L/M cone precursors’ expression of NR2E3 and NRL RNAs suggest that their presence in retinoblastomas[12,13] reflects their normal expression in the L/M cone precursor cells of origin. This idea is further supported by the retinoblastoma cells’ preferential expression of cone-enriched NRL transcript isoforms (Figure S5B).” (middle of p. 24) Based on the above, we elected to retain the title.
 
 Minor comments:
 
 (1) It is difficult to see the orange and magenta colors in the Fig 3E RNA-FISH image. The colors should be changed, or the contrast threshold needs to be adjusted to make the puncta stand out more.
 
 We re-assigned colors, with red for FL-NRL puncta and green for Tr-NRL puncta.
 
 (2) Figure 5C on page 8 should be corrected to Supplementary Figure 5C.
 
 We thank the reviewer for noting this error and changed the figure citation.
 
 Reviewer #3 (Recommendations for the authors):
 
 (1) Minor concerns
 
 a. Abbreviation of some words needs to be included, example: FW.
 
 We now provide abbreviation definitions for FW and others throughout the manuscript.
 
 b. Cat # does not matches with the 'key resource table' for many reagents/kits. Some examples are: CD133-PE mentioned on Page # 22 on # 71, SMART-Seq V4 Ultra Low Input RNA Kit and SMARTer Ultra Low RNA Kit for the Fluidigm C1 Sytem on Page # 22 on # 77, Nextera XT DNA Library preparation kit on Page # 23 on # 77.
 
 We thank the reviewer for noting these discrepancies. We have now checked all catalog numbers and made corrections as needed.
 
 c. Cat # and brand name of few reagents & kits is missing and not mentioned either in methods or in key resource table or both. Eg: FBS, Insulin, Glutamine, Penicillin, Streptomycin, HBSS, Quant-iT PicoGreen dsDNA assay, Nextera XT DNA LibraryPreparation Kit, 5' PCR Primer II A with CloneAmp HiFi PCR Premix.
 
 Catalog numbers and brand names are now provided for the tissue culture and related reagents within the methods text and for kits in the Key Resources Table. Additional descriptions of the primers used for re-amplification and RACE were added to the Methods (p. 28-29).
 
 d. Spell and grammar check is needed throughout the manuscript is needed. Example. In Page # 46 RXRγlo is misspelled as RXRlo.
 
 Spelling and grammar checks were reviewed.
 
 (2) Methods & Key Resource table.
 
 a. In Page # 21, IRB# needs to be stated.
 
 The IRB protocols have been added, now at top of p. 26.
 
 b. In Page # 21, Did the authors dissociate retinae in ice-cold phosphate-buffered saline or papain?
 
 The relevant sentence was corrected to “dissected while submerged in ice-cold phosphatebuffered saline (PBS) and dissociated as described10.” ( p. 26)
 
 c. In Page # 21, How did the authors count or enumerate the cell count? Provide the details.
 
 We now state, “… a 10 µl volume was combined with 10 µl trypan blue and counted using a hemocytometer” (top of p. 27)
 
 d. Why did the authors choose to specifically use only 8 cells for cDNA preparation in Page # 22? State the reason and provide the details.
 
 The reasons for using 8 cells (to prevent evaporation and to manually transfer one slide-worth of droplets to one strip of PCR tubes) and additional single cell collection details are now provided as follows (new text underlined):
 
 “Single cells were sorted on a BD FACSAria I at 4°C using 100 µm nozzle in single-cell mode into each of eight 1.2 µl lysis buffer droplets on parafilm-covered glass slides, with droplets positioned over pre-defined marks … . Upon collection of eight cells per slide, droplets were transferred to individual low-retention PCR tubes (eight tubes per strip) (Bioplastics K69901, B57801) pre-cooled on ice to minimize evaporation. The process was repeated with a fresh piece of parafilm for up to 12 rounds to collect 96 cells). (p. 27, new text underlined)
 
 e. Key resource table does not include several resources used in this study. Example - NR2E3 antibody.
 
 We added the NR2E3 antibody and checked for other omissions.
 
 (3) Results & Figures & Figure Legends
 
 a. Regulon-defined RPC and photoreceptor precursor states
 
 i. On page # 4, 1 paragraph - Clarify the sentence 'Exclusion of all cells with <100,000 cells read and 18 cells.........Emsembl transcripts inferred'. Did the authors use 18 cells or 18FW retinae?
 
 The sentence was changed to:
 
 “After sequencing, we excluded all cells with <100,000 read counts and 18 cells expressing one or more markers of retinal ganglion, amacrine, and/or horizontal cells (POU4F1, POU4F2, POU4F3, TFAP2A, TFAP2B, ISL1) and concurrently lacking photoreceptor lineage marker OTX2. This yielded 794 single cells with averages of 3,750,417 uniquely aligned reads, 8,278 genes detected, and 20,343 Ensembl transcripts inferred (Figure S1A-C).” (p. 4, new words underlined)
 
 To clarify that 18 retinae were used, the first sentence of the Results was revised as follows:
 
 “To interrogate transcriptomic changes during human photoreceptor development, dissociated RPCs and photoreceptor precursors were FACS-enriched from 18 retinae, ages FW13-19 …” (p. 4).
 
 Why did the authors 'exclude cells lacking photoreceptor lineage marker OTX2' from analysis especially when the purpose here was to choose photoreceptor precursor states & further results in the next paragraph clearly state that 5 clusters were comprised of cells with OTX2 and CRX expression. This is confusing.
 
 We apologize for the imprecise diction. We divided the evidently confusing sentence into two sentences to more clearly indicate that we removed cells that did not express OTX2, as in the first response to the previous question.
 
 ii. In Page # 5, the authors reported the number of cell populations (363 large and 5 distal) identified in the THRB+ L/M-cone cluster. What were the # of cell populations identified in the remaining 5 clusters of the UMAP space?
 
 We added the cell numbers in each group to Fig. 1B. We corrected the large LM group to 366 cells (p. 5) and note 371 LM cells , which includes the five distal cells, in Figure 1B.
 
 b. Differential expression of NRL and THRB isoforms in rod and cone precursors
 
 i. In Figure 3B, the authors compare and show the presence of 5 different NRL isoforms for all the 6 clusters that were defined in 3A. However, in the results, the ENST# of just 2 highly assigned transcript isoforms is given. What are the annotated names of the three other isoforms which are shown in 3B? Please explain in the Results.
 
 As requested, we now annotate the remaining isoforms as encoding full-length or truncated NRL in Fig. 3B and show isoform structures in new Supplementary Figure S4B. We also refer to each transcript isoform in the Results (p. 7, last paragraph) and similarly evaluate all isoforms in RB31 cells (Fig. S5B).
 
 ii. What does the Mean FPM in the y-axis of Fig 3C refer to?
 
 Mean FPM represents mean read counts (fragments per million, FPM) for each position across Ensembl NRL exons for each cluster, as now stated in the 6th line of the Fig. 3 legend.
 
 iii. A clear explanation of the results for Figures 3E-3F is missing.
 
 We revised the text to more clearly describe the experiment as follows:
 
 “The cone cells’ higher proportional expression of Tr-NRL first exon sequences was validated by RNA fluorescence in situ hybridization (FISH) of FW16 fetal retina in which NRL immunofluorescence was used to identify rod precursors, RXRg immunofluorescence was used to identify cone precursors, and FISH probes specific to truncated Tr-NRL exon 1T or FL-NRL exons 1 and 2 were used to assess Tr-NRL and FL-NRL expression (Figure 3E,F).” (p. 8, new text underlined).
 
 c. Two post-mitotic photoreceptor precursor populations
 
 i. Although deep-sequencing and SCENIC analysis clarified the identities of four RPC-localized clusters as MG, RPC, and iPRP indicative of cone-bias and TR indicative of rod-bias. It would be interesting to see the discriminating determinant between the TR and ER by SCENIC and deep-sequencing gene expression violin/box plots.
 
 We agree it is of interest to see the discriminating determinant between the TR [now termed iRP] and ER clusters by SCENIC and deep-sequencing gene expression violin/box plots. We now provide this information for selected genes and regulons of interest in the new Supplementary Figures S10A and S10C, along with a similar comparison between the prior high-resolution iPRP (now termed iCP) cluster and the first high-resolution LM cluster, LM1, as described for gene expression on p. 12:
 
 “Notably, THRB and GNAT2 expression did not significantly change while ONECUT1 declined in the subsequent non-RPC-localized iCP and LM1 stages, whereas NR2E3 and NRL dramatically increased on transitioning to the ER state (Figure S10A).”
 
 And as described for regulon activities on pp. 13-14:
 
 “Finally, activities of the cone-specific THRB and ISL2 regulons, the rod-specific NRL regulon, and the pan-photoreceptor LHX3, OTX2, CRX, and NEUROD1 regulons increased to varying extents on transitioning from the immature iCP or iRP states to the early-maturing LM1 or ER states (Figure 10C).”
 
 We also show expression of the same genes for spatiotemporally grouped cells from the Zuo et al. dataset in the new Figure S10B, which displays a similar pattern (apart from the possibly mixed pcw 10 and pcw13 designated rod precursors).
 
 d. Early cone precursors with cone- and rod-related RNA expression
 
 i. On page #12, the last paragraph where the authors explain the multiplex RNA FISH results of RXRγ and NR2E3 by citing Figure S8E. However, in Fig S8E, the authors used NRL to identify the rods. Please clarify which one of the rod markers was used to perform RNA FISH?
 
 Figure S8E (where NRL was used as a rod marker) was cited to remind readers that RXRg has low expression in rods and high expression in cones, rather than to describe the results of this multiplex FISH section. To avoid confusion on this point, Figure S8E is now cited using “(as earlier shown in Figure S8E).” With this issue clarified, we expect the markers used in the FISH + IF analysis will be clear from the revised explanation,
 
 “… we examined GNAT2 and NR2E3 RNA co-expression in RXRg+ cone precursors in the outermost NBL and in RXRg+ rod precursors in the middle NBL … .” (p. 14-15).
 
 To provide further clarity, we provide a diagram of the FISH probes, protein markers, and expression patterns in the new Figure 7E.
 
 ii. The Y-axis of Fig 6G-6H needs to be labelled.
 
 The axes have been re-labeled from “Nb of cells” to “Number of RXRg+ outermost NBL cells in each region” (original Fig. 6G, now Fig. 7C) and “Number of RXRg+ middle NBL cells in each region” (original Fig. 6H, now Fig. 7D).
 
 iii. The legends of Figures 6G and 6H are unclear. In the Figure 6G legend, the authors indicate 'all cells are NR2E3 protein-'. Does that imply the yellow and green bars alone? Similarly, clarify the Figure 6H legend, what does the dark and light magenta refer to? What does the light magenta color referring to NR2E3+/ NR2E3- and the dark magenta color referring to NR2E3+/ NR2E3+ indicate?
 
 We regret the insufficient clarity. We revised the Fig. 6G (now Fig. 7C) key, which now reads
 
 “All outermost NBL cells are NR2E3 protein-negative.” We added to the figure legend for panel 7C,D “(n.b., italics are used for RNAs, non-italics for proteins).” The new scheme in Figure 7E shows the RNAs in italics proteins in non-italics. We hope these changes will clarify when RNA or protein are represented in each histogram category.
 
 Overall, the results (on page # 13) reflecting Figures 6E-6H & Figure S11 are confusing and difficult to understand. Clear descriptions and explanations are needed.
 
 We revised this results section described in the paragraph now spanning p. 14:
 
 - We now refer to the bar colors in Figures 7C and 7D that support each statement.
 
 - We provide an illustration of the findings in Figure 7E.
 
 iv. Previously published literature has shown that cells of the inner NBL are RXRγ+ ganglion cells. So, how were these RXRγ+ ganglion cells in the inner NBL discriminated during multiplex RNA FISH (in Fig 6E-6H and in Fig S11)?
 
 We thank the reviewer for requesting this clarification. We agree that “inner NBL” is the incorrect term for the region in which we examined RXRg+ photoreceptor precursors, as this could include RXRγ+ nascent RGCs. We now clarify that
 
 “we examined GNAT2 and NR2E3 RNA co-expression in RXRg+ cone precursors in the outermost NBL and in RXRg+ rod precursors in the middle NBL … .” (p. 14-15) We further state,
 
 “Limiting our analysis to the outer and middle NBL allowed us to disregard RXRγ+ retinal ganglion cells in the retinal ganglion cell layer or inner NBL (top of p. 15)”
 
 Figure 7E is provided to further aid the reader in understanding the positions examined, and the legend states “RXRg+ retinal ganglion cells in the inner NBL and ganglion cell layer not shown.
 
 v. In Figure 6E, what marker does each color cell correspond to?
 
 In this figure (now panel 7A), we declined to provide the color key since the image is not sufficiently enlarged to visualize the IF and FISH signals. The figure is provided solely to document the regions analyzed and readers are now referred to “see Figure S12 for IF + FISH images” (2nd line, p. 15), where the marker colors are indicated.
 
 vi. In Figure S11 & 6E, Protein and RNA transcript color of NR2E3, GNAT2 are hard to distinguish. Usage of other colors is recommended.
 
 We appreciate the reviewer’s concern related to the colors (in the now redesignated Figure S12 and 7A); however, we feel this issue is largely mitigated by our use of arrows to point to the cells needed to illustrate the proposed concepts in Figure S12B. All quantitation was performed by examining each color channel separately to ensure correct attribution, which is now mentioned in the Methods (2nd-to-last line of Quantitation of FISH section, p. 35).
 
 vii.
 
 With due respect, we suggest that labeling each box (now in Figure 8B) makes the figure rather busy and difficult to infer the main point, which is that boxed regions were examined at various distanced from the center (denoted by the “C” and “0 mm”) with distances periodically indicated. We suggest the addition of such markers would not improve and might worsen the figure for most readers.
 
 e. An early L/M cone trajectory marked by successive lncRNA expression
 
 i. In Figure 8C - color-coded labelling of LM1-4 clusters is recommended.
 
 We note Fig. 8C (now 9C) is intended to use color to display the pseudotemporal positions of each cell. We recognize that an additional plot with the pseudotime line imposed on LM subcluster colors could provide some insights, yet we are unaware of available software for this and are unable to develop such software at present. To enable readers to obtain a visual impression of the pseudotime vs subcluster positions, we now refer the reader to Figure 5A in the revised figure legend, as follows: (“The pseudotime trajectory may be related to LM1-LM4 subcluster distributions in Figure 5A.”).
 
 ii. In Figure 8G - what does the horizontal color-coded bar below the lncRNAs name refer to? These bars are similar in all four graphs of the 8G figure.
 
 As stated in the Fig. 8G (now 9G) legend, “Colored bars mark lncRNA expression regions as described in the text.” We revised the text to more clearly identify the color code. (p. 18-19)
 
 f. Cone intrinsic SYK contributions to the proliferative response to pRB loss
 
 i. In Fig 9F - The expression of ARR3+ cells (indicated by the green arrow in FW18) is poorly or rarely seen in the peripheral retina.
 
 We thank the reviewer for finding this oversight. In panel 9F (now 10F), we removed the green arrows from the cells in the periphery, which are ARR3- due to the immaturity of cones in this region.
 
 ii. In Figure 9F - Did the authors stain the FW16 retina with ARR3?
 
 Unfortunately, we did not stain the FW16 retina for ARR3 in this instance.
 
 iii. Inclusion of DAPI staining for Fig 9F is recommended to justify the ONL & INL in the images.
 
 We regret that we are unable to merge the DAPI in this instance due to the way in which the original staining was imaged. A more detailed analysis corroborating and extending the current results is in progress.
 
 iv. Immunostaining images for Figure 9G are missing & are required to be included. What does shSCR in Fig 9G refer to?
 
 We now provide representative immunostaining images below the panel (now 10G). The legend was updated: “Bottom: Example of Ki67, YFP, and RXRg co-immunostaining with DAPI+ nuclei (yellow outlines). Arrows: Ki67+, YFP+, RXRg+ nuclei.” The revised legend now notes that shSCR refers to the scrambled control shRNA.
 
 v. For Figure 9H - Is the presence and loss of SYK activity consistent with all the subpopulations (S & LM) of early maturing and matured cones?
 
 We appreciate the reviewer’s question and interest (relating to the redesignated Figure 10H); however, we have not yet completed a comprehensive evaluation of SYK expression in all the subpopulations (S & LM) of early maturing and matured cones and will reserve such data for a subsequent study. We suggest that this information is not critical to the study’s major conclusions.
 
 vi. Figure 9A is not explained in the results. Why were MYCN proteins assessed along with ARR3 and NRL? What does this imply?
 
 We thank the reviewer for noting that this figure (now Figure 10A) was not clearly described.
 
 As per the response to Reviewer 1, point 6 , the text now states,
 
 “The upregulation of MYC target genes was of interest given that many MYC target genes are also MYCN targets, that MYCN protein is highly expressed in maturing (ARR3+) cone precursors but not in NRL+ rods (Figure 10A), and that MYCN is critical to the cone precursor proliferative response to pRB loss [8–10].” (middle, p. 19, new text underlined).
 
 Hence, the figure demonstrates the cone cell specificity of high MYCN protein. This is further noted in the Fig. 10a legend: “A. Immunofluorescent staining shows high MYCN in ARR3+ cones but not in NRL+ rods in FW18 retina.”
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.02.28.530247v4
www.biorxiv.org www.biorxiv.org

Material Damage to Multielectrode Arrays after Electrolytic Lesioning is in the Noise

3
1. Public_Reviews 09 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  This useful manuscript addresses a stability issue for long-term chronically implanted array recordings and electrolytic lesioning, which is relevant to both basic science and translational research. The authors provide a systematic scanning electron microscopy (SEM) of explanted arrays, evaluating electrode damage and sharing extensive datasets accessible through interactive plots. The strength of the evidence is solid, but it can be improved by performing additional analyses on complementary neurophysiology, functional, or histological data.
  
  Summary
2. Public_Reviews 09 Jun 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This work presents a GUI with SEM images of 8 Utah arrays (8 of which were explanted, and 4 of which were used for creating cortical lesions).
  
  Strengths:
  
  Visual comparison of electrode tips with SEM images, showing that electrolytic lesioning did not appear to cause extra damage to electrodes.
  
  Weaknesses:
  
  Given that the analysis was conducted on explanted arrays, and no functional or behavioural in vivo data or histological data are provided, any damage to the arrays may have occurred after explantation. This makes the results limited and inconclusive ( firstly, that there was no significant relationship between degree of electrode damage and use of electrolytic lesioning, and secondly, that electrodes closer to the edge of the arrays showed more damage than those in the center).
  
  Overall, these results do not add new insight to the field, although they do add more data and reference images.
  
  Review 1
3. Public_Reviews 09 Jun 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  In this study, the authors used scanning electron microscopy (SEM) to image and analyze eleven Utah multielectrode arrays (including eight chronically implanted in four macaques). Four of the eight arrays had previously been used to deliver electrolytic lesions. Each intact electrode was scored in five damage categories. They found that damage disproportionately occurred to the outer edges of arrays. Importantly, the authors conclude that their electrolytic Lesioning protocol does not significantly increase material degradation compared to normal chronic use without lesion. Additionally, the authors have released a substantial public dataset of single-electrode SEM images of explanted Utah arrays.
  
  The paper is well-written and addresses an important stability issue for long-term chronically implanted array recordings and electrolytic lesioning, which is relevant to both basic science and translational research. By comparing lesioning and non-lesioning electrodes on the same array and within the same animal, the study effectively controls for confounds related to the animal and surgical procedures. The shared dataset, accessible via interactive plots, enhances transparency and serves as a valuable reference for future investigations. Below, we outline some major and minor concerns that could help improve the work.
  
  Major concerns:
  
  (1) Electrode impedance is a critical measurement to evaluate the performance of recording electrodes. It would be helpful if the authors could provide pre-explant and post-explant impedance values for each electrode alongside the five SEM damage scores. This would allow the readers to assess how well the morphological scores align with functional degradation.
  
  (2) The lesion parameters differ across experiments and electrodes. It would be helpful if the authors could evaluate whether damage scores (and/or impedance changes) correlate with total charge, current amplitude, duration, or frequency.
  
  Review 2
Visit annotations in context

Tags

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.03.26.645429v1
www.biorxiv.org www.biorxiv.org

Hatching with Numbers: Pre-natal Light Exposure Affects Number Sense and the Mental Number Line in young domestic chicks

4
1. Public_Reviews 09 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  This fundamental study demonstrates how a left-right bias in the relationship between numerical magnitude and space depends on brain lateralization. The evidence is compelling, and the manuscript could be strengthened by improving its contextualization, presentation, and discussion. The results will be of interest to researchers studying numerical cognition, brain lateralization, and cognitive brain development more broadly.
  
  Summary
2. Public_Reviews 09 Jun 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Functional lateralization between the right and left hemispheres is reported widely in animal taxa, including humans. However, it remains largely speculative as to whether the lateralized brains have a cognitive gain or a sort of fitness advantage. In the present study, by making use of the advantages of domestic chicks as a model, the authors are successful in revealing that the lateralized brain is advantageous in the number sense, in which numerosity is associated with spatial arrangements of items. Behavioral evidence is strong enough to support their arguments. Brain lateralization was manipulated by light exposure during the terminal phase of incubation, and the left-to-right numerical representation appeared when the distance between items gave a reliable spatial cue. The light-exposure induced lateralization, though quite unique in avian species, together with the lack of intense inter-hemispheric direct connections (such as the corpus callosum in the mammalian cerebrum), was critical for the successful analysis in this study. Specification of the responsible neural substrates in the presumed right hemisphere is expected in future research. Comparable experimental manipulation in the mammalian brain must be developed to address this general question (functional significance of brain laterality) is also expected.
  
  Review 1
3. Public_Reviews 09 Jun 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This is the first study to show how a L-R bias in the relationship between numerical magnitude and space depends on brain lateralisation, and moreover, how is modulated by in ovo conditions.
  
  Strengths:
  
  Novel methodology for investigating the innateness and neural basis of an L-R bias in the relationship between number and space.
  
  Weaknesses:
  
  I would query the way the experiment was contextualised. They ask whether culture or innate pre-wiring determines the 'left-to-right orientation of the MNL [mental number line]'.
  
  The term, 'Mental Number Line' is an inference from experimental tasks. One of the first experimental demonstrations of a preference or bias for small numbers in the left of space and larger numbers in the right of space, was more carefully described as the spatial-numerical association of response codes - the SNARC effect (Dehaene, S., Bossini, S., & Giraux, P. (1993). The mental representation of parity and numerical magnitude. Journal of Experimental Psychology: General, 122, 371-396).
  
  This has meant that the background to the study is confusing. First, the authors note, correctly, that many other creatures, including insects, can show this bias, though in none of these has neural lateralisation been shown to be a cause. Second, their clever experiment shows that an experimental manipulation creates the bias. If it were innate and common to other species, the experimental manipulation shouldn't matter. There would always be an L-R bias. Third, they seem to be asserting that humans have a left-to-right (L-R) MNL. This is highly contentious, and in some studies, reading direction affects it, as the original study by Dehaene et al showed; and in others, task affects direction (e.g. Bachtold, D., Baumüller, M., & Brugger, P. (1998). Stimulus-response compatibility in representational space. Neuropsychologia, 36, 731-735, not cited). Moreover, a very careful study of adult humans, found no L-R bias (Karolis, V., Iuculano, T., & Butterworth, B. (2011), not cited, Mapping numerical magnitudes along the right lines: Differentiating between scale and bias. Journal of Experimental Psychology: General, 140(4), 693-706). Indeed, Rugani et al claim, incorrectly, that the L-R bias was first reported by Galton in 1880. There are two errors here: first, Galton was reporting what he called 'visualised numerals', which are typically referred to now as 'number forms' - spontaneous and habitual conscious visual representations - not an inference from a number line task. Second, Galton reported right-to-left, circular, and vertical visualised numerals, and no simple left-to-right examples (Galton, F. (1880). Visualised numerals. Nature, 21, 252-256.). So in fact did Bertillon, J. (1880). De la vision des nombres. La Nature, 378, 196-198, and more recently Seron, X., Pesenti, M., Noël, M.-P., Deloche, G., & Cornet, J.-A. (1992). Images of numbers, or "When 98 is upper left and 6 sky blue". Cognition, 44, 159-196, and Tang, J., Ward, J., & Butterworth, B. (2008). Number forms in the brain. Journal of Cognitive Neuroscience, 20(9), 1547-1556.
  
  If the authors are committed to chicks' MN Line they should test a series of numbers showing that the bias to the left is greater for 2 and 3 than for 4, etc.
  
  What does all this mean? I think that the paper should be shorn of its misleading contextualisation, including the term 'Mental Number Line'. The authors also speculate, usefully, on why chicks and other species might have a L-R bias. I don't think the speculations are convincing, but at least if there is an evolutionary basis for the bias, it should at least be discussed.
  
  This paper is very interesting with its focus on why the L-R bias exists, and where and why it does not.
  
  Review 2
4. Public_Reviews 09 Jun 2025
  
  in eLife
  
  Author response:
  
  Reviewer #1 (Public review):
  
  Functional lateralization between the right and left hemispheres is reported widely in animal taxa, including humans. However, it remains largely speculative as to whether the lateralized brains have a cognitive gain or a sort of fitness advantage. In the present study, by making use of the advantages of domestic chicks as a model, the authors are successful in revealing that the lateralized brain is advantageous in the number sense, in which numerosity is associated with spatial arrangements of items. Behavioral evidence is strong enough to support their arguments. Brain lateralization was manipulated by light exposure during the terminal phase of incubation, and the left-to-right numerical representation appeared when the distance between items gave a reliable spatial cue. The light-exposure induced lateralization, though quite unique in avian species, together with the lack of intense inter-hemispheric direct connections (such as the corpus callosum in the mammalian cerebrum), was critical for the successful analysis in this study. Specification of the responsible neural substrates in the presumed right hemisphere is expected in future research. Comparable experimental manipulation in the mammalian brain must be developed to address this general question (functional significance of brain laterality) is also expected.
  
  We sincerely appreciate the Reviewer's insightful feedback and his/her recognition of the key contributions of our study.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This is the first study to show how a L-R bias in the relationship between numerical magnitude and space depends on brain lateralisation, and moreover, how is modulated by in ovo conditions.
  
  Strengths:
  
  Novel methodology for investigating the innateness and neural basis of an L-R bias in the relationship between number and space.
  
  We would like to thank the Reviewer for their valuable feedback and for highlighting the key contributions of our study.
  
  Weaknesses:
  
  I would query the way the experiment was contextualised. They ask whether culture or innate pre-wiring determines the 'left-to-right orientation of the MNL [mental number line]'.
  
  We thank the Reviewer for raising this point, which has allowed us to provide a more detailed explanation of this aspect. Rather than framing the left-to-right orientation of the mental number line (MNL) as exclusively determined by either cultural influences or innate pre-wiring, our study highlights the role of environmental stimulation. Specifically, prenatal light exposure can shape hemispheric specialization, which in turn contributes to spatial biases in numerical processing. Please see lines 115-118.
  
  The term, 'Mental Number Line' is an inference from experimental tasks. One of the first experimental demonstrations of a preference or bias for small numbers in the left of space and larger numbers in the right of space, was more carefully described as the spatialnumerical association of response codes - the SNARC effect (Dehaene, S., Bossini, S., & Giraux, P. (1993). The mental representation of parity and numerical magnitude. Journal of Experimental Psychology: General, 122, 371-396).
  
  We have refined our description of the MNL and SNARC effect to ensure conceptual accuracy in the revised manuscript; please see lines 53-59.
  
  This has meant that the background to the study is confusing. First, the authors note, correctly, that many other creatures, including insects, can show this bias, though in none of these has neural lateralisation been shown to be a cause. Second, their clever experiment shows that an experimental manipulation creates the bias. If it were innate and common to other species, the experimental manipulation shouldn't matter. There would always be an LR bias. Third, they seem to be asserting that humans have a left-to-right (L-R) MNL. This is highly contentious, and in some studies, reading direction affects it, as the original study by Dehaene et al showed; and in others, task affects direction (e.g. Bachtold, D., Baumüller, M., & Brugger, P. (1998). Stimulus-response compatibility in representational space. Neuropsychologia, 36, 731-735, not cited). Moreover, a very careful study of adult humans, found no L-R bias (Karolis, V., Iuculano, T., & Butterworth, B. (2011), not cited, Mapping numerical magnitudes along the right lines: Differentiating between scale and bias. Journal of Experimental Psychology: General, 140(4), 693-706). Indeed, Rugani et al claim, incorrectly, that the L-R bias was first reported by Galton in 1880. There are two errors here: first, Galton was reporting what he called 'visualised numerals', which are typically referred to now as 'number forms' - spontaneous and habitual conscious visual representations - not an inference from a number line task. Second, Galton reported right-to-left, circular, and vertical visualised numerals, and no simple left-to-right examples (Galton, F. (1880). Visualised numerals. Nature, 21, 252-256.). So in fact did Bertillon, J. (1880). De la vision des nombres. La Nature, 378, 196-198, and more recently Seron, X., Pesenti, M., Noël, M.-P., Deloche, G., & Cornet, J.-A. (1992). Images of numbers, or "When 98 is upper left and 6 sky blue". Cognition, 44, 159-196, and Tang, J., Ward, J., & Butterworth, B. (2008). Number forms in the brain. Journal of Cognitive Neuroscience, 20(9), 1547-1556.
  
  We sincerely appreciate the opportunity to discuss numerical spatialization in greater detail. We have clarified that an innate predisposition to spatialize numerosity does not necessarily exclude the influence of environmental stimulation and experience. We have proposed an integrative perspective, incorporating both cultural and innate factors, suggesting that numerical spatialization originates from neural foundations while remaining flexible and modifiable by experience and contextual influences. Please see lines 69–75.
  
  We have incorporated the Reviewer’s suggestions and cited all the recommended papers; please see lines 47–75.
  
  If the authors are committed to chicks' MN Line they should test a series of numbers showing that the bias to the left is greater for 2 and 3 than for 4, etc.
  
  What does all this mean? I think that the paper should be shorn of its misleading contextualisation, including the term 'Mental Number Line'. The authors also speculate, usefully, on why chicks and other species might have a L-R bias. I don't think the speculations are convincing, but at least if there is an evolutionary basis for the bias, it should at least be discussed.
  
  In the revised version of the manuscript, we have resorted to adopt the Spatial Numerical Association (SNA). We thank the Reviewer for this valuable comment.
  
  We appreciated the Reviewer’s suggestion regarding the evolutionary basis of lateralization and have included considerations of its relevance in chicks and other species; please see lines 143-151 and 381-386.
  
  This paper is very interesting with its focus on why the L-R bias exists, and where and why it does not.
  
  We wish to thank the Reviewer again for his/her work.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.03.09.642211v1
www.biorxiv.org www.biorxiv.org

Episodic boundaries affect neural features of representational drift in humans

1
1. Public_Reviews 08 Jun 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer 1:
  
  (1) In several instances the paper does not address apparent inconsistencies between the prior literature and the findings. For example, the first main finding is that recalled items have more differentiated lateral temporal cortex representations within lists than not recalled items. This seems to be the opposite of the prediction from temporal context models that are used to motivate the paper-context models would predict that greater contextual similarity within a list should lead to greater memory through enhanced temporal clustering in recall. This is what El-Kalliny et al (2019) found, using a highly similar design (free recall, intracranial recordings from the lateral temporal lobe). The authors never address this contradiction in any depth to reconcile it with the previous literature and with the motivating theoretical model.
  
  Figure 2 supports the findings from El-Kalliny and colleagues because it shows the relationship of each list item relative to the first item (El-Kalliny et al. 2019). Items encoded adjacent to SP1 show the highest spectral similarity supporting the idea of overlapping context predicted by the Temporal Context Model. However, our figure characterizes how increasing inter-item distance affects spectral similarity. It shows that two items successfully recalled from temporally distant serial positions show reduced spectral similarity. These findings align with the predictions of the temporal context model because two temporally distant items would lack significant contextual overlap and therefore would have more distinct spectral representations.
  
  El-Kalliny and colleagues do use a similar experimental set-up however the authors define drift differently. They identified patients with a tendency to temporally cluster, and observed those patients tend to drift less between temporally clustered items however they do not specify drift relative to a constant serial position as we do in our analysis. They define drift as spectral change between two adjacent items which is a more relative measure between any two items rather than in relation to a fixed point like SP1. Finally, our analysis focuses only on gamma activity while El-Kalliny and colleagues identified drift across a much broader set of frequency bands.
  
  (2) The way that the authors conduct the analysis of medial parietal neural similarity at boundaries leads to results that cannot be conclusively interpreted. The authors report enhanced similarity across lists for the first item in each list, which they interpret as reflecting a qualitatively distinct boundary signal. However, this finding can readily be explained by contextual drift if one assumes that whatever happens at the start of each list is similar or identical across lists (for example, a get ready prompt or reminder of instructions). The authors do not include analyses to rule this out, which undermines one of the main findings.
  
  Extensions of the temporal context model (Lohnas et al. 2015) predict context at the beginning of a list will be most similar to the end of the prior list. The theory assumes a single-context state, consisting of a recency-weighted average of prior items, that is updated, even across different encoding periods.
  
  However, our results show a boundary item representation is most similar to the prior lists first item rather than the last item. Our results conflict with the extension of TCM because the shared similarity of boundary items suggests the context state for the first item in the list is not a recency-weighted average of the items presented immediately prior. The same boundary sensitive signal is not present in other regions, namely the hippocampus and lateral temporal cortex. Those regions do not show similarity between items at the beginning of each list.
  
  Our main conclusion from these data was that the medial parietal lobe activity seems to be specifically sensitive to task boundaries, defined by the first event or the get ready prompt, while other regions are not.
  
  (3) Although several previous studies have linked hippocampal fMRI and electrophysiological activity at event boundaries with memory performance, the authors do not find similar relationships between hippocampal activity, event boundaries, and memory There are potential explanations for why this might be the case, including the distinction between item vs. associative memory, which has been a prominent feature of previous work examining this question. However, the authors do not address these potential explanations (or others) to explain their findings' divergence from prior work -this makes it difficult to interpret and to draw conclusions from the data about the hippocampus' mechanistic role in forming event memories.
  
  The following text was added and revised in the discussion to discuss hippocampal activity shown in our results and its lack of sensitivity to boundaries.
  
  “Spectral activity in the medial parietal lobe aligned closely with boundaries. Drift between item pairs seemed to reset at each boundary, leading to renewed similarity after each boundary. This observation aligns with previous work suggesting boundaries reset temporal context. In the temporal cortex, our findings extend prior studies which suggest the temporal lobe may play a role in associating adjacently presented items (Yaffe et al. 2014, ElKalliny et al 2019). We found items encoded in distant serial positions, but within the same list, drifted significantly more than items from adjacent serial positions (Figure 2C). Consistent with the predictions of the temporal context model, the reduced similarity between distant items may reflect reduced contextual overlap proportional to the time elapsed between them. However, across task boundaries, our study did not detect a robust change in drift rate in the medial or lateral temporal cortex. This finding contrasts with significant work (Ben-Yakov et al. 2018, Ezzyat et al. 2014; Griffiths et al. 2020) which shows hippocampal sensitivity to event-boundaries. One interpretation would be that boundary representations in the hippocampus are quite sparse and represented by populations of time-sensitive cells whose activity is indexed to task-related boundaries (Umbach et al. 2020). While the sparse representations may not be detectable in gamma activity, perhaps it suggests drift in these regions represents a more abstract set of contextual features accumulated from multiple brain regions.”
  
  (4) There is a similar absence of interpretation with respect to the previous literature for the data showing enhanced boundary-related similarity in the medial parietal cortex. The authors’ interpretation seems to be that they have identified a boundary-specific signal that reflects a large and abrupt change in context, however, another plausible interpretation is that enhanced similarity in the medial parietal cortex is related to a representation of a schema for the task structure that has been acquired across repeated instances.
  
  We agree our results could suggest the MPL creates a generalized situational model or schematic of the task. Unfortunately, our behavioral task does not allow us to differentiate between these ideas and pure boundary representation. However, given boundaries are a component in defining situational models, we chose to interpret our results conservatively as a form of boundary representation.
  
  (5) The authors do not directly compare their model to other models that could explain how variability in neural activity predicts memory. One example is the neural fatigue hypothesis, which the authors mention, however there are no analyses or data to suggest that their data is better fit by a boundary/contextual drift mechanism as opposed to neural fatigue.
  
  The study by Lohnas and colleagues does find higher HFA was greater for recalled items but does not describe a serial position specific trend (Lohnas et al. 2020). For our study, we stringently controlled for recall success in each of our analyses. Our main finding of boundary similarity compares recalled boundary items to recalled items in each of the other serial positions. We also show the similarity of nonrecalled items in all serial positions to demonstrate the lack of boundary representation in first list items, when neural fatigue is presumably least present.
  
  In addition, their study demonstrated neural fatigue in the hippocampus. They did not find evidence of fatigue in the DLPFC, suggesting region-specific mechanisms of neural fatigue. Our results are focused on the medial parietal lobe, and we were not able to find a fatigue model of the region for further comparison. While our results do not rule out the possibility of neural fatigue driving a drifting or boundary signal, we focus on the relevance of the signal to memory performance.
  
  (6) P2. Line 65 cites Polyn et al (2009b) as an example where ‘random’ boundary insertions improve subsequent memory. However, the boundaries in that study always occurred at the same serial position and were therefore completely predictable and not random.
  
  The citation was removed from the corresponding sentence.
  
  (7) P2. Line 74 cites Pu et al. (2022) as an example of medial temporal lobe ‘regional activity’ showing sensitivity to event boundaries; however, this paper reported behavioral and computational modeling results and did not include measurement of neural activity.
  
  The citation was removed from the corresponding sentence.
  
  (8) P.3 Line 117, Hseih et al (2014) and Hseih and Ranganath (2015) are cited as evidence that ‘spectral’ relatedness decreases as a function of distance, but neither of these studies examined ‘spectral’ activity (fMRI univariate and multivariate). The manuscript would benefit from a careful review and updating of how the prior literature is cited, which will increase the impact of the findings for readers.
  
  The text has been updated to reflect this distinction by modifying the statement to: “Previous work consistent with temporal context models suggests neural pattern similarity reduces as a function of distance between related memories.”
  
  (9) Several previous studies have found hippocampal activity at event boundaries correlates with memory performance (Ben-Yakov et al 2011, 2018; Baldassano et al 2017), yet here the authors do not find evidence for hippocampal activity at event boundaries related to memory. Does this difference reflect something important about how the hippocampus vs. medial parietal cortex vs. lateral temporal cortex contribute to memory formation? Currently, there is not much discussion about how to interpret the differences between brain regions. Previous work has suggested that hippocampal pattern similarity at event boundaries specifically supports associative memory across events (Ezzyat & Davachi, 2014; Griffiths & Fuentemilla, 2020; Heusser et al., 2016), which may help explain their findings. In any case the authors could increase the impact of their paper by further situating their findings within the previous literature.
  
  We would not suggest there is no boundary-related activity in the hippocampus. Similar to an earlier point made by the reviewer, to clarify our interpretation of regional differences, the following text has been added to the discussion.
  
  “However, across task boundaries, our study did not detect a robust change in drift rate in the medial or lateral temporal cortex. This finding contrasts with significant work (Ezzyat and Davachi, 2014; Griffiths and Fuentemilla, 2020) which shows hippocampal sensitivity to event-boundaries. One interpretation would be that boundary representations in the hippocampus are quite sparse and represented by populations of time-sensitive cells whose activity is indexed to task-related boundaries (Umbach et al 2020). While the sparse representations may not be detectable in gamma activity, perhaps it suggests drift in these regions represents a more abstract set of contextual features accumulated from multiple brain regions (Baldassano et al. 2017). “
  
  (10) The authors mention neural fatigue as an alternative theory to explain the primacy effect (Serruya et al., 2014), however there are no analyses or data to suggest that their data is better fit by a boundary mechanism as opposed to neural fatigue. Previous studies have shown that gamma activity in the hippocampus changes with serial position and with encoding history (Serruya et al 2014; Lohnas et al 2020). Here, the authors could compare the reported pattern similarity results to control analyses that replicate this prior work, which would strengthen their argument that there is unique information at boundaries that is distinct from a neural fatigue signal.
  
  The serial position effects described by Serruya and colleagues describe decreasing HFA with increasing serial position in the MTL, lateral temporal cortex and prefrontal cortex (Serruya et al. 2014). Despite their findings, we do not observe a strong boundary effect in those regions (see Supp Fig 3 a,b). The lack of boundary effect in regions where HFA is selectively increased for primacy items suggests the global neural fatigue model does not account for our results.
  
  Notably, the authors do not characterize HFA trends in the MPL. Nevertheless, their findings do not rule out the possibility of a boundary effect driving the HFA. We demonstrate boundary-relevant HFA only in the MPL but not in other regions. In addition, we show a correlation between SP1 recalls and boundary representation strength, as well as a conserved similarity of multiple boundary-adjacent items.
  
  Next, the neural fatigue study by Lohnas and colleagues does find higher HFA was greater for recalled items but does not describe a serial position specific trend (Lohnas et al. 2015). For our study, we stringently controlled for recall success in each of our analyses. Our main finding of boundary similarity compares recalled boundary items to recalled items in each of the other serial positions. We also show the similarity of non-recalled items in all serial positions to demonstrate the lack of boundary representation in the first list items, when neural fatigue is presumably least present.
  
  In addition, their study demonstrated neural fatigue in the hippocampus. They did not find evidence of fatigue in the DLPFC, suggesting region-specific mechanisms of neural fatigue. Our results are focused on the medial parietal lobe, and we were not able to find a fatigue model of the region for further comparison. While our results do not rule out the possibility of neural fatigue driving a drifting or boundary signal, we focus on the relevance of the signal to memory performance.
  
  (11) For the analyses that examine cross-list similarity (e.g. the medial parietal analysis in Figure 3), how did the authors choose the number of lists over which similarity was calculated? Was the selection of this free parameter cross-validated to ensure that it is not overfitting the data? Given that there were 25 lists per session, using the three succeeding lists seems arbitrary. Why not use every list across the whole session?
  
  Given the volume of data, number of patients, and computational time available at our facility, we extended the analysis as far as we could to characterize the observed trend.
  
  (12) P4. Line 155 says that Figure 3C shows example subject data, but it looks like it is actually Figure 3D.
  
  The text was updated to reference the correct figure.
  
  (13) The t-tests on P.4 Line 159 have two sets of degrees of freedom but should only have one.
  
  The t-tests described by Figure 3B represent the mean parameter estimate of the predictor for boundary proximity contrasted by region for all item pairs. The statistical test in this case was an unpaired t-test between parameter estimates for patients with electrodes in each of the regions. The numbers within parentheses represent the sample size, or number of subjects, contributing electrodes to each region.
  
  Reviewer 2:
  
  (1) Because this is not a traditional event boundary study, the data are not ideally positioned to demonstrate boundary specific effects. In a typical study investigating event boundary effects, a series of stimuli are presented and within that series occurs an event boundary – for instance, a change in background color. The power of this design is that all aspects between stimuli are strictly controlled – in particular, the timing – meaning that the only difference between boundary-bridging items is the boundary itself. The current study was not designed in this manner, thus it is not possible to fully control for effects of time or that multiple boundaries occur between study lists (study to distractor, distractor to recall, recall to study). Each list in a free recall study can be considered its own “mini” experiment such that the same mechanisms should theoretically be recruited across any/all lists. There are multiple possible processes engaged at the start of a free recall study list which may not be specific to event boundaries per se. For example, and as cited by the authors, neural fatigue/attentional decline (and concurrent gamma power decline) may account for serial position effects. Thus, SP1 on all lists will be similar by virtue of the fact that attention/gamma decrease across serial position, which may or may not be a boundaryspecific effect. In an extreme example, the analyses currently reported could be performed on an independent dataset with the same design (e.g. 12 word delayed free recall) and such analyses could potentially reveal high similarity between SP1-list1 in the current study and SP1-list1 in the second dataset, effects which could not be specifically attributed to boundaries.
  
  The neural fatigue study by Lohnas and colleagues does find higher HFA was greater for recalled items but does not describe a serial position specific trend (Lohnas et al. 2020). For our study, we stringently controlled for recall success in each of our analyses. Our main finding of boundary similarity compares recalled boundary items to recalled items in each of the other serial positions. We also show the similarity of non-recalled items in all serial positions to demonstrate the lack of boundary representation in the first list items, when neural fatigue is presumably least present.
  
  In addition, their study demonstrated neural fatigue in the hippocampus. They did not find evidence of fatigue in the DLPFC, suggesting region-specific mechanisms of neural fatigue. Our results are focused on the medial parietal lobe, and we were not able to find a fatigue model of the region for further comparison. While our results do not rule out the possibility of neural fatigue driving a drifting or boundary signal, we focus on the relevance of the signal to memory performance.
  
  (2) Comparisons of recalled "pairs" does not account for the lag between those items during study or recall, which based on retrieved context theory and prior findings (e.g. Manning et al., 2011), should modulate similarity between item representations. Although the GLM will capture a linear trend, it will not reveal serial position specific effects. It appears that the betas reported for the SP12 analyses are driven by the fact that similarity with SP12 generally increases across serial position, rather a specific effect of "high similarity to SP12 in adjacent lists" (Page 5, excluding perhaps the comparison with list x+1). It is also unclear how the SP12 similarity analyses support the statement that "end-list items are represented more distinctly, or less similarly, to all succeeding items" (Page 5). It is not clear how the authors account for the fact that the same participants do not contribute equally to all ROIs or if the effects are consistent if only participants who have electrodes in all ROIs are included.
  
  In our study, all pairs are defined by the lag between a reference and target item. The results in Figure 3 show the similarity between each serial position in relation to SP1; Figure 4 shows lag between each serial position relative to SP2 and 3; and Figure 5 shows lag relative to SP12. Each statistical model accounts for the lag by ordering the data by increased inter-item distance. Further, our definition of lag is significantly more rigorous than that used by Manning and colleagues. Our similarity results for Figures 3-5 characterize the change in similarity relative to a constant reference point, such as SP1, rather than a relative reference point, such as +1 lag, which aggregates similarity between pairs such as SP1 to SP2 with SP4 to SP5, which maybe recalled via different memory mechanisms.
  
  In Figure 5, we agree your characterization that ‘similarity with SP12 generally increases across serial position’ is a more accurate description of the trend. The text has been updated to reflect this by changing the interpretation to “later serial positions in adjacent lists shared a gradually increasing similarity to SP12.”
  
  Next, we clarify the statement "end-list items are represented more distinctly, or less similarly, to all succeeding items". When recalling SP12, the subsequent items recalled exhibit significantly lower similarity to SP12 (see Figure 5D, pink). Consequently, the spectral representation of successfully recalled end-list items appears more distinct from later items in similar serial positions. This stands in contrast to our observations illustrated in Figures 3 and 4, where successfully recalled start-list items demonstrate greater similarity to later items in similar serial positions.
  
  (3) The authors use the term "perceptual" boundary which is confusing. First, "perceptual boundary" seems to be a specific subset of the broader term "event boundary," and it is unclear why/how the current study is investigating "perceptual" boundaries specifically. Second and relatedly, the current study does not have a sole "perceptual" boundary (as discussed in point 1 above), it is really a combination of perceptual and conceptual since the task is changing (from recalling the words in the previous list to studying the words in the current list OR studying the words in the current list to solving math problems in the current list) in addition to changes in stimulus presentation.
  
  We agree with the statement that ‘perceptual’ as a modifier to the boundaries described here does not add significant information. Therefore, we have removed all reference to perceptual boundaries.
  
  (4) Although the results show that item-item similarity in the gamma band decreases across serial position, it is unclear how the present findings further describe "how gamma activity facilitates contextual associations" (Page 5). As mentioned in point 1 above, such effects could be driven by attentional declines across serial position -- and a concurrent decline in gamma power -- which may be unrelated to, and actually potentially impair, the formation of contextual associations, given evidence from the literature that increased gamma power facilitates binding processes.
  
  We agree that our study does not elucidate a mechanistic relationship between gamma power and contextual associations. The referenced sentence has been changed to: “how gamma activity is associated with context”.
  
  Please see our response to point 1 above. In addition, studies demonstrating decreasing gamma power with increasing serial position focus primarily on the MTL, lateral temporal cortex and prefrontal cortex (Serruya et al. 2012). Despite their findings, we do not observe a strong boundary effect in those regions (see Supp Fig 3 a,b). The lack of boundary effect in regions where HFA is selectively increased for primacy items suggests the global attentional decline or neural fatigue model does not account for our results.
  
  Notably, HFA trends in the MPL are poorly described. Further, gamma power decline does not rule out the possibility of a boundary effect driving the HFA. We demonstrate boundary-relevant HFA only in the MPL but not in other regions. In addition, we show a correlation between SP1 recalls and boundary representation strength, as well as a conserved similarity of multiple boundary-adjacent items.
  
  (5) Some of the logic and interpretations are inconsistent with the literature. For example, the authors state that "The temporal context model (TCM) suggests that gradual drift in item similarity provides context information to support recovery of individual items" however, this does not seem like an accurate characterization of TCM. According to TCM, context is a recency-weighted average of previous experience. Context "drifts" insofar as information is added to/removed from context. Context drift thus influences item similarity -- it is not that item similarity itself drifts, but that any change in item-item similarity is due to context drift.
  
  The current findings do not appear at odds with the conceptualization of drift and context in current version of the context maintenance and retrieval model. Furthermore, the context representation is posited to include information beyond basic item representations. Two items, regardless of their temporal distance, can be associated with similar contexts if related information is included in both context representations, as predicted and shown for multiple forms of relatedness including semantic relatedness (Manning & Kahana, 2012) and task relatedness (Polyn et al., 2012).
  
  We revised the sentence and encompassing paragraph to describe the temporal context model more accurately and emphasize how our findings align with the stated version of CMR. The revised text is below:
  
  “Next, we asked how gamma spectral activity reflects contextual association between items. In the medial parietal lobe, we observed recurring similarity between items distant in time but adjacent to boundaries. This pattern suggests spectral activity may carry information about an item's relationship to a boundary. These observations align with the Context Maintenance and Retrieval model which extends the predictions of TCM to encompass broader relationships among items. Our results demonstrate boundaries as an important aspect of context and specify the spectral and regional properties of these boundary-related contextual features.”
  
  (6) Lohnas et al. (2020) Neural fatigue influences memory encoding in the human hippocampus, Neuropsychologia, should be cited when discussing neural fatigue
  
  Thank you for your suggestion. The citation has been added to the text.
  
  (7) A within-list, not an across list, similarity analysis should be used to test the interpretation that end-of-list items are more distinct than other list items.
  
  We believe this recommendation refers to the following line in our text: “These findings suggest end-list items are represented more distinctly, or less similarly, to all succeeding items.” Our statement compares list x, SP12 to all succeeding items (in list x+1, x+2, etc.). Therefore, this statement refers to items in the next lists which is why we performed an across list analysis rather than within-list one.
  
  (8) It is unclear why it is necessary to use PCA to estimate similarity between items.
  
  PCA was used to reduce the dimensionality of the time-frequency matrix for the gamma band. This technique allowed us to compare predominant trends in gamma between items. In addition, we added a figure showing 3 example subjects in Figure 3 – supplementary figure 2D to show unique time-frequency components contribute to signal reconstructed from the PCs for each subject. Therefore, the boundary representation may be represented differently for each patient.
  
  (9) Lags are listed as -4, 4 (Page 8), however with a list length of 12, possible lags should be 11, 11.
  
  The listed parenthetical statement ‘(-4 to 4)’ referred to Figure 1 where Lag CRP is shown for transitions from -4 to 4. However, we did calculate lag CRP for all possible transitions. Therefore, the referenced phrase was changed to: “Lagged CRP was calculated for all possible transitions (-11 to 11).”
  
  (10) Hsieh et al. 2014 and Hsieh & Ranganath (2015) are fMRI studies and as such, do not support the statement "Previous work consistent with temporal context models suggests spectral relatedness reduces as a function of distance between words" (Page 3).
  
  The statement has been revised to: “Previous work consistent with temporal context models suggests neural pattern similarity reduces as a function of distance between related memories.”
  
  (11) Although statistically one can measure "How item-item similarity is affected by recollection" (Page 3), this is logically backwards, given that similarity during study necessarily precedes performance during free recall. Additionally, it is erroneous to assume that recalled words are "recollected" without additional measurements (e.g. Mickes et al. (2013) Rethinking familiarity: Remember/Know judgments in free recall, JML).
  
  The statement was changed to “item-item similarity is affected based on successful recall” given recollection cannot be determined in our paradigm.
  
  Reviewer 3:
  
  (1) My primary confusion in the current version of this paper is that the analyses don't seem to directly compare the two proposed models illustrated in Fig 1B, i.e. the temporal context model (with smooth drifts between items, including across lists) versus the boundary model (with similarities across all lists for items near boundaries). After examining smooth drift in the within-list analysis (Fig 2), the across-list analyses (Figs 3-5) use a model with two predictors (boundary proximity and list distance), neither of which is a smoothlydrifting context. Therefore there does not appear to be a quantitative analysis supporting the conclusion that in lateral temporal cortex "drift exhibits a relationship with elapsed time regardless of the presences of intervening boundaries" (lines 272-3).
  
  We could not use a smoothly drifting regressor due to its collinearity with any model of boundary similarity. Therefore, we chose our two regressors: boundary proximity, which models intra-list changes in similarity and list distance, which models a stepwise decrease in similarity from adjacent lists.
  
  However, we agree with the comment that the presented data does not directly support the lateral temporal cortex drifts independent of intervening boundaries. Therefore, we amended the statement to: “We found successfully recalled items encoded in distant serial positions drifted significantly more than items from adjacent serial positions (Figure 2C)”. Consistent with the predictions of the temporal context model, the reduced similarity between distant items may reflect reduced contextual overlap proportional time elapsed between them.”
  
  (2) The feature representation used for the neural response to each item is a gamma power time-frequency matrix. This makes it unclear what characteristics of the neural response are driving the observed similarity effects. It appears that a simple overall scaling of the response after boundaries (stronger responses to initial items during the beginning portion of the 1.6s time window) would lead to the increased cosine similarity between initial items, but wouldn't necessarily reflect meaningful differences in the neural representation or context of these items.
  
  Our study aims to draw the connection between the neural response after boundaries with neural representation and context of these items. Prior studies (Manning et al. 2011, El Kalliny et al. 2017) have interpreted similarity in neural spectra as a memory relevant phenomenon. We use very similar methods to perform our analysis.
  
  In addition, we compare the fit of our boundary similarity model to behavioral performance to show increased boundary representation correlates with improved boundary item recall.
  
  While our study does not specify which time-frequency components underly the increased similarity, we do limit our analysis to the gamma band. Traditional analyses include log-scaled, broadband time-frequency data (eg. 3-100hz) from which we specify the relevance of a much narrower spectral band.
  
  Finally, we tried to study which time–frequency components contributed to the increased similarity, but it varied greatly between patients (see Figure 3 – supplementary figure 2D). Hence, we opted to use principal component analyses to compare the features showing the most variation for each given participant. This added analytical step allows us to detect boundary effects across patients despite individual variability in boundary representation.
  
  (3) The specific form of the boundary proximity models is not well justified. For initial items, a model of e^(1-d) is used (with d being serial position), but it is not stated how the falloff scale of this model was selected (as opposed to e.g. e^((1-d)/2)). For final items, a different model of d/#items is used, which seems to have a somewhat different interpretation (about drift between boundaries, rather than an effect specific to items near a final boundary). The schematic in Fig 1B appears to show a hypothesis which is not tested, with symmetric effects at initial and final boundaries.
  
  The boundary proximity models were chosen empirically. Our model was intended to quantify a decreasing relationship across many patients. We acknowledge the constants and variables may not definitively describe underlying neural processes.
  
  For start- and end-list boundaries, we used different models because primacy and recency effects are unique phenomena. Primacy memory is classically thought to arise from rehearsal during the encoding time (Polyn et al. 2009, Lohnas et al. 2015). Alternatively, recency memory is thought to arise from strong contextual cues of recency items during recall due to their temporal proximity. Therefore, we have a limited basis on which to assume their spectral representation in relation to task boundaries would be symmetric.
  
  (4) The main text description of Fig 2 only describes drift effects in lateral temporal cortex, but Fig 2 - supplement 1 shows that there is also drift and a significant subsequent memory effect in the other two ROIs as well. There is not a significant memory x drift slope interaction in these regions; are the authors arguing that the lack of this interaction (different drift rates for remembered versus forgotten items) is critical for interpreting the roles of lateral temporal cortex versus medial parietal and hippocampal regions?
  
  Yes. Fig 2- Supplement 1 shows that drift occurs in both the HC and MPL. However, the interaction term is not significant, which suggests that the rate of drift between recalled and non-recalled items is not significantly different.
  
  In contrast, Fig 2C shows that recalled pairs drift at a higher rate than non-recalled pairs. For the LTC, the interaction term is negative in magnitude and statistically significant. This suggests successfully encoded item pairs encoded far apart share more distinct spectral representations, specifically in the LTC. These findings lead to our interpretation in the discussion that “elevated drift rate might allow the representations of recalled items to remain distinct but ordered in memory.”
  
  (5) The parameter fits for the "list distance" regressor are not shown or analyzed, though they do appear to be important for the observed similarity structure (e.g. Fig 3E). I would interpret this regressor as also being "boundary-related" in the sense that it assumes discrete changes in similarity at boundaries.
  
  Parameter fits for the ‘list distance’ regressor are now shown in the supplementary portion of Figures 3 and Figure 5. The difference between regions is non-significant.
  
  (6) To make strong claims about temporal context versus boundary models as implied by Fig 1B, these two regressors should be fit within the same model to explain across-list similarity. The temporal context model could be based on the number of intervening items (as in Fig 1B) or actual time elapsed between items. The relationship between the smoothly drifting temporal context model and the discretely-jumping list distance models should also be clarified.
  
  We could not use a smoothly drifting regressor due to its collinearity with any model of boundary similarity. A model which included a ‘temporal context regressor’ would not be able to account for the presence of a boundary effect and would not allow us to demonstrate a boundary representation in the presence of drift. Therefore, we chose our two regressors: boundary proximity, which models intra-list changes in similarity and list distance, which models a stepwise decrease in similarity from adjacent lists. These regressors allow the model to differentiate between intra-list changes (the boundary regressor) verses inter-list changes (the list distance regressor).
  
  (7) The features of the time-frequency matrix that are driving similarity between events could be visualized to provide a better understanding of the boundary-related signals. The analysis could also be re-run with reduced versions of the feature space in order to determine the critical components of this signal; for example, responses could be averaged across time to examine only differences across frequencies, or across frequencies to examine purely temporal changes across the 1.6 second window.
  
  Figure 3 – supplementary figure 2 A-C has been added to show varying the number of principal components (PCs) does not change the trend of boundary sensitivity in the MPL. In addition, we included 3 example subjects in Figure 3 – supplementary figure 2D to show unique time-frequency components contribute to signal reconstructed from the PCs for each subject. Therefore, the boundary representation may be represented differently for each patient.
  
  (8) If the authors are considering a space of multiple models as "boundary proximity models" (e.g. linear models and exponential models with different scale factors), this should be part of the model-fitting process rather than a single model being selected posthoc.
  
  We agree with the reviewer’s suggestion that the most ideal way to fit a model to the trend would be using a model-fitting process. However, due to a limitation on the amount of computational resources available, we were not able to perform it given the size of our dataset.
  
  (9) The interpretation of region differences in the results in Fig 2 and Fig 2 - supplement 1 should be clarified.
  
  In discussion, we have added the following text to clarify our interpretation of the regional differences shown in the mentioned figures.
  
  “However, across task boundaries, our study did not detect a robust change in drift rate in the medial or lateral temporal cortex. This finding contrasts with significant work (Ezzyat and Davachi, 2014; Griffiths and Fuentemilla, 2020) which shows hippocampal sensitivity to event-boundaries. One interpretation would be that boundary representations in the hippocampus are quite sparse and represented by populations of time-sensitive cells whose activity is indexed to task-related boundaries (Umbach et al 2018). While the sparse representations may not be detectable in gamma activity, perhaps it suggests drift in these regions represents a more abstract set of contextual features accumulated from multiple brain regions (Baldassano et al. 2017). “
  
  (10) Whether there are significant fits for the list distance regressor, and whether these fits vary across regions, could be stated. The list distance regressor could also be directly compared (in the same model) to a temporal-context regressor, which predicts graded changes in similarity between items rather than the discrete changes between lists.
  
  We have added parameter fits for the ‘list distance’ regressor in the supplementary portion of Figures 3 and Figure 5. The difference between regions is non-significant. Therefore, our results show very similar stepwise decrease in similarity across lists between regions (list distance regressor; Figure 3 —supplementary figure 1B).
  
  We could not compare these parameters to a separate model which includes a smoothly drifting ‘temporal-context’ regressor due to the regressors collinearity with any representation of boundary. See our response to Reviewer 3 –comment 6.
  
  (11) The authors should clarify their interpretation of the results, and whether they are proposing a tweak to the temporal context model or a substantially different organizational system.
  
  In the disucssion we include the following statements to clarify what we suggest regarding the temporal context model.
  
  “Our findings suggest a broader scope of contextual association than just prior items, where temporal proximity as well as task structure in the form of boundaries, play intertwined roles in contextual construction. Our data therefore have implications for updated iterations of the temporal context model incorporating (perhaps) specific terms for boundary information. This may in turn provide a more systematic prediction of primacy effects in behavioral data.”
  
  (12) Minor typos and corrections:
  
  52: using -> use
  
  108: patients -> patients' 156: list -> lists
  
  The list distance plot is described as "pink" in Fig 3 and Fig 5 - supplement 1, but appears gray in the figures.
  
  Each of these corrections has been corrected in the text.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.08.20.553078v2
www.biorxiv.org www.biorxiv.org

Physical constraints and biological regulations underlie universal osmoresponses

3
1. Public_Reviews 06 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  This manuscript develops a theoretical model of osmotic pressure adaptation in microbes by osmolyte production and wall synthesis. The prediction of a rapid increase in growth rate on osmotic shock is experimentally validated using fission yeast. By using phenomenological rules rather than detailed molecular mechanisms, the model can potentially apply to a wide range of microbes, providing important insights that would be of interest to the wider community studying the regulation of cell size and mechanics. The level of coarse-graining and the assumptions and limitations of the model have been well described, providing a convincing foundation for making predictions. However, further experimental work on the validity of the core assumptions across a range of microbial organisms is needed to assess the universality of the model.
  
  Summary
2. Public_Reviews 06 Jun 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  A theoretical model for microbial osmoresponse was proposed. The model assumes simple phenomenological rules: (i) the change of free water volume in the cell due to osmotic imbalance based on pressure balance, (ii) Osmoregulation that assumes change of the proteome partitioning depending on the osmotic pressure that affects the osmolyte-producing protein production, (iii) The cell-wall synthesis regulation where the change of the turgor pressure to the cell-wall synthesis efficiency to go back to the target turgor pressure, (iv) Effect of Intracellular crowding assuming that the biochemical reactions slows down for more crowding and stops when the protein density (protein mass divided by free water volume) reaches a critical value. The parameter values were found in the literature or obtained by fitting to the experimental data. The authors compare the model behavior with various microorganismcs (E. coli, B. subtils, S. Cerevisiae, S. pombe), and successfully reproduced the overall trend (steady state behavior for many of them, dynamics for S. pombe). In addition, the model predicts non-trivial behavior such as the fast cell growth just after the hypoosmotic shock, which is consistent with experimental observation. The authors further make experimentally testable predictions regarding mutant behavior and transient dynamics.
  
  The theory assumes simple mechanistic dependence between core variables without going into specific molecular mechanisms of regulations. The simplicity allows the theory to apply to different organisms by adjusting the time scales with parameters, and the model successfully explains broad classes of observed behaviours. Mathematically, the model provides analytical expressions of the parameter dependencies and an understanding of the dynamics through the phase space without being buried in the detail. This theory can serve as a base to discuss the universality and diversity of microbial osmoresponse.
  
  The coarse-grained nature of the model is the strength of the model in terms of its generality. However, it does not consider various regulations at the molecular level. Hence, certain adaptation features are not considered in the current version of the model. The updated manuscript discusses the pros and cons of the current approach.
  
  Review 1
3. Public_Reviews 06 Jun 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  In this study, Ye et al. have developed a theoretical model of osmotic pressure adaptation by osmolyte production and wall synthesis.
  
  Strengths:
  
  They validate their model predictions of a rapid increase in growth rate on osmotic shock experimentally using fission yeast. The study has several interesting insights which are of interest to the wider community of cell size and mechanics.
  
  Comments on revisions:
  
  The authors have in the revised manuscript addressed the aspects of the writing that were unclear. , that are listed previously as major and minor comments. We believe the issues raised by this reviewer have been adequately addressed in the manuscript.
  
  Review 2
Visit annotations in context

Tags

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.07.02.601668v3
www.biorxiv.org www.biorxiv.org

An initial report of c241,000 to 335,000 Year old Rock Engravings and their relation to Homo naledi in the Rising Star cave system, South Africa

4
1. Public_Reviews 06 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  This paper presents important information about potential Homo naledi-associated markings discovered on the walls of the Hill Antechamber of the Rising Star Cave system, South Africa. If confirmed, the antiquity, intentionality, and authorship of the reported markings will have profound archaeological implications, as such behaviors are otherwise widely considered to be unique to our species, Homo sapiens. This report concerns preliminary findings and as it stands the study is incomplete, with further work needed in the future to support the claims about the anthropogenic nature, age, and author of the engravings.
  
  Summary
2. Public_Reviews 06 Jun 2025
  
  in eLife
  
  Reviewer #3 (Public review):
  
  In a characteristically bold fashion, Lee Berger and colleagues argue here that markings they have found in a dark isolated space in the Rising Star Cave system are likely over a quarter of a million years old and were made intentionally by Homo naledi, whose remains nearby they have previously reported. As in a European and much later case they reference ('Neanderthal engraved 'art' from the Pyrenees'), the entangled issues of demonstrable intentionality, persuasive age and likely authorship will generate much debate among the academic community of rock art specialists. The title of the paper and the reference to 'intentional designs', however, leave no room for doubt as to where the authors stand, despite an avoidance of the word art, entering a very disputed terrain. Iain Davidson's (2020) 'Marks, pictures and art: their contributions to revolutions in communication', also referenced here, forms a useful and clearly articulated evolutionary framework for this debate. The key questions are: 'are the markings artefactual or natural?', 'how old are they?' and 'who made them?, questions often intertwined and here, as in the Pyrenees, completely inseparable. I do not think that these questions are definitively answered in this paper and I guess from the language used by the authors (may, might, seem etc) that they do not think so either.
  
  Before considering the specific arguments of the authors to justify the claims of the title, we should recognise the shift in the academic climate of those concerned with 'ancient markings' that has taken place over the past two or three decades. Before those changes, most specialists would probably have expected all early intentional markings to have been made by Homo sapiens after the African diaspora as part of the explosion of innovative behaviours thought to characterise the 'origins of modern humans'. Now, claims for earlier manifestations of such innovations from a wider geographic range are more favourably received, albeit often fiercely challenged as the case for Pyrenean Neanderthal 'art' shows (White et al. 2020). This change in intellectual thinking does not, however, alter the strict requirements for a successful assertion of earlier intentionality by non-sapiens species. We should also note that stone, despite its ubiquity in early human evolutionary contexts, is a recalcitrant material not easily directly dated whether in the form of walling, artefact manufacture or potentially meaningful markings. The stakes are high but the demands no less so.
  
  Why are the markings not natural? Berger and co-authors seem to find support for the artefactual nature of the markings in their location along a passage connecting chambers in the underground Rising Star Cave system. The presumption is that the hominins passed by the marked panel frequently. I recognise the thinking but the argument is weak. More confidently they note that "In previous work researchers have noted the limited depth of artificial lines, their manufacture from multiple parallel striations, and their association into clear arrangement or pattern as evidence of hominin manufacture (Fernandez-Jalvo et al. 2014)". The markings in the Rising Star Cave are said to be shallow, made by repeated grooving with a pointed stone tool that has left striations within the grooves, and to form designs that are "geometric expressions" including crosshatching and cruciform shapes. "Composition and ordering" are said to be detectable in the set of grooved markings. Readers of this and their texts will no doubt have various opinions about these matters, mostly related to rather poorly defined or quantified terminology. I reserve judgement, but would draw little comfort from the similarities among equally unconvincing examples of early, especially very early, 'designs'. Two or even three half convincing arguments do not add up to one convincing one.
  
  The authors draw our attention to one very interesting issue: given the extensive grooving into the dolomite bedrock by sharp stone objects, where are these objects? Only one potential 'lithic artefact' is reported, a "tool-shaped rock [that] does resemble tools from other contexts of more recent age in southern Africa, such as a silcrete tool with abstract ochre designs on it that was recovered from Blombos Cave (Henshilwood et al. 2018)", also figured by Berger and colleagues. A number of problems derive from this comparison. First, 'tool-shaped rock' is surely a meaningless term: in a modern toolshed 'tool-shaped' would surely need to be refined into 'saw-shaped', 'hammer-shaped' or 'chisel-shaped' to convey meaning? The authors here seem to mean that the Rising Star Cave object is shaped like the Blombos painted stone fragment? But the latter is a painted fragment not a tool and so any formal similarity is surely superficial and offers no support to the 'tool-ness' of the Rising Star Cave object. Does this mean that Homo naledi took (several?) pointed stone tools down the dark passsageways, used them extensively and, whether worn out or still usable, took them all out again when they left? Not impossible, of course. And the lighting?
  
  The authors rightly note that the circumstance of the markings "makes it challenging to assess whether the engravings are contemporary with the Homo naledi burial evidence from only a few metres away" and more pertinently, whether the hominins did the markings. Despite this honest admission, they are prepared to hypothesise that the hominin marked, without, it seems, any convincing evidence. If archaeologists took juxtaposition to demonstrate authorship, there would be any number of unlikely claims for the authorship of rock paintings or even stone tools. The idea that there were no entries into this Cave system between the Homo naledi individuals and the last two decades is an assertion not an observation and the relationship between hominins and designs no less so. In fact the only 'evidence' for the age of the markings is given by the age of the Homo naledi remains, as no attempt at the, admittedly very difficult, perhaps impossible, task of geochronological assessment, has been made.
  
  The claims relating to artificiality, age and authorship made here seem entangled, premature and speculative. Whilst there is no evidence to refute them, there isn't convincing evidence to confirm them.
  
  References:
  
  Davidson, I. 2020. Marks, pictures and art: their contribution to revolutions in communication. Journal of Archaeological Method and Theory 27: 3 745-770.
  
  Henshilwood, C.S. et al. 2018. An abstract drawing from the 73,000-year-old levels at Blombos Cave, South Africa. Nature 562: 115-118.
  
  Rodriguez-Vidal, J. et al. 2014. A rock engraving made by Neanderthals in Gibralter. Proceedings of the National Academy of Sciences.
  
  White, Randall et al. 2020. Still no archaeological evidence that Neanderthals created Iberian cave art.
  
  Comments on latest version:
  
  The authors have not modified their stance or the authority of their arguments since the original paper.
  
  Review 1
3. Public_Reviews 06 Jun 2025
  
  in eLife
  
  Reviewer #4 (Public review):
  
  Thank you for the opportunity to provide a peer-review of this manuscript, which I first reviewed in 2023 under the title of '241,000 to 335,000 Years Old Rock Engravings Made by Homo naledi in the Rising Star Cave system, South Africa'. My review is brief as the authors state they have made "relatively minimal changes", so most of the comments I made in 2023 still stand. Some of the language is a little more temperate but the main issues of this potentially landmark study remain and undermine scientific acceptance of the findings claim. The fact that this is an initial report does not excuse it from the normal conventions of building arguments supported by empirical data. Again, the absence of a rock art expert on the authorial team causes recurring weaknesses still to be evident (would one ask a rock art expert to analyse a new fossil hominin skull for example?). Specifically, there are two major issues that need to be resolved before there is necessary and sufficient cause to assign the term 'rock engravings' to the marks in the Dinaledi chamber. These are authorship and dating.
  
   Authorship: The assertion that the 'rock engravings' are anthropogenic remains unsupported by empirical evidence, with a number of possible natural factors that could just as likely have caused the marks. Not to use image enhancements - which is standard in most rock art research and has been for some time - is a critical omission. The concerns stated about AI and data standards are not developed and the authors are directed to the literature in this field, for example this 2025 overview - https://www.sciencedirect.com/science/article/pii/S1296207424002516. Again, having a rock art expert would show the AI concern to be valid but easily addressed using Data Standards. In the almost 2 years since the first pre-print was released, there has been ample time for high resolution photographs and scans of the purported 'rock engravings'; analysis of which by relevant experts could properly physically characterise the marks and thus establish more or less likely agents for their production. European-based researchers in particular has utilised this approach on material such as the Blombos ochre and marked bone from Europe and Africa. None of these methods is invasive or destructive.
  
  To then go on and link Homo naledi to these markings is premature, especially when this landscape has been home to multiple hominins. Most rock art sites do not contain the physical bodily remains of their makers so we assign authorship based on dating (such as for Neanderthal era art in Europe for example); the second critical issue in this report:
  
   Dating: There is no direct or closely associated chronometric dating of the 'rock engravings' or their immediate context, so the age range claimed is unsupported. Rock art dating is notoriously difficult - and why researchers closely scrutinise dates produced. In this case, however, the chronological context is physically so far removed from these rock markings, as to be misleading at best and need to be discounted until a proper programme of dating has commenced. The sources cited for rock art dating tend to be out of date and it would be standard practice to have a geochronologist assess the rock-marked areas and then establish dating protocols.
  
  Authorship and dating are cornerstone of archaeological/paleoanthropological work and need to established in the first instance. Until that has been done commensurate with current standards in global rock art research this potentially landmark finding cannot be taken as probable, only as possible. This is a pity as the last decade or so has revolutionised our understanding of the socially complex world multiple hominin species lived in, and marked in utilitarian and symbolic ways. The conditions for acceptance of ancient rock art has thus never been better, but the Dinaledi example needs to revisit research first principles around authorship and dating to be included as a credible part of this larger context. It would have been good to see a commitment to a coherent research programme to this end for this case study.
  
  I hope these observations are useful. As above I keep them short as there has been minimal change to the 2023 ms, and my detailed comments on that remain with the first version of the work.
  
  Review 2
4. Public_Reviews 06 Jun 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  We thank the reviewers for their very constructive and helpful comments on the previous version of this manuscript. They have focused on some important issues and have raised many valuable questions that we expect to answer as research begins on these markings. As has been often the case with preprints, a number of experts beyond the four reviewers and editor have provided comments, questions, and suggestions, and we have taken these on board in our revision of the manuscript. In particular, Martinón-Torres et al. (2024) focused several comments upon this manuscript and raise some points that were not considered by the reviewers, and so we discuss those points here in addition to the reviewer comments.
  
  Some of us have been engaged in other aspects of the possible cultural activities of Homo naledi. After the discovery of these markings we considered it indefensible to publish further research on the activity of H. naledi within this part of the cave system without making readers aware that the H. naledi skeletal remains occur in a spatial context near markings on cave walls. Of course, the presence of markings leaves many questions open. A spatial context does not answer all questions about the temporal context. The situation of the Dinaledi Subsystem does entail some constraints that would not apply to markings within a more open cave or rock wall, and we discuss those in the text.
  
  We find ourselves in agreement with most of the reviewers on many points. As reflected by several of the reviewers, and most pointedly in the remarks by reviewer 1, the purpose of this preprint is a preliminary report on the observation of the markings in a very distinctive location. This initial report is an essential step to enable further research to move forward. That research requires careful planning due to the difficulty of working within the Dinaledi Subsystem where the markings are located. This pattern of initial publication followed by more detailed study is common with observations of rock art and other markings identified in South Africa and elsewhere. We appreciate that the reviewers have understood the role of this initial study in that process of research.
  
  Because of this, the revised manuscript represents relatively minimal changes, and all those at the advice of reviewers. Many thanks to all the reviewers for noting various typographic errors, missed references and other issues that we have done our best to fix in the revised manuscript.
  
  Expertise of authors. Reviewer 4 mentions that the expertise of the authors does not include previous publication history on the identification of rock art, and other reviewers briefly comment that experts in this area would enhance the description. AF does have several publications on ancient engravings and other markings; LRB has geological training and field experience with rock art. Notwithstanding this, we do take on board the advice to include a wider array of subject experts in this research, and this is already underway.
  
  Image enhancement. We appreciate the suggestions of some reviewers for possible strategies to use software filters to bring out details that may not be obvious even with our cross-polarization lighting and filtering. These are great ideas to try. In this manuscript we thought that going very far into software editing or image enhancement might be perceived by some readers as excessive manipulation, particularly in an age of AI. In future work we will experiment with the suggested approaches.
  
  Natural weathering. In the process of review and commentary by experts and the public there has been broad acceptance that many of the markings illustrated in this paper are artificial and not a product of natural weathering of the dolomite rock. We deeply appreciate this. At the same time, we accept the comments from reviewers that some markings may be difficult to differentiate from natural weathering, and that some natural features that were elaborated or altered may be among the markings we recognize. On pages 3 and 4 we present a description of the process of natural subaerial weathering of dolomite, which we have rooted in several references as well as our own observations of the natural weathering visible on dolomite cave walls in the Rising Star cave system. This includes other cave walls within the Dinaledi Subsystem. We discuss the “elephant skin” patterning of natural dolomite surface weathering, how that patterning emerges, and how that differs from the markings that are the subject of this manuscript.
  
  Animal claw marks. Martinón-Torres et al. 2024 accept that some of the markings illustrated on Panel A are artificial, but they offer the hypothesis that some of those markings may be consistent with claw marks from carnivores or other mammals. They provide a photo of claw marks within a limestone cave in Europe to illustrate this point. On pages 5 and 6 of the revised manuscript we discuss the hypothesis of claw marks. We discuss the presence of animals in southern Africa that may dig in caves or mark surfaces. However the key aspect of the Malmani dolomite caves is that the hardness of dolomitic limestone rock is much greater than many of the limestone caves in other regions such as Europe and Australia, where claw marks have been noted in rock walls. As we discuss, we have not been able to find evidence of claw marks within the dolomite host bedrock of caves in this region, although carnivores, porcupines, and other animals dig into the soft sediments within and around caves. The form of the markings themselves also counter-indicates the hypothesis that they are claw marks.
  
  Recent manufacture. One comment that occurs within the reviews and from other readers of the preprint is that recent human visitors to the cave, either in historic or recent prehistoric times, may have made these marks. We discuss this hypothesis on page 6 of the revised manuscript. The simple answer is that no evidence suggests that any human groups were in the Dinaledi Subsystem between the presence of H. naledi and the entry of explorers within the last 25 years. The list of all explorers and scientific visitors to have entered this portion of the cave system is presented in a table. We can attest that these people did not make the marks. More generally, such marks have not been known to be made by cavers in other contexts within southern Africa.
  
  Panels B and C. We have limited the text related to these areas, other than indicating that we have observed them. The analysis of these areas and quantification of artificial lines does not match what we have done for the Panel A area and we leave these for future work.
  
  Presence of modern humans. We have observed no evidence of modern humans or other hominin populations within the Dinaledi Subsystem, other than H. naledi. Several reviewers raise the question of whether the absence of evidence is evidence of absence of modern humans in this area. This is connected by two of the reviewers to the observation that the investigation of other caves in recent years has shown that markings or paintings were sometimes made by different groups over tens of thousands of years, in some cases including both Neanderthals and modern humans. We have decided it is best for us not to attempt to prove a negative. It is simple enough to say that there is no evidence for modern humans in this area, while there is abundant evidence of H. naledi there.
  
  Association with H. naledi. Reviewer 2 made an incisive point that the previous version contained some text that appeared contradictory: on the one hand we argued that modern humans were not present in the subsystem due to the absence of evidence of them, yet we accepted that H. naledi may have been present for a longer time than currently established by geochronological methods.
  
  We appreciate this comment because it helped us to think through the way to describe the context and spatial association of these markings and the skeletal remains, and how it may relate to their timeline. Other reviewers also raised similar questions, whether the context by itself demonstrates an association with H. naledi. We have revised the text, in particular on pages 5 and 7, to simply state that we accept as the most parsimonious alternative at present the hypothesis that the engravings were made by H. naledi, which is the only hominin known to be present in this space.
  
  Age of H. naledi in the system. At one place in the previous manuscript we indicated that we cannot establish that H. naledi was only active in the cave system within the constraints of the maximum and minimum ages for the Dinaledi Subsystem skeletal remains (viz., 335 ka – 241 ka), because some localities with skeletal material are undated. We have adjusted this paragraph on page 7 to be clear that we are discussing this only to acknowledge uncertainty about the full range of H. naledi use of the cave system.
  
  Geochronological methods. Several reviewers discuss the issue of geochronology as applied to these markings. This is an area of future investigation for us after the publication of this initial report. As some reviewers note, the prospects for successful placement of these engraved features and other markings with geochronological methods depends on factors that we cannot predict without very high-resolution investigation of the surfaces. We have included greater discussion of the challenges of geochronological placement of engravings on page 6, including more references to previous work on this topic. We also briefly note the ethical problems that may arise as we go further with potentially invasive, destructive or contact studies of these engravings, which must be carefully considered by not just us, but the entire academy.
  
  Title. Some reviewers suggested that the title should be rephrased because this paper does not use chronological methods to derive date constraints for the markings. We have rephrased the title to reflect less certainty while hopefully retaining the clear hypothesis discussed in the paper.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.06.01.543133v2
www.medrxiv.org www.medrxiv.org

Early menarche and childbirth accelerate aging-related outcomes and age-related diseases: Evidence for antagonistic pleiotropy in humans

4
1. Public_Reviews 06 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  This important study uses Mendelian Randomization to provide evidence that early-life reproductive phenotypes (i.e., age at onset of menarche and age at first birth) have a significant impact on numerous health outcomes later in life. The empirical evidence provided by the authors supporting the antagonistic pleiotropy theory is solid. Theories of aging should be empirically tested and this study provides a good first step in that direction.
  
  Summary
2. Public_Reviews 06 Jun 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The present study aims to determine possible associations between reproduction with prevalence of age-related diseases based on the antagonistic pleiotropy hypothesis of ageing predominantly using Mendelian Randomization. The authors provide evidence demonstrated that menarche before the age 11 and childbirth before 21 increases the risk of several diseases, and almost doubled the risk for diabetes, heart failure, and quadrupled the risk of obesity,
  
  Strengths:
  
  Large sample size. Many analyses
  
  Review 1
3. Public_Reviews 06 Jun 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors present an interesting paper where they test the antagonistic pleiotropy theory. Based on this theory they hypothesize that genetic variants associated with later onset of age at menarche and age at first birth have a positive causal effect on a multitude of health outcomes later in life, such as epigenetic aging and prevalence of chronic diseases. Using a mendelian randomization and colocalization approach, the authors show that SNPs associated with later age at menarche are associated with delayed aging measurements, such as slower epigenetic aging and reduced facial aging and a lower risk of chronic diseases, such as type 2 diabetes and hypertension. Moreover, they identify 128 fertility-related SNPs that associate with age-related outcomes and they identified BMI as a mediating factor for disease risk, discussing this finding in the context of evolutionary theory.
  
  Strengths:
  
  The major strength of this manuscript is that it addresses the antagonistic pleiotropy theory in aging. Aging theories are not frequently empirically tested although this is highly necessary. The work is therefore relevant for the aging field as well as beyond this field, as the antagonistic pleiotropy theory addresses the link between fitness (early life health and reproduction) and aging.
  
  Weaknesses:
  
  The authors report evidence in support of the antagonistic pleiotropy theory in aging and discuss the discuss the disposable soma theory. Although both theories describe distinct mechanisms, separating them in empirical research is complicated and needs further studies in future research.
  
  Review 2
4. Public_Reviews 06 Jun 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The present study aims to associate reproduction with age-related disease as support of the antagonistic pleiotropy hypothesis of ageing predominantly using Mendelian Randomization. The authors found evidence that early-life reproductive success is associated with advanced ageing.
  
  Strengths:
  
  Large sample size. Many analyses.
  
  Weaknesses:
  
  Still a number of doubts with regard to some of the results and their interpretation.
  
  Reviewer #1 (Recommendations for the authors):
  
  Thank you for the opportunity to review a revised version.
  
  I still have serious doubts with regard to a number of datasets presented. For example, the results on essential hypertension and cervical cancer show very small effect sizes, but according to the authors still reach the level of statistical significance. This is unlikely to be accurate. For MR analyses, this is nearly impossible. The analyses of these data and the statistical analysis need to be checked for errors and repeated. While BOLT-LLM might not be relevant here, there might be other things happening here. The authors should therefore always interpret the results also with regard to the observed effect sizes instead of only looking at the p-values (0.999 means that there is a 0.1% lower risk).
  
  Thank you for your suggestions. We have updated the results for essential hypertension, GAD, and cervical cancer in results, figures, and supplemental tables (lines 65-89, Figure 1, Tables S3-S4).
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors present an interesting paper where they test the antagonistic pleiotropy theory. Based on this theory they hypothesize that genetic variants associated with later onset of age at menarche and age at first birth may have a positive effect on a multitude of health outcomes later in life, such as epigenetic aging and prevalence of chronic diseases. Using a mendelian randomization and colocalization approach, the authors show that SNPs associated with later age at menarche are associated with delayed aging measurements, such as slower epigenetic aging and reduced facial aging and a lower risk of chronic diseases, such as type 2 diabetes and hypertension. Moreover, they identify 128 fertility-related SNPs that associate with age-related outcomes and they identified BMI as a mediating factor for disease risk, discussing this finding in the context of evolutionary theory.
  
  Strengths:
  
  The major strength of this manuscript is that it addresses the antagonistic pleiotropy theory in aging. Aging theories are not frequently empirically tested although this is highly necessary. The work is therefore relevant for the aging field as well as beyond this field, as the antagonistic pleiotropy theory addresses the link between fitness (early life health and reproduction) and aging.
  
  The authors addressed the remarks on the previous version very well. Addressing the two points below would further increase the quality of the manuscript.
  
  (1) In the previous version the authors mentioned that their results are also consistent with the disposable soma theory: "These results are also consistent with the disposable soma theory that suggests aging as an outcome tradeoff between an organism's investment in reproduction and somatic maintenance and repair."
  
  Although the antagonistic pleiotropy and disposable soma theories describe different mechanisms, both provide frameworks for understanding how genes linked to fertility influence health. The antagonistic pleiotropy theory posits that genes enhancing fertility early in life may have detrimental effects later. In contrast, the disposable soma theory suggests that energy allocation involves a trade-off, where investment in fertility comes at the expense of somatic maintenance, potentially leading to poorer health in later life.
  
  To strengthen the manuscript, a discussion section should be added to clarify the overlap and distinctions between these two evolutionary theories and suggest directions for future research in disentangling their specific mechanisms.
  
  Thank you for your suggestions to clarify the overlap and distinctions between the antagonistic pleiotropy and disposable soma theories. While our primary focus is on the antagonistic pleiotropy framework, we acknowledge that the disposable soma theory also provides a relevant perspective on the trade-offs between reproduction and somatic maintenance.
  
  To address this, we have expanded the discussion section to highlight how both theories contribute to our understanding of the relationship between fertility-related traits and aging-related health outcomes. We also suggested potential future research directions, such as integrating genetic data with biomarkers of somatic to further explore the mechanisms underlying these trade-offs (lines 213-223).
  
  (2) In response to the question why the authors did not include age at menopause in addition to the already included age at first child and age at menarche the following explanation was provided: "Our manuscript focuses on the antagonistic pleiotropy theory, which posits that inherent trade-off in natural selection, where genes beneficial for early survival and reproduction (like menarche and childbirth) may have costly consequences later. So, we only included age at menarche and age at first childbirth as exposures in our research."
  
  It remains, however, unclear why genes beneficial for early survival and reproduction would be reflected only in age at menarche and age at first childbirth, but not in age at menopause. While age at menarche marks the onset of fertility, age at menopause signifies its end. Since evolutionary selection acts directly until reproduction is no longer possible (though indirect evolutionary pressures persist beyond this point), the inclusion of additional fertility-related measures could have strengthened the analysis. A more detailed justification for focusing exclusively on age at menarche and first childbirth would enhance the clarity and rigor of the manuscript.
  
  Thank you for your question regarding the age at menopause in our analysis. Our decision was based on the theoretical framework of antagonistic pleiotropy, which emphasizes early-life reproductive advantages that may have trade-offs later in life. Age at menarche and age at first childbirth are direct markers of early reproductive investment, which align closely with this framework.
  
  While age at menopause marks the cessation of reproductive capability, its evolutionary role is distinct. The selective pressures acting on menopause are complex and may involve post-reproductive contributions rather than direct reproductive fitness benefits. Moreover, the genetic architecture of menopause may be influenced by different biological pathways compared to early reproductive traits.
  
  Nonetheless, we acknowledge that including age at menopause could provide additional insights into reproductive aging. Several papers1,2 were already published regarding age at menopause and age-related outcomes, including diabetes, AD, osteoporosis, cancers, and cardiovascular diseases.
  
  Reviewing Editor (Recommendations for the authors):
  
  Above/below you will find the remaining comments from the reviewers. One of the main issues remaining is that some of the data seems to be incorrectly analysed and some of the findings may not be correct. To clarify this a lot more, I asked the reviewer for some details and received the following:
  
  - In Figure 1B one of their main outcomes is "age of menopause", but they report the data as an odds ratio. This is not correct and should be fixed (it seems the authors can run the right analysis, but just reported it with the wrong heading in the figure). This likely also applies to the outcome "facial aging". Also the heading in Figure 1A should be Beta instead of OR.
  
  We have updated the figures to ensure that the beta values of continuous outcomes and odds ratio values of categorical outcomes are presented in Figure 1.
  
  - With essential hypertension, GAD and cervical cancer, the estimates are so small that they need to re-review their results. The current MR analysis is not sufficiently powered to have such small confidence intervals. Essential hypertension was based on data from UK biobank, although I was also unable to find what program was used to generate the GWAS results, I have strong thoughts this was also BOLT-LLM. Same for cervical cancer. Both datasets used familial-related samples, so they are very likely derived with BOLT-LLM.
  
  I hope this will help to solve this issue.
  
  Based on published paper, gastrointestinal or abdominal disease (GAD) (GWAS ID: ebi-a-GCST90038597) is after BOLT-LLM. Based on MRC IEU UK Biobank GWAS pipeline, version 1 and 2, essential hypertension (GWAS ID: ukb-b-12493) and cervical cancer (GWAS ID: ukb-b-8777) are after BOLT-LLM. We have updated the MR analysis results and figures (lines 65-89, Figure 1, Tables S3-S4) as well as the following IPA analysis (lines 106-162 and 255-280, Figures 2-3).
  
  (1) Magnus, M. C., Borges, M. C., Fraser, A. & Lawlor, D. A. Identifying potential causal effects of age at menopause: a Mendelian randomization phenome-wide association study. Eur J Epidemiol 37, 971-982 (2022). https://doi.org:10.1007/s10654-022-00903-3
  
  (2) Zhang, X., Huangfu, Z. & Wang, S. Review of mendelian randomization studies on age at natural menopause. Front Endocrinol (Lausanne) 14, 1234324 (2023). https://doi.org:10.3389/fendo.2023.1234324
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

medrxiv.org/content/10.1101/2024.09.23.24314197v3
www.biorxiv.org www.biorxiv.org

CRISPR-Edited DPSCs, Constitutively Expressing BDNF Enhance Dentin Regeneration in Injured Teeth

2
1. Public_Reviews 06 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  This study on the effect of the trophic factor BDNF upon dental cells is an understudied subject that is relevant to dental regeneration and repair. Given that the topic is new and has not been covered previously, the report is a useful foray into a new area of investigation, although several experimental results could be strengthened. The connection of BDNF and dental health is a solid attempt in potentially translating trophic factor signaling clinically, which has been stymied in past efforts.
  
  Summary
2. Public_Reviews 06 Jun 2025
  
  in eLife
  
  Joint Public Review:
  
  This work employs both in vitro and in vivo methods to investigate the contribution of BDNF/TrkB signaling to enhancing differentiation and dentin-repair capabilities of dental pulp stem cells in the context of exposure to a variety of inflammatory cytokines. A particular emphasis of the approach is employment of dental pulp stem cells in which BDNF expression has been enhanced using CRISPR technology. Transplantation of such cells are proposed to improve dentin regeneration in a mouse model of tooth decay. The study provides several interesting findings, including demonstrating that exposure to several cytokines/inflammatory agents increases the quantity of activated phospho-Trk B in dental pulp stem. One issue that was not covered is the involvement of the p75 neurotrophin receptor which is also highly sensitive to inflammation and injury. The conclusions could be further augmented by demonstrating the specificity of the antibodies via immunoblot methods, both in the presence and absence of BDNF and other neurotrophins, NT-3 and NT-4, which can also bind to the TrkB receptor.
  
  Review 1
Visit annotations in context

Tags

Review 1

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.12.11.627879v2
www.biorxiv.org www.biorxiv.org

Characterization of binding kinetics and intracellular signaling of new psychoactive substances targeting cannabinoid receptor using transition-based reweighting method

4
1. Public_Reviews 06 Jun 2025
 
 in eLife
 
 eLife Assessment
 
 A combination of molecular dynamics simulation and state-of-the-art statistical post-processing techniques provided valuable insight into GPCR-ligand dynamics. This manuscript provides solid evidence for differences in the binding/unbinding of classical cannabinoid drugs from new psychoactive substances. The results could aid in mitigating the public health threat these drugs pose.
 
 Summary
2. Public_Reviews 06 Jun 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 This manuscript presents insights into biased signaling in GPCRs, namely cannabinoid receptors. Biased signaling is of broad interest in general, and cannabinoid signaling is particular relevant for understanding the impact of new drugs that target this receptor. Mechanistic insight from work like this could enable new approaches to mitigate the public health impact of new psychoactive drugs. Towards that end, this manuscript seeks to understand how new psychoactive substances (NPS, e.g. MDMB-FUBINACA) elicit more signaling through β-arrestin than classical cannabinoids (e.g. HU-210). The authors use an interesting combination of simulations and machine learning.
 
 The caption for Figure 3 doesn't explain the color scheme, so its not obvious what the start and end states of the ligand are.
 
 For the metadynamics simulations were multiple Gaussian heights/widths tried to see what, if any, impact that has on the unbinding pathway? That would be useful to help ensure all the relevant pathways were explored.
 
 It would be nice to acknowledge previous applications of metadynamics+MSMs and (separately) TRAM, such as Simulation of spontaneous G protein activation... (Sun et al. eLife 2018) and Estimation of binding rates and affinities... (Ge and Voelz JCP 2022).
 
 What is KL divergence analysis between macrostates? I know KL divergence compares probability distributions, but its not clear what distributions are being compared.
 
 I suggest being more careful with the language of universality. It can be "supported" but "showing" or "proving" its universal would require looking at all possible chemicals in the class.
 
 Comments on revisions:
 
 The authors provided appropriate responses to the comments above.
 
 Review 1
3. Public_Reviews 06 Jun 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The investigation provides a computational as well as biochemical insights into the (un)binding mechanisms of a pair of psychoactive substances into cannabinoid receptors. A combination of molecular dynamics simulation and a set of state-of-the art statistical post-processing techniques were employed to exploit GPCR-ligand dynamics.
 
 Strengths:
 
 The strength of the manuscript lies in usage and comparison of TRAM as well as Markov state modelling (MSM) for investigating ligand binding kinetics and thermodynamics. Usually MSMs have been more commonly used for this purpose. But as the authors have pointed out, implicit in the usage of MSMs lie the assumption of detailed balance, which would not hold true for many cases especially those with skewed binding affinities. In this regard, the author's usage of TRAM which harnesses both biased and unbiased simulations for extracting the same, provides a more appropriate way-out.
 
 Weaknesses:
 
 (1) While the authors have used TRAM (by citing MSM to be inadequate in these cases), the thermodynamic comparisons of both techniques provide similar values. In this case, one would wonder what advantage TRAM would hold in this particular case.
 
 (2) The initiation of unbiased simulations from previously run biased metadynamics simulations would almost surely introduce hysteresis in the analysis. The authors need to address these issues.
 
 (3) The choice of ligands in the current work seems very forced and none of the results compare directly with any experimental data. An ideal case would have been to use the seminal D.E. Shaw research paper on GPCR/ligand binding as a benchmark and then show how TRAM, using much lesser biased simulation times, would fare against the experimental kinetics or even unbiased simulated kinetics of the previous report
 
 (4) The method section of the manuscript seems to suggest all the simulations were started from a docked structure. This casts doubt on the reliability of the kinetics derived from these simulations that were spawned from docked structure, instead of any crystallographic pose. Ideally, the authors should have been more careful in choosing the ligands in this work based on the availability of the crystallographic structures.
 
 (5) The last part of using a machine learning-based approach to analyse allosteric interaction seems to be very much forced, as there are numerous distance-based more traditional precedent analyses that do a fair job of identifying an allosteric job.
 
 (6) While getting busy with the methodological details of TRAM vs MSM, the manuscript fails to share with sufficient clairty what the distinctive features of two ligand binding mechanisms are.
 
 Comments on revisions:
 
 The authors have addressed most of the queries of the reviewer in an adequate manner. However, The current code availability section just provides the link to Python files to generate the plots. It is not very useful in its current form. The code availability section should provide a proper GitHub page that shows the usage of TRAM for the readers to execute. While Pyemma has been cited for TRAM, a python note book to reproduce the TRAM would be very instructive.
 
 Review 2
4. Public_Reviews 06 Jun 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews
 
 Public Reviews:
 
 Reviewer #1 (Public Review):
 
 This manuscript presents insights into biased signaling in GPCRs, namely cannabinoid receptors. Biased signaling is of broad interest in general, and cannabinoid signaling is particularly relevant for understanding the impact of new drugs that target this receptor. Mechanistic insight from work like this could enable new approaches to mitigate the public health impact of new psychoactive drugs. Towards that end, this manuscript seeks to understand how new psychoactive substances (NPS, e.g. MDMB-FUBINACA) elicit more signaling through βarrestin than classical cannabinoids (e.g. HU-210). The authors use an interesting combination of simulations and machine learning.
 
 We thank the reviewer for the comments. We have provided point by point response to the reviewer’s comment below and incorporated the suggestions in our revised manuscript. Modified parts of manuscripts are highlighted in yellow.
 
 Comments:
 
 (1) The caption for Figure 3 doesn't explain the color scheme, so it's not obvious what the start and end states of the ligand are.
 
 We thank the reviewer to point this out. We have added the color scheme in the figure caption.
 
 (2) For the metadynamics simulations were multiple Gaussian heights/widths tried to see what, if any, impact that has on the unbinding pathway? That would be useful to help ensure all the relevant pathways were explored.
 
 We thank the reviewer for the suggestion. We agree with the reviewer that gaussian height/width may impact unbinding pathway. However, we like to point out that we used a well-tempered version of the metadynamics. In well-tempered metadynamics, the effective gaussian height decreases as bias deposition progresses. Therefore, we believe that the gaussian height/width should have minimal impact on the unbinding pathway. To address the reviewer's suggestion, we conducted additional well-tempered metadynamics simulations varying key parameters such as bias height, bias factor, and the deposition rate, all of which can influence the sampling space. Parameter values for bias height, bias factor and deposition rate that we originally used in the paper are 0.4 kcal/mol, 15 and 1/5 ps-1, respectively. We explored different values for these parameters and projected the sampled space on top of previously sampled region (Figure S4). We observed that new simulations sample similar unbinding pathway in the extracellular direction and discover similar space in the binding pocket as well.
 
 Results and Discussion (Page 10)
 
 “We also performed unbinding simulations using well-tempered metadynamics parameters (bias height, bias deposition rate and bias factor) to confirm the existence of alternative pathways (Figure S4). However, the simulations show that ligands follow the similar pathway for all
 
 metadynamics runs.”
 
 (3) It would be nice to acknowledge previous applications of metadynamics+MSMs and (separately) TRAM, such as the Simulation of spontaneous G protein activation... (Sun et al. eLife 2018) and Estimation of binding rates and affinities... (Ge and Voelz JCP 2022).
 
 We appreciate the reviewer's feedback. We have incorporated additional citations of studies demonstrating the use of TRAM as an estimator for both kinetics and thermodynamics (e.g. Ligand binding: Ge, Y. and Voelz, V.A., JCP, 2022[1]; Peptide-protein binding kinetics: Paul, F. et al., Nat. Commun., 2017[2], Ge, Y. et al., JCIM, 2021[3]). Additionally, we have included references to studies where biased simulations were initially used to explore the conformational space, and the results were then employed to seed unbiased simulations for building a Markov state model. (Metadynamics: Sun, X. et al., elife, 2018[4]; Umbrella Sampling: Abella, J. R. et al., PNAS, 2020[5]; Replica Exchange: Paul, F. et al., Nat. Commun., 2017[2]).
 
 (4) What is KL divergence analysis between macrostates? I know KL divergence compares probability distributions, but it is not clear what distributions are being compared.
 
 We apologize for this confusion. The KL divergence analysis was performed on the probability distributions of the inverse distances between residue pairs from any two macrostates. Each macrostate was represented by 1000 frames that were selected proportional to the TRAM stationary density. All possible pair-wise inverse distances were calculated per frame for the purpose of these calculations. Although KL divergence is inherently asymmetric, we symmetrized the measurement by calculating the average. Per-residue K-L divergence, which is shown in the main figures as color and thickness gradient, was calculated by taking the sum of all pairs corresponding to the residue. We have included a detailed discussion of K-L divergence in Methods section. We have also modified the result section to add a brief discussion of K-L divergence methodology.
 
 Results and Discussion (Page 15)
 
 “We further performed Kullback-Leibler divergence (K-L divergence) analysis between inverse distance of residue pairs of two macrostates to highlight the protein region that undergoes high conformational change with ligand movement.”
 
 Methods (Page 33)
 
 “Kullback–Leibler divergence (K-L divergence) analysis was performed to show the structural differences in protein conformations in different macrostates[4,114] . In this study, this technique was used to calculate the difference in the pairwise inverse distance distributions between macrostates. Each macrostate was represented by 1000 frames that were selected proportional to their TRAM weighted probabilities. Although K-L divergence is an asymmetric measurement, for this study, we used a symmetric version of the K-L divergence by taking the average between two macrostates. Per residue contribution of K-L divergence was calculated by taking the sum of all the pairwise distances corresponding to that residue. This analysis was performed by inhouse Python code.”
 
 (5) I suggest being more careful with the language of universality. It can be "supported" but "showing" or "proving" its universal would require looking at all possible chemicals in the class.
 
 We thank the reviewer for the suggestion. In response, we have revised the manuscript to ensure that the language reflects that our findings are based on observations from a limited set of ligands, namely one NPS and one classical cannabinoid. We have replaced references to ligand groups (such as NPS or classical cannabinoid) with the specific ligand names (such as MDMB-FUBINACA or HU-210) to avoid claims of universality and prevent any potential confusion.
 
 Results and Discussion (Page 19)
 
 “In this work, we trained the network with the NPS (MDMB-FUBINACA), and classical cannabinoid (HU-210) bound unbiased trajectories (Method Section). Here, we compared the allosteric interaction weights between the binding pocket and the NPxxY motif which involves in triad interaction formation. Results show that each binding pocket residue in MDMBFUBINACA bound ensemble shows higher allosteric weights with the NPxxY motif, indicating larger dynamic interactions between the NPxxY motif and binding pocket residues(Figure S9). The probability of triad formation was estimated to observe the effect of the difference in allosteric control. TRAM weighted probability calculation showed that MDMB-FUBINACA bound CB1 has the higher probability of triad formation (Figure 8A). Comparison of the pairwise interaction of the triad residues shows that interaction between Y3977.53-T2103.46 is relatively more stable in case of MDMB-FUBINACA bound CB1, while other two inter- actions have similar behavior for both systems (Figures S10A, S10B, and S10C). Therefore, higher interaction between Y3977.53 and T2103.46 in MDMB-FUBINACA bound receptor causes the triad interaction to be more probable.
 
 Furthermore, we also compared TM6 movement for both ligand bound ensemble which is another activation metric involved in both G-protein and β-arrestin binding. Comparison of TM6 distance from the DRY motif of TM3 shows similar distribution for HU-210 and MDMBFUBINACA (Figure 8B). These observations support that NPS binding causes higher β-arrestin signaling by allosterically controlling triad interaction formation.”
 
 Reviewer #2 (Public Review):
 
 Summary:
 
 The investigation provides computational as well as biochemical insights into the (un)binding mechanisms of a pair of psychoactive substances into cannabinoid receptors. A combination of molecular dynamics simulation and a set of state-of-the art statistical post-processing techniques were employed to exploit GPCR-ligand dynamics.
 
 Strengths:
 
 The strength of the manuscript lies in the usage and comparison of TRAM as well as Markov state modelling (MSM) for investigating ligand binding kinetics and thermodynamics. Usually, MSMs have been more commonly used for this purpose. But as the authors have pointed out, implicit in the usage of MSMs lies the assumption of detailed balance, which would not hold true for many cases especially those with skewed binding affinities. In this regard, the author's usage of TRAM which harnesses both biased and unbiased simulations for extracting the same, provides a more appropriate way out.
 
 Weaknesses:
 
 (1) While the authors have used TRAM (by citing MSM to be inadequate in these cases), the thermodynamic comparisons of both techniques provide similar values. In this case, one would wonder what advantage TRAM would hold in this particular case.
 
 We thank the reviewer for the comment. While we agree that the thermodynamic comparisons between MSM and TRAM provide similar values in this instance, we would like to emphasize the underlying reasoning behind our choice of TRAM.
 
 MSM can struggle to accurately estimate thermodynamic and kinetic properties in cases where local state reversibility (detailed balance) is not easily achieved with unbiased sampling. This is especially relevant in ligand unbinding processes, which often involve overcoming high free energy barriers. TRAM, by incorporating biased simulation data (such as umbrella sampling) in addition to unbiased data, can better achieve local reversibility and provide more robust estimates when unbiased sampling is insufficient.
 
 The similarity in thermodynamic estimates between MSM and TRAM in our study can be attributed to the relatively long unbiased sampling period (> 100 µs) employed. With sufficient sampling, MSM can approach detailed balance, leading to results comparable to those from TRAM. However, as we demonstrated in our manuscript (Figure 4D), when the amount of unbiased sampling is reduced, the uncertainties in both the thermodynamics and kinetics estimates increase significantly for MSM compared to TRAM. Thus, while MSM and TRAM perform similarly under the conditions of extensive sampling, TRAM's advantage lies in its robustness when unbiased sampling is limited or difficult to achieve.
 
 (2) The initiation of unbiased simulations from previously run biased metadynamics simulations would almost surely introduce hysteresis in the analysis. The authors need to address these issues.
 
 We thank the reviewer for the comment. We acknowledge that biased simulations could potentially introduce hysteresis or result in the identification of unphysical pathways. However, we believe this issue is mitigated using well-tempered metadynamics, which gradually deposit a decaying bias. This approach enables the simulation to explore orthogonal directions of collective variable (CV) space, reducing the likelihood of hysteresis effects(Invernizzi, M. and Parrinello, M., JCTC, 2019[6]).
 
 Furthermore, there is precedent for using metadynamics-derived pathways to initiate unbiased simulations for constructing Markov State Models (MSMs). This methodology has been successfully applied in studying G-protein activation (Sun, X. et al., elife, 2018[4]).
 
 Additional support to our observation can be found in two independent binding/unbinding studies of ligands from cannabinoid receptors, which have discovered similar pathway using different CVs (Saleh, et al., Angew. Chem., 2018[7]; Hua, T. et al., Cell, 2020[8]).
 
 (3) The choice of ligands in the current work seems very forced and none of the results compare directly with any experimental data. An ideal case would have been to use the seminal D.E. Shaw research paper on GPCR/ligand binding as a benchmark and then show how TRAM, using much lesser biased simulation times, would fare against the experimental kinetics or even unbiased simulated kinetics of the previous report
 
 We would like to address the reviewer's concerns regarding the choice of ligands, lack of direct experimental comparison, and the use of TRAM, and clarify our rationale point by point:
 
 Ligand Choice: The ligands selected for this study were chosen due to their relevance and well characterized binding properties. MDMB-FUBINACA is well-known NPS ligand with documented binding properties. This ligand is still the only NPS ligand with experimentally determined CB1 bound structure (Krishna Kumar, K. et al., Cell, 2019[9]). Similarly, the classical cannabinoid (HU-210) used in this study has established binding characteristics and is one of earliest known synthetic classical cannabinoid. Therefore, these ligands serve as representative compounds within their respective categories, making them suitable for our comparative analysis.
 
 Experimental Comparison: We have indeed compared our simulation results to experimental data, particularly focusing on binding free energies. In the result section, we have shown that the relative binding free energy estimated from our simulation aligns closely with the experimentally measured values. Additionally, Absolute binding energy estimates are also within ~3 kcal/mol of the experimentally predicted value.
 
 TRAM Performance: TRAM estimated free energies, and rates have been benchmarked against experimental predictions for various studies along with our study (Peptide-protein binding: Paul, F. et al., Nat. Commun., 2017[2]; Ligand unbinding: Wu, H. et al., PNAS, 2016[10]) . As the primary goal of this study is to compare ligand unbinding mechanism, we believe benchmarking against other datasets, such as the D.E. Shaw GPCR/ligand binding paper, is not essential for this work.
 
 (4) The method section of the manuscript seems to suggest all the simulations were started from a docked structure. This casts doubt on the reliability of the kinetics derived from these simulations that were spawned from docked structure, instead of any crystallographic pose. Ideally, the authors should have been more careful in choosing the ligands in this work based on the availability of the crystallographic structures.
 
 We thank the reviewer for the comment. We would like to clarify that we indeed used an experimentally derived pose for one of the ligands (MDMB-FUBINACA) as the cryo-EM structure of MDMB-FUBINACA bound to the protein was available (PDB ID: 6N4B) (Krishna Kumar K. et al., Cell, 2019[9]). However, as the cryo-EM structure had missing loops, we modeled these regions using Rosetta. We apologize for this confusion and have modified our method section to make this point clearer.
 
 Regarding HU-210, we acknowledge that a crystallographic or cryo-EM structure for this specific ligand was not available. We selected HU-210 because it is most commonly used example of classical cannabinoid in the literature with extensively studied thermodynamic properties. Importantly, our docking results for HU-210 align closely with previously experimentally determined poses for other classical cannabinoids (Figure S11) and replicate key polar interactions, such as those with S3837.39, which are characteristic of this class of compounds.
 
 System Preparation (Page 22)
 
 “Modeling of this membrane proximal region was also performed Remodel protocol of Rosetta loop modeling. A distance constraint is added during this modeling step between C98N−term and C107N−term to create the disulfide bond between the residues. [74,76]
 
 As the cryo-EM structure of MDMB-FUBINACA was known, ligand coordinate of MDMB- FUBINACA was added to the modeled PDB structure. The “Ligand Reader & Modeler” module of CHARMM-GUI was used for ligand (e.g., MDMB-Fubinaca) parameterization using CHARMM General Force Field (CGenFF).[77]”
 
 (5) The last part of using a machine learning-based approach to analyze allosteric interaction seems to be very much forced, as there are numerous distance-based more traditional precedent analyses that do a fair job of identifying an allosteric job.
 
 We thank the reviewer for the valuable comment. Neural relational inference method, which leverages a VAE (Variational Autoencoder) architecture, attempts to reconstruct the conformation (X) at time t + τ based on the conformation at time t. In doing so, it captures the non-linear dynamic correlations between residues in the VAE latent space. We chose this method because it is not reliant on specific metrics such as distance or angle, making it potentially more robust in predicting allosteric effects between the binding pocket residues and the NPxxY motif.
 
 In response to the reviewer's suggestion, we have also performed a more traditional allosteric analysis by calculating the mutual information between the binding pocket residues and the NPxxY motif. Mutual information was computed based on the backbone dihedral angles, as this provides a metric that is independent of the relative distances between residues. Our results indicate that the mutual information between the binding pocket residues and the NPxxY motif is indeed higher for the NPS binding simulation (Figure S11).
 
 Method
 
 Mutual information calculation
 
 Mutual information was calculated on same trajectory data as NRI analysis. Python package MDEntropy was used for estimating mutual information between backbone dihedral angles of two residues.
 
 Results and Discussion (Page 21)
 
 “To further validate our observations, we estimated allosteric weights between the binding pocket and the NPxxY motif by calculating mutual information between residue movements. Mutual information analysis reaffirms that allosteric weights between these residues are indeed higher for the MDMB-FUBINACA bound ensemble (Figure S11).”
 
 Mutual Information Estimation (Page 37)
 
 “Mutual information between dynamics of residue pairs was computed based on the backbone dihedral angles, as this provides a metric that is independent of the relative distances between residues. The calculations were done on same trajectory data as NRI analysis. Python package MDEntropy was used for estimating mutual information between backbone dihedral angles of two residues.[124]”
 
 (6) While getting busy with the methodological details of TRAM vs MSM, the manuscript fails to share with sufficient clarity what the distinctive features of two ligand binding mechanisms are.
 
 We thank the reviewer for the insightful comment. In the manuscript, we discussed that the overall ligand (un)binding pathways are indeed similar for both ligands. Therefore, they interact with similar residues during the unbinding process. However, we have focused on two key differences in unbinding mechanism between the two ligands:
 
 (1) MDMB-FUBINACA exhibits two distinct unbinding mechanisms. In one, the linked portion of the ligand exits the receptor first. In the other mechanism, the ligand rotates within the pocket, allowing the tail portion to exit first. By contrast, for HU-210, we observe only a single unbinding mechanism, where the benzopyran ring leads the ligand out of the receptor. We have highlighted these differences in the Figure 6 and 7 and talked about the intermediate states appear along these different unbinding mechanisms. For further clarification of these differences, we have added arrows in the free energy landscapes to highlight these distinct pathways.
 
 (2) In the bound state, a significant difference is observed in the interaction profiles. HU-210, a classical cannabinoid, forms strong polar interactions with TM7, while MDMB-FUBINACA shows weaker polar interactions with this region.
 
 We have discussed these differences in the Results and Discussion section (Page 13-18) & conclusion section (Page 23-24).
 
 Recommendations for the authors:
 
 Reviewer #2 (Recommendations For The Authors):
 
 (1) The authors should choose at least one case where the ligand's crystallographic pose is known and show how TRAM works in comparison to MSM or experimental report.
 
 We thank the reviewer for the comment. We have used the experimentally determined cryo-EM pose for one of the ligands (i.e. MDMB-FUBINACA). We have modified the manuscript to avoid confusion. (Please refer to the response of comment 4 of reviewer 2)
 
 (2) The authors should consider existing traditional methods that are used to detect allostery and compare their machine-learning-based approach to show its relevance.
 
 We appreciate the reviewer’s comment. We have performed the traditional analysis by calculating mutual information between residue dynamics. We have shown that the traditional analysis matches with Machine learning based NRI calculation. (Please refer to the response of comment 5 of reviewer 2)
 
 (3) Figure 3 doesn't provide a guide on the pathway of ligand. Without a proper arrow, it is difficult to surmise what is the start and end of the pathway. The figures should be improved.
 
 We appreciate the reviewer’s suggestion. In response, we have revised Figure 3 to clearly indicate the ligand’s unbinding pathway by adding directional arrows and labeling the bound pose. Additionally, we have updated the figure caption to better clarify the color scheme used in the illustration.
 
 (4) The Figure 5 presentation of free energetics has a very similar shape for the two ligands. More clarity is required on how these two ligands are different.
 
 We thank the reviewer for the comment. While the overall shapes of the free energy profiles for the two ligands are indeed similar, this is expected as both ligands dissociate from the same pocket and follow a comparable pathway. However, key differences in their unbinding mechanisms arise due to variations in the ligand motion within the pocket. Specifically, the intermediate metastable minima in the free energy landscapes reflect these differences. For instance, in the NPS unbinding free energy landscape, the intermediate metastable state I1 corresponds to a conformation where the NPS ligand maintains a polar interaction with TM7, while the tail of the ligand has shifted away from TM5. This intermediate state is absent in the classical cannabinoid unbinding pathway, where no equivalent conformation appears in the landscape.
 
 (6) Page 30: TICA is wrongly expressed as 'Time-independent component analysis'. It is not a time-independent process. Rather it is 'Time structured independent component analysis'.
 
 We thank the reviewer for pointing this out. TICA should be expressed as Time-lagged independent component analysis or Time-structure independent component analysis. We have used the first expression and modified the manuscript accordingly.
 
 (7) The manuscript's MSM theory part is quite well-known which can be removed and appropriate papers can be cited.
 
 We thank the reviewer for the comment. We have removed the theory discussion of MSM and cited relevant papers.
 
 “Markov State Model
 
 Markov state model (MSM) is used to estimate the thermodynamics and kinetics from the unbiased simulation.[56,91] MSM characterizes a dynamic process using the transition probability matrix and estimates its relevant thermodynamics and kinetic properties from the eigendecomposition of this matrix. This matrix is usually calculated using either maximum likelihood or Bayesian approach.[56,97] The prevalence of MSM as a post-processing technique for MD simulations was due to its reliance on only local equilibration of MD trajectories to predict the global equilibrium properties.[92,93] Hence, MSM can combine information from distinct short trajectories, which can only attain the local equilibrium.[94–96]
 
 The following steps are taken for the practical implementation of the MSM from the MD data. [4,17,98–100]”
 
 (8) A proper VAMP score-based analysis should be provided to show confidence in MSM's clustering metric and other hyperparameters.
 
 We thank the reviewer for the recommendation. VAMP-2 score based analysis had been discussed in the method section. We estimated VAMP-2 score of MSM built with different cluster number and input TIC dimensions (Figure S15). Model with best VAMP-2 was selected for comparison with TRAM result.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.09.29.560261v3
www.biorxiv.org www.biorxiv.org

Forward genetics in C. elegans reveals genetic adaptations to polyunsaturated fatty acid deficiency

3
1. Public_Reviews 06 Jun 2025
 
 in eLife
 
 eLife Assessment
 
 This fundamental study investigates the role of polyunsaturated fatty acids (PUFAs) in physiology and membrane biology, using a unique model to perform a thorough genetic screen that demonstrates that PUFA synthesis defects cannot be compensated for by mutations in other pathways. These findings are supported by compelling evidence from a high quality genetic screen, functional validation of their hits, and lipid analyses. This study will appeal to researchers in membrane biology, lipid metabolism, and C. elegans genetics.
 
 Summary
2. Public_Reviews 06 Jun 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 This study addresses the roles of polyunsaturated fatty acids (PUFAs) in animal physiology and membrane function. A C. elegans strain carrying the fat-2(wa17) mutation possess a very limited ability to synthesize PUFAs and there is no dietary input because the E. coli diet consumed by lab grown C. elegans does not contain any PUFAs. The fat-2 mutant strain was characterized to confirm that the worms grow slowly, have rigid membranes, and have a constitutive mitochondrial stress response. The authors showed that chemical treatments or mutations known to increase membrane fluidity did not rescue growth defects. A thorough genetic screen was performed to identify genetic changes to compensate for the lack of PUFAs. The newly isolated suppressor mutations that compensated for FAT-2 growth defects included intergenic suppressors in the fat-2 gene, as well as constitutive mutations in the hypoxia sensing pathway components EGL-9 and HIF-1, and loss of function mutations in ftn-2, a gene encoding the iron storage protein ferritin. Taken together, these mutations lead to the model that increased intracellular iron, an essential cofactor for fatty acid desaturases, allows the minimally functional FAT-2(wa17) enzyme to be more active, resulting in increased desaturation and increased PUFA synthesis.
 
 Strengths:
 
 (1) This study provides new information further characterizing fat-2 mutants. The authors measured increased rigidity of membranes compared to wild type worms, however this rigidity is not able to be rescued with other fluidity treatments such as detergent or mutants. Rescue was only achieved with polyunsaturated fatty acid supplementation. (2) A very thorough genetic suppressor screen was performed. In addition to some internal fat-2 compensatory mutations, the only changes in pathways identified that are capable of compensating for deficient PUFA synthesis was the hypoxia pathway and the iron storage protein ferritin. Suppressor mutations included an egl-9 mutation that constitutively activates HIF-1, and Gain of function mutations in hif-1 that are dominant. This increased activity of HIF conferred by specific egl-9 and hif-1 mutations lead to decreased expression of ftn-2. Indeed, loss of ftn-2 leads to higher intracellular iron. The increased iron apparently makes the FAT-2 fatty acid desaturase enzyme more active, allowing for the production of more PUFAs. (3) The mutations isolated in the suppressor screen show that the only mutations able to compensate for lack of PUFAs were ones that increased PUFA synthesis by the defective FAT-2 desaturase, thus demonstrating the essential need for PUFAs that cannot be overcome by changes in other pathways. This is a very novel study, taking advantage of genetic analysis of C. elegans, and it confirms the observations in humans that certain essential PUFAs are required for growth and development. (4) Overall, the paper is well written, and the experiments were carried out carefully and thoroughly. The conclusions are well supported by the results.
 
 Weaknesses:
 
 Overall, there are not many weaknesses. The main one I noticed is that the lipidomic analysis shown in Figs 3C, 7C, S1 and S3. Whie these data are an essential part of the analysis and provide strong evidence for the conclusions of the study, it is unfortunate that the methods used did not enable the distinction between two 18:1 isomers. These two isomers of 18:1 are important in C. elegans biology, because one is a substrate for FAT-2 (18:1n-9, oleic acid) and the other is not (18:1n-7, cis vaccenic acid). Although rarer in mammals, cis-vaccenic acid is the most abundant fatty acid in C. elegans and is likely the most important structural MUFA. The measurement of these two isomers is not essential for the conclusions of the study, but the manuscript should include a comment about the abundance of oleic vs vaccenic acid in C. elegans (authors can find this information, even in the fat-2 mutant, in other publications of C. elegans fatty acid composition). Otherwise, readers who are not familiar with C. elegans might assume the 18:1 that is reported is likely to be mainly oleic acid, as is common in mammals.
 
 Other suggestions to authors to improve the paper: (1) The title could be less specific; it might be confusing to readers to include the allele name in the title. (2) There are two errors in the pathway depicted in Figure 1A. The16:0-16:1 desaturation can be performed by FAT-5, FAT-6, and FAT-7. The 18:0-18:1 desaturation can only be performed by FAT-6 and FAT-7
 
 Review 1
3. Public_Reviews 06 Jun 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The authors use a genetic screen in C. elegans to investigate the physiological roles of polyunsaturated fatty acids (PUFAs). They screen for mutations that rescue fat-2 mutants, which have strong reductions in PUFAs. As a result, either mutations in fat-2 itself, or mutations in genes involved in the HIF-1 pathway, were found to rescue fat-2 mutants. Mutants in the HIF-1 pathway rescue fat-2 mutants by boosting its catalytic activity (via upregulated Fe2+). Thus, the authors show that in the context of fat-2 mutation, the sole genetic means to rescue PUFA insufficiency is to restore PUFA levels.
 
 Strengths:
 
 As C. elegans can produce PUFAs de novo as essential lipids, the genetic model is well suited to study the fundamental roles of PUFAs. The genetic screen finds mutations in convergent pathways, suggesting that it has reached near-saturation. The authors extensively validate the results of the screening and provide sufficient mechanistic insights to show how PUFA levels are restored in HIF-1 pathway mutants. As many of the mutations found to rescue fat-2 mutants are of gain-of-function, it is unlikely that similar discoveries could have been made with other approaches like genome-wide CRISPR screenings, making the current study distinctive. Consequently, the study provides important messages. First, it shows that PUFAs are essential for life. The inability to genetically rescue PUFA deficiency, except for mutations that restore PUFA levels, suggests that they have pleiotropic essential functions. In addition, the results suggest that the most essential functions of PUFAs are not in fluidity regulation, which is consistent with recent reviews proposing that the importance of unsaturation goes beyond fluidity (doi: 10.1016/j.tibs.2023.08.004 and doi: 10.1101/cshperspect.a041409). Thus, the study provides fundamental insights about how membrane lipid composition can be linked to biological functions.
 
 Weaknesses:
 
 The authors did a lot of efforts to answer the questions that arose through peer review, and now all the claims seem to be supported by experimental data. Thus, I do not see obvious weaknesses. Of course, it remains still unclear what PUFAs do beyond fluidity regulation, but this is something that cannot be answered from a single study. I just have one final proposition to make.
 
 I still do not agree with the answer to my previous comment 6 regarding Figure S2E. The authors claim that hif-1(et69) suppresses fat-2(wa17) in a ftn-2 null background (in Figure S2 legend for example). To claim so, they would need to compare the triple mutant with fat-2(wa17);ftn-2(ok404) and show some rescue. However, we see in Figure 5H that ftn-2(ok404) alone rescues fat-2(wa17). Thus, by comparing both figures, I see no additional effect of hif-1(et69) in an ftn-2(ok404) background. I actually think that this makes more sense, since the authors claim that hif-1(et69) is a gain-of-function mutation that acts through suppression of ftn-2 expression. Thus, I would expect that without ftn-2 from the beginning, hif-1(et69) does not have an additional effect, and this seems to be what we see from the data. Thus, I would suggest that the authors reformulate their claims regarding the effect of hif-1(et69) in the ftn-2(ok404) background, which seems to be absent (consistently with what one would expect).
 
 Review 2
Visit annotations in context

Tags

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.11.08.622646v2
www.biorxiv.org www.biorxiv.org

The oneirogen hypothesis: modeling the hallucinatory effects of classical psychedelics in terms of replay-dependent plasticity mechanisms

4
1. Public_Reviews 06 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  This paper provides a useful new theory of the hallucinatory effects of psychedelics. The authors present convincing evidence that a computational model trained with the Wake-Sleep algorithm can reproduce some features of hallucinations by varying the strength of top-down connections in the model, but discussion of the model's relationships to the psychedelics and sleep literatures is incomplete. The work will be of interest to researchers studying hallucinations or offline activity and learning more broadly.
  
  Summary
2. Public_Reviews 06 Jun 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Bredenberg et al. aim to model some of the visual and neural effects of psychedelics via the Wake-Sleep algorithm. This is an interesting study with findings that go against certain mainstream ideas in psychedelic neuroscience (that I largely agree with). I cannot speak to the math in this manuscript, but it seems like quite a conceptual leap to set a parameter of the model in between wake and sleep and state that this is a proxy to acute psychedelic effects (point #20). My other concerns below are related to the review of the psychedelic literature:
  
  (1) Page 1, Introduction, "...they are agonists for the 5-HT2a serotonin receptor commonly expressed on the apical dendrites of cortical pyramidal neurons..." It is a bit redundant to say "5-HT2A serotonin receptor," as serotonin is already captured by its abbreviation (i.e., 5-HT).
  
  While psychedelic research has focused on 5-HT2A expression on cortical pyramidal cells, note that the 5-HT2A receptor is also expressed on interneurons in the medial temporal lobe (entorhinal cortex, hippocampus, and amygdala) with some estimates being >50% of these neurons (https://doi.org/10.1016/j.brainresbull.2011.11.006, https://doi.org/10.1007/s00221-013-3512-6, https://doi.org/10.7554/eLife.66960, https://doi.org/10.1016/j.mcn.2008.07.005, https://doi.org/10.1038/npp.2008.71, https://doi.org/10.1038/s41386-023-01744-8, https://doi.org/10.1016/j.brainres.2004.03.016, https://doi.org/10.1016/S0022-3565(24)37472-5, https://doi.org/10.1002/hipo.22611, https://doi.org/10.1016/j.neuron.2024.08.016). However, with ~1:4 ratio of inhibitory to excitatory neurons in the brain (https://doi.org/10.1101/2024.09.24.614724), this can make it seem as if 5-HT2A expression is negligible in the MTL. I think it might be important to mention these receptors, as this manuscript discusses replay.
  
  I see now that Figure 1 mentions that PV cells also express 5-HT2A receptors. This should probably be mentioned earlier.
  
  (2) Page 1, Introduction, "They have further been used for millennia as medicine and in religious rituals..." This might be a romanticization of psychedelics and indigenous groups, as anthropological evidence suggests that intentional psychedelic use might actually be more recent (see work by Manvir Singh and Andy Letcher).
  
  (3) When discussing oneirogens, it could be worth differentiating psychedelics from kappa opioid agonists such as ibogaine and salvinorin A, another class of hallucinogens that some refer to as "oneirogens" (similar to how "psychedelic" is the colloquial term for 5-HT2A agonists). Note that studies have found the effects of Salvia divinorum (which contains salvinorin A) to be described more similarly to dreams than psychedelics (https://doi.org/10.1007/s00213-011-2470-6). This makes me wonder why the present study is more applicable to 5-HT2A psychedelics than other kappa opioid agonists or other classes of hallucinogens (e.g., NMDA antagonists, muscarinic antagonists, GABAA agonists).
  
  (4) Page 2, Introduction, "Replay sequences have been shown to be important for learning during sleep [14, 15, 16, 17, 18]: we propose that mechanisms supporting replay-dependent learning during sleep are key to explaining the increases in plasticity caused by psychedelic drug administration." I'm not sure I follow the logic of this point. Dreams happen during REM sleep, whereas replay is most prominent during non-REM sleep. Moreover, while it's not clear what psychedelics do to hippocampal function, most evidence would suggest they impair it. As mentioned, most 5-HT2A receptors in the hippocampus seem to be on inhibitory neurons, and human and animal work finds that psychedelics impair hippocampal-dependent memory encoding (https://doi.org/10.1037/rev0000455, https://doi.org/10.1037/rev0000455, https://doi.org/10.3389/fnbeh.2014.00180, https://doi.org/10.1002/hipo.22712). One study even found that psilocin impairs hippocampal-dependent memory retrieval (https://doi.org/10.3389/fnbeh.2014.00180). Note that this is all in reference to the acute effects (psychedelics may post-acutely enhance hippocampal-dependent memory, https://doi.org/10.1007/s40265-024-02106-4).
  
  (5) Page 2, Introduction, "In total, our model of the functional effect of psychedelics on pyramidal neurons could provide a explanation for the perceptual psychedelic experience in terms of learning mechanisms for consolidation during sleep..." In contrast to my previous point, I think this could be possible. Three datasets have found that psychedelics may enhance cortical-dependent memory encoding (i.e., familiarity; https://doi.org/10.1037/rev0000455, https://doi.org/10.1037/rev0000455), and two studies found that post-encoding administration of psychedelics retroactively enhanced memory that may be less hippocampal-dependent/more cortical-dependent (https://doi.org/10.1016/j.neuropharm.2012.06.007, https://doi.org/10.1016/j.euroneuro.2022.01.114). Moreover, and as mentioned below, 5 studies have found decoupling between the hippocampus and the cortex (https://doi.org/10.3389/fnhum.2014.00020, https://doi.org/10.1002/hbm.22833, https://doi.org/10.1016/j.celrep.2021.109714, https://doi.org/10.1162/netn_a_00349, https://doi.org/10.1038/s41586-024-07624-5), something potentially also observed during REM sleep that is thought to support consolidation (https://doi.org/10.1073/pnas.2123432119). These findings should probably be discussed.
  
  (6) Page 2, Introduction, "In this work, we show that within a neural network trained via Wake-Sleep, it is possible to model the action of classical psychedelics (i.e. 5-HT2a receptor agonism)..." Note that 5-HT2A agonism alone is not sufficient to explain the effects of psychedelics, given that there are 5-HT2A agonists that are non-hallucinogenic (e.g., lisuride).
  
  (7) Page 2, Introduction, "...by shifting the balance during the wake state from the bottom-up pathways to the top-down pathways, thereby making the 'wake' network states more 'dream-like'." I could have included this in the previous point, but I felt that this idea deserved its own point. There has been a rather dogmatic assertion that psychedelics diminish top-down processing and/or enhance bottom-up processing, and I appreciate that the authors have not accepted this as fact. However, because this is an unfortunately prominent idea, I think it ought to be fleshed out more by first mentioning that it's one of the tenets of REBUS. REBUS has become a popular model of psychedelic drug action, but it's largely unfalsifiable (it's based on two unfalsifiable models, predictive processing and integrated information theory), so the findings from this study could tighten it up a bit. Second, there have now been a handful of studies that have attempted to study directionality in information flow under psychedelics, and the findings are rather mixed including increased bottom-up/decreased top-down effects (https://doi.org/10.7554/eLife.59784, https://doi.org/10.1073/pnas.1815129116; note that the latter "bottom-up" effect involves subcortical-cortical connections in which it's less clear what's actually "higher-/lower-level"), increased top-down/decreased bottom-up effects (https://doi.org/10.1038/s41380-024-02632-3, https://doi.org/10.1016/j.euroneuro.2016.03.018), or both (https://doi.org/10.1016/j.neuroimage.2019.116462, https://doi.org/10.1016/j.neuropharm.2017.10.039), though most of these studies are aggregating across largely inhomogeneous states (i.e., resting-state). Lastly, and somewhat problematically, facilitated top-down processing is also an idea proposed in psychosis that's based partially on findings with acute ketamine administration (note that all hallucinations to some degree might rely on top-down facilitation, as a hallucination involves a high-level concept that impinges on lower-level sensory areas; see work by Phil Corlett). While psychosis and the effects of ketamine have some similarities with psychedelics, there are certainly differences, and I think the goal of this manuscript is to uniquely describe 5-HT2A psychedelics (again, I'm left wondering why tweaking alpha in the Wake-Sleep algorithm is any more applicable to psychedelics than other hallucinogenic conditions).
  
  (8) Figure 2 equates alpha with a "psychedelic dose," but this is a bit misleading, as neither the algorithm nor an individual was administered a psychedelic. Alpha is instead a hypothetical proxy for a psychedelic dose. Moreover, if the model were recapitulating the effects of psychedelics, shouldn't these images look more psychedelic as alpha increases (e.g., they may look like images put through the DeepDream algorithm).
  
  (9) Page 11, Methods, "...and the gate α ensures that learning only occurs during sleep mode... The (1 − α) gate in this case ensures that plasticity only occurs during the Wake mode." Much of the math escapes me, so perhaps I'm misunderstanding these statements, but learning and plasticity certainly happen during both wake and sleep, making me wonder what is meant by these statements. Moreover, if plasticity is simply neural changes, couldn't plasticity be synonymous with neural learning? Perhaps plasticity and learning are meant to refer to different types of neural changes. It might be worth clarifying this, as a general problem in psychedelic research is that psychedelics are described as facilitating plasticity when brains are changing at every moment (hence not experiencing every moment as the same), and psychedelics don't impact all forms of plasticity equally. For example, psychedelics may not necessarily enhance neurogenesis or the addition of certain receptor types, and they impair certain forms of learning (i.e., episodic memory encoding). What is typically meant by plasticity enhancements induced by psychedelics (and where there's the most evidence) is dendritic plasticity (i.e., the growth of dendrites and spines). Whatever is meant by "plasticity" should be clarified in its first instance in this manuscript.
  
  (10) Page 12, Methods, "During training, neural network activity is either dominated entirely by bottom-up inputs (Wake, α = 0) or by top-down inputs (Sleep, α = 1)." Again, I could be misunderstanding the mathematical formulation, but top-down inputs operate during wake, and bottom-up inputs can operate during sleep (people can wake up or even incorporate noise from their environments into sleep.
  
  (11) Page 4, Results, "Thus, we can capture the core idea behind the oneirogen hypothesis using the Wake-Sleep algorithm, by postulating that the bottom-up basal synapses are predominantly driving neural activity during the Wake phase (when α is low)." However, several pieces of evidence (and the first circuit model of psychedelic drug action) suggest that psychedelics enhance functional connectivity and potentially even effective connectivity from the thalamus to the cortex (https://doi.org/10.1093/brain/awab406). Note that psychedelics may not equally impact all subcortical structures. REBUS proposes the opposite of the current study, that psychedelics facilitate bottom-up information flow, with one of the few explicit predictions being that psychedelics should facilitate information flow from the hippocampus to the default mode network. However, as mentioned earlier, 5 studies have found that psychedelics diminish functional connectivity between the hippocampus and cortex (including the DMN but also V1).
  
  (12) Page 4, Results, "...and have an excitatory effect that positively modulates glutamatergic transmission..." Note that this may not be brainwide. While psychedelics were found to increase glutamatergic transmission in the cortex, they were also found to decrease hippocampal glutamate (consistent with inhibition of the hippocampus, https://doi.org/10.1038/s41386-020-0718-8).
  
  (13) Page 5, "...which are similar to the 'breathing' and 'rippling' phenomena reported by psychedelic drug users at low doses..." Although it's sometimes unclear what is meant by "low doses," the breathing/rippling effect of psychedelics occurs at moderate and high doses as well.
  
  (14) I watched the videos, and it's hard for me to say there was some stark resemblance to psychedelic imagery. In contrast, for example, when the DeepDream algorithm came out, it did seem to capture something quite psychedelic.
  
  (15) Page 5, "This form of strongly correlated tuning has been observed in both cortex and the hippocampus." If this has been observed under non-psychedelic conditions, what does this tell us about this supposed model of psychedelics?
  
  (16) Page 6, with regards to neural variability, "...but whether this phenomenon [increased variability] is general across tasks and cortical areas remains to be seen." First, is variability here measured as variance? In fMRI datasets that have been used to support the Entropic Brain Hypothesis, note that variance tends to decrease, though certain measures of entropy increase (e.g., Figure 4A here https://doi.org/10.1073/pnas.1518377113 shows global variance decreases, and this reanalysis of those data https://doi.org/10.1002/hbm.23234 finds some entropy increases). Thus, variance and entropy should not be confused (in theory, one could cycle through several more brain states that are however, similar to each other, which would produce more entropy with decreased variance). Second, and perhaps more problematically for the EBH, is that the entropy effects of psychedelics completely disappear when one does a task, and unfortunately, the authors of these findings have misinterpreted them. What they'll say is that engaging in boring cognitive tasks or watching a video decreases entropy under psychedelics, but what you can see in Figure 1b of https://doi.org/10.1021/acschemneuro.3c00289 and Figure 4b of https://doi.org/10.1038/s41586-024-07624-5 is that entropy actually increases under sober conditions when you do a task. That is, it's a rather boring finding. Essentially, when resting in a scanner while sober, many may actually rest (including falling asleep, especially when subjects are asked to keep their eyes closed), and if you perform a task, brain activity should become more complex relative to doing nothing/falling asleep. When under a psychedelic, one can't fall asleep and thus, there's less change (though note that both of the above studies found numerical increases when performing tasks). Lastly, again I should note that the findings of the present study actually go against EBH/REBUS, given that the findings are increased top-down effects when EBH/REBUS predicts decreased top-down/increased bottom-up effects.
  
  (17) Page 6, "Because psychedelic drug administration increases influence of apical dendritic inputs on neural activity in our model, we found that silencing apical dendritic activity reduced across stimulus neural variability more as the psychedelic drug dose increases." I again want to point out that alpha is not the equivalent of a psychedelic dose here, but rather a parameter in the model that is being proposed as a proxy.
  
  (18) Page 8, "Experimentally, plasticity dynamics which could, theoretically, minimize such a prediction error have been observed in cortex [66, 67], and it has also been proposed that behavioral timescale plasticity in the hippocampus could subserve a similar function [68]. We found that plasticity rules of this kind induce strong correlations between inputs to the apical and basal dendritic compartments of pyramidal neurons, which have been observed in the hippocampus and cortex [55, 56]." Note that the plasticity effects of psychedelics are sometimes not observed in the hippocampus or are even observed as decreases (reviewed in https://doi.org/10.1038/s41386-022-01389-z).
  
  (19) Page 9, as is mentioned, REBUS proposes that there should be a decrease in top-down effects under psychedelics, which goes against what is found here, but as I describe above, the effects of psychedelics on various measures of directionality have been quite mixed.
  
  (20) Unless I'm misunderstanding something, it seems to be a bit of a jump to infer that simply changing alpha in your model is akin to psychedelic dosing. Perhaps if the model implemented biologically plausible 5-HT2A expression and/or its behavior were constrained by common features of a psychedelic experience (e.g., fractal-like visuals imposed onto perception, inability to fall asleep, etc.), I'd be more inclined to see the parallels between alpha and psychedelics dosing. However, it would still need to recapitulate unique effects of psychedelics (e.g., impairments in hippocampal-dependent memory with sparing/facilitation of cortical memory). At the moment, it seems like whatever the model is doing is applicable to any hallucinogenic drug or even psychosis.
  
  Review 1
3. Public_Reviews 06 Jun 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  This work is a nice contribution to the literature in articulating a specific, testable theory of how psychedelics act to generate hallucinations and plasticity. The connection to replay, however - including in the title, abstract, and framing throughout the paper - is not well fleshed out.
  
  In particular, the paper's framing seems to conflate replay, dreams, and top-down processing, but these are not one and the same. Picard-Delano et al. TICS 2023 provides a useful review of the differences between replay and dreams. One key point is that most replay has been observed during NREM sleep, but our canonically bizarre / vivid dreams occur during REM. Top-down connections have also been proposed to be used for many processes aside from replay. The paper would benefit from much more precision and nuance on these points.
  
  I believe the paper is missing demonstrations or speculation about how plasticity under various doses of psychedelics relates to changes in performance, which would be an important link to the replay-dependent learning literature.
  
  Are there renderings available for 'ripple' effects of psychedelics that could be included, to allow readers to compare the model's hallucinations to humans'? Short of this, it would be useful to have a more detailed description of what rippling is. (For those readers without firsthand knowledge!) It is currently difficult to assess how close the match is.
  
  Review 2
4. Public_Reviews 06 Jun 2025
  
  in eLife
  
  Author response:
  
  We thank the reviewers for the valuable and constructive reviews. Thanks to these, we believe the article will be considerably improved. We have organized our response to address points that are relevant to both reviewers first, after which we address the unique concerns of each individual reviewer separately. We briefly paraphrase each concern and provide comments for clarification, outlining the precise changes that we will make to the text.
  
  Common Concerns (Reviewer 1 & Reviewer 2):
  
  Can you clarify how NREM and REM sleep relate to the oneirogen hypothesis?
  
  Within the submission draft we tried to stay agnostic as to whether mechanistically similar replay events occur during NREM or REM sleep; however, upon a more thorough literature review, we think that there is moderately greater evidence in favor of Wake-Sleep-type replay occurring during REM sleep which is related to classical psychedelic drug mechanisms of action.
  
  First, we should clarify that replay has been observed during both REM and NREM sleep, and dreams have been documented during both sleep stages, though the characteristics of dreams differ across stages, with NREM dreams being more closely tied to recent episodic experience and REM dreams being more bizarre/hallucinatory (see Stickgold et al., 2001 for a review). Replay during sleep has been studied most thoroughly during NREM sharp-wave ripple events, in which significant cortical-hippocampal coupling has been observed (Ji & Wilson, 2007). However, it is critical to note that the quantification methods used to identify replay events in the hippocampal literature usually focus on identifying what we term ‘episodic replay,’ which involves a near-identical recapitulation of neural trajectories that were recently experienced during waking experimental recordings (Tingley & Peyrach, 2020). In contrast, our model focuses on ‘generative replay,’ where one expects only a statistically similar reproduction of neural activity, without any particular bias towards recent or experimentally controlled experience. This latter form of replay may look closer to the ‘reactivation’ observed in cortex by many studies (e.g. Nguyen et al., 2024), where correlation structures of neural activity similar to those observed during stimulus-driven experience are recapitulated. Under experimental conditions in which an animal is experiencing highly stereotyped activity repeatedly, over extended periods of time, these two forms of replay may be difficult to dissociate.
  
  Interestingly, though NREM replay has been shown to couple hippocampal and cortical activity, a similar study in waking animals administered psychedelics found hippocampal replay without any obvious coupling to cortical activity (Domenico et al., 2021). This could be because the coupling was not strong enough to produce full trajectories in the cortex (psychedelic administration did not increase ‘alpha’ enough), and that a causal manipulation of apical/basal influence in the cortex may be necessary to observe the increased coupling. Alternatively, as Reviewer 1 noted, it may be that psychedelics induce a form of hippocampus-decoupled replay, as one would expect from the REM stage of a recently proposed complementary learning systems model (Singh et al., 2022).
  
  Evidence in favor of a similarity between the mechanism of action of classical psychedelics and the mechanism of action of memory consolidation/learning during REM sleep is actually quite strong. In particular, studies have shown that REM sleep increases the activity of soma-targeting parvalbumin (PV) interneurons and decreases the activity of apical dendrite-targeting somatostatin (SOM) interneurons (Niethard et al., 2021), that this shift in balance is controlled by higher-order thalamic nuclei, and that this shift in balance is critical for synaptic consolidation of both monocular deprivation effects in early visual cortex (Zhou et al., 2020) and for the consolidation of auditory fear conditioning in the dorsal prefrontal cortex (Aime et al., 2022). These last studies were not discussed in the present manuscript–we will add them, in addition to a more nuanced description of the evidence connecting our model to NREM and REM replay.
  
  Can you explain how synaptic plasticity induced by psychedelics within your model relates to learning at a behavioral level?
  
  While the Wake-Sleep algorithm is a useful model for unsupervised statistical learning, it is not a model of reward or fear-based conditioning, which likely occur via different mechanisms in the brain (e.g. dopamine-dependent reinforcement learning or serotonin-dependent emotional learning). The Wake-Sleep algorithm is a ‘normative plasticity algorithm,’ that connects synaptic plasticity to the formation of structured neural representations, but it is not the case that all synaptic plasticity induced by psychedelic administration within our model should induce beneficial learning effects. According to the Wake-Sleep algorithm, plasticity at apical synapses is enhanced during the Wake phase, and plasticity at basal synapses is enhanced during the Sleep phase; under the oneirogen hypothesis, hallucinatory conditions (increased ‘alpha’) cause an increase in plasticity at both apical and basal sites. Because neural activity is in a fundamentally aberrant state when ‘alpha’ is increased, there are no theoretical guarantees that plasticity will improve performance on any objective: psychedelic-induced plasticity within our model could perhaps better be thought of as ‘noise’ that may have a positive or negative effect depending on the context.
  
  In particular, such ‘noise’ may be beneficial for individuals or networks whose synapses have become locked in a suboptimal local minimum. The addition of large amounts of random plasticity could allow a system to extricate itself from such local minima over subsequent learning (or with careful selection of stimuli during psychedelic experience), similar to simulated annealing optimization approaches. If our model were fully validated, this view of psychedelic-induced plasticity as ‘noise’ could have relevance for efforts to alleviate the adverse effects of PTSD, early life trauma, or sensory deprivation; it may also provide a cautionary note against repeated use of psychedelic drugs within a short time frame, as the plasticity changes induced by psychedelic administration under our model are not guaranteed to be good or useful in-and-of themselves without subsequent re-learning and compensation.
  
  We should also note that we have deliberately avoided connecting the oneirogen hypothesis model to fear extinction experimental results that have been observed through recordings of the hippocampus or the amygdala (Bombardi & Giovanni, 2013; Jiang et al., 2009; Kelly et al., 2024; Tiwari et al., 2024). Both regions receive extensive innervation directly from serotonergic synapses originating in the dorsal raphe nucleus, which have been shown to play an important role in emotional learning (Lesch & Waider, 2012); because classical psychedelics may play a more direct role in modulating this serotonergic innervation, it is possible that fear conditioning results (in addition to the anxiolytic effects of psychedelics) cannot be attributed to a shift in balance between apical and basal synapses induced by psychedelic administration. We will provide a more detailed review of these results in the text, as well as more clarity regarding their relation to our model.
  
  Reviewer 1 Concerns:
  
  Is it reasonable to assign a scalar parameter ‘alpha’ to the effects of classical psychedelics? And is your proposed mechanism of action unique to classical psychedelics? E.g. Could this idea also apply to kappa opioid agonists, ketamine, or the neural mechanisms of hallucination disorders?
  
  We will clarify that within our model ‘alpha’ is a parameter that reflects the balance between apical and basal synapses in determining the activity of neurons in the network. For the sake of simplicity we used a single ‘alpha’ parameter, but realistically, each neuron would have its own ‘alpha’ parameter, and different layers or individual neurons could be affected differentially by the administration of any particular drug; therefore, our scalar ‘alpha’ value can be thought of as a mean parameter for all neurons, disregarding heterogeneity across individual neurons.
  
  There are many different mechanisms that could theoretically affect this ‘alpha’ parameter, including: 5-HT2a receptor agonism, kappa opioid receptor binding, ketamine administration, or possibly the effects of genetic mutations underlying the pathophysiology of complex developmental hallucination disorders. We focused exclusively on 5-HT2a receptor agonism for this study because the mechanism is comparatively simple and extensively characterized, but similar mechanisms may well be responsible for the hallucinatory symptoms of a variety of drugs and disorders.
  
  Can you clarify the role of 5-HT2a receptor expression on interneurons within your model?
  
  While we mostly focused on the effects of 5-HT2a receptors on the apical dendrites of pyramidal neurons, these receptors are also expressed on soma-targeting parvalbumin (PV) interneurons. This expression on PV interneurons is consistent with our proposed psychedelic mechanism of action, because it could lead to a coordinated decrease in the influence of somatic and proximal dendritic inputs while increasing the influence of apical dendritic inputs. We will elaborate on this point, and will move the discussion earlier in the text.
  
  Discussions of indigenous use of psychedelics over millenia may amount to over-romanticization.
  
  We will take great care to conduct a more thorough literature review to reevaluate our statement regarding indigenous psychedelic use (including the citations you suggested), and will either provide a more careful statement or remove this discussion from our introduction entirely, as it has little bearing on the rest of the text. The Ethics Statement will also be modified accordingly.
  
  You isolate the 5-HT2a agonism as the mechanism of action underlying ‘alpha’ in your model, but there exist 5-HT2a agonists that do not have hallucinatory effects (e.g. lisuride). How do you explain this?
  
  Lisuride has much-reduced hallucinatory effects compared to other psychedelic drugs at clinical doses (though it does indeed induce hallucinations at high doses; Marona-Lewicka et al., 2002), and we should note that serotonin (5-HT) itself is pervasive in the cortex without inducing hallucinatory effects during natural function. Similarly, MDMA is a partial agonist for 5-HT2a receptors, but it has much-reduced perceptual hallucination effects relative to classical psychedelics (Green et al., 2003) in addition to many other effects not induced by classical psychedelics.
  
  Therefore, while we argue that 5-HT2a agonism induces an increase in influence of apical dendritic compartments and a decrease in influence of basal/somatic compartments, and that this change induces hallucinations, we also note that there are many other factors that control whether or not hallucinations are ultimately produced, so that not all 5-HT2a agonists are hallucinogenic. We will discuss two such factors in our revision: 5-HT receptor binding affinity and cellular membrane permeability.
  
  Importantly, many 5-HT2a receptor agonists are also 5-HT1a receptor agonists (e.g. serotonin itself and lisuride), while MDMA has also been shown to increase serotonin, norepinephrine, and dopamine release (Green et al., 2003). While 5-HT2a receptor agonism has been shown to reduce sensory stimulus responses (Michaiel et al., 2019), 5-HT1a receptor agonism inhibits spontaneous cortical activity (Azimi et al., 2020); thus one might expect the net effect of administering serotonin or a nonselective 5-HT receptor agonist to be widespread inhibition of a circuit, as has been observed in visual cortex (Azimi et al., 2020). Therefore, selective 5-HT2a agonism is critical for the induction of hallucinations according to our model, though any intervention that jointly excites pyramidal neurons’ apical dendrites and inhibits their basal/somatic compartments across a broad enough area of cortex would be predicted to have a similar effect. Lisuride has a much higher binding affinity for 5-HT1a receptors than, for instance, LSD (Marona-Lewicka et al., 2002).
  
  Secondly, it has recently been shown that both the head-twitch effect (a coarse behavioral readout of hallucinations in animals) and the plasticity effects of psychedelics are abolished when administering 5-HT2a agonists that are impermeable to the cellular membrane because of high polarity, and that these effects can be rescued by temporarily rendering the cellular membrane permeable (Vargas et al., 2023). This suggests that the critical hallucinatory effects of psychedelics (apical excitation according to our model) may be mediated by intracellular 5-HT2a receptors. Notably, serotonin itself is not membrane permeable in the cortex.
  
  Therefore, either of these two properties could play a role in whether a given 5-HT2a agonist induces hallucinatory effects. We will provide a considerably extended discussion of these nuances in our revision.
  
  Your model proposes that an increase in top-down influence on neural activity underlies the hallucinatory effects of psychedelics. How do you explain experimental results that show increases in bottom-up functional connectivity (either from early sensory areas or the thalamus)?
  
  Firstly, we should note that our proposed increase in top-down influence is a causal, biophysical property, not necessarily a statistical/correlative one. As such, we will stress that the best way to test our model is via direct intervention in cortical microcircuitry, as opposed to correlative approaches taken by most fMRI studies, which have shown mixed results with regard to this particular question. Correlative approaches can be misleading due to dense recurrent coupling in the system, and due to the coarse temporal and spatial resolution provided by noninvasive recording technologies (changes in statistical/functional connectivity do not necessarily correspond to changes in causal/mechanistic connectivity, i.e. correlation does not imply causation).
  
  There are two experimental results that appear to contradict our hypothesis that deserve special consideration in our revision. The first shows an increase in directional thalamic influence on the distributed cortical networks after psychedelic administration (Preller et al., 2018). To explain this, we note that this study does not distinguish between lower-order sensory thalamic nuclei (e.g. the lateral and medial geniculate nuclei receiving visual and auditory stimuli respectively) and the higher-order thalamic nuclei that participate in thalamocortical connectivity loops (Whyte et al., 2024). Subsequent more fine-grained studies have noted an increase in influence of higher order thalamic nuclei on the cortex (Pizzi et al., 2023; Gaddis et al., 2022), and in fact extensive causal intervention research has shown that classical psychedelics (and 5-HT2a agonism) decrease the influence of incoming sensory stimuli on the activity of early sensory cortical areas, indicating decoupling from the sensory thalamus (Evarts et al., 1955; Azimi et al., 2020; Michaiel et al. 2019). The increased influence of higher-order thalamic nuclei is consistent with both the cortico-striatal-thalamo-cortical (CTSC) model of psychedelic action as well as the oneirogen hypothesis, since higher-order thalamic inputs modulate the apical dendrites of pyramidal neurons in cortex (Whyte et al., 2024).
  
  The second experimental result notes that DMT induces traveling waves during resting state activity that propagate from early visual cortex to deeper cortical layers (Alamia et al., 2020). There are several possibilities that could explain this phenomenon: 1) it could be due to the aforementioned difficulties associated with directed functional connectivity analyses, 2) it could be due to a possible high binding affinity for DMT in the visual cortex relative to other brain areas, or 3) it could be due to increases in apical influence on activity caused by local recurrent connectivity within the visual cortex which, in the absence of sensory input, could lead to propagation of neural activity from the visual cortex to the rest of the brain. This last possibility is closest to the model proposed by (Ermentrout & Cowan, 1979), and which we believe would be best explained within our framework by a topographically connected recurrent network architecture trained on video data; a potentially fruitful direction for future research.
  
  Shouldn’t the hallucinations generated by your model look more ‘psychedelic,’ like those produced by the DeepDream algorithm?
  
  We believe that the differences in hallucination visualization quality between our algorithm and DeepDream are mostly due to differences in the scale and power of the models used across these two studies. We are confident that with more resources (and potentially theoretical innovations to improve the Wake-Sleep algorithm’s performance) the produced hallucination visualizations could become more realistic, but we believe this falls outside the scope of the present study.
  
  We note that more powerful generative models trained with backpropagation are able to produce surreal images of comparable quality (Rezende et al., 2014; Goodfellow et al., 2020; Vahdat & Kautz, 2020), though these have not yet been used as a model of psychedelic hallucinations. However, the DeepDream model operates on top of large pretrained image processing models, and does not provide a biologically mechanistic/testable interpretation of its hallucination effects. When training smaller models with a local synaptic plasticity rule (as opposed to backpropagation), the hallucination effects are less visually striking due to the reduced quality of our trained generative model, though they are still strongly tied to the statistics of sensory inputs, as quantified by our correlation similarity metric (Fig. 5b). We will provide a more detailed explanation of this phenomenon when we discuss our model limitations in our revised manuscript.
  
  Your model assumes domination by entirely bottom-up activity during the ‘wake’ phase, and domination entirely by top-down activity during ‘sleep,’ despite experimental evidence indicating that a mixture of top-down and bottom-up inputs influence neural activity during both stages in the brain. How do you explain this?
  
  Our use of the Wake-Sleep algorithm, in which top-down inputs (Sleep) or bottom-up inputs (Wake) dominate network activity is an over-simplification made within our model for computational and theoretical reasons. Models that receive a mixture of top-down and bottom-up inputs during ‘Wake’ activity do exist (in particular the closely related Boltzmann machine (Ackley et al., 1985)), but these models are considerably more computationally costly to train due to a need to run extensive recurrent network relaxation dynamics for each input stimulus. Further, these models do not generalize as cleanly to processing temporal inputs. For this reason, we focused on the Wake-Sleep algorithm, at the cost of some biological realism, though we note that our model should certainly be extended to support mixed apical-basal waking regimes. We will make sure to discuss this in our ‘Model Limitations’ section.
  
  Your model proposes that 5-HT2a agonism enhances glutamatergic transmission, but this is not true in the hippocampus, which shows decreases in glutamate after psychedelic administration.
  
  We should note that our model suggests only compartment specific increases in glutamatergic transmission; as such, our model does not predict any particular directionality for measures of glutamatergic transmission that includes signaling at both apical and basal compartments in aggregate, as was measured in the provided study (Mason et al., 2020).
  
  You claim that your model is consistent with the Entropic Brain theory, but you report increases in variance, not entropy. In fact, it has been shown that variance decreases while entropy increases under psychedelic administration. How do you explain this discrepancy?
  
  Unfortunately, ‘entropy’ and ‘variance’ are heavily overloaded terms in the noninvasive imaging literature, and the particularities of the method employed can exert a strong influence on the reported effects. The reduction in variance reported by (Carhart-Harris et al., 2016) is a very particular measure: they are reporting the variance of resting state synchronous activity, averaged across a functional subnetwork that spans many voxels; as such, the reduction in variance in this case is a reduction in broad, synchronous activity. We do not have any resting state synchronous activity in our network due to the simplified nature of our model (particularly an absence of recurrent temporal dynamics), so we see no reduction in variance in our model due to these effects.
  
  Other studies estimate ‘entropy’ or network state disorder via three different methods that we have been able to identify. 1) (Carhart-Harris et al., 2014) uses a different measure of variance: in this case, they subtract out synchronous activity within functional subnetworks, and calculate variability across units in the network. This measure reports increases in variance (Fig. 6), and is the closest measure to the one we employ in this study. 2) (Lebedev et al., 2016) uses sample entropy, which is a measure of temporal sequence predictability. It is specifically designed to disregard highly predictable signals, and so one might imagine that it is a measure that is robust to shared synchronous activity (e.g. resting state oscillations). 3) (Mediano et al., 2024) uses Lempel-Ziv complexity, which is, similar to sample entropy, a measure of sequence diversity; in this case the signal is binarized before calculation, which makes this method considerably different from ours. All three of the preceding methods report increases in sequence diversity, in agreement with our quantification method. Our strongest explanation for why the variance calculation in (Carhart-Harris et al., 2016) produces a variance reduction is therefore due to a reduction in low-rank synchronous activity in subnetworks during resting state.
  
  As for whether the entropy increase is meaningful: we share Reviewer 1’s concern that increases in entropy could simply be due to a higher degree of cognitive engagement during resting state recordings, due to the presence of sensory hallucinations or due to an inability to fall asleep. This could explain why entropy increases are much more minimal relative to non-hallucinating conditions during audiovisual task performance (Siegel et al., 2024; Mediano et al., 2024). However, we can say that our model is consistent with the Entropic Brain Theory without including any form of ‘cognitive processing’: we observe increases in variability during resting state in our model, but we observe highly similar distributions of activity when averaging over a wide variety of sensory stimulus presentations (Fig. 5b-c). This is because variability in our model is not due to unstructured noise: it corresponds to an exploration of network states that would ordinarily be visited by some stimulus. Therefore, when averaging across a wide variety of stimuli, the distribution of network states under hallucinating or non-hallucinating conditions should be highly similar.
  
  One final point of clarification: here we are distinguishing Entropic Brain Theory from the REBUS model–the oneirogen hypothesis is consistent with the increase in entropy observed experimentally, but in our model this entropy increase is not due to increased influence of bottom-up inputs (it is due instead to an increase in top-down influence). Therefore, one could view the oneirogen hypothesis as consistent with EBT, but inconsistent with REBUS.
  
  You relate your plasticity rule to behavioral-timescale plasticity (BTSP) in the hippocampus, but plasticity has been shown to be reduced in the hippocampus after psychedelic administration. Could you elaborate on this connection?
  
  When we were establishing a connection between our ‘Wake-Sleep’ plasticity rule and BTSP learning, the intended connection was exclusively to the mathematical form of the plasticity rule, in which activity in the apical dendrites of pyramidal neurons functions as an instructive signal for plasticity in basal synapses (and vice versa): we will clarify this in the text. Similarly, we point out that such a plasticity rule tends to result in correlated tuning between apical and basal dendritic compartments, which has been observed in hippocampus and cortex: this is intended as a sanity check of our mapping of the Wake-Sleep algorithm to cortical microcircuitry, and has limited further bearing on the effects of psychedelics specifically.
  
  Reduction in plasticity in the hippocampus after psychedelic administration could be due to a complementary learning systems-type model, in which the hippocampus becomes partly decoupled from the cortex during REM sleep (Singh et al., 2022); were this to be the case, it would not be incompatible with our model, which is mostly focused on the cortex. Notably, potentiating 5HT-2a receptors in the ventral hippocampus does not induce the head-twitch response, though it does produce anxiolytic effects (Tiwari et al., 2024), indicating that the hallucinatory and anxiolytic effects of classical psychedelics may be partly decoupled.
  
  Reviewer 2 Concerns:
  
  Could you provide visualizations of the ‘ripple’ phenomenon that you’re referring to?
  
  We will do this! For now, you can get a decent understanding of what the ‘ripple effect’ looks like from the ‘eyes closed’ hallucination condition for networks trained on CIFAR10 (Fig. 2d). The ripple effect that we are referring to is very similar, except it is superimposed on a naturalistic image under ordinary viewing conditions; to give a higher quality visualization of the ripple phenomenon itself, we will subtract out the static contribution of the image itself, leaving only the ripple phenomenon.
  
  Could you provide a more nuanced description of alternative roles for top-down feedback, beyond being used exclusively for learning as depicted in your model?
  
  For the sake of simplicity, we only treat top-down inputs in our model as a source of an instructive teaching signal, the originator of generative replay events during the Sleep phase, and as the mechanism of hallucination generation. However, as discussed in a response to a previous question, in the cortex pyramidal neurons receive and respond to a mixture of top-down and bottom-up processing.
  
  There are a variety of theories for what role top-down inputs could play in determining network activity. To name several, top-down input could function as: 1) a denoising/pattern completion signal (Kadkhodaie & Simoncelli, 2021), 2) a feedback control signal (Podlaski & Machens, 2020), 3) an attention signal (Lindsay, 2020), 4) ordinary inputs for dynamic recurrent processing that play no specialized role distinct from bottom-up or lateral inputs except to provide inputs from higher-order association areas or other sensory modalities (Kar et al., 2019; Tugsbayar et al., 2025). Though our model does not include these features, they are perfectly consistent with our approach.
  
  In particular, denoising/pattern completion signals in the predictive coding framework (closely related to the Wake-Sleep algorithm) also play a role as an instructive learning signal (Salvatori et al., 2021); and top-down control signals can play a similar role in some models (Gilra & Gerstner, 2017; Meulemans et al., 2021). Thus, options 1 and 2 are heavily overlapping with our approach, and are a natural consequence of many biologically plausible learning algorithms that minimize a variational free energy loss (Rao & Ballard, 1997; Ackley et al., 1985). Similarly, top-down attentional signals can exist alongside top-down learning signals, and some models have argued that such signals can be heavily overlapping or mutually interchangeable (Roelfsema & van Ooyen, 2005). Lastly, generic recurrent connectivity (from any source) can be incorporated into the Wake-Sleep algorithm (Dayan & Hinton, 1996), though we avoided doing this in the present study due to an absence of empirical architecture exploration in the literature and the computational complexity associated with training on time series data.
  
  To conclude, there are a variety of alternative functions proposed for top-down inputs onto pyramidal neurons in the cortex, and we view these additional features as mutually compatible with our approach; for simplicity we did not include them in our model, but we believe that these features are unlikely to interfere with our testable predictions or empirical results.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.09.27.615483v2
www.biorxiv.org www.biorxiv.org

Trial-Level Representational Similarity Analysis

3
1. Public_Reviews 06 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  This study proposes a potentially useful improvement on a popular fMRI method for quantifying representational similarity in brain measurements by focusing on representational strength at the single trial level and adding linear mixed effects modeling for group-level inference. The manuscript demonstrates increased sensitivity with no loss of precision compared to more classic versions of the method. However, the framing of the work with respect to these prior approaches is incomplete, several assumptions are insufficiently motivated, and it is unclear to what extent the approach would generalize to other paradigms.
  
  Summary
2. Public_Reviews 06 Jun 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The paper presents a novel method for RSA, called trial-level RSA (tRSA). The method first constructs a trial x trial representation dissimilarity matrix using correlation distances, assuming that (as in the empirical example) each trial has a unique stimulus. Whereas "classical RSA" correlates the entire upper triangular matrix of the RDM / RSM to a model RDM / RSM, tRSA first calculates the correlation to the model RDM per row, and then averages these values. The paper claims that tRSA has increased sensitivity and greater flexibility than classical RSA.
  
  Strengths & Weaknesses:
  
  I have to admit that it took a few hours of intense work to understand this paper and to even figure out where the authors were coming from. The problem setting, nomenclature, and simulation methods presented in this paper do not conform to the notation common in the field, are often contradictory, and are usually hard to understand. Most importantly, the problem that the paper is trying to solve seems to me to be quite specific to the particular memory study in question, and is very different from the normal setting of model-comparative RSA that I (and I think other readers) may be more familiar with.
  
  Main issues:
  
  (1) The definition of "classical RSA" that the authors are using is very narrow. The group around Niko Kriegeskorte has developed RSA over the last 10 years, addressing many of the perceived limitations of the technique. For example, cross-validated distance measures (Walther et al. 2016; Nili et al. 2014; Diedrichsen et al. 2021) effectively deal with an uneven number of trials per condition and unequal amounts of measurement noise across trials. Different RDM comparators (Diedrichsen et al. 2021) and statistical methods for generalization across stimuli (Schütt et al. 2023) have been developed, addressing shortcomings in sensitivity. Finally, both a Bayesian variant of RSA (Pattern component modelling, (Diedrichsen, Yokoi, and Arbuckle 2018) and an encoding model (Naselaris et al. 2011) can effectively deal with continuous variables or features across time points or trials in a framework that is very related to RSA (Diedrichsen and Kriegeskorte 2017). The author may not consider these newer developments to be classical, but they are in common use and certainly provide the solution to the problems raised in this paper in the setting of model-comparative RSA in which there is more than one repetition per stimulus.
  
  (2) The stated problem of the paper is to estimate "representational strength" in different regions or conditions. With this, the authors define the correlation of the brain RDM with a model RDM. This metric conflates a number of factors, namely the variances of the stimulus-specific patterns, the variance of the noise, the true differences between different dissimilarities, and the match between the assumed model and the data-generating model. It took me a long time to figure out that the authors are trying to solve a quite different problem in a quite different setting from the model-comparative approach to RSA that I would consider "classical" (Diedrichsen et al. 2021; Diedrichsen and Kriegeskorte 2017). In this approach, one is trying to test whether local activity patterns are better explained by representation model A or model B, and to estimate the degree to which the representation can be fully explained. In this framework, it is common practice to measure each stimulus at least 2 times, to be able to estimate the variance of noise patterns and the variance of signal patterns directly. Using this setting, I would define 'representational strength" very differently from the authors. Assume (using LaTeX notation) that the activity patterns $y_j,n$ for stimulus j, measurement n, are composed of a true stimulus-related pattern ($u_j$) and a trial-specific noise pattern ($e_j,n$). As a measure of the strength of representation (or pattern), I would use an unbiased estimate of the variance of the true stimulus-specific patterns across voxels and stimuli ($\sigma^2_{u}$). This estimator can be obtained by correlating patterns of the same stimuli across repeated measures, or equivalently, by averaging the cross-validated Euclidean distances (or with spatial prewhitening, Mahalanobis distances) across all stimulus pairs. In contrast, the current paper addresses a specific problem in a quite specific experimental design in which there is only one repetition per stimulus. This means that the authors have no direct way of distinguishing true stimulus patterns from noise processes. The trick that the authors apply here is to assume that the brain data comes from the assumed model RDM (a somewhat sketchy assumption IMO) and that everything that reduces this correlation must be measurement noise. I can now see why tRSA does make some sense for this particular question in this memory study. However, in the more common model-comparative RSA setting, having only one repetition per stimulus in the experiment would be quite a fatal design flaw. Thus, the paper would do better if the authors could spell the specific problem addressed by their method right in the beginning, rather than trying to set up tRSA as a general alternative to "classical RSA".
  
  (3) The notation in the paper is often conflicting and should be clarified. The actual true and measured activity patterns should receive a unique notation that is distinct from the variances of these patterns across voxels. I assume that $\sigma_ijk$ is the noise variances (not standard deviation)? Normally, variances are denoted with $\sigma^2$. Also, if these are variances, they cannot come from a normal distribution as indicated on page 10. Finally, multi-level models are usually defined at the level of means (i.e., patterns) rather than at the level of variances (as they seem to be done here).
  
  (4) In the first set of simulations, the authors sampled both model and brain RSM by drawing each cell (similarity) of the matrix from an independent bivariate normal distribution. As the authors note themselves, this way of producing RSMs violates the constraint that correlation matrices need to be positive semi-definite. Likely more seriously, it also ignores the fact that the different elements of the upper triangular part of a correlation matrix are not independent from each other (Diedrichsen et al. 2021). Therefore, it is not clear that this simulation is close enough to reality to provide any valuable insight and should be removed from the paper, along with the extensive discussion about why this simulation setting is plainly wrong (page 21). This would shorten and clarify the paper.
  
  (5) If I understand the second simulation setting correctly, the true pattern for each stimulus was generated as an NxP matrix of i.i.d. standard normal variables. Thus, there is no condition-specific pattern at all, only condition-specific noise/signal variances. It is not clear how the tRSA would be biased if there were a condition-specific pattern (which, in reality, there usually is). Because of the i.i.d. assumption of the true signal, the correlations between all stimulus pairs within conditions are close to zero (and only differ from it by the fact that you are using a finite number of voxels). If you added a condition-specific pattern, the across-condition RSA would lead to much higher "representational strength" estimates than a within-condition RSA, with obvious problems and biases.
  
  (6) The trial-level brain RDM to model Spearman correlations was analyzed using a mixed effects model. However, given the symmetry of the RDM, the correlations coming from different rows of the matrix are not independent, which is an assumption of the mixed effect model. This does not seem to induce an increase in Type I errors in the conditions studied, but there is no clear justification for this procedure, which needs to be justified.
  
  (7) For the empirical data, it is not clear to me to what degree the "representational strength" of cRSA and tRSA is actually comparable. In cRSA, the Spearman correlation assesses whether the distances in the data RSM are ranked in the same order as in the model. For tRSA, the comparison is made for every row of the RSM, which introduces a larger degree of flexibility (possibly explaining the higher correlations in the first simulation). Thus, could the gains presented in Figure 7D not simply arise from the fact that you are testing different questions? A clearer theoretical analysis of the difference between the average row-wise Spearman correlation and the matrix-wise Spearman correlation is urgently needed. The behavior will likely vary with the structure of the true model RDM/RSM.
  
  (8) For the real data, there are a number of additional sources of bias that need to be considered for the analysis. What if there are not only condition-specific differences in noise variance, but also a condition-specific pattern? Given that the stimuli were measured in 3 different imaging runs, you cannot assume that all measurement noise is i.i.d. - stimuli from the same run will likely have a higher correlation with each other.
  
  (9) The discussion should be rewritten in light of the fact that the setting considered here is very different from the model-comparative RSA in which one usually has multiple measurements per stimulus per subject. In this setting, existing approaches such as RSA or PCM do indeed allow for the full modelling of differences in the "representational strength" - i.e., pattern variance across subjects, conditions, and stimuli. Cross-validated distances provide a powerful tool to control for differences in measurement noise variances and possible covariances in measurement noise across trials, which has many distinct advantages and is conceptually very different from the approach taken here. One of the main limitations of tRSA is the assumption that the model RDM is actually the true brain RDM, which may not be the case. Thus, in theory, there could be a different model RDM, in which representational strength measures would be very different. These differences should be explained more fully, hopefully leading to a more accessible paper.
  
  References:
  
  Diedrichsen, J., Berlot, E., Mur, M., Schütt, H. H., Shahbazi, M., & Kriegeskorte, N. (2021). Comparing representational geometries using whitened unbiased-distance-matrix similarity. Neurons, Behavior, Data and Theory, 5(3). https://arxiv.org/abs/2007.02789
  
  Diedrichsen, J., & Kriegeskorte, N. (2017). Representational models: A common framework for understanding encoding, pattern-component, and representational-similarity analysis. PLoS Computational Biology, 13(4), e1005508.
  
  Diedrichsen, J., Yokoi, A., & Arbuckle, S. A. (2018). Pattern component modeling: A flexible approach for understanding the representational structure of brain activity patterns. NeuroImage, 180, 119-133.
  
  Naselaris, T., Kay, K. N., Nishimoto, S., & Gallant, J. L. (2011). Encoding and decoding in fMRI. NeuroImage, 56(2), 400-410.
  
  Nili, H., Wingfield, C., Walther, A., Su, L., Marslen-Wilson, W., & Kriegeskorte, N. (2014). A toolbox for representational similarity analysis. PLoS Computational Biology, 10(4), e1003553.
  
  Schütt, H. H., Kipnis, A. D., Diedrichsen, J., & Kriegeskorte, N. (2023). Statistical inference on representational geometries. ELife, 12. https://doi.org/10.7554/eLife.82566
  
  Walther, A., Nili, H., Ejaz, N., Alink, A., Kriegeskorte, N., & Diedrichsen, J. (2016). Reliability of dissimilarity measures for multi-voxel pattern analysis. NeuroImage, 137, 188-200.
  
  Review 1
3. Public_Reviews 06 Jun 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This methods paper proposes two changes to classic RSA, a popular method to probe neural representation in neuroimaging experiments: computing RSA at row/column level of RDM, and using mixed linear modeling to compute second-level statistics, using the individual row/columns to estimate a random effect of stimulus. The benefit of the new method is demonstrated using simulations and a re-analysis of a prior fMRI dataset on object perception and memory encoding.
  
  Strengths:
  
  (1) The paper is clearly written and features clear illustrations of the proposed method.
  
  (2) The combination of simulation and real data works well, with the same factors being examined in both simulations and real data, resulting in a convincing demonstration of the benefits of tRSA in realistic experimental scenarios.
  
  (3) I find the author's claim that tRSA is a promising approach to perform more complete modeling of cogneuro data, but also to conceptualize representation at the single trial/event level (cf Discussion section on P42), quite appealing.
  
  Weaknesses:
  
  (1) While I generally welcome the contribution (see above), I take some issue with the accusatory tone of the manuscript in the Introduction. The text there (using words such as 'ignored variances', 'errouneous inferences', 'one must', 'not well-suited', 'misleading') appears aimed at turning cRSA in a 'straw man' with many limitations that other researchers have not recognized but that the new proposed method supposedly resolves. This can be written in a more nuanced, constructive manner without accusing the numerous users of this popular method of ignorance.
  
  (2) The described limitations are also not entirely correct, in my view: for example, statistical inference in cRSA is not always done using classic parametric statistics such as t-tests (cf Figure 1): the rsatoolbox paper by Nili et al. (2014) outlines non-parametric alternatives based on permutation tests, bootstrapping and sign tests, which are commonly used in the field. Nor has RSA ever been conducted at the row/column level (here referred to by the authors as 'trial level'; cf King et al., 2018).
  
  (3) One of the advantages of cRSA is its simplicity. Adding linear mixed effects modeling to RSA introduces a host of additional 'analysis parameters' pertaining to the choice of the model setup (random effects, fixed effects, interactions, what error terms to use) - how should future users of tRSA navigate this?
  
  (4) Here, only a single real fMRI dataset is used with a quite complicated experimental design for the memory part; it's not clear if there is any benefit of using tRSA on a simpler real dataset. What's the benefit of tRSA in classic RSA datasets (e.g., Kriegeskorte et al., 2008), with fixed stimulus conditions and no behavior?
  
  (5) The cells of an RDM/RSM reflect pairwise comparisons between response patterns (typically a brain but can be any system; cf Sucholutsky et al., 2023). Because the response patterns are repeatedly compared, the cells of this matrix are not independent of one another. Does this raise issues with the validity of the linear mixed effects model? Does it assume the observations are linearly independent?
  
  (6) The manuscript assumes the reader is familiar with technical statistical terms such as Type I/II error, sensitivity, specificity, homoscedasticity assumptions, as well as linear mixed models (fixed effects, random effects, etc). I am concerned that this jargon makes the paper difficult to understand for a broad readership or even researchers currently using cRSA that might be interested in trying tRSA.
  
  (7) I could not find any statement on data availability or code availability. Given that the manuscript reuses prior data and proposes a new method, making data and code/tutorials openly available would greatly enhance the potential impact and utility for the community.
  
  References
  
  King, M. L., Groen, I. I., Steel, A., Kravitz, D. J., & Baker, C. I. (2019). Similarity judgments and cortical visual responses reflect different properties of object and scene categories in naturalistic images. NeuroImage, 197, 368-382.
  
  Kriegeskorte, N., Mur, M., Ruff, D. A., Kiani, R., Bodurka, J., Esteky, H., ... & Bandettini, P. A. (2008). Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron, 60(6), 1126-1141.
  
  Nili, H., Wingfield, C., Walther, A., Su, L., Marslen-Wilson, W., & Kriegeskorte, N. (2014). A toolbox for representational similarity analysis. PLoS computational biology, 10(4), e1003553.
  
  Sucholutsky, I., Muttenthaler, L., Weller, A., Peng, A., Bobu, A., Kim, B., ... & Griffiths, T. L. (2023). Getting aligned on representational alignment. arXiv preprint arXiv:2310.13018.
  
  Review 2
Visit annotations in context

Tags

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.03.27.645646v1
www.biorxiv.org www.biorxiv.org

Coordinated spinal locomotor network dynamics emerge from cell-type-specific connectivity patterns

4
1. Public_Reviews 06 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  In this valuable study, Wandler et al. provide convincing theoretical evidence for alternate mechanisms of rhythm generation by CPGs. Their model shows that cell-type-specific connectivity and a dominant inhibitory drive could underlie rhythm generation. Excitatory input could act to enhance the frequency range of these rhythms. This modeling study could motivate further experimental investigation of these mechanisms to understand CPG rhythmogenesis.
  
  Summary
2. Public_Reviews 06 Jun 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  This study explores the connectivity patterns that could lead to fast and slow undulating swim patterns in larval zebrafish using a simplified theoretical framework. The authors show that a pattern of connectivity based only on inhibition is sufficient to produce realistic patterns with a single frequency. Two such networks, coupled with inhibition but with distinct time constants, can produce a range of frequencies. Adding excitatory connections further increases the range of obtainable frequencies, albeit at the expense of sudden transitions in the mid-frequency range.
  
  Strengths:
  
  (1) This is an eloquent approach to answering the question of how spinal locomotor circuits generate coordinated activity using a theoretical approach based on moving bump models of brain activity.
  
  (2) The models make specific predictions on patterns of connectivity while discounting the role of connectivity strength or neuronal intrinsic properties in shaping the pattern.
  
  (3) The models also propose that there is an important association between cell-type-specific intersegmental patterns and the recruitment of speed-selective subpopulations of interneurons.
  
  (4) Having a hierarchy of models creates a compelling argument for explaining rhythmicity at the network level. Each model builds on the last and reveals a new perspective on how network dynamics can control rhythmicity. I liked that each model can be used to probe questions in the next/previous model.
  
  Major Issues:
  
  (1) How is this simplified model representative of what is observed biologically? A bump model does not naturally produce oscillations. How would the dynamics of a rhythm generator interact with this simplistic model?
  
  (2) Would this theoretical construct survive being expressed in a biophysical model? It seems that it should, but even a simple biological model with the basic patterns of connectivity shown here would greatly increase confidence in the biological plausibility of the theory.
  
  (3) How stable is this model in its output patterns? Is it robust to noise? Does noise, in fact, smooth out the abrupt transitions in frequency in the middle range?
  
  (4) All figure captions are inadequate. They should have enough information for the reader to understand the figure and the point that was meant to be conveyed. For example, Figure 1 does not explain what the red dot is, what is black, what is white, or what the gradations of gray are. Or even if this is a representative connectivity of one node, or if this shows all the connections? The authors should not leave the reader guessing.
  
  Review 1
3. Public_Reviews 06 Jun 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors aimed to show that connectivity patterns within spinal circuits composed of specific excitatory and inhibitory connectivity and with varying degrees of modularity could achieve tail beats at various frequencies as well as proper left-right coordination and rostrocaudal propagation speeds.
  
  Strengths:
  
  The model is simple, and the connectivity patterns explored are well supported by the literature.
  
  The conclusions are intuitive and support many experimental studies on zebrafish spinal circuits for swimming. The simulations provide strong support for the sufficiency of connectivity patterns to produce and control many hallmark features of swimming in zebrafish.
  
  Weaknesses:
  
  I only have two minor suggestions:
  
  (1) Figure 1A, if I interpret Figure 1B correctly, should there not be long descending projections as well that don't seem to be illustrated?
  
  (2) Page 5, It would be good to define what is meant by slow and fast here, as this definition changes with age in zebrafish (what developmental age)?
  
  Review 2
4. Public_Reviews 06 Jun 2025
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  Central pattern generator (CPG) circuits underly rhythmic motor behaviors. To date, it is thought that these CPG networks are rather local and multiple CPG circuits are serially connected to allow locomotion across the entire body. Distributed CPG networks that incorporate long-range connections have not been proposed, although such connectivity has been experimentally shown for several different spinal populations. In this manuscript, the authors use this existing literature on long-range spinal interneuron connectivity to build a new computational model that reproduces basic features of locomotion like left-right alternation, rostrocaudal propagation, and independent control of frequency and amplitude. Interestingly, the authors show that a model solely based on inhibitory neurons can recapitulate these basic locomotor features. Excitatory sources were then added that increased the dynamic range of frequencies generated. Finally, the authors were also able to reproduce experimentally observed consequences of cell-type-specific ablations, showing that local and long-range, cell-type-specific connectivity could be sufficient for generating locomotion.
  
  Strengths:
  
  This work is novel, providing an interesting alternative to distributed CPGs to the local networks traditionally predicted. It shows cell type cell-type-specific network connectivity is as important, if not more than intrinsic cell properties for rhythmogenesis and that inhibition plays a crucial role in shaping locomotor features. Given the importance of local CPGs in understanding motor control, this alternative concept will be of broad interest to the larger motor control field, including invertebrate and vertebrate species.
  
  Weaknesses:
  
  I have the following minor concerns/clarifications:
  
  (1) The authors describe a single unit as a neuron, be it excitatory or inhibitory, and the output of the simulation is the firing rate of these neurons. Experimentally and in other modeling studies, motor neurons are incorporated in the model, and the output of the network is based on motor neuron firing rate, not the interneurons themselves. Why did the authors choose to build the model this way?
  
  (2) In the single population model (Figure 1), the authors use ipsilateral inhibitory connections that are long-range in an ascending direction. Experimentally, these connections have been shown to be local, while long-range ipsilateral connections have been shown to be descending. What were the reasons the authors chose this connectivity? Do the authors think local ascending inhibitions contribute to rostrocaudal propagation, and how?
  
  (3) In the two-population model, the authors show independent control of frequency and rhythm, as has been reported experimentally. However, in these previous experimental studies, frequency and amplitude are regulated by different neurons, suggesting different networks dedicated to frequency and amplitude control. However, in the current model, the same population with the same connections can contribute to frequency or amplitude depending on relative tonic drive. Can the authors please address these differences either by changes in the model or by adding to the Discussion?
  
  (4) It would be helpful to add a paragraph in the Discussion on how these results could be applicable to other model systems beyond zebrafish. Cell intrinsic rhythmogenesis is a popular concept in the field, and these results show an interesting and novel alternative. It would help to know if there is any experimental evidence suggesting such network-based propagation in other systems, invertebrates, or vertebrates.
  
  Review 3
Visit annotations in context

Tags

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.12.20.629829v2
www.biorxiv.org www.biorxiv.org

Neuroanatomical foundations of social tolerance across macaque species

4
1. Public_Reviews 06 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  This important work compares the size of two brain areas, the amygdala and the hippocampus, across 12 species belonging to the Macaca genus. The authors find, using a convincing methodological approach, that amygdala - but not hippocampal - volume varies with social tolerance grade, with high tolerance species showing larger amygdala than low tolerance species of macaques. Interestingly, their findings also suggest an inverted developmental effect, with intolerant species showing an increase in amygdala volume across the lifespan, compared to tolerant species exhibiting the opposite trend. Overall, this paper offers new insights into the neural basis of social and emotional processing.
  
  Summary
2. Public_Reviews 06 Jun 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This paper investigates the potential link between amygdala volume and social tolerance in multiple macaque species. Through a comparative lens, the authors considered tolerance grade, species, age, sex, and other factors that may contribute to differing brain volumes. They found that amygdala, but not hippocampal, volume differed across tolerance grades, such that high-tolerance species showed larger amygdala than low-tolerance species of macaques. They also found that less tolerant species exhibited increases in amygdala volume with age, while more tolerant species showed the opposite. Given their wide range of species with varied biological and ecological factors, the authors' findings provide new evidence for changes in amygdala volume in relation to social tolerance grades. Contributions from these findings will greatly benefit future efforts in the field to characterize brain regions critical for social and emotional processing across species.
  
  Strengths:
  
  (1) This study demonstrates a concerted and impressive effort to comparatively examine neuroanatomical contributions to sociality in monkeys. The authors impressively collected samples from 12 macaque species with multiple datapoints across species age, sex, and ecological factors. Species from all four social tolerance grades were present. Further, the age range of the animals is noteworthy, particularly the inclusion of individuals over 20 years old - an age that is rare in the wild but more common in captive settings.
  
  (2) This work is the first to report neuroanatomical correlates of social tolerance grade in macaques in one coherent study. Given the prevalence of macaques as a model of social neuroscience, considerations of how socio-cognitive demands are impacted by the amygdala are highly important. The authors' findings will certainly inform future studies on this topic.
  
  (3) The methodology and supplemental figures for acquiring brain MRI images are well detailed. Clear information on these parameters is crucial for future comparative interpretations of sociality and brain volume, and the authors do an excellent job of describing this process in full.
  
  Weaknesses:
  
  (1) The nature vs. nurture distinction is an important one, but it may be difficult to draw conclusions about "nature" in this case, given that only two data points (from grades 3 and 4) come from animals under one year of age (Method Figure 1D). Most brains were collected after substantial social exposure-typically post age 1 or 1.5-so the data may better reflect developmental changes due to early life experience rather than innate wiring. It might be helpful to frame the findings more clearly in terms of how early experiences shape development over time, rather than as a nature vs. nurture dichotomy.
  
  (2) It would be valuable to clarify how the older individuals, especially those 20+ years old, may have influenced the observed age-related correlations (e.g., positive in grades 1-2, negative in grades 3-4). Since primates show well-documented signs of aging, some discussion of the potential contribution of advanced age to the results could strengthen the interpretation.
  
  (3) The authors categorize the behavioral traits previously described in Thierry (2021) into 3 self-defined cognitive requirements, however, they do not discuss under what conditions specific traits were assigned to categories or justify why these cognitive requirements were chosen. It is not fully clear from Thierry (2021) alone how each trait would align with the authors' categories. Given that these traits/categories are drawn on for their neuroanatomical hypotheses, it is important that the authors clarify this. It would be helpful to include a table with all behavioral traits with their respective categories, and explain their reasoning for selecting each cognitive requirement category.
  
  (4) One of the main distinctions the authors make between high social tolerance species and low tolerance species is the level of complex socio-cognitive demands, with more tolerant species experiencing the highest demands. However, socio-cognitive demands can also be very complex for less tolerant species because they need to strategically balance behaviors in the presence of others. The relationships between socio-cognitive demands and social tolerance grades should be viewed in a more nuanced and context-specific manner.
  
  (5) While the limitations section touches on species-related considerations, the issue of individual variability within species remains important. Given that amygdala volume can be influenced by factors such as social rank and broader life experience, it might be useful to further emphasize that these factors could introduce meaningful variation across individuals. This doesn't detract from the current findings but highlights the importance of considering life history and context when interpreting subcortical volumes-particularly in future studies.
  
  Review 1
3. Public_Reviews 06 Jun 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This comparative study of macaque species and the type of social interaction is both ambitious and inevitably comes with a lot of caveats. The overall conclusion is that more intolerant species have a larger amygdala. There are also opposing development profiles regarding amygdala volume depending on whether it is a tolerant or intolerant species.
  
  To achieve any sort of power, they have combined data from 4 centres, which have all used different scanning methods, and there are some resolution differences. The authors have also had to group species into 4 classifications - again to assist with any generalisations and power. They have focussed on the volumes of two structures, the amygdala and the hippocampus, which seems appropriate. Neither structure is homogeneous and so it may well be that a targeted focus on specific nuclei or subfields would help (the authors may well do this next) - but as the variables would only increase further along with the number of potential comparisons, alongside small group numbers, it seems only prudent to treat these findings are preliminary. That said, it is highly unlikely that large numbers of macaque brains will become available in the near future.
  
  This introduction is by way of saying that the study achieves what it sets out to do, but there are many reasons to see this study as preliminary. The main message seems to be twofold: (1) that more intolerant species have relatively larger amygdalae, and (2) that with development, there is an opposite pattern of volume change (increasing with age in intolerant species and decreasing with age in tolerant species). Finding 1 is the opposite of that predicted in Table 1 - this is fine, but it should be made clearer in the Discussion that this is the case, otherwise the reader may feel confused. As I read it, the authors have switched their prediction in the Discussion, which feels uncomfortable.
  
  It is inevitable that the data in a study of this complexity are all too prone to post hoc considerations, to which the authors indulge. In the case of Grade 1 species, the individuals have a lot to learn, especially if they are not top of the hierarchy, but at the same time, there are fewer individuals in the troop, making predictions very tricky. As noted above, I am concerned by the seemingly opposite predictions in Table 1 and those in the Discussion regarding tolerance and amygdala volume. (It may be that the predictions in Table 1 are the opposite of how I read them, in which case the Table and preceding text need to align.)
  
  Review 2
4. Public_Reviews 06 Jun 2025
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  In this study, the authors were looking at neurocorrelates of behavioural differences within the genus Macaca. To do so, they engaged in real-world dissection of dead animals (unconnected to the present study) coming from a range of different institutions. They subsequently compare different brain areas, here the amygdala and the hippocampus, across species. Crucially, these species have been sorted according to different levels of social tolerance grades (from 1 to 4). 12 species are represented across 42 individuals. The sampling process has weaknesses ("only half" of the species contained by the genus, and Macaca mulatta, the rhesus macaque, representing 13 of the total number of individuals), but also strengths (the species are decently well represented across the 4 grades) for the given purpose and for the amount of work required here. I will not judge the dissection process as I am not a neuroanatomist, and I will assume that the different interventions do not alter volume in any significant ways / or that the different conditions in which the bodies were kept led to the documented differences across species.
  
  There are two main results of the study. First, in line with their predictions, the authors find that more tolerant macaque species have larger amygdala, compared to the hippocampus, which remains undifferentiated across species. Second, they also identify developmental effects, although with different trends: in tolerant species, the amygdala relative volume decreases across the lifespan, while in intolerant species, the contrary occurs. The results look quite strong, although the authors could bring up some more clarity in their replies regarding the data they are working with. From one figure to the other, we switch from model-calculated ratio to model-predicted volume. Note that if one was to sample a brain at age 20 in all the grades according to the model-predicted volumes, it would not seem that the difference for amygdala would differ much across grades, mostly driven with Grade 1 being smaller (in line with the main result), but then with Grade 2 bigger than Grade 3, and then Grade 4 bigger once again, but not that different from Grade 2.
  
  Overall, despite this, I think the results are pretty strong, the correlations are not to be contested, but I also wonder about their real meaning and implications. This can be seen under 3 possible aspects:
  
  (1) Classification of the social grade
  
  While it may be familiar to readers of Thierry and collaborators, or to researchers of the macaque world, there is no list included of the 18 behavioral traits used to define the three main cognitive requirements (socio-cognitive demands, predictability of the environment, inhibitory control). It would be important to know which of the different traits correspond to what, whether they overlap, and crucially, how they are realized in the 12 study species, as there could be drastic differences from one species to the next. For now, we can only see from Table S1 where the species align to, but it would be a good addition to have them individually matched to, if not the 18 behavioral traits, at least the 3 different broad categories of cognitive requirements.
  
  (2) Issue of nature vs nurture
  
  Another way to look at the debate between nature vs nurture is to look at phylogeny. For now, there is no phylogenetic tree that shows where the different grades are realized. For example, it would be illuminating to know whether more related species, independently of grades, have similar amygdala or hippocampus sizes. Then the question will go to the details, and whether the grades are realized in particular phylogenetic subdivisions. This would go in line with the general point of the authors that there could be general species differences.
  
  With respect to nurture, it is likely more complicated: one needs to take into account the idiosyncrasies of the life of the individual. For example, some of the cited literature in humans or macaques suggests that the bigger the social network, the bigger the brain structure considered. Right, but this finding is at the individual level with a documented life history. Do we have any of this information for any of the individuals considered (this is likely out of the scope of this paper to look at this, especially for individuals that did not originate from CdP)?
  
  (3) Issue of the discussion of the amygdala's function
  
  The entire discussion/goal of the paper, states that the amygdala is connected to social life. Yet, before being a "social center", the amygdala has been connected to the emotional life of humans and non-humans alike. The authors state L333/34 that "These findings challenge conventional expectations of the amygdala's primary involvement in emotional processes and highlight the complexity of the amygdala's role in social cognition". First, there is no dichotomy between social cognition and emotion. Emotion is part of social cognition (unless we and macaques are robots). Second, there is nowhere in the paper a demonstration that the differences highlighted here are connected to social cognition differences per se. For example, the authors have not tested, say, if grade 4 species are more afraid of snakes than grade 1 species. If so, one could predict they would also have a bigger amygdala, and they would probably also find it in the model. My point is not that the authors should try to correlate any kind of potential aspect that has been connected to the amygdala in the literature with their data (see for example the nice review by Domínguez-Borràs and Vuilleumier, https://doi.org/10.1016/B978-0-12-823493-8.00015-8), but they should refrain from saying they have challenged a particular aspect if they have not even tested it. I would rather engage the authors to try and discuss the amygdala as a multipurpose center, that includes social cognition and emotion.
  
  Strengths:
  
  Methods & breadth of species tested.
  
  Weaknesses:
  
  Interpretation, which can be described as 'oriented' and should rather offer additional views.
  
  Review 3
Visit annotations in context

Tags

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.03.12.642838v1
www.biorxiv.org www.biorxiv.org

BEHAV3D Tumor Profiler to map heterogeneous cancer cell behavior in the tumor microenvironment

5
1. Public_Reviews 05 Jun 2025
 
 in eLife
 
 eLife Assessment
 
 This is a useful tool for code-less analysis of patterns in cell migratory behaviours in vivo using intravital microscopy data and allows correlation with spatial features of the tumour microenvironment. There is a need for these tools to make quantitative analysis, comparison and interpretation of complex cell tracking data more accessible and evidence is provided of its applicability to tracks generated by both proprietary and open tracking software. However, it is incomplete due to limitations imposed by the assumptions that apply to the statistical tests employed.
 
 Summary
2. Public_Reviews 05 Jun 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 Intravital microscopy (IVM) is a powerful tool that facilitates live imaging of individual cells over time in vivo in their native 3D tissue environment. Extracting and analysing multi-parametric data from IVM images however is challenging, particularly for researchers with limited programming and image analysis skills. In this work, Rios-Jimenez and Zomer et al have developed a 'zero-code' accessible computational framework (BEHAV3D-Tumour Profiler) designed to facilitate unbiased analysis of IVM data to investigate tumour cell dynamics (via the tool's central 'heterogeneity module' ) and their interactions with the tumour microenvironment (via the 'large-scale phenotyping' and 'small-scale phenotyping' modules). It is designed as an open-source modular Jupyter Notebook with a user-friendly graphical user interface and can be implemented with Google Colab, facilitating efficient, cloud-based computational analysis at no cost. Demo datasets are also available on the authors GitHub repository to aid user training and enhance the usability of the developed pipeline.
 
 To demonstrate the utility of BEHAV3D-TP, they apply the pipeline to timelapse IVM imaging datasets to investigate the in vivo migratory behaviour of fluorescently labelled DMG cells in tumour bearing mice. Using the tool's 'heterogeneity module' they were able to identify distinct single-cell behavioural patterns (based on multiple parameters such as directionality, speed, displacement, distance from tumour edge) which was used to group cells into distinct categories (e.g. retreating, invasive, static, erratic). They next applied the framework's 'large-scale phenotyping' and 'small-scale phenotyping' modules to investigate whether the tumour microenvironment (TME) may influence the distinct migratory behaviours identified. To achieve this, they combine TME visualisation in vivo during IVM (using fluorescent probes to label distinct TME components) or ex vivo after IVM (by large-scale imaging of harvested, immunostained tumours) to correlate different tumour behavioural patterns with the composition of the TME. They conclude that this tool has helped reveal links between TME composition (e.g. degree of vascularisation, presence of tumour-associated macrophages) and the invasiveness and directionality of tumour cells, which would have been challenging to identify when analysing single kinetic parameters in isolation.
 
 The authors also evaluated the BEHAV3D TP heterogeneity module using available IVM datasets of distinct breast cancer cell lines transplanted in vivo, as well as healthy mammary epithelial cells to test its usability in non-tumour contexts where the migratory phenotypes of cells may be more subtle. This generated data is consistent with that produced during the original studies, as well as providing some additional (albeit preliminary) insights above that previously reported. Collectively, this provides some confidence in BEHAV3D TP's ability to uncover complex, multi-parametric cellular behaviours that may be missed using traditional approaches.
 
 Overall, this computational framework appears to represent a useful and comparatively user-friendly tool to analyse dynamic multi-parametric data to help identify patterns in cell migratory behaviours, and to assess whether these behaviours might be influenced by neighbouring cells and structures in their microenvironment. When combined with other methods, it therefore has the potential to be a valuable addition to a researcher's IVM analysis 'tool-box'.
 
 Strengths:
 
 - Figures are clearly presented, and the manuscript is easy to follow. - The pipeline appears to be intuitive and user-friendly for researchers with limited computational expertise. A detailed step-by-step video and demo datasets are also included to support its uptake. - The different computational modules have been tested using relevant datasets, including imaging data of normal and tumour cells in vivo. - All code is open source, and the pipeline can be implemented with Google Colab. - The tool combines multiple dynamic parameters extracted from timelapse IVM images to identify single-cell behavioural patterns and to cluster cells into distinct groups sharing similar behaviours, and provides avenues to map these onto in vivo or ex vivo imaging data of the tumour microenvironment
 
 Weaknesses:
 
 - The tool does not facilitate the extraction of quantitative kinetic cellular parameters (e.g. speed, directionality, persistence and displacement) from intravital images. To use the tool researchers must first extract dynamic cellular parameters from their IVM datasets using other software including Imaris, which is expensive and therefore not available to all. Nonetheless, the authors have developed their tool to facilitate the integration of other data formats generated by open-source Fiji plugins (e.g. TrackMate, MTrackJ, ManualTracking) which will help ensure its accessibility to a broader range of researchers. - The analysis provides only preliminary evidence in support of the authors conclusions on DMG cell migratory behaviours and their relationship with components of the tumour microenvironment. The authors acknowledge this however, and conclusions are appropriately tempered in the absence of additional experiments and controls.
 
 Review 1
3. Public_Reviews 05 Jun 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The authors produce a new tool, BEHAV3D to analyse tracking data and to integrate these analyses with large and small scale architectural features of the tissue. This is similar to several other published methods to analyse spatio-temporal data, however, the connection to tissue features is a nice addition, as is the lack of requirement for coding. The tool is then used to analyse tracking data of tumour cells in diffuse midline glioma. They suggest 7 clusters exist within these tracks and that they differ spatially. They ultimately suggest that there these behaviours occur in distinct spatial areas as determined by CytoMAP.
 
 Strengths:
 
 - The tool appears relatively user-friendly and is open source. The combination with CytoMAP represents a nice option for researchers.
 
 - The identification of associations between cell track phenotype and spatial features is exciting and the diffuse midline glioma data nicely demonstrates how this could be used.
 
 Weaknesses:
 
 - The revision has dealt with many concerns, however, the statistics generated by the process are still flawed. While the statistics have been clarified within the legends and this is a great improvement in terms of clarity the underlying assumptions of the tests used are violated. The problem is that individual imaging positions or tracks are treated as independent and then analysed by ANOVA. As separate imaging positions within the same mouse are not independent, nor are individual cells within a single mouse, this makes the statistical analyses inappropriate. For a deeper analysis of this that is feasible within a review please see Lord, Samuel J., et al. "SuperPlots: Communicating reproducibility and variability in cell biology." The Journal of cell biology 219.6 (2020): e202001064. Ultimately, while this is a neat piece of software facilitating the analysis of complex data, the fact that it will produce flawed statistical analysis is a major problem. This problem is compounded by the fact that much imaging analysis has been analysed in this inappropriate manner in the past, leading to issues of interpretation and ultimately reproducibility.
 
 Review 2
4. Public_Reviews 05 Jun 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 The manuscript by Rios-Jimenez developed a software tool, BEHAV3D Tumor Profiler, to analyze 3D intravital imaging data and identify distinctive tumor cell migratory phenotypes based on the quantified 3D image data. Moreover, the heterogeneity module in this software tool can correlate the different cell migration phenotypes with variable features of the tumor microenvironment. Overall, this is a useful tool for intravital imaging data analysis and its open-source nature makes it accessible to all interested users.
 
 Strengths:
 
 An open-source software tool that can quantify cell migratory dynamics from intravital imaging data and identify distinctive migratory phenotypes that correlate with variable features of the tumor microenvironment.
 
 Weaknesses:
 
 Motility is only one tumor cell feature and is probably not sufficient to characterize and identify the heterogeneity of the tumor cell population that impacts their behaviors in the complex tumor microenvironment (TME). For instance, there are important non-tumor cell types in the TME, and the interaction dynamics of tumor cells with other cell types, e.g., ﬁbroblasts and distinct immune cells, play a crucial role in regulating tumor behaviors. BEHAV3D-TP focuses on only motility feature analysis, and cannot be applied to analyze other tumor cell dynamic features or cell-cell interaction dynamics.
 
 Review 3
5. Public_Reviews 05 Jun 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews
 
 We thank the reviewers for their positive and constructive comments on the manuscript. In the revised manuscript we addressed these comments, which we believe have improved the quality of our work.
 
 In summary:
 
 (1) We acknowledge the reviewer's suggestion to incorporate open-source segmentation and tracking functionalities, increasing its accessibility to a wider user base; however, these additions fall outside the primary scope of our current work, which is to provide an analytical framework for IVM data after segmentation and tracking. Developing open-source segmentation and tracking tools represents a substantial undertaking in its own right, which has been comprehensively explored in other studies (e.g. https://doi.org/10.4049/jimmunol.2100811; https://doi.org/10.7554/eLife.60547; https://doi.org/10.1016/j.media.2022.102358; https://doi.org/10.1038/s41592024-02295-6 - now cited in our revised manuscript).
 
 In our analyses, we used data processed with Imaris, a commercial software that, despite its limitations, is widely used by the intravital microscopy community due to its user-friendly platform for 3D image visualization and analysis. Nevertheless, recognizing the need for compatibility with tracking data from various pipelines, we have modified our tool to accept other data formats, such as those generated by open-source Fiji plugins like TrackMate, MTrackJ, ManualTracking (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input). These updates are available in our GitHub repository and are described in the revised manuscript.
 
 (2) We appreciate the reviewer #3 suggestion to incorporate additional features into our analytical pipeline. In response, we have already updated the GitHub repository to allow users to input and select which features (dynamic, morphological, or spatial) they wish to include in the analysis (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readmeov-file#feature-selection ). In the revised manuscript, we highlighted this new functionality and provided examples using alternative datasets to demonstrate the application of these features.
 
 (3) We appreciate the constructive feedback of reviewers #1 and #2 regarding the statistical analysis and interpretation of the data presented in Figures 3 and 4. We understand the importance of clarity and rigor in data analysis and presentation, and we addressed the concerns raised in the revised version of the manuscript.
 
 (4) We appreciate reviewer #1's suggestion regarding the inclusion of demo data, as we believe it would greatly enhance the usability of our pipeline. We acknowledge that this was an oversight on our part. To address this, we have now added demos to our GitHub repository (https://github.com/imAIgene-
 
 Dream3D/BEHAV3D_Tumor_Profiler/tree/BEHAV3D_TP-v2.0/demo_datasets). In the revised manuscript, we referenced this addition and present new figures with examples of these demo’s processing different IVM dataset (2D/3D, different tumors and healthy tissues). Additionally, we have provided processed DMG IVM movie samples in an imaging repository.
 
 (5) Finally, we made some small changes to the manuscript based on the reviewers’ feedback.
 
 Below we provide a point-by-point response to the reviewers’ comments
 
 Reviewer #1 (Public review):
 
 Comment #1: A key limitation of the pipeline is that it does not overcome the main challenges and bottlenecks associated with processing and extracting quantitative cellular data from timelapse and longitudinal intravital images. This includes correcting breathing-induced movement artifacts, automated registration of longitudinal images taken over days/weeks, and accurate, automated segmentation and tracking of individual cells over time. Indeed, there are currently no standardised computational methods available for IVM data processing and analysis, with most laboratories relying on custom-built solutions or manual methods. This isn't made explicit in the manuscript early on (described below), and the researchers rely on expensive software packages such as IMARIS for image processing and data extraction to feed the required parameters into their pipeline. This limitation unfortunately reduces the likely impact of BEHAV3D-TP on the IVM field.
 
 As highlighted above, the tool does not facilitate the extraction of quantitative kinetic cellular parameters (e.g. speed, directionality, persistence, and displacement) from intravital images. Indeed, to use the tool researchers must first extract dynamic cellular parameters from their IVM datasets, requiring access to expensive software (e.g. IMARIS as used here) and/or above-average computational expertise to develop and use custom-made open-source solutions. This limitation is not made explicit or discussed in the text.
 
 We acknowledge the reviewer's suggestion to incorporate open-source segmentation and tracking functionalities, increasing its accessibility to a wider user base; however, these additions fall outside the primary scope of our current work and represent a substantial undertaking in their own right. Several studies (e.g., Diego Ulisse Pizzagalli et al., J Immunol (2022); Aby Joseph et al., eLife (2020); Molina-Moreno et al., Medical Image Analysis (2022); Hidalgo-Cenalmor et al., Nat Methods (2024); Ershov et al., Nat Methods (2022)) have comprehensively addressed these topics, and we now reference them in the revised manuscript to provide readers with relevant background.
 
 The objective of our manuscript is not to develop a complete segmentation or tracking pipeline but rather to introduce an analytical framework capable of extracting enhanced insights from the data generated by existing tools. This goal arises from our observations of the field: despite significant investment in image processing, researchers often rely on simplistic approaches, such as averaging single parameters across conditions, which can obscure tumor heterogeneity and spatial behavioral dynamics within the tumor microenvironment.
 
 Our current tool focuses on providing this much-needed analytical capability. For our analysis we used Imaris, a widely utilized software in the intravital microscopy (IVM) community, known for its intuitive 3D visualization and analysis platform despite certain limitations.
 
 In our own literature search of recent IVM studies published by leading laboratories in high-impact journals, we found that close to half used Imaris, while the remainder primarily relied on manual workflows with Fiji plugins. Thus, we consider it valuable to offer a pipeline compatible with such commonly used software, given its prevalence in the field.
 
 However, following the suggestion of the reviewer, and to enhance the tool’s flexibility and compatibility, we have expanded the pipeline to accept data formats generated by open-source Fiji plugins, such as TrackMate, MTrackJ, and ManualTracking. These updates are detailed in the revised manuscript and are implemented in our GitHub repository (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ), where we also provide several demos using TrackMate and Imaris processed data. This addition demonstrates our tool's capability to integrate with segmented and tracked datasets from diverse platforms, increasing its applicability to a broader range of researchers using both commercial and open-source pipelines.
 
 Comment #2: The number of cells (e.g. per behavioural cluster), and the number of independent mice, represented in each result figure, is not included in the figure legends and are difficult to ascertain from the methods.
 
 We appreciate the reviewer's constructive feedback regarding the clarity of the number and type of replicates used in our analyses. In the revised manuscript, we have included detailed information in the figure legends and the number of independent mice represented in each figure legend to ensure transparency. Regarding the number
 
 of cells, we have indicated the total number of processed cells in Figure 2b legend (953 cells). Additionally, we have now included figures (Sup Fig 4c, Sup Fig 5e-g, Fig 5c,e, Sup Fig 6 c,d) for each cluster, where individual dots represent the individual cell tracks with color indicating the position and the shape indicating individual mice.
 
 Comment #3: The data used to test the pipeline in this manuscript is currently not available, making it difficult to assess its usability. It would be important to include this for researchers to use as a 'training dataset'.
 
 As stated above we acknowledge that this was an oversight on our part and thank the reviewer for pointing this out. To address this, we have now added demo data to our GitHub repository (BEHAV3D_Tumor_Profiler/demo_datasets at main · imAIgeneDream3D/BEHAV3D_Tumor_Profiler · GitHub). In the revised manuscript we have referenced this addition in the Data availability section. Since we included now processing with Fiji as well, we provide 4 demo datasets (https://github.com/imAIgeneDream3D/BEHAV3D_Tumor_Profiler/tree/main/demo_datasets), one processed with Imaris in 3D; and one with CellPose2.0 and Trackmate in 2D; one processed with µSAM and Trackmate in 3D and one manually processed with MtrackJ in 2D . Moreover, we now provide Imaris-processed DMG IVM movie samples in an open-source repository.
 
 Comment #4: Precisely how the BEHAV3D-TP large-scale phenotyping module can map large-scale spatial phenotyping data generated using LSR-3D imaging data and Cytomap to 3D intravital imaging movies is unclear. Further details in the text and methods would be beneficial to aid understanding.
 
 We appreciate the reviewer’s comment and in the revised manuscript we have now provided details in the methods section “Tumor large-scale spatial phenotyping with Cytomap” to clarify how the BEHAV3D-TP module maps LSR-3D and Cytomap data to 3D intravital imaging movies:
 
 “To map the assigned regions onto IVM movies, a 3D image of the cluster distribution within the tumor was generated and exported for each sample (Figure Supplement 5a). Next, regions within the IVM movies were visually matched to the corresponding regions identified by the Large-Scale Phenotyping module of Cytomap (Figure 3c). For each mouse, at least one or two representative positions per matched region type were selected, cropped, and analyzed to assess tumor cell behavior, following the previously described cell tracking methodology (Imaris Cell tracking).”
 
 Moreover, we updated Figure 3 c to further clarify these steps.
 
 Comment #5: The analysis provides only preliminary evidence in support of the authors' conclusions on DMG cell migratory behaviours and their relationship with components of the tumour microenvironment. Conclusions should therefore be tempered in the absence of additional experiments and controls.
 
 We appreciate the reviewer’s comment and acknowledge that our conclusions should be tempered due to the preliminary nature of our evidence. In the revised version of the manuscript we have revised our conclusions accordingly and emphasize the necessity for additional experiments and controls to further validate our findings on DMG cell migratory behaviors and their relationship with the tumor microenvironment.
 
 In discussion: “While our findings suggest that microenvironmental factors may influence tumor cell migration, further studies will be necessary to establish causal relationships. Additional experimental validation, such as macrophage ablation experiments, could help clarify the specific contributions of these factors.”
 
 Reviewer #1 (Recommendations for the authors):
 
 (1) To test the ability of the pipeline to identify relevant patterns of migratory behaviours additional 'control' experiments would be helpful e.g. comparing non-invasive vs invasive tumour cell lines, artificially controlling migratory behaviours of cells such as implanting beads soaked in factors that would attract/repel cells?
 
 (2) Does the pipeline work well for a variety of cell types/contexts? e.g. can it identify and cluster more subtle migratory behaviours such as non-tumour cells during tissue development or regeneration conditions?
 
 We appreciate the reviewer’s valuable suggestions. In the revised manuscript, we have included additional examples demonstrating the capability of our pipeline to investigate heterogeneous cell behavior across two additional experimental setups:
 
 (1) We have now evaluated our BEHAV3D TP heterogeneity module using IVM data from breast cancer cell lines with varying migratory capacities (DOI: 10.1016/j.yexcr.2019.04.009). In these datasets, our pipeline extends beyond predefined characteristics based solely on speed, enabling the identification of distinct cell populations. Notably, our analysis reveals that the breast cancer lines exhibit different proportions of different migratory behaviors such as Fast, Intermediate, Very slow and Static (Supplementary Fig 1).
 
 (2) We have now evaluated our BEHAV3D TP heterogeneity module using IVM data from healthy breast epithelial cells (DOI: 10.1016/j.celrep.2024.115073), where we identify distinct morhophynamic epithelial cell populations in the terminal end but of the mammary gland that have a distinct distribution among Hormone receptor (HR) + and HR- terminal end but cells.
 
 (3) To support biological conclusions could the authors show that ablating tumourassociated macrophages or vasculature alters the migratory patterns of nearby tumour cells?
 
 We appreciate the reviewer's suggestion regarding the potential effects of ablating tumor-associated macrophages or vasculature on the migratory patterns of nearby tumor cells. While these experiments would functionally validate the observations made by our method, we would like to clarify that the primary focus of our study was on the development and application of computational tools for behavioral analysis and thus we consider that delving deeper in understanding the biology behind our observation is out of the scope of the current study. However, as mentioned previously, we have carefully tempered our conclusions to acknowledge the limitations of our current study. In the revised manuscript, we explicitly highlight that experiments involving the ablation of tumor-associated macrophages or vasculature would be crucial for further understanding the biological relevance of our findings.
 
 Minor corrections to text:
 
 (4) Line 63 - are references formatted correctly?
 
 Thank you for pointing out this error. We have corrected it in the revised manuscript.
 
 (5) Lines 161 -162 - 'intravitally imaged' used twice in a sentence.
 
 Thank you for pointing out the typo. We have corrected it in the revised manuscript.
 
 Reviewer #2 (Public review):
 
 Comment#1: The strength of democratizing this kind of analysis is undercut by the reliance upon Imaris for segmentation, so it would be nice if this was changed to an open-source option for track generation.
 
 As noted in our previous response to Reviewer #1, we would like to point out that although Imaris is a commercial software, it is widely used in the intravital microscopy community due to its user-friendly interface. We conducted a literature review to evaluate this aspect and below we include references from leading laboratories in the IVM field that utilize Imaris. One of its key advantages, which we also utilized, is semi-automated data tracking that allows for manual corrections in 3D—a process that can be more challenging in other open-source software with less effective data visualization.
 
 However, we recognize that enhancing our pipeline's compatibility with open-source options is important. To this end, we have updated our tool to support 2D and 3D data formats generated by open-source Fiji plugins like TrackMate, MTrackJ, and ManualTracking, improving compatibility with various segmentation and tracking pipelines (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ). In the revised manuscript, we describe the new functionality and demonstrate the operation of the BEHAV3D-TP heterogeneity module across various IVM datasets, processed in both 2D and 3D with different processing pipelines (Supplementary Fig 1-3). This includes CellPose 2.0 and the novel 'Segment Anything' model, followed by TrackMate tracking, applied to both tumor and healthy IVM data. Moreover we have developed a new web application that integrates morphological and tracking information from Segment Anything segmentation and Trackmate tracking, depicted in Supplementary Fig 3 a (https://morphotrack-merger.streamlit.app/ ). Additionally, we have updated the introduction to better clarify the scope of our study and include references to existing image processing solutions.
 
 Comment#2: The main issue is with the interpretation of the biological data in Figure 3 where ANOVA was used to analyse the proportional distribution of different clusters. Firstly the n is not listed so it is unclear if this represents an n of 3 where each mouse is an individual or whether each track is being treated as a test unit. If the latter this is seriously flawed as these tracks can't be treated as independent. Also, a more appropriate test would be something like a Chi-squared test or Fisher's exact test. Also, no error bars are included on the stacked bar graphs making interpretation impossible. Ultimately this is severely flawed and also appears to show very small differences which may be statistically different but may not represent biologically important findings. This would need further study.
 
 We appreciate the reviewer’s insightful comments regarding the interpretation of the biological data in Figure 3.
 
 To clarify, each imaged position is considered an independent biological replicate (n = 18 from a total of 6 mice). We acknowledge that the description of the statistical methods and the experimental units was not sufficiently clear in the previous version. In our original submission, we used an ANOVA to test whether the proportion of each behavioral cluster differed across the tumor microenvironment regions. Post hoc pairwise comparisons were performed using Tukey’s test, with the results shown in Supplementary Figure 2d (currently Fig 3d). However, we agree with the reviewer that this approach may be misleading when paired with stacked bar plots that lack error bars, as it can obscure individual variability and does not explicitly represent statistical uncertainty.
 
 In the revised manuscript, we present the data as boxplots with individual data points, where each dot represents an imaged position, and the shape corresponds to a specific mouse. In Figure 3 d the y-axis displays the normalized percentage of each cluster across TME regions, expressed as z-scores. This normalization corrects for inter-mouse variability and facilitates a comparison of the relative distribution of clusters across TME regions, independent of the overall abundance differences between mice. We performed an ANOVA with Tukey's post hoc test for each individual behavioral cluster to assess differences across TME regions. Additionally, for transparency, in Supplementary Figure 5 d we provide the raw percentage values. The legends provide the number of positions and mice included in the analysis.
 
 Comment#3: Figure 4 has similar statistical issues in that the n is not listed and, again, it is unclear whether they are treating each cell track as independent which, again, would be inappropriate. The best practice for this type of data would be the use of super plots as outlined in Lord et al. (2020) JCI - SuperPlots: Communicating reproducibility and variability in cell biology.
 
 We appreciate the reviewer’s comments and suggestions regarding Figure 4. In this case as we are comparing overall the behavioral clusters features, each individual cell is treated as a unit. In the revised manuscript, we have clarified this point in the figure legend and incorporated plots in Figure 4c and 4e, indicating the mouse and imaging position each data point originates from. This enhances the visualization of reproducibility and variability in our data, demonstrating that the results are consistent across multiple mice and positions and are not driven by a single mouse or imaging position.
 
 Comment#4: The main issue that this raises is that the large-scale phenotyping module and the heterogeneity module appear designed to produce these statistical analyses that are used in these figures and, if they are based on the assumption that each track is independent, then this will produce inappropriate analyses as a default.
 
 We appreciate the reviewer’s comment, although we are unclear about the specific concern being raised. To clarify, in our large-scale phenotyping analysis, each position is assigned to a TME niche based on the CytoMAP analysis and the workflow outlined in Figure 3c. Multiple positions are imaged per mouse. For each position, we measure the proportion of tumor cells exhibiting a specific behavioral phenotype, and these proportions are subsequently used for statistical analysis (Figure 3 d).
 
 In contrast, in Supplementary Fig. 5e-g, we treat each cell track as an individual unit, grouping them by their assigned large-scale region. Here, we assess whether differences between regions can be detected using a conventional single-feature analysis—a more traditional approach. However, we find that this method loses important behavioral patterns and distinctions that BEHAV3D-TP captures.
 
 We hope that this explanation, along with the modifications made to the figures and figure legends, provides greater clarity.
 
 Reviewer #3 (Public review):
 
 Comment #1: The most challenging task of analyzing 3D time-lapse imaging data is to accurately segment and track the individual cells in 3D over a long time duration. BEHAV3D Tumor Profiler did not provide any new advancement in this regard, and instead relies on commercial software, Imaris, for this critical step. Imaris is known to have a very high error rate when used for analyzing 3D time-lapse data. In the Methods section, the authors themselves stated that "Tumor cell tracks were manually corrected to ensure accurate tracking". Based on our own experience of using Imaris, such manual correction is tedious and often required for every time step of the movie. Therefore, Imaris is not a satisfactory tool for analyzing 3D time-lapse data. Moreover, Imaris is expensive and many research labs probably can't afford to buy it. The fact that BEHAV3D Tumor Profiler critically depends on the faulty ImarisTrack module makes it unclear whether the BEHAV3D tool or the results are reliable.
 
 If the authors want to "democratize the analysis of heterogeneous cancer cell behaviors", they should perform image segmentation and tracking using open-source codes (e.g., Cellpose, Stardisk & 3DCellTracker) and not rely on the expensive and inaccurate ImarisTrack Module for the image analysis step of BEHAV3D.
 
 We appreciate the reviewer’s comments on the challenges of segmenting and tracking individual cells in 3D time-lapse imaging data. As mentioned previously (please refer to comment #1 to reviewer #1), our primary focus is to develop an analytical tool for comprehensive data analysis rather than developing tools for image processing. However to enhance accessibility, we have updated our tool to support data formats from open-source Fiji plugins, such as TrackMate, which will benefit users without access to commercial software (https://github.com/imAIgeneDream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ). In Supplementary Figures 1, 2, and 3, we present IVM data from different sources, processed using three distinct methods: MTrackJ (Supplementary Fig. 1), Cellpose + TrackMate (Supplementary Fig. 2), and µSAM + TrackMate (Supplementary Fig. 3). The latter two represent state-of-the-art deep learning approaches.
 
 On the other hand, while we recognize the limitations of Imaris, it remains widely used in the intravital microscopy community due to its user-friendly interface for 3D visualization and semi-automated segmentation capabilities. Since no perfect tracking method currently exists, we initially utilized Imaris for its ability to allow manual correction of faulty tracks, ensuring the reliability of our results. This approach, not only widely used (see above) but was the best available option when we began our analysis, allowing us to obtain accurate results efficiently.
 
 In the revised manuscript, we clarify the scope of our study and provide information on both Imaris and alternative processing options to strengthen the reliability of our findings:
 
 In introduction: “While significant efforts have been made to develop opensource segmentation and tracking tools for live imaging data, including IVM22–27 fewer tools exist for the unbiased analysis of tumor dynamics. One major barrier is that implementing such analytical methods often requires substantial computational expertise, limiting accessibility for many biomedical researchers conducting IVM experiments. To bridge this gap, we present BEHAV3D Tumor Profiler (BEHAV3D-TP) by providing a robust, user-friendly tool that allows researchers to extract meaningful insights from dynamic cellular behaviors without requiring advanced programming skills.”
 
 In the Methods, we describe now describe not only Imaris processing pipeline, but also the µSAM segmentation pipelines and reference to CellPose IVM processing, which are combined with TrackMate for tracking. Additionally, to integrate morphological information from µSAM with tracking data from TrackMate, we developed a web tool to merge the outputs from both processing steps: https://morphotrack-merger.streamlit.app/
 
 Comment #2: The authors developed a "Heterogeneity module" to extract distinctive tumor migratory phenotypes from the cell tracks quantified by Imaris. The cell tracks of the individual tumor cells are all quite short, indicating relatively low motility of the tumor cells. It's unclear whether such short migratory tracks are sufficient to warrant the PCA analysis to identify the 7 distinctive migratory phenotypes shown in Figure 2d. It's also unclear whether these 7 migratory phenotypes correspond to unique functional phenotypes.
 
 For the 7 distinctive motility clusters, the authors should provide a more detailed analysis of the differences between them. It's unclear whether the difference in retreating, slow retreating, erratic, static, slow, slow invading, and invading correspond to functional difference of the tumor cells.
 
 While some tumor cells exhibit limited motility, indicated by short tracks, others demonstrate significant migratory capabilities (Figure 2 Invading and Retreating cells). This variability in tumor cell behavior is a central focus of our analysis, and our tool is specifically designed to identify and distinguish these differences. Our PCA analysis effectively captures this variability, as illustrated in Figure 2 d-f. It differentiates between cells exhibiting varying degrees of migratory behavior, including both highly and less migratory phenotypes, as well as their directionality relative to the tumor core and the persistence of their movements. Thus, we believe that our approach provides valuable insights into the distinct migratory phenotypes within the tumor microenvironment.
 
 While our current manuscript does not provide explicit evidence linking each motility cluster to functional differences among the tumor cells, it is important to note that the state of the field supports the idea that cell dynamics can predict cell states and phenotypes. Research conducted by ourselves (Dekkers, Alieva et al., Nat Biotech, 2023) and others, such as Craiciuc et al. (Nature, 2022) and Freckmann et al. (Nat Comm, 2022) has shown that variations in cell motility patterns are indicative of underlying functional characteristics. For instance, cell morphodynamic features have been shown to reflect differences in cell types, T cell targeting states (Dekkers, Alieva et al., Nat Biotech, 2023), immune cell types (Crainiciuc et al. (Nature, 2022)), tumor metastatic potential, and drug resistance states (Freckmann et al. (Nat Comm, 2022)). In the revised manuscript, we have referenced relevant studies to underscore the biological significance of these behaviors. By doing so, we hope to clarify the potential implications of our findings and strengthen the overall narrative of our research:
 
 In discussion: “While our current study does not provide direct functional validation of the distinct motility clusters identified, existing literature strongly supports the notion that cell dynamics can serve as a proxy for functional states and phenotypic heterogeneity. Prior work, including studies by our group[19,66] as well as Crainiciuc et al.[35] and Freckmann et al.[20], has demonstrated that variations in cell motility patterns can reflect underlying functional characteristics. Specifically, cell morpho-dynamic features have been shown to correlate with differences in cell type identity, T-cell engagement, metastatic potential, and drug resistance states. This growing body of evidence suggests that tumor cell behavior, as captured by BEHAV3D-TP, may serve as a predictive tool for deciphering functional tumor heterogeneity. Future studies integrating transcriptomic or proteomic profiling of motility-defined subpopulations could further elucidate the biological significance of these behavioral phenotypes.”
 
 Comment #3: Using only motility to classify tumor cell behaviours in the tumor microenvironment (TME) is probably not sufficient to capture the tumor cell difference. There are also other non-tumor cell types in the TME. If the authors aim to develop a computational tool that can elucidate tumor cell behaviors in the TME, they should consider other tumor cell features, e.g., morphology, proliferation state, and tumor cell interaction with other cell types, e.g., fibroblasts and distinct immune cells.
 
 The authors should expand the scale of tumor behavior features to classify the tumor phenotype clusters, e.g., to include tumor morphology, proliferation state, and tumor cell interaction with other TME cell types.
 
 We believe that using dynamic features alone is sufficient to capture differences in tumor behavior, as demonstrated by our results in Figure 2. However, we appreciate the reviewer’s suggestion to consider additional features, such as cell morphology, to finetune our analyses. To this end, we have adapted our pipeline to be compatible with any dynamic, morphologic or spatial features present in the data. In the revised manuscript we showcase this new addition with the analyses of two new dataset: 2D IVM data from healthy epithelial breast cells (Supplementary Fig 2) and 3D IVM data from adult gliomas (Supplementary Fig 3). These analyses identified cells with specific morphodynamic characteristics, which exhibited distinct kinetic behaviors or spatial distributions.
 
 However, we would like to point out that not all features may provide informative insights and that a wide range of features can instead introduce biologically irrelevant noise, making interpretation more challenging. For instance, in 3D microscopy, the zaxis resolution is typically lower, which can lead to artifacts like elongation in that direction. Adding morphological features that capture this may skew the analysis. Therefore, we believe that incorporating additional features should be approached with caution. We clarify these considerations in the revised manuscript to better guide users in utilizing our computational tool effectively:
 
 In discussion: “In addition to motility-based classification, features such as tumor cell morphology, proliferation state, and interactions with the tumor microenvironment can further refine tumor phenotyping. BEHAV3D-TP allows for the selection of diverse feature types, supporting datasets that include both dynamic, morphological and spatial parameters. However, we recognize that expanding the feature set may introduce biologically irrelevant noise, particularly in 3D microscopy data where limited z-axis resolution can lead to morphological artifacts. This highlights the potential need in the future to include unbiased feature selection strategies, such as bootstrapping methods67, to ensure the identification of meaningful and biologically relevant parameters. Careful consideration of these aspects is key to maximizing the interpretability and predictive value of analyses performed with BEHAV3D-TP.”
 
 Comment #4: The authors have already published two papers on BEHAV3D [Alieva M et al. Nat Protoc. 2024 Jul;19(7): 2052-2084; Dekkers JF, et al. Nat Biotechnol. 2023 Jan;41(1):60-69]. Although the previous two papers used BEHAV3D to analyze T cells, the basic pipeline and computational steps are similar, in particular regarding cell segmentation and tracking. The addition of a "Heterogeneity module" based on PCA analysis does not make a significant advancement in terms of image analysis and quantification.
 
 We want to emphasize that we have no intention of duplicating our previous publications. In this manuscript, we have consistently cited our foundational papers, where BEHAV3D was first developed for T cell migratory analysis in in vitro settings. In the introduction, we clearly state that our earlier work inspired us to adopt a similar approach for analyzing cell behavior in intravital microscopy (IVM) data, addressing the specific needs and complexities of analyzing tumor cell behaviors in the tumor microenvironment.
 
 Importantly, our new work provides several key advancements: 1) a pipeline specifically adapted for intravital microscopy (IVM) data; 2) integration of spatial characteristics from both large-scale and small-scale phenotyping; and 3) a zero-code approach designed to empower researchers without coding skills to effectively utilize the tool. We believe that these enhancements represent meaningful progress in the analysis of cell behaviors within the tumor microenvironment which will be valuable for the IVM community. We ensure that these points are clearly articulated in the revised manuscript:
 
 In introduction: “In line with this concept of characterizing cellular dynamic properties for cell classification, we have previously developed an analytical platform termed BEHAV3D 19,21 allowing to perform behavioral phenotyping of engineered T cells targeting cancer. While BEHAV3D was initially developed to analyze T cell migratory behavior under controlled in vitro conditions, we sought to expand its application to investigate tumor cell behaviors in IVM data, where the complexity of the TME presents distinct analytical challenges. This manuscript builds on our foundational work but represents a significant advancement by adapting the pipeline specifically for IVM datasets.”
 
 Reviewer #3 (Recommendations for the authors):
 
 (1) If the authors want to "democratize the analysis of heterogeneous cancer cell behaviors", they should perform image segmentation and tracking using open-source codes (e.g., Cellpose, Stardisk & 3DCellTracker) and not rely on the expensive and inaccurate ImarisTrack Module for the image analysis step of BEHAV3D.
 
 We thank the reviewer for this recommendation and as stated above we recognize that enhancing our pipeline's compatibility with open-source options is important. To this end, we have updated our tool to support data formats generated by open-source Fiji plugins like TrackMate, MTrackJ, and ManualTracking, improving compatibility with various segmentation and tracking pipelines (https://github.com/imAIgeneDream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ). In the revised manuscript, we detail this new functionality and demonstrate the operation of the BEHAV3D-TP heterogeneity module using an example dataset of glioma tumors.
 
 Additionally, we have updated the introduction to better clarify the scope of our study (See comment #1 from Review #3) and include references to existing image processing solutions.
 
 (2) For the 7 distinctive motility clusters, the authors should provide a more detailed analysis of the differences between them. It's unclear whether the difference in retreating, slow retreating, erratic, static, slow, slow invading, and invading correspond to functional difference of the tumor cells.
 
 As noted in the comment above, the revised manuscript now incorporates references to relevant literature that support our understanding that behavioral differences among cells are driven by their underlying functional differences (See comment #2 from Reviewer #3). Additionally, we would like to point to Figure 2d and Supplementary Fig 4 c that provide evidence of the functional distinctions between the identified clusters.
 
 (3) The authors should expand the scale of tumor behavior features to classify the tumor phenotype clusters, e.g., to include tumor morphology, proliferation state, and tumor cell interaction with other TME cell types.
 
 We thank the reviewer for this valuable suggestion. In the revised manuscript, we have added the flexibility to incorporate a wide range of features, including morphological ones, and enabled users to select the specific features they wish to include in their analysis. To illustrate this functionality, we have included 2 example dataset analyzed using this approach (See comment #3 from Reviewer #3). Additionally, as indicated above we emphasize the importance of careful selection and interpretation of features, as improper choices may lead to biologically irrelevant results. This clarification is intended to ensure that users apply the tool thoughtfully and derive meaningful insights.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.08.23.609358v2
www.biorxiv.org www.biorxiv.org

Redistribution of fragmented mitochondria ensure symmetric organelle partitioning and faithful chromosome segregation in mitotic mouse zygotes

5
1. Public_Reviews 05 Jun 2025
 
 in eLife
 
 eLife Assessment
 
 This important study investigates the role of Drp1 in early embryo development. The authors have addressed most of the original comments and the work now presents convincing evidence on how this protein influences mitochondrial localization and partitioning during the first embryonic divisions. The research employs the Trim-Away technique to eliminate Drp1 in zygotes, revealing critical insights into mitochondrial clustering, spindle formation, and embryonic development.
 
 Summary
2. Public_Reviews 05 Jun 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 Gekko, Nomura et al., show that Drp1 elimination in zygotes using the Trim-Away ttechnique leads to mitochondrial clustering and uneven mitochondrial partitioning during the first embryonic cleavage, resulting in embryonic arrest. They monitor organellar localization and partitioning using specific targeted fluorophores. They also describe the effects of mitochondrial clustering in spindle formation and the detrimental effect of uneven mitochondrial partitioning to daughter cells.
 
 Strengths:
 
 The authors have gathered solid evidence for the uneven segregation of mitochondria upon Drp1 depletion through different means: mitochondrial labelling, ATP labelling and mtDNA copy number assessement in each daughter cell. Authors have also characterised the defects in cleavage mitotic spindles upon Drp1 loss
 
 Weaknesses:
 
 This study convincingly describes the phenotype seen upon Drp1 loss. However, it remains descriptive. Further studies should be conducted to elucidate the mechanism by which Drp1 ensures even mitochondrial partitioning during the first embryonic cleavage.
 
 Review 1
3. Public_Reviews 05 Jun 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Gekko et al investigate the impact of perturbing mitochondrial during early embryo development, through modulation of the mitochondrial fission protein Drp1 using Trim-Away technology. They aimed to validate a role for mitochondrial dynamics in modulating chromosomal segregation, mitochondrial inheritance and embryo development and achieve this through the examination of mitochondrial and endoplasmic reticulum distribution, as well as actin filament involvement, using targeted plasmids, molecular probes and TEM in pronuclear stage embryos through the first cleavages divisions. Drp1 deletion perturbed mitochondrial distribution, leading to asymmetric partitioning of mitochondria to the 2-cell stage embryo, prevented appropriate chromosomal segregation and culminated in embryo arrest. Resultant 2-cell embryos displayed altered ATP, mtDNA and calcium levels. Microinjection of Drp1 mRNA partially rescued embryo development. A role for actin filaments in mitochondrial inheritance is described, however the actin-based motor Myo19 does not appear to contribute.
 
 Overall, this study builds upon their previous work and provides further support for a role of mitochondrial dynamics in mediating chromosomal segregation and mitochondrial inheritance. In particular, Drp1 is required for redistribution of mitochondria to support symmetric partitioning and support ongoing development.
 
 Strengths: The study is well designed, the methods appropriate and the results clearly presented. The findings are nicely summarised in a schematic.
 
 The addition of further quantification, including mitochondrial cluster size, elongation/aspect ratio and ROS, as requested by the reviewers, has provided further evidence for the impact of Drp1 depletion on mitochondrial morphology and function.
 
 Understanding the role of mitochondria in binucleation and mitochondrial inheritance is of clinical relevance for patients undergoing infertility treatment, particularly those undergoing mitochondrial replacement therapy.
 
 Weaknesses (original manuscript): The authors first describe the redistribution of mitochondria during normal development, followed by alterations induced by Drp1 depletion. It would be useful to indicate time post-hCG for imaging of fertilised zygotes (first paragraph of the results/Figure 1) to compare with subsequent Drp1 depletion experiments.
 
 It is noted that Drp1 protein levels were undetectable 5h post-injection, suggesting earlier times were not examined, yet in Figure 3A it would seem that aggregation has occurred within 2 hours (relative to Figure 1).
 
 Mitochondria appear to be slightly more aggregated in Drp1 fl/fl embryos than in control, though comparison with untreated controls does not appear to have been undertaken. There also appears to be some variability in mitochondrial aggregation patterns following Drp1 depletion (Figure 2-suppl 1 B) which are not discussed.
 
 The authors use western blotting to validate the depletion of Drp1, however do not quantify band intensity. It is also unclear whether pooled embryo samples were used for western blot analysis.
 
 Likewise, intracellular ROS levels are examined however quantification is not provided. It is therefore unclear whether 'highly accumulated levels' are of significance or related to Drp1 depletion.
 
 In previous work, Drp1 was found to have a role as a spindle assembly checkpoint (SAC) protein. It is therefore unclear from the experiments performed whether aggregation of mitochondria separating the pronuclei physically (or other aspects of mitochondrial function) prevents appropriate chromosome segregation or whether Drp1 is acting directly on the SAC.
 
 Weaknesses (revised manuscript):
 
 The only remaining weakness is that the authors have not undertaken additional experiments to clarify any role for mitochondrial transport following Drp1 depletion.
 
 Review 2
4. Public_Reviews 05 Jun 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Why mitochondria are finely maintained in the female germ cell (oocyte), zygotes, and preimplantation embryos? Mitochondrial fusion seems beneficial in somatic cells to compensate for unhealthy mitochondria, for example, mitochondria with mutated mtDNA that potentially defuel the respiratory activity if accumulated above a certain threshold. However, in the germ cells, it may rather increase the risk of transmitting mutated mtDNA to the next generation. Also, finely maintained mitochondria would also be beneficial for efficient removal when damaged, as authors briefly discussed. Due in part to the limited suitable model, physiological role of mitochondrial fission in embryos were obscure. In this study, authors demonstrated that mitochondrial fission prevents multiple adverse outcomes, especially including the aberrant demixing of parental genome (a clinical phenotype of human embryos) in zygotic stage. Thus, this study would be also of clinical importance that could contribute by proposing a novel mechanism.
 
 After reading through the comments of other reviewers, what authors could potentially improve their manuscript had been largely summarized in three following points.
 
 (1) Authors would better clarify whether a loss of Drp1 contributes to the chromosome segregation defects directly (e.g. checking SAC-like activity) or indirectly (aggregated mitochondria became physically obstacle; maybe in part getting the cytoskeleton involved).
 
 (2) Although the level of Myo19 may not be so high (given the low level of TRAK2 in oocytes: Lee et al. PNAS 2024, PMID 38917013), authors would better further clarify the effect of Myo19-Trim with timelapse (e.g. EB3-GFP/Mt-DsRed) and EM analysis (detailed mitochondrial architecture).
 
 (3) Authors would better clarify phenotypic heterogeneity/variety regarding the degree of alteration in mitochondrial morphology/ architecture dependent on the levels of Drp1 loss with detailed quantification of EM images to address why aggregation of mitochondria in Drp1-/- parthenote (possibly, more likely Drp1 protein-free) looks different/weaker than Trim-awayed one. Employment of the parthenotes of Trim-awayed MII oocytes might also complement the further discussion.
 
 The revised preprinted have addressed all the points described above. Authors have also adequately indicated the limitations at each of the specific points. Revisions authors made have consolidated their conclusion, thus still, making this study an excellent one.
 
 Review 3
5. Public_Reviews 05 Jun 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews
 
 Reviewer #1 (Public review):
 
 We thank reviewer 1 for the helpful comments. As indicated in the responses below, we have taken all comments and suggestions into consideration in this revised version of the manuscript.
 
 Weaknesses:
 
 While this study convincingly describes the phenotype seen upon Drp1 loss, my major concern is that the mechanism underlying these defects in zygotes remains unclear. The authors refer to mitochondrial fragmentation as the mechanism ensuring organelle positioning and partitioning into functional daughters during the first embryonic cleavage. However, could Drp1 have a role beyond mitochondrial fission in zygotes? I raise these concerns because, as opposed to other Drp1 KO models (including those in oocytes) which lead to hyperfused/tubular mitochondria, Drp1 loss in zygotes appears to generate enlarged yet not tubular mitochondria. Lastly, while the authors discard the role of mitochondrial transport in the clustering observed, more refined experiments should be performed to reach that conclusion.
 
 It would be difficult to answer from this study whether Drp1 plays a role beyond mitochondrial fission in zygotes. However, the reasons why Drp1 KO zygotes differ from the somatic Drp1 KO model can be discussed as follows.
 
 First, the reviewer mentioned that the loss of Drp1 in oocytes leads to hyperfused/tubular mitochondria, but in fact, unlike in somatic cells, the EM images in Drp1 KO oocytes show enlarged mitochondria rather than tubular structures (Udagawa et al., Curr Biol. 2014, PMID: 25264261, Fig. 2C and Fig. S1B-D), as in the case of zygotes in this study. Mitochondria in oocytes/zygotes have the shape of a small sphere with an irregular cristae located peripherally. These structural features may be the cause of insensitivity or resistance to inner membrane fusion the resultant failure to form tubular mitochondria as seen in somatic cell models. Nonetheless, quantitative analysis of EM images in the revised version confirmed that the mitochondria of Drp1-depleted embryos were not only enlarged but also significantly elongated (Figure 2J-2M). Therefore, in Drp1-depleted embryos, significant structural and functional (e.g., asymmetry between daughters) changes in mitochondria were observed, and these are expected to lead to defects in the embryonic development.
 
 As for mitochondrial transport, we do not fully understand the intent of this question, but we do not entirely rule out mitochondrial transport. At least clustered mitochondria did not disperse again, but how mitochondria behave through the cytoskeleton within clusters will require further study, as the reviewer pointed out.
 
 Reviewer #1 (Recommendations For The Authors):
 
 (1) The authors show no effect of Myo19 Trim-Away, yet it remains unclear whether myo19 is involved in the positioning of mitochondria around the spindle. Judging by their co-localization during that stage, it might be. Therefore, in the absence of myo19, mitochondria might remain evenly distributed throughout mitosis, thus passively resulting in equal partitioning to daughter cells, with no severe developmental defects. Could the authors show a video of the whole process and discuss it?
 
 We have newly performed live imaging of mitochondria and chromosomes in Myo19 Trim-Away zygotes (n=13). As shown in Figure 1-figure supplement 2 and Figure 1-Video 2, there were no obvious changes in mitochondrial (and chromosomal) dynamics throughout the first cleavage and no significant mitochondrial asymmetry was observed, Therefore, we conclude that depletion of Myo19 does not cause mitochondrial asymmetry during embryonic cleavage. These results are described in the revised manuscript (Line 218-221).
 
 (2) Mitochondrial aggregation upon Drp1 depletion should be characterized in more detail: for example, % of mitochondria free, % in small clusters (> X diameter), and % in big clusters (>Y diameter).
 
 In the revised version, mitochondrial aggregation has been quantified by comparing the cluster size and number in control, Drp1 Trim-Away and Drp1 Trim-Away embryos expressing exogenous Drp1 (mCh-Drp1) (Figure 2G, 2H). In control embryos, mitochondria were interspersed in a large number of small clusters, while in Drp1-depleted embryos, mitochondria became highly aggregated into a small number of large clusters that was reversed by expression of mCh-Drp1. These results are described in the revised manuscript (Line 242-245).
 
 (3) The discrepancies with parthenogenetic embryos derived from Drp1 (-/-) parthenotes should be commented on. Quantification of the dimensions of the clusters would help establish the degree of similarity/difference. Could the authors comment on their hypothesis as to why the clusters are remarkably larger in Drp1 depleted zygotes?
 
 In the revised version, we have quantified the mitochondrial aggregation in Drp1 KO parthenotes (Figure 2-figure supplement 1; the data for Drp1 KO parthenotes has been reorganized into the supplemental figure, due to lack of space in figure 2 caused by the addition of quantitative data for Drp1 Trim-Away embryos). The size of mitochondrial clusters in Drp1 KO parthenotes was significantly increased compared to controls, but as the reviewer noted, mitochondrial aggregation appears to be moderate compared to that in Drp1-depleted embryos. The phenotypic discrepancies in two Drp1-deficient embryo models is discussed below.
 
 First, it is clear that phenotypic severity of Drp1 KO oocytes is dependent on the age of the female. Indeed, oocytes collected from 8-week-old female arrested meiosis after NEB, mainly due to marked mitochondrial aggregation (Udagawa et al., Curr Biol. 2014, PMID: 25264261), whereas oocytes from juvenile female completed meiosis (Adhikari et al., Sci Adv. 2022, PMID: 35704569), and thus Drp1 KO pathenotes were obtained from juvenile female in the present study. Comparison of mitochondrial morphology in Drp1 KO oocytes in both papers also suggests that mitochondrial aggregation in adult mice is more intense (Udagawa et al., Curr Biol. Fig. 2A) than in juvenile mice (Adhikari et al., Sci Adv. 2022: Fig. 1G, 1H), and appears to be similar to Drp1-depleted embryos in this study (Figure 2E). There may be differences in the level of Drp1 depletion in these Drp1-deficient oocytes/zygotes. Similar results occurring between juvenile and adult KO female have been reported in a previous paper (Yueh et al., Development 2021, PMID: 34935904), as adult-derived Smac3Δ/Δ<?sup> zygotes arrested at the 2-cell stage, whereas juvenile-derived Smac3Δ/Δ<?sup> zygotes have developmental competence comparable to the wild type. Remarkably, the SMC3 protein levels in juvenile Smac3Δ/Δ<?sup> oocytes was also comparable to Smc3fl/fl. The authors surmised that the decline maternal SMC3 between juvenile and sexual maturity is probably due to the continuous induction of the promoter-Cre driver, suggesting that similar induction may also occur in Drp1 KO oocytes. In addition, we also observed not only age differences but also batch differences in Drp1 KO oocytes (and resulting embryos) such that little mitochondrial aggregation was observed in oocytes collected from some juvenile KO colonies. Therefore, for KO models showing age (sexual maturation)-dependent gradual phenotypic changes, Trim-way may be an approach that provides more reproducible results as it induces acute degradation of maternal proteins.
 
 (4) Mitochondrial clusters in Drp1 trim-away zygotes resemble those seen when defects in mitochondrial positioning are obtained by TRAK2 induction (PMID: 38917013), pointing again to a role of actin in the clustering process. Could the authors explore the role of actin further?
 
 TRAK2 and microtubule-dependent mechanisms may also be involved in mitochondrial dynamics during the first cleavage division, possibly in association with migration of two pronuclei. Although the mitochondrial aggregation induced by TRAK2 overexpression is similar to that in Drp1-depleted embryos, it is unlikely that changes at the EM level occurred as seen in Drp1-depleted embryos (enlarged mitochondria, etc.). In addition, in TRAK2-overexpressing embryos, rather than uneven partitioning of mitochondria, the daughter blatomeres themselves were uneven in size after cleavage, making it difficult to precisely assess the similarity between the two models.
 
 Regarding the role of F-actin, we show that the subcellular distribution of cytoplasmic actin overlaps with that of mitochondria throughout the first cleavage and seems to accumulate in aggregated mitochondria, particularly during the mitotic phase, as higher correlation was observed (Figure 1E). Although it was not observed that actin and the myo19 motor regulate mitochondrial partitioning, as reported in somatic cell-based studies, it is possible that actin accumulated in mitochondria may be indirectly involved in mitochondrial dynamics via mitochondrial fission. For example, inverted formin 2 (INF2) enhance actin polymerization and is required for efficient mitochondrial fission as an upstream function of Drp1 (Korobova et al., Science 2013, PMID: 23349293). In the revised manuscript, we have added the description on this point. (Line 452-456)
 
 (5) Electron microscopy images showed indeed aberrant morphology of the mitochondria, yet not a hyperfused morphology. Aspect ratio (long/short axis) quantification should be included, besides the current measurement, since mitochondria in Drp1 trim-away look bigger yet as round as in the control.
 
 In the revised version, detailed quantitative data on EM images has been added (Figure 2J-2M). In Drp1 depleted embryos, significant increases were observed in both the major and minor axes of mitochondria. As the reviewer noted, we also assumed that mitochondria in depleted embryos were enlarged rather than elongated, but the quantification of aspect ratio shows that significant elongation occurred. These results has been described in the revised manuscript (Line 252-256).
 
 (6) Why are mitochondria in golgi-mcherry-expressing cells showing a different morphology of the clusters?
 
 As noted by the reviewer, compared to other mitochondrial images, Drp1-depleted embryos expressing Golgi-mCherry appear to have less mitochondrial aggregation. The exact reason is not known, but may be due to inter-lot variation of Trim21 mRNA used in this experiment. Nevertheless, substantial mitochondrial aggregation was observed compared to the control, which does not seem to affect the conclusion.
 
 (7) Authors comment on ROS being enriched (highly accumulated) in mitochondria. However, while quantification is missing, it might seem that ROS are equally distributed in control or Drp1 Trim-Away embryos. Could the authors quantify ROS signal inside and outside of the mitochondria, perhaps using a mask drawn by mitotracker? Furthermore, it would make these data more convincing to artificially induce/deplete ROS to validate the sensitivity of the technique to variations. Also, why is ROS pattern referred to as ectopic?
 
 Thank you for your useful suggestions. In the revised version, masked binary images were created from mitochondrial images to quantify ROS levels inside and outside mitochondria (Line 734-741). The result shows the accumulation of ROS to mitochondria in Drp1-depleted embryos (Figure 4-figure supplement 1E). The term ectopic was used to mean excessive accumulation of ROS in the mitochondria compared to normal embryos, but has been deleted as it is not very accurate.
 
 Minor comments:
 
 (A) Video 1: images at t=-00:20 and t=00:00 of the mtGFP are actually the same images as H2B-mCherry.
 
 Probably a faulty filter/shutter control failed to capture GFP fluorescence at these times. It appears that the autocontrast function detected a small amount of mCherry fluorescence leakage. It would be possible to replace it with another video, but as the relevant frame were unrelated to the analysis, the previous video was used as is. The same problem also occurs in the newly added Myo19-depleted zygote movie (Figure 1-Video 2, 03:15).
 
 (B) Could you calculate the degree of colocalization between mt-GFP and ER-mCherry in ctrl and Drp1 trim-away? While it is apparent that ER is somehow more associated with mitochondrial clusters, it would be informative to quantify it.
 
 Since the ER is partially confined to the mitochondrial aggregation site, it was difficult to calculate correlation coefficients from fluorescence images of mt-GFP and ER-mCherry to quantitatively assess colocalization. Instead, line scan analysis of whole mitochondrial clumps showed that the peak of the ER-mCherry signal overlaps with that of mt-GFP, but this is not the case for Golgi-mCherry or peroxisome-mCherry (Figure 2-figure supplement 2A-2C).
 
 (C) Regarding the developmental arrest: The quantification of the different stages at each developmental time could be more informative. For example, at E4.5 how many embryos are at each stage (2-cell, 4-cell, ... blastocyst)? Also, could the authors comment on the reduction in developmental competence in Figure 4C, regarding the blastocyst stage?
 
 Many arrested embryos do not maintain their morphologies and undergo a unique degenerative process over time, known as cell fragmentation. Therefore, it is difficult to accurately determine the number of each developmental stage at, for example, E4.5 days. In this study, the 2-cell stage was observed at E1.5, the 4-8 cell at E2.5-E3.0, morula at E3.5 and the blastocyst at E4.5.
 
 Although the rate of embryos reaching the blastocyst stage was reduced compared to that of normal embryos, the overexpression of mCh-Drp1 may explain the failure of complete restoration of developmental competence, since embryos injected solely with mCh-Drp1 mRNA also showed reduced developmental competence. For rescue experiments, the comparison with internal controls is more important and therefore we described below. This is a specific effect of Drp1 deletion because none of the internal control conditions increased arrest at the 2-cell stage and arrest was completely reversed by microinjecting Trim-away insensitive exogenous mCh-Drp1 mRNA (Line 337-340).
 
 (D) In lines 103 to 105, proliferation should be changed to division or development.
 
 In the revised version, proliferation has been changed to division (Line 103).
 
 (E) Could the authors reference the statement in lines 168-169?
 
 The following 3 references have been added (Hardy et al., 1993, PMID: 8410824; Meriano et al., 2004, PMID: 15588469; Seikkula et al., 2018, PMID: 29525505).
 
 (F) Line 448: "Cells lacking Drp1 have highly elongated mitochondria that cannot be divided into transportable units,..." This is clearly not the case for zygotes, so why are then these mitochondria still clustering and not transported elsewhere?
 
 Although it is difficult to answer this reviewer's question precisely, EM images of Drp1-depleted embryos suggest that individual mitochondria appear not only to be enlarged but also to have increased outer membrane attachment due to excessive aggregation. Thus, these large mitochondrial clumps may therefore be preventing transport.
 
 Reviewer #2 (Public review):
 
 We thank reviewer 2 for the helpful comments. As indicated in the responses below, we have taken all comments and suggestions into consideration in this revised version of the manuscript.
 
 Weaknesses:
 
 The authors first describe the redistribution of mitochondria during normal development, followed by alterations induced by Drp1 depletion. It would be useful to indicate the time post-hCG for imaging of fertilised zygotes (first paragraph of the results/Figure 1) to compare with subsequent Drp1 depletion experiments.
 
 In the revised version, the time after hCG has been indicated (Line 176-182). In subsequent Drp1 depletion experiments, the revised version notes that “no significant delay in cell cycle progression was observed following Drp1 depletion (data not shown) compared to control embryos (Figure 1A)” (Line 291-193). There was a slight discrepancy in the time post-hCG between live imaging and immunofluorescence analysis (Figure 1-figure supplement 1A), which may be due to manipulation of zygotes outside incubator during the microinjection of mRNA.
 
 It is noted that Drp1 protein levels were undetectable 5h post-injection, suggesting earlier times were not examined, yet in Figure 3A it would seem that aggregation has occurred within 2 hours (relative to Figure 1).
 
 As the reviewer pointed out, the depletion of Drp1 is likely to have occurred at an earlier stage. In this study, due to the injection of various mRNAs to visualize organelles such as mitochondria and chromosomes, observations were started after about 5 h of incubation for their fluorescent proteins to be sufficiently expressed. Therefore, for the Western blot analysis, samples were prepared according to the time of the start of the observation.
 
 Mitochondria appear to be slightly more aggregated in Drp1 fl/fl embryos than in control, though comparison with untreated controls does not appear to have been undertaken. There also appears to be some variability in mitochondrial aggregation patterns following Drp1 depletion (Figure 2-suppl 1 B) which are not discussed.
 
 In the revised version, mitochondrial aggregation has been quantified by comparing the cluster size and number in control, Drp1 Trim-Away and Drp1 Trim-Away embryos expressing exogenous Drp1 (mCh-Drp1) (Figure 2G, 2H). We have also quantified the mitochondrial aggregation in Drp1fl/fl and Drp1Δ/Δ parhenotes (Figure 2-figure supplement 1; note that the data for Drp1 KO parthenotes has been reorganized into the supplemental figure, due to lack of space in figure 2 caused by the addition of quantitative data for Drp1 Trim-Away embryos). Mitochondria appear to be slightly more aggregated in Drp1fl/fl embryos than in control, but no significant differences in cluster size or number were observed (data not shown). On the other hand, mitochondrial clusters in Drp1 Trim-Away embryos were remarkably larger than Drp1Δ/Δ parhenotes, Please refer to the response to reviewer 1's comment (3) for discussion of this discrepancy.
 
 As noted by the reviewer, compared to other mitochondrial images, Drp1-depleted embryos expressing Golgi-mCherry appear to have less mitochondrial aggregation. The exact reason is not known, but may be due to inter-lot variation of Trim21 mRNA used in this experiment. Nevertheless, substantial mitochondrial aggregation was observed compared to the control, which does not seem to affect the conclusion.
 
 The authors use western blotting to validate the depletion of Drp1, however do not quantify band intensity. It is also unclear whether pooled embryo samples were used for western blot analysis.
 
 In the revised version, the band intensities in Western blot analysis were quantified and validated the previous results (Figure 1H for Myo19 depletion, Figure 2B for Drp1 expression during preimplantation development, Figure 2D for Drp1 depletion). The number of embryos analyzed was described in Figure legends (Pooled samples ranging from 20 to 100 were used).
 
 Likewise, intracellular ROS levels are examined however quantification is not provided. It is therefore unclear whether 'highly accumulated levels' are of significance or related to Drp1 depletion.
 
 In the revised version, masked binary images were created from mitochondrial images to quantify ROS levels inside and outside mitochondria (Line 734-741). The result shows the accumulation of ROS to mitochondria in Drp1-depleted embryos (Figure 4-figure supplement 1E).
 
 In previous work, Drp1 was found to have a role as a spindle assembly checkpoint (SAC) protein. It is therefore unclear from the experiments performed whether aggregation of mitochondria separating the pronuclei physically (or other aspects of mitochondrial function) prevents appropriate chromosome segregation or whether Drp1 is acting directly on the SAC.
 
 In the revised manuscript, we have discussed this reference (Zhou et al., Nature Communications, PMID: 36513638) (Line 482-483).
 
 Reviewer #2 (Recommendations For The Authors):
 
 The authors report that disruption of F-actin organization led to asymmetry in mitochondrial inheritance, however depletion of Myo19 does not impact inheritance. The authors note in the discussion that loss of another mitochondrial motor protein, Miro, has been shown to affect mitochondrial inheritance. They suggest this may be due to reduced levels of Myo19, despite data from the present study suggesting a lack of involvement of Myo19. Given that Miro1 also interacts with microtubules, and crosstalk between actin filaments and microtubules has been reported, have the authors considered whether other motor proteins, such as KIF5, may be involved in mitochondrial movement in the zygote and therefore inheritance? Myo19 also plays a role in mitochondrial architecture. Were any differences noted at the EM level?
 
 During oocyte meiosis and early embryonic cleavage, kinesin-5 has been reported to be important for the formation of bipolar spindles (Fitzharris, Curr Biol., 2009, PMID: 19465601) and may have some involvement in mitochondrial dynamics. Given that the migration of two pronuclei towards the zygotic centre is dynein-dependent manner (Scheffler Nat Commun. 2021PMID: 33547291), dynein may also be involved in the process of mitochondrial accumulation around the pronuclei. Nevertheless, whether microtubule-dependent mechanisms regulate mitochondrial partitioning remains controversial. Mitochondria basically diverge from microtubules at the onset of mitosis, and indeed Miro1-deleted zygotes did not show the asymmetric mitochondrial partitioning (Lee et al., Front Cell Dev Biol. 2022, PMID: 36325364). More recently, it was reported that overexpression of TRAK2 causes significant mitochondrial aggregation in embryos (Lee et al., Proc Natl Acad Sci U S A. 2024, PMID: 36325364), but since overexpression might disrupt a regulatory balance by other motors/adaptor complexes, further investigation using TRAK2-deficient embryos is expected.
 
 As noted by the reviewer, myo19 seems to be important for the maintenance of mitochondrial cristae architecture and, consequently, for the regulation of mitochondrial function (Shi et al., Nat Commun. 2022, PMID: 35562374). We have not observed the EM images in myo19-depleted embryos, but we examined their membrane potential and ROS by TMRM and H2DCF staining, respectively, and confirmed that they were comparable to control embryos (data not shown). The loss of myo19 in zygotes/embryos did not cause any functional changes in mitochondria, suggesting that mitochondrial architecture may not be substantially affected either.
 
 Transcriptomic analysis would be useful to identify alterations in cell cycle checkpoint regulators, as well as immunofluorescence to identify changes in spindle assembly checkpoint protein recruitment.
 
 The present results showed that the majority of Drp1-depleted embryos arrest at the G2 stage, possibly due to cell cycle checkpoint mechanisms. Transcriptome analysis would certainly be beneficial, but eventually more detailed analysis of proteins and their phosphorylation modifications, etc. is needed for accurate assessment. These studies will be the subject of future work.
 
 Minor comments:
 
 There are many instances where the English could be improved, particularly the overuse of the word 'the'.
 
 We have checked the manuscript again carefully and hopefully it has been improved some.
 
 Line 144: replace 'took' with 'take'.
 
 We have corrected this in the revised version (Line 140).
 
 Line 157: it is unclear what is meant by 'hinders the functional importance of Drp1 in mature oocytes and embryos'.
 
 This description has been corrected to “complicates the functional analysis of Drp1 in mature oocytes and embryos” (Line 152-153)
 
 Line 198: replace with 'displayed a mitochondrial distribution pattern closely associated with'
 
 We have corrected this in the revised version (Line 195-196).
 
 Line 200: provide a time to clarify when the cytoplasmic meshwork was 'subsequently reorganized'
 
 In the revised version, “at the metaphase” has been added (Line 198).
 
 Line 204: replace 'to' with 'for'
 
 We have corrected this in the revised version (Line 203).
 
 Lines 285-87: consider rearranging the text to improve the flow.
 
 To improve the flow of text before and after, the following sentence has been added; We postulated that this asymmetry was due to non-uniformity in the distribution of mitochondria around the spindle (Line 295-297)
 
 Line 418: replace 'central' with 'centre'
 
 We have corrected this in the revised version (Line 430).
 
 Line 427: replace 'pertaining' with 'partitioning'
 
 We have corrected this in the revised version (Line 438).
 
 Line 574: clarify to what '1-5% of that of the oocytes' refers
 
 We have corrected it to “1-5% of the total volume of the zygote.” (Line 587-588).
 
 Line 619: indicate the dilution used
 
 We apologize for the previous incorrect description. We used a part of the extract as the template, not a dilution, and have corrected it to be accurate (Line 631-632).
 
 Line 634: replace 'on' with 'in' and detail in which medium embryos were mounted.
 
 We have corrected this in the revised version (Line 647).
 
 Please check all spelling in the figures.
 
 Figure 1J - inheritance is spelt incorrectly.
 
 Figure-Suppl 1, D: Interphase (PN) and (2-cell) is spelt incorrectly. G: inheritance is spelt incorrectly.
 
 Figure 5F - bottom section prior to cytokinesis, spindle is spelt 'spincle'
 
 Ensure consistency in abbreviation use (e.g. use of NEB and NEBD).
 
 Thank you for your careful correction of typographical errors. In the revised version, all points raised by the reviewers have been corrected.
 
 Reviewer #3 (Public review):
 
 We thank reviewer 2 for the helpful comments. As indicated in the responses below, we have taken all comments and suggestions into consideration in this revised version of the manuscript.
 
 Seemingly, there are few apparent shortcomings. Following are the specific comments to activate the further open discussion.
 
 Line 246: Comments on cristae morphology of mitochondria in Drp1-depleted embryos would better be added.
 
 In the revised manuscript, we have added the following comment; swollen or partially elongated mitochondria with lamella cristae structures in the inner membrane were observed in Drp1 depleted embryos. In addition, the quantification of aspect ratio (long/short axis) shows that significant mitochondrial elongation was occurred (Figure 2M). These results has been described in the revised manuscript (Line 251-256).
 
 - Regarding Figure 2H: If possible, a representative picture of Ateam would better be included in the figure. As the authors discussed in line 458, Ateam may be able to detect whether any alterations of local energy demand occurred in the Drp1-depleted embryos.
 
 Thank you for your very useful comments. Although it would be interesting to investigate whether alterations in ATP levels occurred in localized areas (e.g., around the spindle), the present study used conventional fluorescence microscope instead of confocal laser microscopy to observe ATeam fluorescence in order to quantify the fluorescence intensity in the whole embryo (or whole blastomere) and thus we currently cannot provide the images that reviewer expected. As shown in Figure-figure supplement 1C, the ATP levels tend to be higher at the cell periphery in control and at the mitochondrial aggregation areas in Drp1-depleted embryos, but it would need high resolution images using confocal microscopy to show it clearly.
 
 - Line 282: In Figure 3-Video 1, mitochondria were seemingly more aggregated around female pronucleus. Is it OK to understand that there is no gender preference of pronuclei being encircled by more aggregated mitochondria?
 
 Review of multiple videos shows that aggregated mitochondria were localized toward the cell center, but did not exhibit the behavior of preferentially concentrating near the female pronucleus.
 
 - Line 317: A little more explanation of the "variability" would be fine. Does that basically mean that the Ca2+ response in both Drp1-depleted blastomeres were lower than control and blastomere with more highly aggregated mitochondria show severer phenotype compared to the other blastomere with fewer mito?
 
 We think that the reviewer's comments are mostly correct. It is clear that there is a bias in Ca2+ store levels between blastomeres of Drp1 depleted embryos, However, since mitochondria were not stained simultaneously in this experiment, we cannot draw conclusions in detail, such that daughter blastomere that inherit more mitochondria have higher Ca2+ stores, or that blastomere with more aggregated mitochondria have lower Ca2+ stores.
 
 - Regarding Figure 5B (& Figure 1-figure supplement 1B): Do authors think that there would be less abnormalities in the embryos if Drp1 is trim-awayed after 2-cell or 4-cell, in which mitochondria are less involved in the spindle?
 
 The marked centration of mitochondrial clusters in Drp1-depleted embryos appears to be associated with migration of the pronuclei toward the cell center, which is unique to the first embryonic cleavage. Since the assembly of the male and female pronuclei at the cell center is also unique to the first cleavage, binucleation due to mitochondrial misplacement was observed only in the first cleavage. Therefore, if Drp1 is depleted at the 2-cell or 4-cell stage, chromosome segregation errors may be less frequent. However, since unequal partitioning of mitochondria is thought to occur, some abnormalities in embryonic development is likely to be observed.
 
 Reviewer #3 (Recommendations For The Authors):
 
 Specific comments
 
 - Line 262: "Since mitochondrial dynamics are spatially coordinated at the ER-mitochondria MCSs," adequate ref. would better be added.
 
 We have added an adequate reference to the revised manuscript (Friedman et al., 2011, PMID: 21885730).
 
 - Line 333-336: "...as assessed by the presence of the nuclear envelope." Do authors show the data? In Figure 4-figure supplement 1A, the difference of the phosphoH3-ser10 signal between control and Trim-Away group might be weak. For clarity, it would be helpful if authors indicate the different points to note in the figure.
 
 Although the data is not shown, nuclear staining of arrested 2-cell stage embryos exhibited clear nuclear membranes, similar to the DAPI image in Figure 4-figure supplement 1A. We have indicated that the data is not shown in the revised version (Line 345). Based on a report that phosphorylated histone H3 (Ser10) localizes in pericentromeric heterochromatin that hat can be visualized by DAPI staining in late G2 interphase cell (Hendzel et al., 1997, Chromosoma, PMID: 9362543), this study qualitatively estimated the G2 phase from the phosphorylated histone H3 signal and the DAPI counterstained images. We have noted this point in the revised figure legend (Line 1012-1014).
 
 Typos or points for reword/rephrase
 
 - Line 149: "molecular identification" may better be " molecular characteristics".
 
 We have corrected this in the revised version (Line 145).
 
 - Line 157: "hinders the functional importance" would be "implies the functional importance" or "complicates the functional analysis".
 
 We have corrected this in the revised version (Line 152-153).
 
 - Line 208: "Since the role of F-actin in many cellular events, such as cytokinesis, preclude them as targets for experimentally manipulating mitochondrial distribution, " may better be "Given many cellular roles, disruption of F-actin per se was unsuitable as a strategy for manipulating mitochondrial distribution", for example.
 
 We have corrected this in the revised version (Line 207-208).
 
 - Line 260: "with MCSs with the plasma.." may better be "with MCSs such as with the plasma..".
 
 We have corrected this in the revised version (Line 267-268).
 
 - Line 312: "distribution and segregation" may better be "distribution and the resulting segregation of the inter-organelle contacts".
 
 We have corrected this in the revised version (Line 324-325).
 
 - Line 427: "pertaining" might be "partitioning".
 
 We have corrected this in the revised version (Line 438).
 
 Line 463: "loss of Drp1 induced mitochondrial aggregation disturbs" may better be "mitochondrial aggregation induced by the loss of Drp1 disturbs".
 
 We have corrected this in the revised version (Line 478-479).
 
 - Line 752: "endoplasmic reticulum (pink) " would be " endoplasmic reticulum (aqua) ".
 
 We have corrected this in the revised version (Line 780).
 
 - Figure 5E: "(Noma 2-cell embryos)" would be "(Nomal 2-cell embryos)".
 
 - Figure 5F: "Mitochondrial centration prevents dual spincle assembly" would be "Mitochondrial centration prevents dual spindle assembly".
 
 Thank you for your careful correction of typographical errors. We have corrected all the words/expressions the reviewer pointed out in the revised version.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.06.13.598818v2
www.biorxiv.org www.biorxiv.org

Old age variably impacts chimpanzee engagement and efficiency in stone tool use

4
1. Public_Reviews 05 Jun 2025
 
 in eLife
 
 eLife Assessment
 
 This valuable study provides a novel framework for leveraging longitudinal field observations to examine the effects of aging on stone tool use behaviour in wild chimpanzees. The methods and results are robust providing solid evidence of the effects of old age on nut cracking behaviour at this field site. Despite the low sample size of five individuals, this study is of broad interest to ethologists, primatologists, archaeologists, and psychologists.
 
 Summary
2. Public_Reviews 05 Jun 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 Howard-Spink et al. investigated how older chimpanzees changed their behavior regarding stone tool use for nutcracking over a period of 17 years, from late adulthood to old age. This behavior is cognitively demanding, and it is a good target for understanding aging in wild primates. They used several factors to follow the aging process of five individuals, from attendance at the nut-cracking outdoor laboratory site to time to select tools and efficiency in nut-cracking to check if older chimpanzee changed their behavior.
 
 Indeed, older chimpanzees reduced their visits to the outdoor lab, which was not observed in the younger adults. The authors discuss several reasons for that; the main ones being physiological changes, cognitive and physical constraints, and changes in social associations. Much of the discussion is hypothetical, but a good starting point, as there is not much information about senescence in wild chimpanzees.
 
 The efficiency for nut-cracking was variable, with some individuals taking a long time to crack nuts while others showed little variance. As this is not compared with the younger individuals and the sample is small (only five individuals), it is difficult to be sure if this is also partly a normal variance caused by other factors (ecology) or is only related to senescence.
 
 Strengths:
 
 (1) 17 years of longitudinal data in the same setting, following the same individuals.
 
 (2) Using stone tool use, a cognitively demanding behavior, to understand the aging process.
 
 Weaknesses:
 
 A lack of comparison of the stone tool use behavior with younger individuals in the same period, to check if the changes observed are only related to age or if it is an overall variance. The comparison with younger chimpanzees was only done for one of the variables (attendance).
 
 Comments on Revised Version (from BRE):
 
 The authors have now added to the manuscript that they did not have sufficient data to compare additional variables to younger chimpanzees, and therefore compared intra-individual variation across field seasons. They have also explained that nut hardness, although not measured, was largely controlled for due to the experimental nature of the 'outdoor laboratory' whereby only nuts of a suitable maturity (and hardness) are provided to the chimpanzees. The discussion now also includes mention of other ecological variables and their potential influence on the results.
 
 Review 1
3. Public_Reviews 05 Jun 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 Primates are a particularly important and oft-applied model for understanding the evolution of, e.g., life history and senescence in humans. Although there is a growing body of work on aging in primates, there are three components of primate senescence research that have been underutilized or understudied: (1) longitudinal datasets, (2) wild populations, and (3) (stone) tool-use behaviors. Therefore, the goal of this study was to (1) use a 17-year longitudinal dataset (2) of wild chimpanzees in the Bossou forest, (3) visiting a site for field experiments on nut-cracking. They sampled and analyzed data from five field seasons for five chimpanzees of old age. From this sample, Howard-Spink and colleagues noted a decline in tool-use and tool-use efficiency in some individuals, but not in others. The authors then conclude that there is a measurable effect of senescence on chimpanzee behavior, but that it varies individually. The study has major intellectual value as a building block for future research, but there are several major caveats.
 
 Strengths:
 
 With this study, Howard-Spink and colleagues make a foray into a neglected topic of research: the impact of the physiological and cognitive changes due to senescence on stone tool use in chimpanzees. Based on novelty alone, this is a valuable study. The authors cleverly make use of a longitudinal record covering 17 years of field data, which provides a window into long-term changes in the behavior of wild chimpanzees, which I agree cannot be understood through cross-sectional comparisons.
 
 The metrics of 'efficiency' (see caveats below) are suitable for measuring changes in technological behavior over time, as specifically tailored to the nut-cracking (e.g., time, number of actions, number of strikes, tool changes). The ethogram and the coding protocol are also suitable for studying the target questions and objectives. I would recommend, however, the inclusion of further variables that will assist in improving the amount of valid data that can be extrapolated (see also below).
 
 With this pilot, Howard-Spink and colleagues have established a foundation upon which future research can be designed, including further investigation with the Bossou dataset and other existing video archives, but especially future targeted data collection, which can be designed to overcome some of the limits and confounds that can be identified in the current study.
 
 Weaknesses:
 
 Although I agree with the reasoning behind conducting this research and understand that, as the authors state, there are logistical considerations that have to be made when planning and executing such a study, there are a number of methodological and theoretical shortcomings that either need to be more explicitly stated by the authors or would require additional data collection and analysis.
 
 One of the main limitations of this study is the small sample size. There are only 5 of the old-aged individuals, which is not enough to draw any inferences about aging for chimpanzees more generally. Howard-Spink and colleagues also study data from only five of the 17 years of recorded data at Bossou. The selection of this subset of data requires clarification: why were these intervals chosen, why this number of data points, and how do we know that it provides a representative picture of the age-related changes of the full 17 years?
 
 With measuring and interpreting the 'efficiency' of behaviors, there are in-built assumptions about the goals of the agents and how we can define efficiency. First, it may be that efficiency is not an intentional goal for nut-cracking at all, but rather, e.g., productivity as far as the number of uncrushed kernels (cf. Putt 2015). Second, what is 'efficient' for the human observer might not be efficient for the chimpanzee who is performing the behavior. More instances of tool-switching may be considered inefficient, but it might also be a valid strategy for extracting more from the nuts, etc. Understanding the goals of chimpanzees may be a difficult proposition, but these are uncertainties that must be kept in mind when interpreting and discussing 'decline' or any change in technological behaviors over time.
 
 For the study of the physiological impact of senescence of tool use (i.e., on strength and coordination), the study would benefit from the inclusion of variables like grip type and (approximate) stone size (Neufuss et al., 2016). The size and shape of stones for nut-cracking have been shown to influence the efficacy and 'efficiency' of tool use (i.e., the same metrics of 'efficiency' implemented by Howard-Spink et al. in the current study), meaning raw material properties are a potential confound that the authors have not evaluated.
 
 Similarly, inter- and intraspecific variation in the properties of nuts being processed is another confound (Falótico et al., 2022; Proffitt et al., 2022). If oil palm nuts were varying year-to-year, for example, this would theoretically have an effect on the behavioral forms and strategies employed by the chimpanzees, and thus, any metric of efficiency being collected and analyzed. Further, it is perplexing that the authors analyze only one year where the coula nuts were provided at the test site, but these were provided during multiple field seasons. It would be more useful to compare data from a similar number of field seasons with both species if we are to study age-related changes in nut processing over time (one season of coula nut-cracking certainly does not achieve this).
 
 Both individual personality (especially neophilia versus neophobia; e.g., Forss & Willems, 2022) and motivation factors (Tennie & Call, 2023) are further confounds that can contribute to a more valid interpretation of the patterns found. To draw any conclusions about age-related changes in diet and food preferences, we would need to have data on the overall food intake/preferences of the individuals and the food availability in the home range. The authors refer briefly to this limitation, but the implications for the interpretation of the data are not sufficiently underlined (e.g., for the relevance of age-related decline in stone tool-use ability for individual survival).
 
 Generally speaking, there is a lack of consideration for temporal variation in ecological factors. As a control for these, Howard-Spink and colleagues have examined behavioral data for younger individuals from Bossou in the same years, to ostensibly show that patterns in older adults are different from patterns in younger adults, which is fair given the available data. Nonetheless, they seem to focus mostly on the start and end points and not patterns that occur in between. For example, there is a curious drop in attendance rate for all individuals in the 2008 season, the implications of which are not discussed by the authors.
 
 As far as attendance, Howard-Spink and colleagues also discuss how this might be explained by changes in social standing in later life (i.e., chimpanzees move to the fringes of the social network and become less likely to visit gathering sites). This is not senescence in the sense of physiological and cognitive decline with older age. Instead, the reduced attendance due to changes in social standing seems rather to exacerbate signs of aging rather than be an indicator of it itself. The authors also mention a flu-like epidemic that caused the death of 5 individuals; the subsequent population decline and related changes in demography also warrant more discussion and characterization in the manuscript.
 
 Understandably, some of these issues cannot be evaluated or corrected with the presented dataset. Nonetheless, these undermine how certain and/or deterministic their conclusions can really be considered. Howard-Spink et al. have not strongly 'demonstrated' the validity of relationships between the variables of the study. If anything, their cursory observations provide us with methods to apply and hypotheses to test in future studies. It is likely that with higher-resolution datasets, the individual variability in age-related decline in tool-use abilities will be replicated. For now, this can be considered a starting point, which will hopefully inspire future attempts to research these questions.
 
 Falótico, T., Valença, T., Verderane, M. & Fogaça, M. D. Stone tools differences across three capuchin monkey populations: food's physical properties, ecology, and culture. Sci. Rep. 12, 14365 (2022). Forss, S. & Willems, E. The curious case of great ape curiosity and how it is shaped by sociality. Ethology 128, 552-563 (2022). Neufuss, J., Humle, T., Cremaschi, A. & Kivell, T. L. Nut-cracking behaviour in wild-born, rehabilitated bonobos (Pan paniscus): a comprehensive study of hand-preference, hand grips and efficiency. Am. J. Primatol. 79, e22589 (2016). Proffitt, T., Reeves, J. S., Pacome, S. S. & Luncz, L. V. Identifying functional and regional differences in chimpanzee stone tool technology. R. Soc. Open Sci. 9, 220826 (2022). Putt, S. S. The origins of stone tool reduction and the transition to knapping: An experimental approach. J. Archaeol. Sci.: Rep. 2, 51-60 (2015). Tennie, C. & Call, J. Unmotivated subjects cannot provide interpretable data and tasks with sensitive learning periods require appropriately aged subjects: A Commentary on Koops et al. (2022) "Field experiments find no evidence that chimpanzee nut cracking can be independently innovated". ABC 10, 89-94 (2023).
 
 Comments on Revised Version (from BRE):
 
 The authors have revised their methods to clarify why certain field seasons were chosen and have clarified aspects of their analysis relevant to this reviewer's concerns. The coula nut cracking data and results which were of a single season have now been restricted to the Supplementary. The revised discussion now includes a much more detailed limitations section including both ecological factors but also the effects of social aging. Stone tool size, grip and other factors are also acknowledged as being potentially important for measuring efficiency but the authors were unable to include in this study due to the nature of the dataset.
 
 Review 2
4. Public_Reviews 05 Jun 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews
 
 The main criticisms levied by both reviewers can be traced down to our use of a long-term video archive to assess for the effects of aging on individual chimpanzees over extended time periods. Specifically, the reviewers raised several points surrounding whether we could exclude ecological variation over years as the explanation of changes with aging, rather than aging itself. Whilst we acknowledge there are limitations to our approach, we provide a comprehensive response to these points highlighting:
 
 (1) Where ecological variables have been accounted for using controls (including the behaviors of other individuals, or an aging individuals’ behavior at younger ages).
 
 (2) Where ecological data may be missing, thus a potential limitation to our study, and further data would be beneficial.
 
 (3) Whether, in light of these limitations, interannual ecological variation offers a likely explanation for the behavioral changes we have identified. We provide an argument that whilst ecological data would be desirable for our study, interannual changes in ecology are unlikely to explain the trends in our data. Additionally, we explain why age-related changes, such as senescence, are more likely to underpin the patterns described in our manuscript.
 
 Across 1-3, we have made substantial changes to the reporting of our manuscript to ensure that our results are communicated transparently, and conclusions are made with appropriate care. We have also moved all discussion of coula-nut cracking to the supplementary materials, given the points raised by reviewers about the lack of data describing coula-nut cracking in earlier field seasons.
 
 We hope that these modifications will enhance both the editors’ and reviewers’ assessment of our manuscript, where we have aimed to make careful conclusions that are supported by our available data. Similarly, we have aimed to communicate the importance of our results across fields of research including primatology, evolutionary anthropology, and comparative gerontology, and hope that our research will be of use to further studies within these subfields.
 
 Reviewer 1 (Recommendations for the authors):
 
 (1) If possible, include results or a summary of the behaviour of younger adults using stone tools during the same period. It would be helpful to know if they had the same or different pattern to exclude other factors that may influence the tool use (harder nuts in a particular season, diseases, motivation for other foods, etc).
 
 We include data for other individuals when analyzing attendance. However, we did not collect comparable long-term efficiency data on younger adult individuals for this study. This is, in part, due to the time constraints imposed by long-term behavior coding. Additionally, only one adult was both present at Bossou throughout the 1999-2016 period, and younger than the threshold for our old-age category across these years (thus, the baseline used to compare with older adults would be just one younger adult, thus would not have been useful for characterizing normal variation of many younger adults over time). However, given the longitudinal data we present, we can use data from the earlier field seasons for each elderly focal individual as a personalized baseline control. Previous studies at Bossou find that across the majority of adulthood, efficiency varies between individuals, but is stable within individuals over time (e.g., Berdugo et al. 2024, cited). We detected similar stability in individuals’ efficiency over the first three field seasons sampled in our analysis, where there was very little intra-individual variation in tool-using efficiency. However, in later years, two individuals (Velu & Yo) began to exhibit relatively large reductions in efficiency.
 
 These results are unlikely to be explained by ecological variation. If there was a change in ecology underpinning our results, we would expect: [1] changes in ecology to also introduce variation in earlier field seasons, and [2] to influence all individuals in our study similarly. As such, if the changes observed in later field seasons were due to ecological changes, they should have caused a reduced efficiency across individuals, and to a similar degree – we did not observe this result, with large reductions in efficiency were confined to two individuals. Moreover, for Yo (the individual who exhibited the largest reduction in efficiency) we found some additional evidence that changes in oil-palm-nut cracking efficiency extended beyond the period we sampled, i.e. they were evident even in 2018, reflecting a long-term, directional reduction in efficiency as compared to earlier years of her life. This consistent reduction in tool-using efficiency over multiple years adds further weight to the hypothesis that changes at the level of the individual were causing reduced tool-using efficiency, rather than our results being underpinned by interseasonal variation in ecology.
 
 Whilst we agree that our study is limited in the extent to which we can analytically assess ecological explanations for changes in nut-cracking efficiency, we believe that hypothetical ecological changes across field seasons do not predict our results. We now raise both sides of this debate in our discussion, where we outline our limitations (see lines 535-593).
 
 (2) The data from 2011 was scarce, with only one individual having 10 encounters. It would be better to be cautious with this season's results.
 
 We appreciate this limitation raised by the reviewer. Velu and Yo were only encountered a few times in 2011; however, both were encountered more frequently in 2016. For 2011, we did not collect oil-palm nut cracking data for either Yo or Velu. Thus, their change in efficiency was detected by models using data from all other years, regardless of the few encounters in 2011. This sparsity of data may still have influenced our metrics for the proportion of time chimpanzees spent engaging in different behaviors when present at the outdoor laboratory in 2011, particularly for Velu, who was one of the two individuals who exhibited a change in behavior in this year (along with Fana, N = 10 for 2011). We have therefore added a line in our results and discussion highlighting the sparsity of data for Velu when estimating these proportions for 2011 (see lines 255-256 & 410).
 
 Minor corrections
 
 (1) The last paragraph of the introduction presents many results, which should be in the results section.
 
 We would like to keep this section of the introduction. Our paper investigates the effect of aging on many different aspects of nut cracking, which could become confusing for readers unless laid out clearly. We believe that having a short summary early on in the paper assists readers with following the methods and arguments presented within our paper.
 
 (2) The first section (Sampled data) of the results contains much information that belongs in the methods section.
 
 We appreciate that there is some overlap between our methods and results section. However as the results section comes before the methods in our manuscript, we wanted to ensure that there is suitable information in our results that allow our results to be interpreted clearly by readers, and that the methods used to generate these results are transparently communicated. For these reasons, we will leave this information in the results, as we believe it increases our paper’s readability.
 
 Reviewer 2 (Public review):
 
 One of the main limitations of this study is the small sample size. There are only 5 of the old-aged individuals, which is not enough to draw any inferences about aging for chimpanzees more generally. Howard-Spink and colleagues also study data from only five of the 17 years of recorded data at Bossou. The selection of this subset of data requires clarification: why were these intervals chosen, why this number of data points, and how do we know that it provides a representative picture of the age-related changes of the full 17 years?
 
 We note that our sample size is limited to 5 individuals. This is an inevitable constraint of analyzing aging longitudinally in long-lived species, as only few individuals will live to old age. We argue that 17 years is a long enough period of study, as in the initially sampled field season (1999) focal individuals are reaching a mature age of adulthood (39-44 years) and begin to age progressively up to ages that are typically considered to be on the extreme side for chimpanzees’ lifespans in the wild (56-61 years). We raise in our methods that whilst it is difficult to determine precisely when chimpanzees become ‘old aged’, previous studies use the age of around 40 years, as from this age survivorship begins to decrease more rapidly (see Wood et al., Science 2023). Indeed, one focal individual (Tua) disappeared during the period of our study (presumed dead), and one other individual died in 2017 (Velu), the year after our final sampled field season. As of 2025, two other focal females have since died, and only one focal individual was still alive at Bossou (Jire, the individual exhibiting the least evidence for senescence over our study period). These observations suggest that we successfully captured data from chimpanzees during the oldest ages of their lives for most individuals in the community. Moreover, the period of 1999-2016 contains the majority of data available within the Bossou Archive, with years before and after this window containing comparably less data. This information is included within our results and methods (see sections 2.1 and 4.1).
 
 For our earliest field season (1999), it is unlikely that senescence had already had an effect on stone-tool use, as we measured efficiency to be high across all efficiency metrics for all individuals. For example, in 1999, the median number of hammer strikes performed by focal chimpanzees ranged from 2-4 strikes, and this was comparable to the efficiency reported across all adults observed in previous studies at Bossou (Biro et al. 2003, Anim. Cog.). This finding suggests that senescence effects had not yet taken place, allowing us to evaluate whether aging affects efficiency over subsequent field seasons. This point is now included in the manuscript on lines 449-452.
 
 We sampled at 4-to-5-year intervals to balance the time-intensive nature of fine-scale behavior coding against the need to sample data across the extended 17-year time window available in our study. We limited the final year to 2016 as, in following years, data were collected using different sampling protocols (though, see limited data from 2018 in the supplementary materials). We aimed to keep the intervals between years as consistent as possible (approx. 4 years); however, for some years data were not collected at Bossou, due to disease outbreaks in the region. In these instances, we selected the closest field season where suitable data were available for study (always +/- 1 year). We have provided further clarification surrounding our sampling regime in the methods (see amendments in section 4.1)
 
 With measuring and interpreting the 'efficiency' of behaviors, there are in-built assumptions about the goals of the agents and how we can define efficiency. First, it may be that efficiency is not an intentional goal for nut-cracking at all, but rather, e.g., productivity as far as the number of uncrushed kernels (cf. Putt 2015). Second, what is 'efficient' for the human observer might not be efficient for the chimpanzee who is performing the behavior. More instances of tool-switching may be considered inefficient, but it might also be a valid strategy for extracting more from the nuts, etc. Understanding the goals of chimpanzees may be a difficult proposition, but these are uncertainties that must be kept in mind when interpreting and discussing 'decline' or any change in technological behaviors over time.
 
 We agree that knowing precisely how chimpanzees perceive their own efficiency during tool use is unlikely to be available through observation alone. However, under optimal foraging theory, it is reasonable to assume that animals aim to economize foraging behaviors such that they maximize their rate of energy intake. Moreover, a wealth of studies demonstrate that adult chimpanzees acquire and refine tool-using skill efficiency throughout their lives. For example, during nut cracking, adults often select tools with specific properties that aid efficient nut cracking (Braun et al. 2025, J. Hum. Evol.; Carvalho et al. 2008, J. Hum. Evol.; Sirianni et al. 2015, Anim. Behav.); perform nut cracking using more streamlined combinations of actions than less experienced individuals (Howard-Spink et al. 2024, Peer J; Inoue-Nakamura & Matsuzawa 1997, J. Comp. Psychol.), and as a result end up cracking nuts using fewer hammer strikes, indicating a higher level of skill (Biro et al. 2003, Anim. Cogn.; Boesch et al. 2019, Sci. Rep.). Ultimately, these factors suggest that across adulthood, experienced chimpanzees perform nut cracking with a level of efficiency which exceeds novice individuals, including across the whole behavioral sequence for tool use, even if they are not aware or intending to do so. Previous studies at Bossou have also highlighted that there are stable inter-individual differences in efficiency of individuals over time (Berdugo et al. 2024, Nat. Hum. Behav.). This pattern of findings allows us to ask whether this acquired level of skill is stable across the oldest years of an individual’s life, or whether some individuals experience decreased efficiency with age. In addition, our selection of efficiency metrics is in keeping with a wealth of studies which examine the efficiency of stone-tool using in apes, thus, we argue that this is not problematic for our study.
 
 As we stated in our initial responses to reviewers, it is unlikely that tool switching is a valid strategy for tool use, as it is so rarely performed by proficient adult nut crackers (including earlier in life for our focal individuals). Nevertheless, we did not find a significant change in tool switching for oil-palm nut cracking, and this behavioral change was only observed when Yo was cracking coula nuts. As we have now moved discussion of coula nut cracking to the supplementary materials (and tempered discussion of coula nut cracking to emphasize the need for more data) this behavioral variable does not influence our reported results.
 
 In our discussion, we also highlight how seemingly less efficient actions may reflect a valid strategy for nut cracking. E.g. a greater number of tool strikes may reflect a strategy of compensation for progressive tool wear. This would still reflect a reduced efficiency (e.g. in terms of the rate at which kernels can be consumed), but may perhaps borne for the necessity to accommodate for changes in an individuals’ physical affordances with aging. Thus, we do take the Reviewer’s point into account, but by using an alternative, more likely, example given the available data. We have now emphasized this point in lines 521-527.
 
 We have also clarified these matters by adding more information into our methods (see lines 798-802 and 828-829), highlighting that we take a perspective on efficiency that reflects the speed of nut processing and kernel consumption, and the number of different behavioral elements required to do so. Our phrasing now explicitly avoids using language that assumes that individuals’ have some perception of their own efficiency during tool use.
 
 For the study of the physiological impact of senescence of tool use (i.e., on strength and coordination), the study would benefit from the inclusion of variables like grip type and (approximate) stone size (Neufuss et al., 2016). The size and shape of stones for nut-cracking have been shown to influence the efficacy and 'efficiency' of tool use (i.e., the same metrics of 'efficiency' implemented by Howard-Spink et al. in the current study), meaning raw material properties are a potential confound that the authors have not evaluated.
 
 We did not collect this data as part of our study. Whilst grip type could be a useful variable to measure for future studies, it is not necessary to demonstrate senescence per se. However, we agree that this could be a fruitful avenue to understand changes in behavior at greater granularity, and have added this as a recommendation for further study. We also now provide a discussion on stone dimensions and materials as part of our limitations (see lines 581-589 for both points).
 
 Similarly, inter- and intraspecific variation in the properties of nuts being processed is another confound (Falótico et al., 2022; Proffitt et al., 2022;). If oil palm nuts were varying year-to-year, for example, this would theoretically have an effect on the behavioral forms and strategies employed by the chimpanzees, and thus, any metric of efficiency being collected and analyzed. Further, it is perplexing that the authors analyze only one year where the coula nuts were provided at the test site, but these were provided during multiple field seasons. It would be more useful to compare data from a similar number of field seasons with both species if we are to study age-related changes in nut processing over time (one season of coula nut-cracking certainly does not achieve this).
 
 We have moved all discussion of coula nuts to the supplementary materials so as to avoid any confusion with oil-palm nuts (see comments from Reviewer 2, and our response). Nut hardness may influence the difficulty with which nuts are cracked, with one of the most likely factors influencing nut hardness being its age: young nuts are relatively harder to crack, whereas older nuts, which are often worm-eaten or can be empty, crack more easily, yet are not worth cracking (Sakura & Matsuzawa, 1991; Ethology). We largely controlled for this in our study, as the nuts provided at outdoor laboratories were inspected to ensure that the majority of them were of suitable maturity for cracking, and we now clarify this control in our methods (see lines 678-680) and when discussing our study limitations (see lines 551-558). In these sections, we also highlight a previous study at Bossou that shows chimpanzees select nuts which can be readily cracked, based on their age (Sakura & Matsuzawa, 1991; Ethology).
 
 We acknowledge that we are limited in the extent to which we can control for interannual variation in ecology with our available data. However, we highlight why interannual variability is unlikely to fully explain our results (see lines 551-580 and response to comments from Reviewer 1). We also highlight in our limitations section that future studies should (where possible) aim to collect more ecological data to account for possible confounds more rigorously.
 
 Both individual personality (especially neophilia versus neophobia; e.g., Forss & Willems, 2022) and motivation factors (Tennie & Call, 2023) are further confounds that can contribute to a more valid interpretation of the patterns found. To draw any conclusions about age-related changes in diet and food preferences, we would need to have data on the overall food intake/preferences of the individuals and the food availability in the home range. The authors refer briefly to this limitation, but the implications for the interpretation of the data are not sufficiently underlined (e.g., for the relevance of age-related decline in stone tool-use ability for individual survival).
 
 In our discussion, we highlight that multiple aging factors may influence apes’ dietary preferences and motivations to attend experimental (and perhaps also naturally-occurring) nut cracking sites (see lines 397-443 and 542-550). We do not believe that neophobia is a likely driver underlying our results, given that the outdoor laboratory has been used to collect data for many decades, including over a decade prior to the first field season in which data were sampled for our study (now highlighted in lines 692-694). In addition, previous studies at Bossou have determined that the outdoor laboratory is visited with comparable frequency to naturallyoccurring nut cracking sites, which makes any form of novelty bias unlikely (this information is now included in our methods, see lines 397-400, and also 687-689).
 
 We agree that further information is required about foraging behaviours across the home range to understand changes in attendance at the outdoor laboratory, and have now provided more clarity on this within the limitations section of our discussion 542-550. In our discussion of individual survivability, we state clearly that we cannot make a conclusion about how changes in tool use influence survival with the available data, and assert that this would require data across the home range (see lines 627-638). We agree that future research is needed to assess whether changes in tool use would influence survivability, and also suggest that it may not be survival-relevant; instead changes in tool use with aging may simply be a litmus test for detecting more generalized senescence.
 
 Generally speaking, there is a lack of consideration for temporal variation in ecological factors. As a control for these, Howard-Spink and colleagues have examined behavioral data for younger individuals from Bossou in the same years, to ostensibly show that patterns in older adults are different from patterns in younger adults, which is fair given the available data. Nonetheless, they seem to focus mostly on the start and end points and not patterns that occur in between. For example, there is a curious drop in attendance rate for all individuals in the 2008 season, the implications of which are not discussed by the authors.
 
 As the reviewer points out, when examining the attendance rates of older individuals over sampled field seasons, we used the attendance rates of younger individuals as a control. However, we do not run this analysis using start and end points only. Attendance rates were included in our model across the full range of sample field seasons. However, as the key result here is an interaction term between age cohort (old) and the field season (scaled about the mean), we supplement this significant statistical result with a digestible comparison of attendance rates between the first and last field season, to give a general sense of effect size. We have clarified that all data were used in our model (see line 229, and also the legend for Table 2), and in this section we also provide all key model outputs and signpost where the full model output can be found in the supplementary materials.
 
 As far as attendance, Howard-Spink and colleagues also discuss how this might be explained by changes in social standing in later life (i.e., chimpanzees move to the fringes of the social network and become less likely to visit gathering sites). This is not senescence in the sense of physiological and cognitive decline with older age. Instead, the reduced attendance due to changes in social standing seems rather to exacerbate signs of aging rather than be an indicator of it itself. The authors also mention a flu-like epidemic that caused the death of 5 individuals; the subsequent population decline and related changes in demography also warrant more discussion and characterization in the manuscript.
 
 We have adapted this part of the discussion to make it clear that social aging is not necessarily equivalent to physiological and cognitive aging. We have also clarified in this section the changes in demography at Bossou during our study, which may have further impacted social behaviors (see lines 423-443).
 
 Understandably, some of these issues cannot be evaluated or corrected with the presented dataset. Nonetheless, these undermine how certain and/or deterministic their conclusions can really be considered. Howard-Spink et al. have not strongly 'demonstrated' the validity of relationships between the variables of the study. If anything, their cursory observations provide us with methods to apply and hypotheses to test in future studies. It is likely that with higher-resolution datasets, the individual variability in age-related decline in tool-use abilities will be replicated. For now, this can be considered a starting point, which will hopefully inspire future attempts to research these questions.
 
 We thank the reviewer for their comments. We have adapted our manuscript to highlight that we agree that it serves a starting point for answering these valuable questions; however, we do feel that we can contribute meaningful evidence that it is likely aging effects underlying the findings in our data (see responses above). We agree with the reviewer that further study is needed to understand these questions in more detail, and have tried to ensure that our conclusions are suitably tempered, and the recommendations for research are heavily encouraged to build on our findings.
 
 Falótico, T., Valença, T., Verderane, M. & Fogaça, M. D. Stone tools differences across three capuchin monkey populations: food's physical properties, ecology, and culture. Sci. Rep. 12, 14365 (2022).
 
 This has now been cited.
 
 Forss, S. & Willems, E. The curious case of great ape curiosity and how it is shaped by sociality. Ethology 128, 552-563 (2022).
 
 We do not cite this – see above.
 
 Neufuss, J., Humle, T., Cremaschi, A. & Kivell, T. L. Nut-cracking behaviour in wild-born, rehabilitated bonobos (Pan paniscus): a comprehensive study of hand-preference, hand grips and efficiency. Am. J. Primatol. 79, e22589 (2016).
 
 This has now been cited.
 
 Proffitt, T., Reeves, J. S., Pacome, S. S. & Luncz, L. V. Identifying functional and regional differences in chimpanzee stone tool technology. R. Soc. Open Sci. 9, 220826 (2022).
 
 This has now been cited.
 
 Putt, S. S. The origins of stone tool reduction and the transition to knapping: An experimental approach. J. Archaeol. Sci.: Rep. 2, 51-60 (2015).
 
 We do not cite this, as we instead cite studies which highlight chimpanzees’ ability to become more efficient in tool use with repeated practice (see above).
 
 Tennie, C. & Call, J. Unmotivated subjects cannot provide interpretable data and tasks with sensitive learning periods require appropriately aged subjects: A Commentary on Koops et al. (2022) "Field experiments find no evidence that chimpanzee nut cracking can be independently innovated". ABC 10, 89-94 (2023).
 
 We do not cite this – see above
 
 Reviewer #2 (Recommendations for the authors):
 
 Minor Comments:
 
 (1) Line 494: Citation #53 is listed twice.
 
 This has been amended.
 
 (2) Line 501: The term 'culturally-dependent' as used here is, at best, controversial, and at worst, misapplied. I would recommend replacing it with simply the term 'cultural'.
 
 This has been changed to ‘cultural’.
 
 Major Comments:
 
 For the Introduction, in the paragraph starting on Line 91, and the Discussion, starting on Line 369, I would recommend some simple re-structuring of the argumentation. As many in the Public Review, the changes in social standing according to age are not necessarily a case of senescence in the very sense of physiological or cognitive changes of the individual. This seems to have had an effect on attendance rates, which then could have been a driver of behavioral changes and even cognitive decline as ostensibly measured by the other variables. The social impact of aging should be mentioned in the Introduction (it is not currently) and the social and physiological/cognitive effects of aging should be separated in the Discussion. You can then discuss more clearly how the former via other behavioral changes can accelerate the latter (or not).
 
 We take the point raised about social aging. Integrating information about social aging into the introduction was challenging without disrupting the flow of the paper; however, we have included these valuable points in the discussion (see lines 423-443). We now structure this section to clearly distinguish social aging, and discuss how, in tandem with changes in demography at Bossou, it may have influenced rates of attendance to the outdoor laboratory over the years. We do not go into detail about how social aging may interact with physiological or cognitive effects of aging, as we cannot support this with the available data, however we highlight at the end of this paragraph how all of these possible factors require further investigation.
 
 For the present study, it will either be impossible or impractical to gather data on the yearly ecological conditions, contextualized dietary preferences, individual personalities, etc., so I would not ask that you do so. It is important, however, to temper some of the claims being made in the manuscript about what you have 'determined' about the nature of senescence in chimpanzees and to be more transparent about the limitations and potential confounds when interpreting the data. To avoid repetition, the key points can be found in the Public Review under 'Weaknesses'.
 
 We appreciate the reviewer’s understanding of the limitations of our study. Some of these factors – such as individual personalities and dietary preferences – are addressed somewhat by our use of long-term data at the level of the individual, particularly in the analyses of efficiency, where we model individuals’ behaviors compared to those in earlier years offers an individuallybespoke control. However, there are other ecological variables of possible importance that we cannot evaluate. We now address several of these points raised by reviewers in the discussion, to ensure transparency of reporting (see limitations section of our discussion, and results to the comments provided by Reviewer 1, and our responses to points raised in the Public Review). We have also tempered some of the phrasing surrounding our conclusions, where we say that this is the first evidence that aging can impact chimpanzee tool use, we also highlight the need for an assortment of further studies.
 
 Finally, the integration of the coula nut-cracking data is not well-executed as it stands. I would recommend that they collect and analyze equivalent behavioral data from the other years where coula nuts were provided. By examining only one season of coula nut-cracking, we cannot contextualize the data to past seasons; there is no sense in comparing one season of coula nut-cracking (i.e., in a sense of efficiency) to roughly contemporary seasons of palm-nut cracking due to, as you describe, differences in physical properties of the nuts. If you are not able to collect the additional data and carry out the requisite analysis, then I would recommend that the coula nut-related sections be removed from the manuscript, so that it does not detract from the logical flow of arguments and distract from the other data, which is more logically-attuned to your research questions.
 
 We have removed this from the main manuscript. We have decided to include the information surrounding coula nut cracking in the supplementary materials, as this information is still relevant to the findings of our study, and may interest some readers. However, we have phrased this information to make it clear that further data is needed to compare coula nut cracking across years.
 
 These criticisms do not subtract from the (potential) value or importance of the work for the field. This is, of course, an important contribution to an understudied topic. As such, I would gladly advocate for the manuscript, assuming the authors would reflect on the listed caveats and make changes in response to the 'Major Comments'.
 
 We thank the reviewer for their comments.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.11.25.625128v3
www.biorxiv.org www.biorxiv.org

A Commander-independent function of COMMD3 in endosomal trafficking

5
1. Public_Reviews 05 Jun 2025
 
 in eLife
 
 eLife Assessment
 
 This important study explores the mechanisms underlying the maintenance of cell surface protein levels. The authors present solid evidence to support their claims, though the addition of certain validation experiments could have further strengthened the conclusions. This work will be of particular interest to cell biologists focused on membrane trafficking.
 
 Summary
2. Public_Reviews 05 Jun 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 G. Squiers et al. analyzed a previously reported CRISPR genetic screening dataset of engineered GLUT4 cell-surface presentation and identified the Commander complex subunit COMMD3 as being required for endosomal recycling of specific cargo protein, transferrin receptor (TfR), to the cell surface. Through comparison of COMMD3-KO and other Commander subunit-KO cells, they demonstrated that the role of COMMD3 in mediating TfR recycling is independent of the Commander complex. Structural analysis and co-immunoprecipitation followed by mass spectrometry revealed that TfR recycling by COMMD3 relies on ARF1. COMMD3 interacts with ARF1 through its N-terminal domain (NTD) to stabilize ARF1. A mutation in the NTD of COMMD3 failed to rescue cell surface TfR in COMMD3-KO cells. In conclusion, the authors assert that COMMD3 stabilizes ARF1 in a Commander complex-independent manner, which is essential for recycling specific cargo proteins from endosomes to the plasma membrane.
 
 The conclusions of this paper are generally supported by data, but some validation experiments should be included to strengthen the study.
 
 (1) Specific role of ARF1 to COMMD3: The authors don't think KO/KD of ARF1 is appropriate to address its specificity to COMMD3 cargo selection, so they focused on the COMMD3 NTD mutant. Though the mutant failed to rescue COMMD3 cargo TfR recycling, they did not examine the Commander cargo ITGA6. In addition, they cannot validate that the mutant interrupts the interaction between NTD and ARF1. These missing results and validation make their claim that ARF1 is specific to the COMMD3's Commander-independent function less convincing.
 
 Review 1
3. Public_Reviews 05 Jun 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The Commander complex is a key player in endosomal recycling which recruits cargo proteins and facilitates the formation of tubulo-vesicular carriers. Squiers et al found COMMD3, a subunit of the Commander complex, could interact directly with ARF1 and regulate endosomal recycling.
 
 Strengths:
 
 Overall, this is a nice study that provides some interesting knowledge on the function of the Commander complex.
 
 Comments on revisions:
 
 The authors have addressed all my previous concerns
 
 Review 2
4. Public_Reviews 05 Jun 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 The study by Squiers and colleagues reveals a novel, Commander-independent role for COMMD3 in endosomal recycling. Through unbiased genetic screens, the authors identified COMMD3 as a regulator of GLUT4-SPR trafficking and validated its function using knockout experiments, which demonstrated its impact on endosomal morphology and trafficking independent of the Commander complex. Importantly, they mapped the interaction between the N-terminal domain (NTD) of COMMD3 and the GTPase Arf1, and through structure-guided mutagenesis, established that this interaction is essential for COMMD3's Commander-independent activity. The manuscript provides compelling evidence supporting this newly identified function of COMMD3, and I find the authors' interpretations well-justified. This is an excellent and intriguing study.
 
 Comments on revisions:
 
 The authors addressed all comments. Congratulations on this exciting work.
 
 Review 3
5. Public_Reviews 05 Jun 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews
 
 Reviewer 1 (Public reviews):
 
 (1) Commander-Independent Role of COMMD3: While the authors provided evidence to support the Commander-independent role of COMMD3-such as the absence of other Commander subunits in the CRISPR screen and not decreased COMMD3 levels in other subunit-KO cells- direct evidence is lacking. The mutation that specifically disrupts the COMMD3-ARF1 interaction could serve as a valuable tool to directly address this question.
 
 The Reviewer raised an excellent point. We fully agree with the Reviewer that multiple lines of evidence are needed to support the novel Commander-independent function of COMMD3.
 
 Comparative genetic analyses in Figures 4 and 5 indicate that COMMD3 regulates endosomal retrieval independently of the Commander complex. In Figure 8 of the revised manuscript, we show that point mutations introduced into the COMMD3:ARF1 interface impair this Commander-independent function. Moreover, Figure 6 demonstrates that ARF1 upregulation fully rescues the KO phenotype of COMMD3. In addition, Figure S2 further supports that COMMD3 levels, but not those of other Commander subunits, correspond to its Commander-independent function in endosomal trafficking. We have also revised the Discussion section to elaborate on the implications of these findings. We appreciate the Reviewer’s advice.
 
 (2) Role of ARF1 in Cargo Selection: The Commander-independent function of COMMD3 appears cargo-dependent and relies on ARF1's role in cargo selection. The authors should investigate whether KO/KD of ARF1 reduces cell surface levels of ITGA6 and TfR.
 
 The Reviewer correctly pointed out that KO/KD of ARF1 may provide further insights into the Commander-independent function of COMMD3. However, since ARF1 is involved in cargo sorting at both the endosome and the trans-Golgi network, its KO would disrupt multiple trafficking routes, making the data difficult to interpret. Instead, we focused on point mutations in the NTD that specifically disrupt ARF1 binding without affecting the function of the Commander complex (Fig. 8). As these mutations impair the Commander-independent function of COMMD3, our data strongly support a direct role for ARF1 in this recycling pathway. We note that the discovery of a novel trafficking pathway inevitably opens many research directions. One such direction is to systematically identify cargoes that rely on COMMD3 but not the Commander complex for endosomal retrieval.
 
 (3) Impact on TfR Stability: Figure 7D suggests that TfR protein levels are reduced in COMMD3-KO cells, potentially due to degradation caused by disrupted recycling. This raises the question of whether the observed reduction in cell surface TfR is due to impaired endosomal recycling or decreased total protein levels. The authors should quantify the ratio of cell surface protein to total protein for TfR, GLUT-SPR, and ITGA6 in COMMD3-KO cells.
 
 Based on the Reviewer's suggestion, we quantified both the total levels and the surface-tototal ratio of TfR, as shown in Figure S1 of the revised manuscript. These new data further support the conclusion that defects in TfR retrieval lead to its lysosomal degradation. The GLUT-SPR data presented in the main figures represent the surface-to-total ratio of the GLUT-SPR reporter. We thank the Reviewer for the important suggestion.
 
 Reviewer #1 (Recommendations for the authors):
 
 (1) Commander-Independent Role of COMMD3: The mutation that specifically disrupts the COMMD3-ARF1 interaction could serve as a valuable tool to directly address this question. The authors should evaluate whether the full-length mutant of COMMD3 can rescue decreased levels of CCDC93 and VPS35L, as well as cell surface ITGA6, TfR, and GLUT4 inCOMMD3-KO cells.
 
 This is an excellent point. In our mechanistic experiments, we focused on the NTD of COMMD3 because this domain mediates its Commander-independent function and is not involved in forming the Commander holo-complex. This approach allowed us to draw unambiguous conclusions. Nevertheless, we anticipate that full-length COMMD3 carrying these point mutations would also be defective in regulating Commander-independent cargo.
 
 (2) Role of ARF1 in Cargo Selection: The authors should investigate whether KO/KD of ARF1 reduces cell surface levels of ITGA6 and TfR. Was ARF1 identified in the initial CRISPR screen? If so, this should be explicitly noted. Alternatively, does ARF1 overexpression rescue ITGA6 levels in COMMD3-KO cells? Furthermore, does ARF1 overexpression rescue TfR levels in COMMD3 and CCDC93 double-KO cells?
 
 Reinto the Commander-independent function of COMMD3. However, since ARF1 is involved in cargo sorting at both the endosome and the trans-Golgi network, its KO would disrupt multiple trafficking routes, making the data difficult to interpret. Instead, we focused on point mutations that specifically disrupt ARF1 binding without affecting the function of the Commander complex (Fig. 8). Since these mutations impair the Commander-independent function of COMMD3, our data strongly support a direct role for ARF1 in this novel recycling pathway. Based on our genetic data, we anticipate that all COMMD3-dependent cargoes will be similarly rescued in ARF1-overexpressing cells. In line with the Reviewer's comment, a key research direction we are currently pursuing is systematically determining how surface protein levels are affected by COMMD3 KO and ARF1 overexpression using surface proteomics.
 
 (3) Inconsistency in COMMD3 Rescue Levels (Figure 5A): Figure 5A shows comparable or higher levels of COMMD3 in rescued cells than in CCDC93-KO and VPS35L-KO cells. However, COMMD3 rescue did not increase cell surface TfR as much as in CCDC93-KO and VPS35L-KO cells. This inconsistency should be discussed or validated.
 
 To address the Reviewer’s inquiry, we quantified COMMD3 expression levels in these cell lines using multiple independent experiments. The new data are presented in Figure S2 of the revised manuscript. These expanded datasets allowed us to more accurately determine the relationship between COMMD3 expression and our genetic data. Since the Commander complex remains intact in the COMMD3 rescue cells, a significant portion of COMMD3 proteins are expected to be incorporated into the Commander complex, which does not regulate TfR recycling. In contrast, because the Commander complex is disrupted in Ccdc93 and Vps35l KO cells, all COMMD3 proteins are available to regulate TfR recycling in a Commander-independent manner. These findings are fully consistent with the similar surface TfR levels observed in Ccdc93/Vps35l KO cells and COMMD3 overexpressing cells. We thank the Reviewer for this excellent suggestion.
 
 (4) Significance of NTD in COMMD3 Function: The conclusion that "the NTD of COMMD3 mediates its Commander-independent function and interacts with ARF1" (Page 12) is not fully supported without a side-by-side comparison of NTD, CTD, and FL COMMD3 in the same experiment (e.g., Figures 6B and 6G). Additional data is needed to strengthen this claim.
 
 We conducted the experiment suggested by the Reviewer and included the data in Figure S3. Our results indicate that the COMMD3 CTD cannot mediate the Commander-independent function of COMMD3 in endosomal retrieval. We appreciate the Reviewer’s suggestion.
 
 (5) ARF1 Stabilization Experiments: To substantiate the claim that COMMD3 binds and stabilizes the GTP-form of ARF1, the authors should include a comparative experiment showing GTP-form, GDPform, and wild-type ARF1 (e.g., Figures 6G and 7C).
 
 We fully agree with the Reviewer that it would be important to compare how the ARF1:COMMD3 interaction is influenced by the nucleotide-binding state. However, trapping ARF1 in its GDP-bound state remains unfeasible, and nucleotide-free small GTPases are inherently unstable. In addition, WT ARF1 likely exists as a mixture of GTP- and GDP-bound forms, further complicating the analysis. To address the Reviewer’s comment, we used AlphaFold3 predictions. Interestingly, we found that the ipTM score of GTP-ARF1:COMMD3 is significantly higher than that of GDP-ARF1:COMMD3 or apo-ARF1:COMMD3, supporting our conclusion that COMMD3 recognizes and stabilizes the active form of ARF1.
 
 (6) Validation of NTD Mutation (Figure 8): Co-immunoprecipitation or cellular co-localization experiments should be performed to confirm that the NTD mutation disrupts the interaction between COMMD3 and ARF1, as depicted in Figure 8.
 
 This is an important question, and the best approach to address it would be to measure the binding affinity of the WT and mutant proteins using ITC or SPR. However, this is currently unfeasible, as we have not yet obtained pure recombinant COMMD3 and GTP-ARF1 proteins. Co-IP, by nature, is a crude assay that often fails to detect changes in binding affinity. A previous study on other proteins showed that mutations in protein-binding interfaces strongly reduced binding affinity as measured by SPR, but these changes would have been missed by co-IP assays (PMID: 25500532). In agreement with this limitation, our co-IP experiments did not yield conclusive results. Instead, we focused on structure-guided genetic experiments, which unequivocally demonstrated the effects of targeted mutations on the Commander-independent function of COMMD3.
 
 Reviewer #2 (Public review):
 
 (1) All existing data suggest that COMMD3 is a subunit of the Commander complex. Is there any evidence that COMMD3 can exist as a monomer?
 
 The Reviewer raised an intriguing point. Indeed, COMMD proteins, including COMMD3, can exist outside the Commander holo-complex and form homo- or hetero-oligomers, as monomeric COMMD proteins are likely unstable. These observations align well with the Commander-independent function identified in this study. We have revised the Discussion section of the manuscript to further elaborate on this point and thank the Reviewer for the suggestion.
 
 (2) In Figure 9, the author emphasizes COMMD3-dependent cargo and Commander-dependent cargo. Can the authors speculate what distinguishes these two types of cargo? Do they contain sequence-specific motifs?
 
 This is another important question. Our data clearly demonstrate that COMMD3 has a Commander-independent function in addition to its canonical role within the Commander holocomplex. Since cargo proteins typically possess multiple sorting signals that operate at different stages of the exocytic and endocytic pathways, identifying COMMD3-dependent sorting signals remains a challenge. ARF4 has been shown to specifically recognize the VXPX motif (PMID: 15728366), suggesting that ARF1 may similarly bind cytosolic sorting signals, with COMMD3 stabilizing this interaction. A key future direction is to systematically identify COMMD3-dependent cargo proteins and elucidate the mechanisms underlying their endosomal sorting. We have revised the Discussion section of the manuscript to explicitly address this point and thank the Reviewer for this important suggestion.
 
 (3) What could be the possible mechanism underlying the observation that the knockout of COMMD3 results in larger early endosomes? How is the disruption of cargo retrieval related to the increase in endosome size?
 
 The endosomal retrieval process is critical for recycling membrane proteins and lipids back to the plasma membrane or the trans-Golgi network. When this process is disrupted, cargo that should be recycled accumulates within endosomes, leading to their enlargement. For example, defects in retromer function can cause endosomal swelling due to cargo accumulation (PMID: 33380435). We added this citation to the revised manuscript and thank the Reviewer for the advice.
 
 Reviewer 3 (Recommendations for the authors):
 
 (1) Figure 4: How do the authors define Commander-dependent vs. Commander-independent cargos?
 
 In Figure 4, the surface expression of ITGA6 is reduced to approximately 0.75 across all knockouts. However, there is a similar level of reduction for GLUT4-SPR in the commd5 knockout and for LAMP1 in the commd5 and commd1 knockouts. Are GLUT4-SPR and LAMP1 Commander-dependent or Commander-independent cargos? Additionally, how does COMMD3 specifically identify/distinguish these cargos?
 
 This is an excellent point. Our data suggest that TfR is a COMMD3-dependent but Commander-independent cargo, whereas ITGA6 is a Commander-dependent cargo that does not involve COMMD3-specific functions. The other two cargoes we examined—GLUT-SPR and LAMP1—primarily rely on COMMD3, with the Commander complex playing a minor role. Together, these observations clearly demonstrate that COMMD3 has a Commander-independent function in addition to its canonical role within the Commander holo-complex. Since cargo proteins typically possess multiple sorting signals that operate at different stages of the exocytic and endocytic pathways, identifying COMMD3-dependent sorting signals remains a challenge. ARF4 has been shown to specifically recognize the VXPX motif (PMID: 15728366), suggesting that ARF1 may similarly bind cytosolic sorting signals, with COMMD3 stabilizing this interaction. A key future direction is to systematically identify COMMD3-dependent cargo proteins and elucidate the mechanisms underlying their endosomal sorting. We have revised the Discussion section of the manuscript to explicitly address this point. We thank the Reviewer for this important suggestion.
 
 (2) There is an increase in the surface expression of GLUT4-SPR in the commd1 knockout. Is this increase significant? The figure suggests a significant increase, but the text states it remains unchanged. Clarification is needed.
 
 We found that surface levels of GLUT-SPR were slightly increased in Commd1 KO cells, in stark contrast to the strong reduction observed in Commd3 KO cells (Fig. 4B). This finding is consistent with our conclusion that COMMD3 has a distinct role from other Commander subunits. We have revised the Results section to more clearly describe these data and thank the Reviewer for the advice.
 
 (3) Figure 5A: To support the claim that COMMD3 is upregulated in the vps35l KO/Ccdc93 KO, the authors should quantify COMMD3 expression. Also, why is there a Vps35l band present in the Vps35l knockout cells?
 
 Based on the Reviewer’s suggestion, we quantified the total levels of COMMD3 and included these new data in Figure S2. In this study, gene deletion was achieved through the simultaneous introduction of two independent gRNAs. Based on our previous experience, this strategy typically results in the complete loss of gene expression. We posit that the residual band observed in Vps35l KO cells originates from background signals, such as nonspecific staining by the antibody.
 
 (4) Figure 7: It is intriguing that COMMD3 stabilizes Arf1-GTP and can compensate for COMMD3 in knockout cells. However, is this stabilization specific to TfR cargo only? The authors should test additional Commander-dependent and Commander-independent cargos to clarify this point.
 
 Based on our genetic data, we anticipate that all COMMD3-dependent cargoes will be similarly rescued in ARF1-overexpressing cells. In line with the Reviewer's comment, an important direction we are pursuing is the use of surface proteomics to systematically determine how surface protein levels are affected by COMMD3 KO and ARF1 overexpression.
 
 (5) Is Arf1 interaction specific to COMMD3? The authors should investigate the effects of Arf1 knockout on COMMD3 expression and test its role in regulating Commander-dependent and Commander-independent cargos.
 
 The Reviewer raised an excellent point. Since ARF1 is involved in cargo sorting at both the endosome and the trans-Golgi network, its KO would interfere with multiple trafficking routes and the data would be difficult to interpret. Thus, in this work, we focused on the function and mechanism of the COMMD3:ARF1 complex on the endosome. Based on the suggestion of the Reviewer, we used AlphaFold3 to predict ARF1 binding to COMMD proteins. Interestingly, the complex with the highest predicted ipTM score is COMMD3:ARF1, while other COMMD proteins have much lower predicted binding scores. These results are consistent with the results of our unbiased CRISPR screens and targeted gene KO, and further support the conclusion that the COMMD3:ARF1 binding is specific and physiologically important in endosomal trafficking.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.12.12.628173v2
www.biorxiv.org www.biorxiv.org

Harnessing AlphaFold to reveal hERG channel conformational state secrets

4
1. Public_Reviews 05 Jun 2025
 
 in eLife
 
 eLife Assessment
 
 This valuable study uses AlphaFold2 to guide the structural modelling of different states of the human voltage-gated potassium channel KV11.1, a key pharmacological drug target. Follow-up molecular dynamics and drug-docking simulations, combined with experimental characterization, offer convincing evidence supporting the models. The work shows potential for improving drug potency predictions in ion channel pharmacology.
 
 Summary
2. Public_Reviews 05 Jun 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 Ngo et. al use several computational methods to determine and characterize structures defining the three major states sampled by the human voltage-gated potassium channel hERG: the open, closed and inactivated state. Specifically, they use AlphaFold and Rosetta to generate conformations that likely represent key features of the open, closed and inactivated states of this channel. Molecular dynamics simulations confirm that ion conduction for structure models of the open but not the inactivated state. Moreover, drug docking in silico experiments show differential binding of drugs to the conformation of the three states; the inactivated one being preferentially bound by many of them. Docking results are then combined with a Markov model to get state-weighted binding free energies that are compared with experimentally measured ones.
 
 Strengths:
 
 The study uses state-of-the-art modeling methods to provide detailed insights into the structure-function relationship of an important human potassium channel. AlphaFold modeling, MD simulations and Markov modeling are nicely combined to investigate the impact of structural changes in the hERG channel on potassium conduction and drug binding.
 
 Weaknesses:
 
 (1) Selection of inactivated conformations based on AlphaFold modeling seems a bit biased. The authors base their initial selection of the "most likely" inactivated conformation on the expected flipping of V625 and the constriction at G626 carbonyls. This follows a bit the "Streetlight effect". It would be better to have selection criteria that are independent of what they expect to find for the inactivated state conformations. Using cues that favour sampling/modeling of the inactivated conformation, such as the deactivated conformation of the VSD used in the modeling of the closed state, would be more convincing. There may be other conformations that are more accurately representing the inactivated state. In addition, I am not sure whether pLDDT is a good selection criterion. It reports on structural confidence, but that may not relate to functional relevance.
 
 (2) The comparison of predicted and experimentally measured binding affinities lacks of appropriate controls. Using binding data from open-state conformations only is not the best control. A much better control is the use of alternative structures predicted by AlphaFold for each state (e.g. from the outlier clusters or not considered clusters) in the docking and energy calculations. Importantly, labels for open, closed and inactivated state should be randomized to check robustness of the findings. Such a control would strengthen the overall findings significantly.
 
 (3) Figures where multiple datapoints are compared across states generally lack assessment of the statistical significance of observed trends (e,g. Figure 3d).
 
 The authors have successfully achieved their goal of providing new insights into the structural details of the three major conformational states sampled by the human voltage-gated potassium channel hERG, and linking these states to changes in drug-binding affinities. However, the study would benefit from more robust controls and orthogonal validation. Additionally, the generalizability of the approach remains to be demonstrated.
 
 Review 1
3. Public_Reviews 05 Jun 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 Ngo et al. use AlphaFold2 and Rosetta to model closed, open, and inactive states of the human ion channel hERG. Subsequent MD simulations and comparisons with experiment support the plausibility of their models.
 
 Strengths:
 
 Ngo et al. employ various computational methods to enhance AlphaFold2's prediction capabilities for the human voltage-gated potassium channel hERG. They guide AlphaFold2 to explore different protein conformations and states, including its open, closed, and inactivated forms, using targeted templates. Additionally, they applied the Rosetta FastRelax protocol with an implicit membrane to refine the conformation of each residue in the predictions and address steric clashes, along with molecular dynamics (MD) simulations to account for membrane-pore flexibility. The methodology is well-described, and the figures are clear and descriptive.
 
 The authors have addressed some of the concerns raised during the first round of reviews. For instance, to mitigate potential bias in selecting the inactivated conformation, they evaluated conformational variability via backbone dihedral angles at specific residues in the selectivity filter and the drug binding sites. They also evaluated the top representative model from inactivated-state-sampling Cluster 3 (termed "AF ic3"), which was initially excluded. This model is now included in the revised manuscript as Figure S9a, b. MD simulations confirmed that this state could be a potential alternative open-state conformation. The authors also acknowledged the limitation of their study by not incorporating other enhanced sampling methods and AF3.
 
 In the revised manuscript, the authors provided more extensive explanations of their methods. For example, they explained that their approach to template selection was guided by their experience-AlphaFold2 with larger templates often overly constraining predictions to the input structure, reducing its flexibility to explore alternative conformations. In contrast, smaller, targeted fragments increase the likelihood that AlphaFold2 will incorporate the desired structural features while predicting the rest of the protein. They also noted that pLDDT scores are not always reliable for selecting new or alternative conformations, citing proper references. They included a model from cluster 3 of the inactivated-state sampling process, which exhibited lower pLDDT scores to illustrate this further.
 
 Another point raised by the reviewers was the exclusion of the N-terminal PAS domain due to GPU memory limitations and its impact on the study. This omission may overlook the PAS domain's potential roles in gating kinetics and allosteric effects on drug binding. The authors acknowledged these limitations in the main text and highlighted the need for future studies to explore these regions in greater detail. They also alluded to potential future research to address these points. Additionally, they have made some of their analysis scripts and tools available on GitHub as a community resource.
 
 Weakness:
 
 The primary issue with the study is the lack of a general pipeline or strategy that can be universally applied to any system, even if limited to ion channels or membrane proteins. A related paper assessed the conformational variability in voltage-sensing domains (VSDs) by applying both the default MSA depth and a range of reduced MSA depths to enhance conformational diversity (please see https://doi.org/10.1101/2025.03.12.642934). They generated 600 models for 32 members of the voltage-gated cation channel superfamily and demonstrated that AlphaFold2 can predict a range of diverse structures of the VSDs, representing activated, deactivated, and intermediate conformations, with more diversity observed for some VSDs compared to others.
 
 The authors have addressed one of the reviewer's concerns about generalizability by including an example in Figure S14 of the modified text, showing how their approach can be applied to model another ion channel system. However, some outstanding questions remain: Is this method better suited for ion channels or membrane proteins with already solved structures and extensive research available? Can this pipeline be applied to other systems as well? Additionally, how does this method compare to other methods using MSA subsampling and other enhanced AF-based techniques to generate alternative conformations of proteins?
 
 Review 2
4. Public_Reviews 05 Jun 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews
 
 Reviewer #1 (Public review):
 
 Weaknesses:
 
 (1) The selection of inactivated conformations based on AlphaFold modeling seems a bit biased. The authors base their selection of the “most likely” inactivated conformation on the expected flipping of V625 and the constriction at G626 carbonyls. This follows a bit of the “Streetlight effect”. It would be better to have selection criteria that are independent of what they expect to find for the inactivated state conformations. Using cues that favour sampling/modeling of the inactivated conformation, such as the deactivated conformation of the VSD used in the modeling of the closed state, would be more convincing. There may be other conformations that are more accurately representing the inactivated state. I see no objective criteria that justify the non-consideration of conformations from cluster 3 of the inactivated state modeling. I am not sure whether pLDDT is a good selection criterion. It reports on structural confidence, but that may not relate to functional relevance.
 
 We sincerely thank the reviewer for their perceptive critique highlighting potential bias in selecting the inactivated conformation. We recognize that over-relying on preconceived traits could limit exploration of diverse inactivated states, and we appreciate the opportunity to address this concern.
 
 Although we selected the model with the flipped V625 in the selectivity filter (SF) from the first round of inactivated-state sampling as the template for the second round, the resulting models still exhibited substantial diversity in their SF conformations. This selection primarily served to steer predictions away from the open-state configuration observed in the PDB 5VA2 SF, and we have clarified this rationale in the Methodology section. To assess conformational variability, we examined backbone dihedral angles (phi φ and psi ψ) at key residues in the selectivity filter (S624 – G628) and drugbinding region on the pore-lining S6 segment (Y652, F656), of all 100 models sampled in the subsequent inactivatedstate-sampling attempt. By overlaying the φ and ψ dihedral angles from different models, including the open state (PDB 5VA2-based), the closed state, and representative models from AlphaFold inactivated-state-sampling Cluster 2 and Cluster 3, we found that these conformations consistently fall within or near high-probability regions of the dihedral angle distributions. This indicates that these structural states are well represented within the ensemble of conformations sampled by AlphaFold within the scope of this study, particularly at functionally critical positions.
 
 Following the analysis above and consistent with the reviewer’s suggestion, we evaluated the top representative model from inactivated-state-sampling Cluster 3 (named “AF ic3”), which we had initially excluded. This model demonstrated SF residue G626 carbonyl oxygen flipped away from the conduction pathway, hinting at potential impact on ion conduction, yet its pore region structurally resembled the open state (Figure S9a, b). To test this objectively, we ran molecular dynamics (MD) simulations (2 runs, 1 μs long each, with applied 750 mV voltage) with varied initial ion/water configurations in the SF, finding it consistently open and conducting throughout (Figure S9c, d), consistent with our previous observations in Figure S11 that ion conduction can still occur when the upper SF is dilated. Drug docking (Figure S12) further revealed that the model exhibited binding affinities similar to those for the PDB 5VA2-based openstate structure. These findings combined led us to classify it as a possible alternative open-state conformation.
 
 Models from Cluster 4 were not tested due to extensive steric clashes, where residues in the SF overlapped with neighboring residues from adjacent subunits. The remaining models displayed SF conformations that combined features from earlier clusters. However, due to subunit-to-subunit variability, where individual subunits adopted differing conformations, they were classified as outliers. This combination of features may be valuable to investigate further in a follow-up study.
 
 We acknowledge that our approach is just one of many ways to sample different states, and alternative strategies, such as generating more models, varying multiple sequence alignment (MSA) subsampling, or testing different templates, might reveal improved models. Given that hERG channel inactivation likely spans a spectrum of conformations, our resource limitations may have restricted us to exploring and validating only part of this diversity. Nevertheless, the putative inactivated (AlphaFold Cluster 2) model’s non-conductivity and improved affinity for drugs targeting the inactivated state observed in our study suggests that this approach may be capturing relevant features of the inactivated-state conformation. We look forward to investigating deeper other possibilities in a future study and are grateful for the reviewer’s feedback.
 
 (2) The comparison of predicted and experimentally measured binding affinities lacks an appropriate control. Using binding data from open-state conformations only is not the best control. A much better control is the use of alternative structures predicted by AlphaFold for each state (e.g. from the outlier clusters or not considered clusters) in the docking and energy calculations. Using these docking results in the calculations would reveal whether the initially selected conformations (e.g. from cluster 2 for the inactivated state) are truly doing a better job in predicting binding affinities. Such a control would strengthen the overall findings significantly.
 
 We appreciate the reviewer’s insightful suggestion. To address this, we extended our analysis by incorporating an alternative AlphaFold2-predicted model from inactivated-state-sampling cluster 3 as a structural control. This model was established in a previously discussed analysis to be open and conducting as a follow up to comment #1, so we will call it Open (AF ic3) to differentiate it from Open (PDB 5VA2). We evaluated this new model in single-state and multi-state contexts alongside our original open-state model based on the experimental PDB 5VA2 structure. Additionally, we expanded the drug docking procedure to explore a broader region around the putative drug binding site by increasing the sampling space, and we adopted an improved approach for selecting representative docking poses to better capture relevant binding modes.
 
 Shown in Figure 7 are comparisons of experimental drug potencies with the binding affinities from the molecular docking calculations under the following conditions:
 
 (a) Single-state docking using the experimentally derived open-state structure (PDB 5VA2)
 
 (b) Multi-state docking incorporating open (PDB 5VA2), inactivated, and closed-state conformations weighted by experimentally observed state distributions
 
 (c) Single-state docking using an alternative AlphaFold-predicted open-state (inactivated-state-sampling cluster 3, AF ic3)
 
 (d) Multi-state docking combining the AlphaFold-predicted open-state (inactivated-state-sampling cluster 3, AF ic3)
 
 Using only the open-state model (PDB 5VA2) yielded a moderate correlation with experimental data (R2 = 0.43, r = 0.66, Figure 7a). Incorporating multi-state binding (weighted by their experimental distributions) improved the correlation substantially (R2 = 0.63, r = 0.79, Figure 7b), boosting predictive power by 47% and underscoring the value of multi-state modeling. Importantly, this improvement was achieved without considering potential drug-induced allosteric effects on the hERG channel conformation and gating, which will be addressed in future work.
 
 Next, we substituted the PDB 5VA2-based open-state model with the AF ic3 open-state model. Docking to this alternative model alone produced similar performance (R2 = 0.44, r = 0.66, Figure 7c), and incorporating it into the multi-state ensemble further improved the correlation with experiments (R2 = 0.64, r = 0.80, Figure 7d), representing a 45% gain in R2 and matching the performance of multi-state docking results based on the PDB 5VA2-derived model.
 
 These findings suggest that the predictive power of computational drug docking is enhanced not merely by the accuracy of individual models, but by the structural diversity and complementarity provided by an ensemble of protein conformations. Rather than relying solely on a single experimentally determined protein structure, the ensemble benefits from incorporating AlphaFold-predicted models that capture alternative conformations identified through our state-specific sampling approach. These diverse protein models reflect different structural features, which together offer a more comprehensive representation of the ion channel’s binding landscape and enhance the predictive performance of computational drug docking. Overall, these results reinforce that multi-state modeling offers a more realistic and predictive framework for understanding drug – ion channel interactions than traditional single-state approaches, emphasizing the value of both individual model evaluation and their collective integration. We are grateful for the reviewer’s suggestion.
 
 (3) Figures where multiple datapoints are compared across states generally lack assessment of the statistical significance of observed trends (e.g. Figure 3d).
 
 We appreciate the reviewer’s comment on the statistical significance assessment in Figure 3d. To clarify, the comparisons shown in the subpanels are based on three selected representative models for each state, rather than a broader population sample (similarly for Figure 3b). In the closed-state predicted models, the strong convergence of the voltagesensing domain (VSD), with an all-atom RMSD of 0.36 Å between cluster 1 and 2 closed-state sampling models and 0.95 Å to the outlier cluster, indicates minimal structural variation. Those RMSD values shown in the manuscript text demonstrates good convergence and by themselves represent statistical significance assessment of those models. This trend extends to open-state and inactivated-state AlphaFold models with similarly limited differences in the VSD regions among them. This convergence suggests that population-based statistical analysis may not reveal meaningful deviations, as the low variability among models limits the insights beyond those obtained from comparing representative structures.
 
 Nonetheless, we acknowledge this limitation. In future studies, we plan to explore alternative modeling approaches to introduce greater variability, enabling a more robust statistical evaluation of state-specific trends in the predictions.
 
 (4) Figure 3 and Figures S1-S4 compare structural differences between states. However, these differences are inferred from the initial models. The collection of conformations generated via the MD runs allow for much more robust comparisons of structural differences.
 
 We have explored these conformational state dynamics through MD simulations for the Open (5VA2-based), Inactivated (AlphaFold Cluster 2), and Closed-state models, as presented in Figures S7, S8, S10, S11. These figures provide detailed insights: Figure S7-S8 analyzes SF and pore conformation dynamics, including averaged pore radii with and without voltage and superimposed conformational ensembles; Figure S10 tracks cross-subunit distances between protein backbone carbonyl oxygens, revealing sequential SF dilation steps near residues F627 an G628; and Figure S11 illustrates this SF dilation process over time, highlighting residue F627 carbonyl flipping and SF expansion. We appreciate the opportunity to clarify our approach.
 
 Reviewer #2 (Recommendations for the authors):
 
 Major concerns:
 
 (1) Protein fragments are used to model the closed and inactivated states of hERG, but the choices of fragments are not well justified. For instance, in Figure 1a, helices from 8EP1 (deactivated voltage-sensing domain) and a helix+loop from 5VA2 (selectivity filter) are used. Why just the selectivity filter and not the cytosolic domain, for instance? Why not some parts of the helices attached to the selectivity filter, or the whole membrane inserted domain of 8EP1? Same for the inactivated conformation in Figure 1c: why the cytosolic domain only?
 
 We thank the reviewer for their thoughtful questions regarding our choice of protein fragments for modeling the closed and inactivated states of hERG in Figures 1a and 1c, and we appreciate the opportunity to justify these selections more clearly. Our approach to template selection was guided by our experience that providing AlphaFold2 with larger templates often leads it to overly constrain predictions to the input structure, reducing its flexibility to explore alternative conformations. In contrast, smaller, targeted fragments increase the likelihood that AlphaFold2 will incorporate the desired structural features while predicting the rest of the protein. We have provided a more detailed discussion of this in the methodology section.
 
 For the closed state (Figure 1a), we chose the deactivated voltage-sensing domain (VSD) from the rat EAG channel (PDB 8EP1) to inspire AlphaFold2 to predict a similarly deactivated VSD conformation characteristic of hERG channel closure, as this domain’s downward shift is a hallmark of potassium channel closure. We paired this with the selectivity filter (SF) and adjacent residues from the open-state hERG structure (PDB 5VA2) to maintain its conductive conformation, as it is generally understood that K+ channel closure primarily involves the intracellular gate rather than significant SF distortion. Including additional helices (e.g., S5–S6) or the entire membrane domain from PDB 8EP1 risked biasing the model toward the EAG channel’s pore structure, which differs from hERG’s, while omitting the cytosolic domain ensured focus on the VSD-driven closure without over-constraining cytoplasmic domain interactions.
 
 For the inactivated state (Figure 1c), we initially used only the cytosolic domain from PDB 5VA2 to anchor the prediction while allowing AlphaFold2 to freely sample transmembrane domain conformations, particularly the SF, where the inactivation occurs via its distortion. Excluding the SF or attached helices at this stage avoided locking the model into the open-state SF, and the cytosolic domain alone provided a minimal scaffold to maintain hERG’s intracellular architecture without dictating pore dynamics. Following the initial prediction, we initiated more extensive sampling by using one of the predicted SFs that differs from the open-state SF (PDB 5VA2) as a structural seed, aiming to guide predictions away from the open-state configuration. The VSD and cytosolic domain were also included in this state to discourage pore closure during prediction. Using larger fragments, like the full membrane-spanning domains or additional cytosolic regions from the open-state structure might reduce AlphaFold2’s ability to deviate from the open-state conformation, undermining our goal of capturing more diverse, state-specific features.
 
 It is worth noting that multiple strategies could potentially achieve the predicted models in our study, and here we only present examples of the paths we took and validated. It is likely that many of the steps may be unnecessary and could be skipped, and future work building on our approach can further explore and streamline this process. A consistent theme underlies our choices: for the closed state, we know the VSD should adopt a deactivated (“down”) conformation, so we provide AlphaFold2 with a specific fragment to guide this outcome; for the inactivated state, we recognize that the SF must change to a non-conductive conformation, so we grant AlphaFold2 flexibility to explore diverse conformations by minimizing initial constraints on the transmembrane region.
 
 With greater sampling and computational resources, it is possible we could identify additional plausible, non-conductive conformations that might better represent an inactivated state, as hERG inactivation may encompass a spectrum of states. In this study, due to resource limitations, we focused on generating and validating a subset of conformations. Still, we acknowledge that broader exploration could further refine these models, which could be pursued in future studies. We updated the Methods and Discussion sections to reflect this perspective, and we are grateful for the reviewer’s input, which encourages us to clarify our rationale and highlight the adaptability of our approach.
 
 To demonstrate the broader feasibility of this approach, we applied it to another ion channel system, voltage-gated sodium channel NaV 1.5, as illustrated in Figure S14. In this example, a deactivated VSD II from the cryo-EM structure of a homologous ion channel NaV1.7 (PDB 6N4R) (DOI: 10.1016/j.cell.2018.12.018), which was trapped in a deactivated state by a bound toxin, was used as a structural template. This guided AlphaFold to generate a NaV1.5 model in which all four voltage sensor domains (VSD I–IV) exhibit S4 helices in varying degrees of deactivation. Compared to the cryo-EM openstate NaV1.5 structure (PDB 6LQA) (DOI: 10.1002/anie.202102196), the predicted model displays a visibly narrower pore, representing a plausible closed state. This example underscores the versatility of our strategy in modeling alternative conformational states across diverse ion channels.
 
 (2) While the authors rely on AF2 (ColabFold) for the closed and inactivated states, they use Rosetta to model loops of the open state. Why not just supply 5VA2 as a template to ColabFold and rebuild the loops that way? Without clear explanations, these sorts of choices give the impression that the authors were looking for specific answers that they knew from their extensive knowledge of the hERG system. While the modeling done in this paper is very nice, its generalizability is not obvious.
 
 We appreciate the reviewer’s question about our use of Rosetta to model loops in the open-state hERG channel (PDB
 
 5VA2) rather than rebuilding it entirely with ColabFold. In the study, we conducted a control experiment supplying parts of PDB 5VA2 to ColabFold to rebuild the loops, generating 100 models (Figure 2a: predicted open state). The top-ranked model (by pLDDT) differed from our Rosetta-modelled structure by only 0.5 Å RMSD, primarily due to the flexible extracellular loops as expected, with the pore and selectivity filter (our areas of focus) remaining nearly identical. We chose the Rosetta-refined cryo-EM structure as this structure and approach have been widely used as an open-state reference in our other hERG channel studies, such as by Miranda et al. (DOI: 10.1073/pnas.1909196117) and Yang et al. (DOI: 10.1161/CIRCRESAHA.119.316404), to ensure that our results are more directly comparable to prior work in the field. Nonetheless, as both models (with loops modeled by Rosetta or AlphaFold) were virtually identical, we would expect no significant differences if either were used to represent the open state in our study. We have incorporated this clarification into the main text.
 
 (3) pLDDT scores were used as a measure of reliable and accurate predictions, but plDDT is not always reliable for selecting new/alternative conformations (see https://doi.org/10.1038/s41467-024-515072 and https://www.nature.com/articles/s41467-024-51801-z).
 
 We acknowledge that while pLDDT is a valuable indicator of structural confidence in AlphaFold2 predictions, its limitations warrant consideration. In our revision, we mitigated this by not relying solely on pLDDT, but we also performed protein backbone dihedral angle analysis of the protein regions of focus in all predicted models to ensure comprehensive coverage of conformational variations. From our AlphaFold modeling results, we tested a model from cluster 3 of the inactivated-state sampling process, which exhibited lower pLDDT scores, and included these results in our revised analysis. We included a note in the revised manuscript’s Discussion section: “As noted in recent studies, pLDDT scores are not reliable indicators for selecting alternative conformations (DOI: 10.1038/s41467-024-51507-2 and DOI: 10.1038/s41467-024-51801-z). To address this, we performed a protein backbone dihedral angle analysis in the regions of interest to ensure that our evaluation captured a representative range of sampled conformations.”
 
 (4) Extensive work has been done using AF2 to model alternative protein conformations (https://www.biorxiv.org/content/10.1101/2024.05.28.596195v1.abstract, along with some references the authors cite, such as work by McHaourab); another group recently modeled the ion channel GLIC (https://www.biorxiv.org/content/10.1101/2024.09.05.611464v1.abstract). Therefore, this work, though generally solid and thorough, seems more like a variation on a theme than a groundbreaking new methodology, especially because of the generalizability issues mentioned above.
 
 We sincerely thank the reviewer for acknowledging the solidity of our study and for drawing our attention to the impressive recent efforts using AlphaFold2 to explore alternative protein conformations. These studies are valuable contributions that highlight the versatility of AlphaFold2, and we are grateful for their context in evaluating our work.
 
 Building on these efforts, our approach not only enhances the prediction of conformational diversity but also introduces a twist by incorporating structural templates to guide AlphaFold2 toward specific functional protein states. More significantly, our study advances beyond mere structural modeling by integrating these conformations with their rigorous validation by incorporating multiple simulation results tested against experimental data to reveal that AlphaFold-predicted conformations can align with distinct physiological ion channel states. A key finding is that drug binding predictions using AlphaFold-derived hERG channel states substantially improve correlation with experimental data, which is a longstanding challenge in computational screening of multi-state proteins like the hERG channel, for which previous structural models have been mostly limited to the open state based on the cryo-EM structures. Our approach not only captures this critical state dependence but also reveals potential molecular determinants underlying enhanced drug binding during hERG channel inactivation, a phenomenon observed experimentally but poorly understood. These insights advance drug safety assessment by improving predictive screening for hERG-related cardiotoxicity, a major cause of drug attrition and withdrawal.
 
 We view our methodology as a natural evolution of the advancements cited by the reviewer, offering an approach that predicts diverse hERG channel conformational states and links them to meaningful functional and pharmacological outcomes. To address the reviewer’s concern about generalizability, we have expanded the methodology section to make it easier to follow and include additional details. As an example, we show how our approach can be applied to model another ion channel system, NaV1.5, in Figure S14.
 
 Furthermore, to enhance the applicability of our methodology, we have uploaded the scripts for analyzing AlphaFoldpredicted models to GitHub (https://github.com/k-ngo/AlphaFold_Analysis), ensuring they are adaptable for a wide range of scenarios with extensive documentation. This enables users, even those not focused on ion channels, to effectively apply our tools to analyze AlphaFold predictions for their own projects and produce publication-ready figures.
 
 While it is likely that multiple modeling approaches could lead AlphaFold to model alternative protein conformations, the key challenge lies in validating the physiological relevance of those predicted states. This study is intended to support other researchers in applying our template-guided approach to different protein systems, and more importantly, in rigorously in silico testing and validation of the biological significance of the conformation-specific structural models they generate.
 
 Minor concerns:
 
 (1) The authors mention in the Introduction section that capturing conformational states, especially for membrane proteins that may be significant as drug targets, is crucial. It would be helpful to relate their work to the NMR studies domains of the hERG channel, particularly the N-terminal “eag” domain, which is crucial for channel function and can provide insights into conformational changes associated with different channel states (https://doi.org/10.1016/j.bbrc.2010.10.132 ).
 
 We appreciate the reviewer’s insightful comment regarding the PAS domain and the potential influence of other regions, such as the N-linker and distal C-region, on drug binding and state transitions.
 
 The PAS domain did appear in the starting templates used for initial structural modeling (as shown in Figure 1a, b, c), but it was not included in the final models used for subsequent analyses. The omission was primarily due to hardwareimposed constraints, as including these additional regions would exceed the memory capacity of our current graphics processing unit (GPU) card, leading to failures during the prediction step.
 
 The PAS domain, even if not serving as a conventional direct drug-binding site, can influence the gating kinetics of hERG channels. By altering the probability and duration with which channels occupy specific states, it can indirectly affect how well drugs bind. For example, if the presence of the PAS domain shifts hERG channel gating so that more channels enter (and remain in) the inactivated state as was shown previously (e.g., DOI: 10.1085/jgp.201210870), drugs with a higher affinity for that state would appear to bind more potently, as observed in previous electrophysiological experiments (e.g., DOI: 10.1111/j.1476-5381.2011.01378.x). It is also plausible that the PAS domain could exert allosteric effects that alter the conformational landscape of the hERG channel during gating transitions, potentially impacting drug accessibility or binding stability. This is an intriguing hypothesis and an important avenue for future research.
 
 With access to more powerful computational resources, it would be valuable to explore the full-length hERG channel, including the PAS domain and associated regions, to assess their potential contributions to drug binding and gating dynamics. We incorporated a discussion of these points into the main text, acknowledging the limitations of our current models and highlighting the need for future studies to explore these regions in greater detail. The addition reads: “…Our models excluded the N-terminal PAS domain due to GPU memory limitations, despite its inclusion in initial templates. This omission may overlook its potential roles in gating kinetics and allosteric effects on drug binding (e.g., PMID: 21449979, PMID: 23319729, PMID: 29706893, PMID: 30826123, DOI:10.4103/jpp.JPP_158_17). Future research will explore the full-length hERG channel with enhanced computational resources to assess these regions’ contributions to conformational state transitions and pharmacology.”
 
 (2) In the second-to-last paragraph of the Introduction, the authors describe how AlphaFold2 works. They write, “AlphaFold2 primarily requires the amino acid sequence of a protein as its input, but the method utilizes other key elements: in addition to the amino acid sequence, AlphaFold2 can also utilize multiple sequence alignments (MSAs) of similar sequences from different species, templates of related protein structures when available, and/or homologous proteins (Jumper et al., 2021a). Evolutionarily conserved regions over multiple isoforms and species indicated that the sequence is crucial for structural integrity”. The last sentence is confusing; if the authors mean that all information required to fold the protein into its 3D structure is present in its primary sequence, that has been the paradigm. It is unclear from this paragraph what the authors wanted to convey.
 
 We apologize for any confusion caused by this phrasing. Our intent was not to restate the well-established paradigm that a protein’s primary sequence contains the information needed for its 3D structure, but rather to emphasize how
 
 AlphaFold2 leverages evolutionary conservation, via multiple sequence alignments (MSAs), to infer structural constraints beyond what a single sequence alone might reveal. Specifically, we aimed to highlight that conserved regions across species and isoforms provide additional context that AlphaFold2 uses to enhance the accuracy of its predictions, complementing the use of templates and homologous structures as described in Jumper et al. (2021). To clarify this, we revised the sentence in the manuscript to read: “AlphaFold2 primarily requires a protein's amino acid sequence as input, but it also leverages other critical data sources. In addition to the sequence, it incorporates multiple sequence alignments (MSAs) of related proteins from different species, available structural templates, and information on homologous proteins. While the primary sequence encodes the 3D structure, AlphaFold2 harnesses evolutionary conservation from MSAs to reveal structural insights that extend beyond what a single sequence can provide.” We thank the reviewer for pointing out this ambiguity.
 
 (3) In the Results section, the authors state that the predictions generated by their method are evaluated by standard accuracy metrics, please elaborate - what standard metrics were used to judge the predictions and why (some references would be a nice addition). Further, on Page 6, the sentence “There are fewer differences between the open- and closed-state models (Figure S2b, d)” is confusing, fewer differences than what? or there are a few differences between the two states/models? Please clarify.
 
 The original sentence referring to “standard accuracy metrics” is somewhat misplaced, as our intent was not to apply any conventional “benchmarking” to judge the predictions, but rather to evaluate functional and structural relevance in a physiologically meaningful context. Specifically, we assessed drug binding affinities from molecular docking simulations (in Rosetta Energy Units, R.E.U.) against experimental drug potency data (e.g., IC50 values converted to free energies in kcal/mol, Figure 7), analyzed differences in interaction networks across states in relation to known mutations affecting hERG inactivation (Figure 4, Table 2), validated ion conduction properties through MD simulations with the applied voltage against expected state-dependent hERG channel behavior (Figure 5), and compared predicted structural models to available experimental cryo-EM structures (Figure 3). We clarified in the text that our assessment emphasized the physiological plausibility of the generated conformations, drawing on evidence from existing computational and experimental studies at each step of the analysis above.
 
 As for the sentence on page 6, “There are fewer differences between the open- and closed-state models,” we apologize for the ambiguity; we meant that the hydrogen bond networks in the selectivity filter region exhibit fewer differences between the open and closed states compared to the more pronounced variations seen between the open and inactivated states. We revised this sentence to read: “The open- and closed-state models show fewer differences in their selectivity filter hydrogen bond networks compared to those between the open and inactivated states,” to enhance readability.
 
 (4) In the Discussion, the authors reiterate that this methodology can be extended to sample multiple protein conformations, and their system of choice was hERG potassium channel. I think this methodology can be applied to a system when there is enough knowledge of static structures, and some information on dynamics (through simulations) and mutagenesis analysis available. A well-studied system can benefit from such a protocol to gauge other conformational states.
 
 We agree that this approach is well-suited to systems with sufficient static structures, dynamic insights from simulations, and mutagenesis data, as seen with the hERG channel. We appreciate the reviewer’s implicit concern about generalizability to less-characterized systems and addressed this in the Discussion as a limitation, noting that the method’s effectiveness may depend on prior knowledge. Future studies can explore whether the advent of AlphaFold3 and other deep learning approaches can enhance its applicability to systems with more limited data. We have added this comment to the Discussion: “…A limitation of our methodology is its reliance on well-characterized systems with ample static structures, molecular dynamics simulation data, and mutagenesis insights, as demonstrated with the hERG channel, which may limit its applicability to less-studied proteins.”
 
 (5) The Methods section must be broken down into steps to make it easier to follow for the reader (if they want to implement these steps for themselves on their system of choice).
 
 a. Is possible to share example scripts and code used to piece templates together for AF2. Also, since the AF3 code is now available, the authors may comment on how their protocol can be applicable there or have plans to implement their protocol using AF3 (which is designed to work better for binding small molecules). Please see https://github.com/google-deepmind/alphafold3 for the recently released code for AF3.
 
 We appreciate the reviewer’s suggestion to improve the Methods section and their comments on scripts and AlphaFold3 (AF3). We revised the Methods to separate it into clear steps (e.g., template preparation, AF2 setup, clustering, and refinement) for better readability and reproducibility, and uploaded the sample scripts along with the instructions to GitHub (https://github.com/k-ngo/AlphaFold_Analysis).
 
 Regarding AF3’s recent code release, we plan to explore the applicability of our methodology to AF3 in a follow-up study, leveraging its advanced features to refine conformational predictions and state-specific drug docking, and added a brief comment to the Discussion to reflect this future direction: “…Following the recent release of AlphaFold3’s source code, we plan to explore the applicability of our template-guided methodology in a follow-up study, leveraging AF3’s advanced diffusion-based architecture to enhance protein conformational state predictions and state-specific drug docking, particularly given its improved capabilities for modeling small molecule – protein interactions…”
 
 b. The authors modified the hERG protein by removing a segment, the N-terminal PAS domain (residues M1 - R397) because of graphics card memory limitation. Would the removal of the PAS domain affect the structure and function of the channel protein? HERG and other members of the “eag K+ channel” family contain a PAS domain on their cytoplasmic N terminus. Removal of this domain alters a physiologically important gating transition in HERG, and the addition of the isolated domain to the cytoplasm of cells expressing truncated HERG reconstitutes wild-type gating. (see https://doi.org/10.1371/journal.pone.0059265). Please elaborate on this.
 
 We thank the reviewer for raising an important point about the removal of the N-terminal PAS domain and for highlighting its physiological role in hERG channel gating transitions. In our study, unlike experimental settings where PAS removal alters gating, we believe this omission has minimal impact on our key analyses.
 
 The drug docking procedure focuses on optimizing drug binding poses with minor protein structural refinement around the putative drug binding site, which in our case is the hERG channel pore region, where hERG-blocking drugs predominantly bind. The cytoplasmic PAS domain, located distally from this site, remains outside the protein structure refinement zone during drug docking simulations. However, one aspect we have not yet considered is the potential effect of drug modulation of the hERG channel gating and vice versa particularly given the PAS domain’s role in gating. This interplay could be significant but requires investigation beyond our current drug docking framework. We plan to explore this in future studies using alternative simulation methodologies, such as extended MD simulations or enhanced sampling techniques, to comprehensively capture these dynamic protein - ligand interactions.
 
 Similarly, in our 1 μs long MD simulations assessing ion conductivity (Figure 4), the timescale is too short for PASmediated gating changes to propagate through the protein and meaningfully influence ion conduction and channel activation dynamics, which occurs on a millisecond time scale (see e.g., DOI: 10.3389/fphys.2018.00207). To fully address this limitation, we plan to explore the inclusion of the PAS domain in a follow-up study with enhanced computational resources, allowing us to investigate its structural and functional contributions more comprehensively.
 
 (6) The first paragraph of the Methods reads as though AF2 has layers that recycle structures. We doubt that the authors meant it that way. Please update the language to clarify that recycling is an iterative process in which the pairwise representation, MSA, and predicted structures are passed (“recycled”) through the model multiple times to improve predictions.
 
 We agree that the phrasing might suggest physical layers recycling structures, which was not our intent. Instead, we meant to describe AlphaFold2’s iterative refinement process, where intermediate outputs, such as the pairwise residue representations, multiple sequence alignments (MSAs), and predicted structures, are iteratively passed (or “recycled”) through the model to enhance prediction accuracy. To clarify this, we revised the relevant sentence to read: “A critical feature of AlphaFold2 is its iterative refinement, where pairwise residue representations, MSAs, and initial structural predictions are recycled through the model multiple times, improving accuracy with each iteration.”
 
 Reviewer #3 (Recommendations for the authors):
 
 The authors should integrate the very recently published CryoEM experimental data of hERG inhibition by several drugs (Miyashita et al., Structure, 2024; DOI: 10.1016/j.str.2024.08.021).
 
 We thank the reviewer for the suggestion. Here, we compare drug binding in our open-states (PDB 5VA2-derived and an additional AlphaFold-predicted model from Cluster 3 of inactivated-state-sampling attempt named “AF ic3”) and inactivated-state models, using the cationic forms of astemizole and E-4031, with the corresponding experimental structures (Figure S13). Drug binding in the closed state is excluded as the pore architecture deviates too much from those in the cryo-EM structures. Experimental data (DOI: 10.1124/mol.108.049056) indicate that both astemizole and E4031 bind more potently to the inactivated state of the hERG channel.
 
 Astemizole (Figure S13a):
 
 - In the PDB 5VA2-derived open-state model, astemizole binds centrally within the pore cavity, adopting a bent conformation that allows both aromatic ends of the molecule to engage in π–π stacking with the side chains of Y652 from two opposing subunits. Hydrophobic contacts are observed with S649 and F656 residues.
 
 - In the AF ic3 open-state model, the ligand is stabilized through multiple π–π stacking interactions with Y652 residues from three subunits, forming a tight aromatic cage around its triazine and benzimidazole rings. Hydrophobic interactions are observed with hERG residues T623, S624, Y652, F656, and S660.
 
 - In the inactivated-state model, astemizole adopts a compact, horizontally oriented pose deeper in the channel pore, forming the most extensive interaction network among all the states. The ligand is tightly stabilized by multiple π–π stacking interactions with Y652 residues across three subunits, and forms hydrogen bonds with residues S624 and Y652. Additional hydrophobic contacts are observed with residues F557, L622, S649, and Y652.
 
 - Consistent with our findings, electrophysiology study by Saxena et al. identified hERG residues F557 and Y652 as crucial for astemizole binding, as determined through mutagenesis (DOI: 10.1038/srep24182).
 
 - In the cryo-EM structure (PDB 8ZYO) (DOI: 10.1016/j.str.2024.08.021), astemizole is stabilized by π–π stacking with Y652 residues. However, no hydrogen bonds are detected which may reflect limitations in cryo-EM resolution rather than true absence of contacts. Additional hydrophobic interacts are observed with L622 and G648 residues.
 
 E-4031 (Figure S13b):
 
 - In the PDB 5VA2-derived open-state model, E-4031 binds within the central cavity primarily through polar interactions. It forms a π–π stacking interaction with residue Y652, anchoring one end of the molecule. Polar interactions are observed with residues A653 and S660. Additional hydrophobic contacts are observed with residues A652 and Y652.
 
 - In the AF ic3 open-state model, E-4031 adopts a slightly deeper pose within the central cavity stabilized by dual π–π stacking interactions between its aromatic rings and hERG residue Y652. Additional hydrogen bonds are observed with residues S624 and Y652, and hydrophobic contacts are observed with residues T623 and S624.
 
 - In the inactivated-state model, E-4031 adopts its deepest and most stabilized binding pose, consistent with its experimentally observed preference for this state. The ligand is stabilized by multiple π–π stacking interactions between its aromatic rings and hERG residues Y652 from opposing subunits. The sulfonamide nitrogen engages in hydrogen bonding with residue S649, while the piperidine nitrogen hydrogen bonds with residue Y652. Hydrophobic contacts with residues S624, Y652, and F656 further reinforce the binding, enclosing the ligand in a densely packed aromatic and polar environment.
 
 - Previous mutagenesis study showed that mutations involving hERG residues F557, T623, S624, Y652, and F656 affect E-4031 binding (DOI: 10.3390/ph16091204).
 
 - In the cryo-EM structure (PDB 8ZYP) (DOI: 10.1016/j.str.2024.08.021), E-4031 engages in a single π–π stacking interaction with hERG residue Y652, anchoring one end of the molecule. The remainder of the ligand is stabilized predominantly through hydrophobic contacts involving residues S621, L622, T623, S624, M645, G648, S649, and additional Y652 side chains, forming a largely nonpolar environment around the binding pocket.
 
 In both cryo-EM structures, astemizole and E-4031 adopt binding poses that closely resembles the inactivated-state model in our docking study, consistent with experimental evidence that these drugs preferentially bind to the inactivated state (DOI: 10.1124/mol.108.049056). This raises the possibility that the cryo-EM structures may capture an inactivatedlike channel state. However, closer examination of the SF reveals that the cryo-EM conformations more closely resemble the open-state PDB 5VA2 structure (DOI: 10.1016/j.cell.2017.03.048), which has been shown to be conductive here and in previous studies (DOI: 10.1073/pnas.1909196117, 10.1161/CIRCRESAHA.119.316404).
 
 The conformational differences between the cryo-EM and open-state docking results may reflect limitations of the docking protocol itself, as GALigandDock assumes a rigid protein backbone and cannot account for ligand-induced large conformational changes. In our open-state models, the hydrophobic pocket beneath the selectivity filter is too small to accommodate bulky ligands (Figure 3a, b), whereas the cryo-EM structures show a slight outward shift in the S6 helix that expands this space (Figure S13).These allosteric rearrangements, though small, falls outside the scope of the current docking protocol, which lacks flexibility to capture these local, ligand-induced adjustments (DOI: 10.3389/fphar.2024.1411428).
 
 In contrast, docking to the AlphaFold-predicted inactivated-state model reveals a reorganization beneath the selectivity filter that creates a larger cavity, allowing deeper ligand insertion. Notably, neither our inactivated-state docking nor the available cryo-EM structures show strong interactions with F656 residues. However, in the AlphaFold-predicted inactivated model, the more extensive protrusion of F656 into the central cavity may further occlude the drug’s egress pathway, potentially trapping the ligand more effectively. This could explain why mutation of F656 significantly reduces the binding affinity of E-4031 (DOI: 10.3390/ph16091204). These findings suggest that inactivation may trigger a series of modular structural rearrangements that influence drug access and binding affinity, with different aspects potentially captured in various computational and experimental studies, rather than resulting from a single, uniform conformational change.
 
 Discussion of the original Wang and Mackinnon finding, DOI: 10.1016/j.cell.2017.03.048 regarding C-inactivation, pore mutation S631A and F627 rearrangement is likely warranted. Since hERG inactivation is present at 0 mV in WT channels (the likely voltage for the CryoEM study) please discuss how this might affect interpretations of starting with this structure as a template for models presented here, perhaps as part of Figure S1.
 
 We sincerely thank the reviewer for bringing up the insightful findings from Wang and MacKinnon regarding hERG C-type inactivation as well as the voltage context of their cryo-EM structure (PDB 5VA2). We recognize that WT hERG exhibits inactivation at 0 mV, likely the condition of the cryo-EM study, raising the possibility that PDB 5VA2, while classified as an open state, might subtly reflect features of inactivation. Notably, PDB 5VA2 has been widely adopted in numerous studies and consistently found to represent a conducting state, such as in Yang et al. (DOI: 10.1161/CIRCRESAHA.119.316404) and Miranda et al. (DOI: 10.1073/pnas.1909196117). Our MD simulations further support this, showing K+ conduction in the 5VA2-based open-state model (Figure 4a, c), consistent with its selectivity filter conformation (Figure S1a). Although we used PDB 5VA2 as a starting template for predicting inactivated and closed states, our AlphaFold2 predictions did not rigidly adhere to this structure, as evidenced by distinct differences in hydrogen bond networks, drug binding affinities, pore radii, and ion conductivity between our state-specific hERG channel models (Figures S2, 5, 3b, 4). Nevertheless, this does not preclude the possibility that PDB 5VA2’s certain potential inactivated-like traits at 0 mV could subtly influence our predictions elsewhere in the model, which warrants further exploration in future studies. In our revised analysis, we also tested an alternative AlphaFold-predicted conformation, referred to as Open (AlphaFold cluster 3), which, while sharing some similarities with PDB 5VA2, exhibits subtle differences in the selectivity filter and pore conformations. This structure was also found to be conducting ions and showed a drug binding profile similar to that of the PDB 5VA2-based open-state model. We greatly appreciate this feedback which helped us refine and strengthen our analysis.
 
 Page 8, the significance of 750 and 500 mV in terms of physiological role?
 
 We appreciate this opportunity to clarify the methodological rationale. Although these voltages significantly exceed typical physiological membrane potentials, their use in MD simulations is a well-established practice to accelerate ion conduction events. This approach helps overcome the inherent timescale limitations of conventional MD simulations, as demonstrated in previous studies of hERG and other ion channels. For instance, Miranda et al. (DOI: 10.1073/pnas.1909196117), Lau et al. (DOI: 10.1038/s41467-024-51208-w), Yang et al. (DOI: 10.1161/CIRCRESAHA.119.316404) applied similarly high voltages (500~750 mV) to study hERG K+ conduction, which is notably small under physiological conditions at ~2 pS (DOI: 10.1161/01.CIR.94.10.2572), necessitating amplification to observe meaningful permeation within nanosecond-to-microsecond timescales. Likewise, studies of other K+ ion channels, such as Woltz et al. (DOI: 10.1073/pnas.2318900121) on small-conductance calcium-activated K+ channel SK2 and Wood et al. (DOI: 10.1021/acs.jpcb.6b12639) on Shaker K+ channel, have used elevated voltages (250~750 mV) to probe ion conduction mechanisms via MD simulations. In addition, the typical timescale of these simulations (1 μs) is too short to capture major structural effects such as those leading to inactivation or deactivation which occur over milliseconds in physiological conditions.
 
 The abstract could be edited a bit to more clearly state the novel findings in this study.
 
 We thank the reviewer for their suggestion. We have revised the abstract to read: “To design safe, selective, and effective new therapies, there must be a deep understanding of the structure and function of the drug target. One of the most difficult problems to solve has been resolution of discrete conformational states of transmembrane ion channel proteins. An example is KV11.1 (hERG), comprising the primary cardiac repolarizing current, Ikr. hERG is a notorious drug antitarget against which all promising drugs are screened to determine potential for arrhythmia. Drug interactions with the hERG inactivated state are linked to elevated arrhythmia risk, and drugs may become trapped during channel closure. While prior studies have applied AlphaFold to predict alternative protein conformations, we show that the inclusion of carefully chosen structural templates can guide these predictions toward distinct functional states. This targeted modeling approach is validated through comparisons with experimental data, including proposed state-dependent structural features, drug interactions from molecular docking, and ion conduction properties from molecular dynamics simulations. Remarkably, AlphaFold not only predicts inactivation mechanisms of the hERG channel that prevent ion conduction but also uncovers novel molecular features explaining enhanced drug binding observed during inactivation, offering a deeper understanding of hERG channel function and pharmacology. Furthermore, leveraging AlphaFold-derived states enhances computational screening by significantly improving agreement with experimental drug affinities, an important advance for hERG as a key drug safety target where traditional single-state models miss critical state-dependent effects. By mapping protein residue interaction networks across closed, open, and inactivated states, we identified critical residues driving state transitions validated by prior mutagenesis studies. This innovative methodology sets a new benchmark for integrating deep learning-based protein structure prediction with experimental validation. It also offers a broadly applicable approach using AlphaFold to predict discrete protein conformations, reconcile disparate data, and uncover novel structure-function relationships, ultimately advancing drug safety screening and enabling the design of safer therapeutics.”
 
 Many of the Supplemental figures would fit in better in the main text, if possible, in my opinion. For instance, the network analysis (Fig. S2) appears to be novel and is mentioned in the abstract so may fit better in the main text. The discussion section could be focused a bit more, perhaps with headers to highlight the key points.
 
 Yes, we agree with the reviewer and made the suggested changes. We moved Figure S2 as a new main-text figure.
 
 Additionally, we revised the Discussion section to improve focus and clarity.
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.01.27.577468v3
www.biorxiv.org www.biorxiv.org

Secreted exosomes induce filopodia formation

3
1. Public_Reviews 05 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  In this important manuscript, the authors reveal novel findings on the role of exosomes in regulating filopodia formation. Filopodia are crucial for various cellular processes, including migration, polarization, directional sensing, and the formation of neuronal synapses. The authors convincingly demonstrate that exosomes, particularly those enriched with the protein THSD7A, play a significant role in promoting filopodia formation in both cancer cells and neurons.
  
  Summary
2. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Joint Public Review:
  
  Summary:
  
  The authors identify a novel relationship between exosome secretion and filopodia formation in cancer cells and neurons. They observe that multivesicular endosomes (MVE)-plasma membrane (PM) fusion is associated with filopodia formation in HT1080 cells and that MVEs are present on filopodia in primary neurons. Using overexpression and knockdown (KD) of Rab27/HRS in HT1080 cells, melanoma cells and/or primary rat neurons, they find that decreasing exosome secretion reduces filopodia formation, while Rab27 overexpression leads to the opposite result. Furthermore, the decreased filopodia formation is rescued in the Rab27a/HRS KD melanoma cells by the addition of small extracellular vesicles (EVs) but not large EVs purified from control cells. The authors identify endoglin as a protein unique to small EVs secreted by cancer cells when compared to large EVs. KD of endoglin reduces filopodia formation and this is rescued by the addition of small EVs from control cells and not by small EVs from endoglin KD cells. Based on the role of filopodia in cancer metastasis, the authors then investigate the role of endoglin in cancer cell metastasis using a chick embryo model. They find that injection of endoglin KD HT1080 cells into chick embryos gives rise to less metastasis compared to control cells - a phenotype that is rescued by the co-injection of small EVs from control cells. Using quantitative mass spectrometry analysis, they find that thrombospondin type 1 domain containing 7a protein (THSD7A) is down regulated in small EVs from endoglin KD melanoma cells compared to those from control cells. They also report that THSD7A is more abundant in endoglin KD cell lysate compared to control HT1080 cells and less abundant in small EVs from endoglin KD cells compared to control cells, indicating a trafficking defect. Indeed, using immunofluorescence microscopy, the authors observe THSD7A-mScarlet accumulation in CD63-positive structures in endoglin KD HT1080 cells, compared to control cells. Finally, the authors determine that exosome-secreted THSD7A induces filopodia formation in a Cdc42-dependent mechanism.
  
  Strengths:
  
  Through proteomic analysis, the authors revealed that endoglin is an important player in the effective trafficking of THSD7A within exosomes. This study offers interesting insights into the dynamic interplay between exosome-mediated protein trafficking and essential cellular processes, emphasizing its significant relevance in both cancer progression and neural function. The authors communicated their findings clearly and effectively.
  
  (1) While exosomes are known to play a role in cell migration and autocrine signaling, the relationship between exosome secretion and the formation of filopodia is novel.
  
  (2) The authors identify an exosomal cargo protein, THSD7A, which is essential for regulating this function.
  
  (3) The data presented provide strong evidence of a role for endoglin in the trafficking of THSD7A in exosomes.
  
  (4) The authors associate this process with functional significance in cancer cell metastasis and neurological synapse formation, both of which involve the formation of filopodia.
  
  (5) The data are presented clearly, and their interpretation appropriately explains the context and significance of the findings.
  
  Weaknesses:
  
  While the authors showed the important role of exosomal cargo protein THSD7A in neurons, it will be interesting to conduct any in vivo studies to determine whether THSD7A plays a similar role in promoting filopodia and synapse formation in vivo. Some of the comments of the reviewers were not fully addressed, such as rigorous analysis and quantification through Live-cell imaging through TIRF microscopy tracking labeled THSD7A and filopodia formation, which would provide more clarity in timing and strengthen causality of this relationship. The authors need to consider fully characterizing the role of Cdc42. If the authors would like to fully elaborate on the role of Cdc42 in another manuscript, it is better not to mention at all the role of Cdc42 in filopodia formation in this paper.
  
  Review 1
3. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The study significantly advances our understanding of how exosomes regulate filopodia formation. Filopodia play crucial roles in cell movement, polarization, directional sensing, and neuronal synapse formation. McAtee et al. demonstrated that exosomes, particularly those enriched with the protein THSD7A, play a pivotal role in promoting filopodia formation through Cdc42 in cancer cells and neurons. This discovery unveils a new extracellular mechanism through which cells can control their cytoskeletal dynamics and interaction with their surroundings. The study employs a combination of rescue experiments, live-cell imaging, cell culture, and proteomic analyses to thoroughly investigate the role of exosomes and THSD7A in filopodia formation in cancer cells and neurons. These findings offer valuable insights into fundamental biological processes of cell movement and communication and have potential implications for understanding cancer metastasis and neuronal development.
  
  Weaknesses:
  
  The conclusions of this study are in most cases supported by data, but some aspects of data analysis need to be better clarified and elaborated. Some conclusions need to be better stated and according to the data observed.
  
  We appreciate the reviewer's recognition of the impact of our study. We will address the concerns about data analysis and the statement of our conclusions in our full response to reviewers.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors show that small EVs trigger the formation of filopodia in both cancer cells and neurons. They go on to show that two cargo proteins, endoglin, and THSD7A, are important for this process. This possibly occurs by activating the Rho-family GTPase CDC42.
  
  Strengths:
  
  The EV work is quite strong and convincing. The proteomics work is well executed and carefully analyzed. I was particularly impressed with the chick metastasis assay that added strong evidence of in vivo relevance.
  
  Weaknesses:
  
  The weakest part of the paper is the Cdc42 work at the end of the paper. It is incomplete and not terribly convincing. This part of the paper needs to be improved significantly
  
  We appreciate the reviewer's recognition of the impact of our study. Indeed, more work needs to be done to clarify the role of Cdc42 in the induction of filopodia by exosome-associated THSD7A. We anticipate that this will be a separate manuscript, delving in-depth into how exosome-associated THSD7A interacts with recipient cells to activate Cdc42 and carrying out a variety of assays for Cdc42 activation.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The authors identify a novel relationship between exosome secretion and filopodia formation in cancer cells and neurons. They observe that multivesicular endosomes (MVE)-plasma membrane (PM) fusion is associated with filopodia formation in HT1080 cells and that MVEs are present in filopodia in primary neurons. Using overexpression and knockdown (KD) of Rab27/HRS in HT1080 cells, melanoma cells, and/or primary rat neurons, they found that decreasing exosome secretion reduces filopodia formation, while Rab27 overexpression leads to the opposite result. Furthermore, the decreased filopodia formation is rescued in the Rab27a/HRS KD melanoma cells by the addition of small extracellular vesicles (EVs) but not large EVs purified from control cells. The authors identify endoglin as a protein unique to small EVs secreted by cancer cells when compared to large EVs. KD of endoglin reduces filopodia formation and this is rescued by the addition of small EVs from control cells and not by small EVs from endoglin KD cells. Based on the role of filopodia in cancer metastasis, the authors then investigate the role of endoglin in cancer cell metastasis using a chick embryo model. They find that injection of endoglin KD HT1080 cells into chick embryos gives rise to less metastasis compared to control cells - a phenotype that is rescued by the co-injection of small EVs from control cells. Using quantitative mass spectrometry analysis, they find that thrombospondin type 1 domain containing 7a protein (THSD7A) is downregulated in small EVs from endoglin KD melanoma cells compared to those from control cells. They also report that THSD7A is more abundant in endoglin KD cell lysate compared to control HT1080 cells and less abundant in small EVs from endoglin KD cells compared to control cells, indicating a trafficking defect. Indeed, using immunofluorescence microscopy, the authors observe THSD7A-mScarlet accumulation in CD63-positive structures in endoglin KD HT1080 cells, compared to control cells. Finally, the authors determine that exosome-secreted THSD7A induces filopodia formation in a Cdc42-dependent mechanism.
  
  Strengths:
  
  (1) While exosomes are known to play a role in cell migration and autocrine signaling, the relationship between exosome secretion and the formation of filopodia is novel.
  
  (2) The authors identify an exosomal cargo protein, THSD7A, which is essential for regulating this function.
  
  (3) The data presented provide strong evidence of a role for endoglin in the trafficking of THSD7A in exosomes.
  
  (4) The authors associate this process with functional significance in cancer cell metastasis and neurological synapse formation, both of which involve the formation of filopodia.
  
  (5) The data are presented clearly, and their interpretation appropriately explains the context and significance of the findings.
  
  Weaknesses:
  
  (1) A better characterization of the nature of the small EV population is missing:
  
  It is unclear why the authors chose to proceed to quantitative mass spectrometry with the bands in the Coomassie from size-separated EV samples, as there are other bands present in the small EV lane but not the large EV lane. This is important to clarify because it underlies how they were able to identify THSD7A as a unique regulator of exosome-mediated filopodia formation. Is there a reason why the total sample fractions were not compared? This would provide valuable information on the nature of the small and large EV populations.
  
  We would like to clarify that there are two sets of proteomics data in the manuscript. The first was comparing bands from a colloidal Coomassie-stained gel from two samples: small EVs and large EVs from B16F1 cells. In this proteomics experiment, we identified endoglin as present in small EVs, but not large EVs. For this experiment, we only sent four bands from the small EV lane, chosen based on their obvious banding pattern difference on the Coomassie gel.
  
  In the second proteomics experiment, we used quantitative iTRAQ proteomics to compare small EVs purified from B16F1 control (shScr) and endoglin KD (shEng1 and shEng2) cell lines. In this experiment, we sent total protein extracted from small EV samples for analysis. So, these samples included the entire EV content, not just selected bands from a gel. In this experiment, we identified THSD7A as reduced in the shEng small EVs.
  
  (2) Data analysis and quantification should be performed with increased rigor:
  
  a) Figure 1C - The optical and temporal resolution are insufficient to conclusively characterize the association between exosome secretion and filopodia. Specifically, the 10-second interval used in the image acquisitions is too close to the reported 20-second median time between exosome secretion and filopodia formation. Two-5 sec intervals should be used to validate this. It would also be important to correlate the percentage of filopodia events that co-occur with exosome secretion. Is this a phenomenon that occurs with most or only a small number of filopodia? Additionally, resolution with typical confocal microscopy is subpar for these analyses. TIRF microscopy would offer increased resolution to parse out secretion events. As the TIRF objective is listed in the Methods section, figure legends should mention which images were acquired using TIRF microscopy.
  
  We acknowledge that the frame rate naturally limits our estimates of the timing of filopodia formation after exosome secretion. We set out to show a relationship between exosome secretion and filopodia formation, based on their proximity in timing. While our data set shows a median time interval of 20 seconds, the true median could be between 10-30 seconds, based on our frame rate. Regardless of the exact timing, our data show that exosome secretion is rapidly followed by filopodia formation events.
  
  To address the question of the percentage of filopodia events that are preceded by exosome secretion, the reviewer is correct in stating that we might need TIRF microscopy and a faster frame rate to observe all the MVB fusion events and get an accurate calculation of this number. The timing of the acquisition was based on the typical timing of filopodia formation, which is slow relative to MVB fusion. Thus, with the current dataset, we could miss secretion events taking place between the 10 second time intervals. Therefore, to address this question, we would need to acquire a new dataset with a much more rapid frame acquisition (multiple frames per second rather than one frame every ten seconds). Regardless, for the secretion events that we visualized with the current dataset, we always observed subsequent filopodia formation.
  
  No TIRF imaging was used in this manuscript. A TIRF objective was used for selected neuron imaging (see methods); however, it was used for spinning disk confocal microscopy, not for TIRF imaging. This is stated in the methods.
  
  b) Figure 2 - It would be important to perform further analysis to concretely determine the relationship between exosome secretion and filopodia stability. Are secretion events correlated with the stability of filopodia? Is there a positive feedback loop that causes further filopodia stability and length with increased secretion? Furthermore, is there an association between the proximity of secretion with stability? Quantification of filopodia more objectively (# of filopodia/cell) would be helpful.
  
  Our data show that manipulation of general exosome secretion, via Hrs knockdown, affects both de novo filopodia formation and filopodia stability (Fig 2g,h). Interestingly, knockdown of endoglin only affects de novo filopodia formation, while filopodia stability is unaffected (Fig 4g,h). These results suggest that filopodia stability is dependent upon exosome cargoes besides endoglin/THSD7A. Such cargoes might include other extracellular matrix molecules, such as fibronectin. We previously showed that exosomes promote nascent cell adhesion and rapid cell migration, through exosome-bound fibronectin (Sung et al., Nature Communications, 6:7164, 2015). We also previously found that inhibition of exosome secretion affects the persistence of invadopodia, which are filopodia-dependent structures (Hoshino et al., Cell Reports, 5:1159-1168, 2013). We agree that this is an interesting research direction, and perhaps future work could focus on exosomal factors that are responsible for filopodia persistence. This would possibly involve more proteomics analysis to identify candidate exosomal cargoes involved in this process.
  
  With regard to the way we plotted the filopodia data, we plotted the cancer cell data as filopodia per cell area so that it matched the neuron data, which was plotted as filopodia per 100 µm of dendrite distance. Since the neurons cannot be imaged as a whole cell, the quantification is based on the length of the dendrite in the image. We found that graphing the cancer cell data as filopodia per cell gave similar results as filopodia per cell area. To demonstrate that this quantification gives similar results, we have now plotted the filopodia per cell area data from Fig 2 as filopodia per cell and placed these new plots in Supp Fig 2.
  
  c) Figure 6 - Why use different gel conditions to detect THSD7A in small EVs from B16F1 cells vs HT1080 and neurons? Why are there two bands for THSD7A in panels C and E? It is difficult to appreciate the KD efficiency in E. The absence of a signal for THSD7A in the HT1080 shEng small EVs that show a signal for endoglin is surprising. The authors should provide rigorous quantification of the westerns from several independent experimental repeats.
  
  Detection of THSD7A via Western blot was, unfortunately, not straightforward and simple. Due to the large size (~260 kDa) of THSD7A, its low level of expression in cancer cells, as well as the inconsistency of commercially available THSD7A antibodies, we had to troubleshoot multiple conditions. We found that it was much easier to detect THSD7A in the human fibrosarcoma cell line HT1080 than in the mouse B16F1 cells, both in the cell lysates and in the small EVs. We were unable to detect THSD7A using the same (reducing) conditions for the mouse melanoma B16F1 samples but were successful using native gel conditions. We also detected THSD7A in rat primary neuron samples. All these samples were from different source organisms (human, mouse, rat) and from either cell lysates or extracellular vesicles, further complicating the analyses. Expression and maturation of THSD7A in these different cell types and compartments could involve different post-translational modifications, such as glycosylation, thus requiring different methods needed to detect THSD7A on Western blots and leading to different banding patterns.
  
  With regard to the level of knockdown of THSD7A in the Western blot shown in Figure 6E, the normalized level is quantitated below the bands. If you compare that quantitation to the filopodia phenotypes in the same panel, they are quite concordant. Figures 7B and 7C show quantification of triplicate Western blots, highlighting the significant accumulation of THSD7A in shEng cell lysates, as well as significant small EV secretion of THSD7A in control and WT rescued conditions.
  
  (3) The study lacks data on the cellular distribution of endoglin and THSD7A:
  
  a) Figure 6 - Is THSD7A expected to be present in the nucleus as shown in panel D (label D is missing in the Figure). It is not clear if this is observed in neurons. a Western of endogenous THSD7A on cell fractions would clarify this. The authors should further characterize the cellular distribution of THSD7A in both cell types. Similarly, the cellular distribution of endoglin in the cancer cells should be provided. This would help validate the proposed model in Figure 8.
  
  The image in figure 6D shows an HT1080 cell stained with phalloidin-Alexa Fluor 488 to visualize F-actin with or without expression of THSD7A-mScarlet. In order to fully visualize the thin filopodia protrusions, the cellular plane of focus of the images for this panel was purposely taken at the bottom of the cell, where the cell is attached to the coverslip glass. Thus, we interpret the red signal across the cell body as THSD7A-mScarlet expression on the plasma membrane underneath the cell, not in the nucleus. The neuron images only include the dendrite portion of the neurons; therefore, there is no nucleus present in the neuronal images. For the cellular distribution of endoglin, we agree that this is an important future direction to understand how endoglin regulates THSD7A trafficking. We have added the lack of these data to the “Limitations” section at the end of the manuscript.
  
  b) Figure 7 - Although the western blot provides convincing evidence for the role of endoglin in THSD7A trafficking, the microscopy data lack resolution as well as key analyses. While differences between shSCR and shEng cells are clear visually, the insets appear to be zoomed digitally which decreases resolution and interferes with interpretation. It would be crucial to show the colocalization of endoglin and THSD7A within CD63-postive MVE structures. What are the structures in Figure 7E shSCR zoom1? It would be important to rule out that these are migrasomes using TSPAN4 staining. More information on how the analysis was conducted is needed (i.e. how extracellular areas were chosen and whether the images are representative of the larger population). A widefield image of shSCR and shEng cells and DAPI or HOECHST staining in the higher magnification images should be provided. Additionally, the authors should quantify the colocalization of external CD63 and mScarlet signals from many independently acquired images (as they did for the internal signals in panel F). Is there no external THSD7A signal in the shEng cells?
  
  The images for Figure 7E were taken with high resolution on a confocal microscope. Insets for Figure 7E were digitally zoomed so that readers could see the tiny structures. Zoom 1 in Figure 7E shows areas of extracellular deposition, whereas Zoom 2 shows THSD7A colocalization with CD63 in MVE. In the extracellular areas (Zoom 1), we observe small punctate depositions that are positive for CD63 and/or THSD7A-mScarlet. Our interpretation of this staining is that the cells are secreting heterogeneous small EVs that are then attached to the glass coverslip. The images and zooms in Fig 7E were chosen to be representative and indeed reveal that there is more extracellular deposition of THSD7A-mScarlet outside the control shScr cells compared to the shEng cells, consistent with more secretion of THSD7A in small EVs from shScr cells when compared to those of shEng cells (Fig 7A,B). However, we did not quantify this difference, as these experiments were conducted with transient transfection of THSD7A-mScarlet, and it is challenging to determine which cell the extracellular THSD7A-mScarlet came from, complicating any quantitative analysis on a per-cell basis.
  
  Quantification of internal THSD7A localization is much more straightforward in this experimental regime. Indeed, in Figure 7F, we quantitated internal colocalization of THSD7A-mScarlet and CD63, which we obtained by choosing only cells that were visually positive for THSD7A-mScarlet in each transient transfection and omitting all extracellular signals. Quantifying the extracellular colocalization of THSD7A and CD63 could certainly be a future direction for this project and would require establishing cells that stably express THSD7A-mScarlet.
  
  With regard to whether the extracellular deposits are migrasomes, we have no reason to believe that they would be migrasomes. The preponderance of our evidence points to exosomes as carrying THSD7A and inducing filopodia. Furthermore, CD63 is an exosome marker (Sung et al., Nat Comm, 2020) and does not induce migrasomes, unlike many other tetraspanins (Huang et al., Nat Cell Bio, 2019).
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  The authors need to clarify the figure labeling and description and conclusions would be better to be drawn based on the findings. Some figures need to more clear e.g. Figure 1E needs to have information on what are green and red fluorescent proteins. Do all figures in 1A have the same scale bar or different? Figure 3A lacks a scale bar. In Figure 3, the GFP signal is in yellow, does it represent a merge or is it just the GFP alone? Figure 6D is missing a D. Figure 4D needs to be better explained. Additionally, both figures 8B and 8C since represent a model based on all the findings of the study would be better to stand alone as a separate figure from figure 8A.
  
  The figure legend for figure 1E notes that green corresponds to GFP-Rab27b and the red corresponds to mCherry filler. In addition, the labels are marked to the right of the figure. For Figure 1A, we have now indicated in the legend that all scale bars = 10 µm. In figure 3, neurons were co-transfected with GFP or GFP-Rab27b. Thus, the yellow signal in these images is the merge of the mCherry filler with either GFP (expression throughout the neuron body and dendrites) or GFP-Rab27b (punctate colocalization). We have added a scale bar to Fig 3A. Figure 6D has been corrected, with a “D” label added. Figure 4D shows representative images of cells with filopodia under the various conditions, including add-back of control or endoglin-KD EVs. We have clarified the conditions in the figure legend for 4D. For Figure 8, we have now split it into 2 figures: one with data (Fig 8) and one with the model (Fig 9).
  
  Reviewer #2 (Recommendations for the authors):
  
  For the most part, this story is strong and well-presented. The findings are interesting and will significantly advance our understanding of how EVs affect various processes such as cancer metastasis. However, the Cdc42 work is not great. They only indirectly implicate Cdc42 with a somewhat iffy inhibitor (ML141) and a constitutively active form transfected into cells. Both approaches have drawbacks such as off-target effects in the case of the inhibitor and possible cross-talk to other GTPases in the case of the active mutant. The activation of Cdc42 should be demonstrated by an activity assay. Several commercial kits are available. Inhibition of Cdc42 should be tested by knockdown in addition to the inhibitor.
  
  We appreciate the reviewer’s recognition of our work. To address the limitations of our study, particularly the Cdc42 mechanistic work, we have now added a “Limitations of the study” section at the end of the text. Here, we address our experimental limitations and future directions.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) Since the purified small EVs contain canonical exosomal markers and originate from MVEs, the authors should consider a more consistent use of the term "exosome" to avoid confusion.
  
  We acknowledge that the usage of both “exosomes” and “small extracellular vesicles” can seem confusing to many readers. Typically in the EV field, we use the term “exosome” when we can reliably determine that the EVs originate from the endocytic pathway. Thus, we use this term when we have specifically perturbed this pathway by targeting Hrs or Rab27. We use the term “small extracellular vesicles” or SEVs when referring to a purified heterogeneous population of SEVs from unknown or a variety of origins. Thus, when referring to vesicles isolated from the conditioned media, we call them SEVs because we cannot determine their origin. Clarification of this terminology has been added to the introduction of the paper.
  
  (2) 1st results section - expressing mCherry as a "filler" is confusing, clarify that this is meant to identify cellular background.
  
  This has now been clarified in the paper.
  
  (3) Figure 3 - Although Rab27a and Rab27b play a role in exosome secretion, Rab27b does not have redundant functions with Rab27a in every cellular context. The authors should mention the specific roles of Rab27a and Rab27b in promoting MVE fusion with the PM and in regulating the anterograde movement of MVEs to the PM, respectively (Ostrowski et al. 2010, Citation 52 in the ms). Although Rab27a is not highly expressed in neurons, it is not currently clear whether Rab27b has a redundant function with Rab27a or whether there is another unknown factor that plays this role. As neurons also do not express endoglin, the mechanisms that mediate how EVs regulate filopodia formation in these cells are most probably different than in cancer cells. This should be highlighted in the discussion.
  
  We have now added a couple of clarifying sentences about the roles of Rab27a and Rab27b to the results section, including the Ostrowski reference and another reference suggesting possible redundancy of Rab27a and Rab27b. With regard to endoglin not being expressed by neurons, that is one reason why we carried out the proteomics with control and endoglin-KD EVs to find a universal cargo that would directly induce filopodia formation. Indeed, THSD7A seems to be such a universal cargo, expressed in both cancer cell and neuron EVs and inducing filopodia in both cell types. This point, along with the requirement for regulation of THSD7A by other molecules in neurons, is discussed in the results and discussion sections.
  
  (4) As the authors note, the mechanistic link between endoglin-sorted, exosomal THSD7A and Cdc42-mediated filopodia formation remains unclear. While the findings on Cdc-42 are clear, they are not surprising. What is the role of mDia/ENA/VASP or BAR proteins in this? The authors should also consider an assay to determine whether exosomal THSD7A binds to the PM to cause the signaling or if the cargo is first internalized before performing its function. Since this process is both autocrine and paracrine, the authors could co-culture THSD7A-mScarlet cells with vector control cells and observe how THSD7A-mScarlet is localized in the non-expressing cells.
  
  As other reviewers also noted, the Cdc42 mechanistic data at the end of the paper has clear limitations that are now addressed within the manuscript in a “Limitations of the Study” section. Here we discuss our experimental troubleshooting and approach to assaying Cdc42 involvement in this process. We acknowledge there are many rigorous experiments that could be pursued in the future to strengthen our mechanism and proposed model.
  
  We also agree that elucidating how THSD7A specifically interacts with target cells would be very informative and insightful. This would be most effectively assayed using a cell line that is stably expressing THSD7A-mScarlet and could be a future direction of this project. However, it is out of the scope of this current publication.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.07.20.604139v2
arxiv.org arxiv.org

Modeling flexible behavior with remapping-based hippocampal sequence learning

5
1. Public_Reviews 05 Jun 2025
  
  in eLife (unscoped)
  
  eLife Assessment
  
  This is a potentially valuable modeling study on sequence generation in the hippocampus in a variety of behavioral contexts. While the scope of the model is ambitious, its presentation is incomplete and would benefit from substantially more methodological clarity and better biological justification. The work will interest the broad community of researchers studying cortical-hippocampal interactions and sequences.
  
  Summary
2. Public_Reviews 05 Jun 2025
  
  in eLife (unscoped)
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Ito and Toyoizumi present a computational model of context-dependent action selection. They propose a "hippocampus" network that learns sequences based on which the agent chooses actions. The hippocampus network receives both stimulus and context information from an attractor network that learns new contexts based on experience. The model is consistent with a variety of experiments, both from the rodent and the human literature, such as splitter cells, lap cells, and the dependence of sequence expression on behavioral statistics. Moreover, the authors suggest that psychiatric disorders can be interpreted in terms of over-/under-representation of context information.
  
  Strengths:
  
  This ambitious work links diverse physiological and behavioral findings into a self-organizing neural network framework. All functional aspects of the network arise from plastic synaptic connections: Sequences, contexts, and action selection. The model also nicely links ideas from reinforcement learning to neuronally interpretable mechanisms, e.g., learning a value function from hippocampal activity.
  
  Weaknesses:
  
  The presentation, particularly of the methodological aspects, needs to be majorly improved. Judgment of generality and plausibility of the results is hampered, but is essential, particularly for the conclusions related to psychiatric disorders. In its present form, it is unclear whether the claims and conclusions made are justified. Also, the lack of clarity strongly reduces the impact of the work in the larger field.
  
  More specifically:
  
  (1) The methods section is impenetrable. The specific adaptations of the model to the individual use cases of the model, as well as the posthoc analyses of the simulations, did not become clear. Important concepts are only defined in passing and used before they are introduced. The authors may consider a more rigorous mathematical reporting style. They also may consider making the methods part self-contained and moving it in front of the results part.
  
  (2) The description of results in the main text remains on a very abstract level. The authors may consider showing more simulated neural activity. It remains vague how the different stimuli and contexts are represented in the network. Particularly, the simulations and related statistical analyses underlying the paradigms in Figure 4 are incompletely described.
  
  (3) The literature review can be improved (laid out in the specific recommendations).
  
  (4) Given the large range of experimental phenomenology addressed by the manuscript, it would be helpful to add a Discussion paragraph on how much the results from mice and humans can be integrated, particularly regarding the nature of the context selection network.
  
  (5) As a minor point, the hippocampus is pretty much treated as a premotor network. Also, a Discussion paragraph would be helpful.
  
  Review 2
3. Public_Reviews 05 Jun 2025
  
  in eLife (unscoped)
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The manuscript by Ito and Toyozumi proposes a new model for biologically plausible learning of context-dependent sequence generation, which aims to overcome the predefined contextual time horizon of previous proposals. The model includes two interacting models: an Amari-Hopfield network that infers context based on sensory cues, with new contexts stored whenever sensory predictions (generated by a second hippocampal module) deviate substantially from actual sensory experience, which then leads to hippocampal remapping. The hippocampal predictions themselves are context-dependent and sequential, relying on two functionally distinct neural subpopulations. On top of this state representation, a simple Rescola-Wagner-type rule is used to generate predictions for expected reward and to guide actions. A collection of different Hebbian learning rules at different synaptic subsets of this circuit (some reward-modulated, some purely associative, with occasional additional homeostatic competitive heterosynaptic plasticity) enables this circuit to learn state representations in a set of simple tasks known to elicit context-dependent effects.
  
  Strengths:
  
  The idea of developing a circuit-level model of model-based reinforcement learning, even if only for simple scenarios, is definitely of interest to the community. The model is novel and aims to explain a range of context-dependent effects in the remapping of hippocampal activity.
  
  Weaknesses:
  
  The link to model-based RL is formally imprecise, and the circuit-level description of the process is too algorithmic (and sometimes discrepant with known properties of hippocampus responses), so the model ends up falling in between in a way that does not fully satisfy either the computational or the biological promise. Some of the problems stem from the lack of detail and biological justification in the writing, but the loose link to biology is likely not fully addressable within the scope of the current results. The attempt at linking poor functioning of the context circuit to disease is particularly tenuous.
  
  Review 1
4. Public_Reviews 05 Jun 2025
  
  in eLife (unscoped)
  
  Reviewer #3 (Public review):
  
  Summary:
  
  This paper develops a model to account for flexible and context-dependent behaviors, such as where the same input must generate different responses or representations depending on context. The approach is anchored in the hippocampal place cell literature. The model consists of a module X, which represents context, and a module H (hippocampus), which generates "sequences". X is a binary attractor RNN, and H appears to be a discrete binary network, which is called recurrent but seems to operate primarily in a feedforward mode. H has two types of units (those that are directly activated by context, and transition/sequence units). An input from X drives a winner-take-all activation of a single unit H_context unit, which can trigger a sequence in the H_transition units. When a new/unpredicted context arises, a new stable context in X is generated, which in turn can trigger a new sequence in H. The authors use this model to account for some experimental findings, and on a more speculative note, propose to capture key aspects of contextual processing associated with schizophrenia and autism.
  
  Strengths:
  
  Context-dependency is an important problem. And for this reason, there are many papers that address context-dependency - some of this work is cited. To the best of my knowledge, the approach of using an attractor network to represent and detect changes in context is novel and potentially valuable.
  
  Weaknesses:
  
  The paper would be stronger, however, if it were implemented in a more biologically plausible manner - e.g., in continuous rather than discrete time. Additionally, not enough information is provided to properly evaluate the paper, and most of the time, the network is treated as a black box, and we are not shown how the computations are actually being performed.
  
  Review 3
5. Public_Reviews 05 Jun 2025
  
  in eLife (unscoped)
  
  Author Response:
  
  We appreciate the reviewers’ thoughtful assessments and constructive feedback on our manuscript. The central goal of our study was to propose a simple and biologically inspired model-based reinforcement learning (MBRL) framework that draws on mechanisms observed in episodic memory systems. Unlike model-free approaches that require processing at each state transition, our model uses sequential activity (= transition model) to predict environmental changes in the long term by leveraging episode-like representations.
  
  While many prior studies have focused on optimizing task performance in MBRL, our primary aim is to explore how flexible, context-dependent behavior—reminiscent of that observed in biological systems—can be instantiated using simple, neurally plausible mechanisms. In particular, we emphasize the use of an Amari-Hopfield network for the context selection module. This network, governed by Hebbian learning, forms attractors that can correct for sensory noise and facilitate associative recall, allowing dynamic separation of prediction errors due to sensory noise versus those due to contextual mismatches. However, we acknowledge that our explanation of these mechanisms, especially in relation to sensory noise, was not sufficiently developed in the current manuscript. We plan to revise the text to clarify this limitation and to expand on the implications of these mechanisms in the context of psychiatric disorder-like behaviors, as illustrated in Figure 5. Several reviewers raised concerns about the clarity of our model. Our implementation is intentionally algorithmic rather than formal, designed to provide an accessible proof-of-concept model. We will revise the manuscript to better describe the core logic of the model—namely, the bidirectional interaction between the Hopfield network (X) and the hippocampal sequence module (H), where X sends the information on estimated current context to H, and H returns a future prediction based on the episode to X. This interaction forms a loop enabling the current context estimation and its reselection.
  
  The key advantage of this architecture is its ability to flexibly adjust the temporal span of episodes used for inference and control, providing a potential solution to the challenge of credit assignment over variable time scales in MBRL. Because our model forms and stores the variable length of episodes depending on the context, it can handle both short-horizon and long-horizon tasks simultaneously. Moreover, because each episode is organized by context, reselecting contexts enables rapid switching between these variable timescales. This flexibility addresses a challenge in MBRL—the assignment of credit across variable time scales—without requiring explicit optimization. To better illustrate this important feature, we plan to include additional experiments in the revised manuscript that demonstrate how context-dependent modulation of episode length enhances behavioral flexibility and task performance.
  
  Finally, we will address the comments on the presentation and the biological grounding of our model. To improve clarity and biological relevance, we will revise the Methods section to explicitly describe how the model is grounded in mechanisms observed in real neural systems. Also, we will clarify which parts of our figures represent computational results versus schematic illustrations and more clearly explain how each model component relates to known neural mechanisms. These revisions aim to improve both clarity and accessibility for a broad audience, while reinforcing the biological relevance of our approach.
  
  We thank the reviewers again for their insightful comments, which will help us substantially improve the manuscript. We look forward to submitting a revised version that more clearly conveys the contributions and implications of our work.
  
  AuthorResponse
Visit annotations in context

Tags

Summary

AuthorResponse

Review 1

Review 2

Review 3

Annotators

Public_Reviews

URL

arxiv.org/abs/2407.14708
www.biorxiv.org www.biorxiv.org

The triad interaction of ULK1, ATG13, and FIP200 is required for ULK complex formation and autophagy

5
1. Public_Reviews 05 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  Building on previous structural studies, this work provides valuable new insights into the architecture of the autophagy initiation complex, comprising ULK1, ATG13, and FIP200. The authors present their findings with solid supporting evidence, making this study a significant contribution to the autophagy field.
  
  Summary
2. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  In this study, Hama et al. investigated the molecular regulatory mechanisms underlying the formation of the ULK1 complex in mammalian cells. Their results showed that in mammalian cells, ULK1, ATG13, and FIP200 form a complex with a stoichiometry of 1:1:2. These predicted interaction regions were validated through both in vivo and in vitro experiments, providing deeper insight into the molecular basis of ULK1 complex assembly in mammalian cells.
  
  The revised manuscript has addressed the majority of my concerns, and I have no further questions. Overall, this is a solid and impactful study that significantly advances our understanding of how the ULK1 complex is formed.
  
  Review 1
3. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This is important work that helps to uncover how the process of autophagy is initiated - via structural analyses of the initiating ULK1 complex. High resolution structural details and a mechanistic insight of this complex have been lacking and understanding how it assembles and functions is a major goal of a field that impacts many aspects of cell and disease biology. While we know components of the ULK1 complex are essential for autophagy, how they physically interact is far from clear. The work presented makes use of AlphaFold2 to structurally predict interaction sites between the different subunits of the ULK1 complex (namely ULK1, ATG13 and FIP200). Importantly, the authors go on to experimentally validate that these predicted sites are critical for complex formation by using site-directed mutagenesis and then go on to show that the three-way interaction between these components is necessary to induce autophagy in cells.
  
  Strengths:
  
  The data are very clear. Each binding interface of ATG13 (ATG13 with FIP300/ATG13 with ULK1) is confirmed biochemically with ITC and IP experiments from cells. Likewise, IP experiments with ULK1 and FIP200 also validate interaction domains. A real strength of the work is in the analyses of the consequences of disrupting ATG13's interactions in cells. The authors make CRISPR KI mutations of the binding interface point mutants. This is not a trivial task and is the best approach as everything is monitored under endogenous conditions. Using these cells the authors show that ATG13's ability to interact with both ULK1 and FIP200 is essential for a full autophagy response.
  
  Weaknesses:
  
  I think a main weakness here is the failure to acknowledge and compare results with an earlier preprint that shows essentially the same thing (https://doi.org/10.1101/2023.06.01.543278). Arguably, this earlier work is much stronger from a structural point of view as it relies not only on AlphaFold2 but also actual experimental structural determinations (and takes the mechanisms of autophagy activation further by providing evidence for a super complex between the ULK1 and VPS34 complexes). That is not to say that this work is not important, as in the least it independently helps to build a consensus for ULK1 complex structure. Another weakness is that the downstream "functional" consequences of disrupting the ULK1 complex are only minimally addressed. The authors perform a Halotag-LC3 autophagy assay, which essentially monitors the endpoint of the process. There are a lot of steps in between, knowledge of which could help with mechanistic understanding. Not in the least is the kinase activity of ULK1 - how is this altered by disrupting its interactions with ATG13 and/or FIP200?
  
  Update:
  
  I feel the authors have addressed my concerns in their revised manuscript
  
  Review 2
4. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Reviewer #3 (Public review):
  
  In this study, the authors employed the protein complex structure prediction tool AlphaFold-Multimer to obtain a predicted structure of the protein complex composed of ULK1-ATG13-FIP200 and validated the structure using mutational analysis. This complex plays a central role in the initiation of autophagy in mammals. The results obtained in this study reveal extensive binary interactions between ULK1 and ATG13, between ULK1 and FIP200, and between ATG13 and FIP200, and pinpoint the critical residues at each interaction interface. Mutating these critical residues led to the loss of binary interactions. Interestingly, the authors showed that the ATG13-ULK1 interaction and the ATG13-FIP200 interaction are partially redundant for maintaining the complex. The experimental data presented by the authors are of high quality and convincing. The revised manuscript offers enhanced details about the prediction procedure and results, along with additional experimental findings, significantly increasing the scientific value of this paper.
  
  Review 3
5. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  In this study, Hama et al. explored the molecular regulatory mechanisms underlying the formation of the ULK1 complex. By employing the AlphaFold structural prediction tool, they showed notable differences in the complex formation mechanisms between ULK1 in mammalian cells and Atg1 in yeast cells. Their findings revealed that in mammalian cells, ULK1, ATG13, and FIP200 form a complex with a stoichiometry of 1:1:2. These predicted interaction regions were validated through both in vivo and in vitro assays, enhancing our understanding of the molecular mechanisms governing ULK1 complex formation in mammalian cells. Importantly, they identified a direct interaction between ULK1 and FIP200, which is crucial for autophagy. However, some aspects of this manuscript require further clarification, validation, and correction by the authors.
  
  Thank you for your thorough evaluation of our manuscript. We have carefully revised the manuscript to address your concerns by performing extra experiments and providing additional clarifications, validations, and corrections as written below.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This is important work that helps to uncover how the process of autophagy is initiated - via structural analyses of the initiating ULK1 complex. High-resolution structural details and a mechanistic insight of this complex have been lacking and understanding how it assembles and functions is a major goal of a field that impacts many aspects of cell and disease biology. While we know components of the ULK1 complex are essential for autophagy, how they physically interact is far from clear. The work presented makes use of AlphaFold2 to structurally predict interaction sites between the different subunits of the ULK1 complex (namely ULK1, ATG13, and FIP200). Importantly, the authors go on to experimentally validate that these predicted sites are critical for complex formation by using site-directed mutagenesis and then go on to show that the three-way interaction between these components is necessary to induce autophagy in cells.
  
  Strengths:
  
  The data are very clear. Each binding interface of ATG13 (ATG13 with FIP300/ATG13 with ULK1) is confirmed biochemically with ITC and IP experiments from cells. Likewise, IP experiments with ULK1 and FIP200 also validate interaction domains. A real strength of the work in in their analyses of the consequences of disrupting ATG13's interactions in cells. The authors make CRISPR KI mutations of the binding interface point mutants. This is not a trivial task and is the best approach as everything is monitored under endogenous conditions. Using these cells the authors show that ATG13's ability to interact with both ULK1 and FIP200 is essential for a full autophagy response.
  
  Thank you for your thoughtful review and for highlighting the importance of our approach.
  
  Weaknesses:
  
  I think a main weakness here is the failure to acknowledge and compare results with an earlier preprint that shows essentially the same thing (https://doi.org/10.1101/2023.06.01.543278). Arguably this earlier work is much stronger from a structural point of view as it relies not only on AlphaFold2 but also actual experimental structural determinations (and takes the mechanisms of autophagy activation further by providing evidence for a super complex between the ULK1 and VPS34 complexes). That is not to say that this work is not important, as in the least it independently helps to build a consensus for ULK1 complex structure. Another weakness is that the downstream "functional" consequences of disrupting the ULK1 complex are only minimally addressed. The authors perform a Halotag-LC3 autophagy assay, which essentially monitors the endpoint of the process. There are a lot of steps in between, knowledge of which could help with mechanistic understanding. Not in the least is the kinase activity of ULK1 - how is this altered by disrupting its interactions with ATG13 and/or FIP200?
  
  Thank you for this valuable feedback. In response, we performed a detailed structural comparison between the cryo-EM structure reported in the referenced preprint and our AlphaFold-based model. We have summarized both the similarities and differences in newly included figures (revised Figure 2A, B, 3B, S1F) and provided an in-depth discussion in the main text. Furthermore, to address the downstream consequences of ULK1 complex disruption, we have investigated the impact on ULK1 kinase activity, specifically examining how mutations affecting ATG13 or FIP200 interaction alter ULK1’s phosphorylation of a key substrate ATG14. In addition, we analyzed the effect on ATG9 vesicle recruitment. We provide the corresponding data as Figure S3C-E and detailed discussions in the revised manuscript.
  
  Reviewer #3 (Public review):
  
  In this study, the authors employed the protein complex structure prediction tool AlphaFold-Multimer to obtain a predicted structure of the protein complex composed of ULK1-ATG13-FIP200 and validated the structure using mutational analysis. This complex plays a central role in the initiation of autophagy in mammals. Previous attempts at resolving its structure have failed to obtain high-resolution structures that can reveal atomic details of the interactions within the complex. The results obtained in this study reveal extensive binary interactions between ULK1 and ATG13, between ULK1 and FIP200, and between ATG13 and FIP200, and pinpoint the critical residues at each interaction interface. Mutating these critical residues led to the loss of binary interactions. Interestingly, the authors showed that the ATG13-ULK1 interaction and the ATG13-FIP200 interaction are partially redundant for maintaining the complex.
  
  We are grateful for your high evaluation of our work.
  
  The experimental data presented by the authors are of high quality and convincing. However, given the core importance of the AlphaFold-Multimer prediction for this study, I recommend the authors improve the presentation and documentation related to the prediction, including the following:
  
  (1) I suggest the authors consider depositing the predicted structure to a database (e.g. ModelArchive) so that it can be accessed by the readers.
  
  We have deposited the AlphaFold model to ModelArchive with the accession code ma-jz53c, which is indicated in the revised manuscript.
  
  (2) I suggest the authors provide more details on the prediction, including explaining why they chose to use the 1:1:2 stoichiometry for ULK1-ATG13-FIP200 and whether they have tried other stoichiometries, and explaining why they chose to use the specific fragments of the three proteins and whether they have used other fragments.
  
  We appreciate your suggestion. As we noted in the original manuscript, previous studies have shown that the C-terminal region of ULK1 and the C-terminal intrinsically disordered region of ATG13 bind to the N-terminal region of the FIP200 homodimer (Alers, Loffler et al., 2011; Ganley, Lam du et al., 2009; Hieke, Loffler et al., 2015; Hosokawa, Hara et al., 2009; Jung, Jun et al., 2009; Papinski and Kraft, 2016; Wallot-Hieke, Verma et al., 2018). We relied on these findings when determining the specific regions to include in our complex prediction and when selecting a 1:1:2 stoichiometry for ULK1–ATG13–FIP200 which was reported previously (Shi et al., 2020). We also used AlphaFold2 to predict the structures of the full-length ULK1–ATG13 complex and the complex of the FIP200N dimer with full-length ATG13, confirming that there were no issues with our choice of regions (revised Figure S1A-C). In the revised manuscript, we have provided a more detailed explanation of our rationale based on the previous reports and additional AlphaFold predictions.
  
  (3) I suggest the authors present the PAE plot generated by AlphaFold-Multimer in Figure S1. The PAE plot provides valuable information on the prediction.
  
  We provided the PAE plot in the revised Figure S1C.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) In Figure 1D, the labels for the input and IP of ATG13-FLAG should be corrected to ATG13-FLAG FIP3A.
  
  We thank the reviewer for pointing out these labeling mistakes. We revised the labels based on the suggestions.
  
  (2) In the discussion section, the authors should address why ATG13-FLAG ULK1 2A in Fig. 2D leads to a significantly lower expression of ULK1 and provide possible explanations for this observation.
  
  ATG13 and ATG101, both core components of the ULK1 complex, are known to stabilize each other through their mutual interaction. Loss or reduction of one protein typically leads to the destabilization of the other. In this context, ULK1 is similarly stabilized by binding to ATG13. Therefore, ATG13-FLAG ULK2A mutant, which has reduced binding to ULK1, likely loses this stabilizing activity and ULK1 becomes destabilized, resulting in the lower expression levels of ULK1. We added these discussions in the revised manuscript.
  
  (3) In Figure 4B, the authors should explain why Atg13-FLAG KI significantly affects the expression of endogenous ULK1. Could Atg13-FLAG KI be interfering with its binding to ULK1? Experimental evidence should be provided to support this. Additionally, does Atg13-FLAG KI affect autophagy? Wild-type HeLa cells should be included as a control in Figure 4C and 4D to address this question.
  
  Thank you for your constructive suggestion. We found a technical error in the ULK1 blot of Figure 4B. Therefore, we repeated the experiment. The results show that ULK1 expression did not significantly change in the ATG13-FLAG KI. These findings are consistent with Figure S3A. We have replaced Figure 4B with this new data.
  
  We agree that including wild-type HeLa cells as a control is essential to determine whether ATG13-FLAG KI affects autophagy. We performed the same experiments in wild-type HeLa cells and found that ATG13-FLAG KI does not significantly impact autophagic flux. Accordingly, we have replaced Figures 4D and 4E with these new data.
  
  (4) In Figure 3C, the authors used an in vitro GST pulldown assay to detect a direct interaction between ULK1 and FIP200, which was also confirmed in Figure 3E. However, since FLAG-ULK1 FIP2A affects its binding with ATG13 (Fig. 3E), it is possible that ULK1 FIP2A inhibits autophagy by disrupting this interaction. The authors should therefore use an in vitro GST pulldown assay to determine whether GST-ULK1 FIP2A affects its binding with ATG13. Additionally, the authors should investigate whether the interaction between ULK1 and FIP200 in cells requires the involvement of ATG13 by using ATG13 knockout cells to confirm if the ULK1-FIP200 interaction is affected in the absence of ATG13.
  
  Thank you for the valuable suggestion. We examined the effect of the FIP2A mutation on the ULK1–ATG13 interaction using isothermal titration calorimetry (ITC) to obtain quantitative binding data. The results showed that the FIP2A mutation does not markedly alter the affinity between ULK1 and ATG13 (revised Figure S2B), suggesting that FIP2A mainly weakens the ULK1–FIP200 interaction. Regarding experiments in ATG13 knockout cells, ULK1 becomes destabilized in the absence of ATG13, making it technically difficult to assess how the ULK1–FIP200 interaction is affected under those conditions.
  
  Reviewer #2 (Recommendations for the authors):
  
  I feel the manuscript would benefit from a more detailed comparison with the Hurely lab paper - are the structural binding interfaces the same, or are there differences?
  
  We appreciate the suggestion to compare our results more closely with the work from the Hurley lab. We performed a detailed structural comparison between the cryo-EM structure reported in the referenced preprint and our AlphaFold-based model (revised Figure 2A, B, 3B, S1F) and provided an in-depth discussion in the main text.
  
  As mentioned, what happens downstream of disrupting the ULK1 complex? How is ULK1 activity changed, both in vitro and in cells? Does disruption of the ULK1 complex binding sites impair VPS34 activity in cells (for example by looking at PtdIns3P levels/staining)?
  
  Thank you for your insightful comments. We focused on elucidating how disrupting the ULK1 complex leads to impaired autophagy. To assess ULK1 activity, we measured ULK1-dependent phosphorylation of ATG14 at Ser29 (PMID: 27046250; PMID: 27938392). In FIP3A and FU5A knock-in cells, ATG14 phosphorylation was significantly reduced, indicating decreased ULK1 activity (revised Figure S3D, E). This observation is consistent with previous work showing that FIP200 recruits the PI3K complex. Notably, in ATG13 knockout cells, ATG14 phosphorylation became almost undetectable, though the underlying mechanism remains to be fully investigated. Altogether, these data point to reduced ULK1 activity as a key factor explaining the autophagy deficiency observed in FU5A knock-in cells.
  
  We also explored possible downstream mechanisms. One well-established function of ATG13 is to recruit ATG9 vesicles (PMID: 36791199). These vesicles serve as an upstream platform for the PI3K complex, providing the substrate for phosphoinositide generation (PMID: 38342428). To clarify how our mutations impact this step, we starved ATG13-FLAG knock-in cells and observed ATG9 localization. Unexpectedly, even in FU5A knock-in cells where ATG13 is almost completely dissociated from the ULK1 complex, ATG9A still colocalized with FIP200 (revised Figure S3C). These puncta also overlapped with p62, likely because p62 bodies recruit both FIP200 and ATG9 vesicles. Although we suspect that ATG9 recruitment is nonetheless impaired under these conditions, we were unable to definitively demonstrate this experimentally and consider it an important avenue for future study.
  
  Reviewer #3 (Recommendations for the authors):
  
  Here are some additional minor suggestions:
  
  (1) The UBL domains are only mentioned in the abstract but not anywhere else in the manuscript. I suggest the authors add descriptions related to the UBL domains in the Results section.
  
  We thank the reviewer for pointing out the lack of description of UBL domains, which we added in Results in the revised manuscript.
  
  (2) The authors may want to consider adding a diagram in Figure 1A to show the domain organization of the three full-length proteins and the ranges of the three fragments in the predicted structure.
  
  We have added a proposed diagram as Figure 1A.
  
  (3) I suggest the authors consider highlighting in Figure 1A the positions of the binding sites shown in Figure 1B, for example, by adding arrows in Figure 1A.
  
  We have added arrows in the revised Figure 1B (which was Figure 1A in the original submission).
  
  (4) In Figure 1D, "Atg13-FLAG" should be "Atg13-FLAG FIP3A".
  
  We have revised the labeling in Figure 1D.
  
  (5) "the binding of ATG13 and ULK1 to the FIP200 dimer one by one" may need to be re-phrased. "One by one" conveys a meaning of "sequential", which is probably not what the authors meant to say.
  
  We have revised the sentence as “the binding of one molecule each of ATG13 and ULK1 to the FIP200 dimer”.
  
  (6) In "Wide interactions were predicted between the four molecules", I suggest changing "wide" to "extensive".
  
  We have changed “wide” to “extensive” in the revised manuscript.
  
  (7) In "which revealed that the tandem two microtubule-interacting and transport (MIT) domains in Atg1 bind to the tandem two MIT interacting motifs (MIMs) of ATG13", I suggest changing the two occurrences of "tandem two" to "two tandem" or simply "tandem".
  
  We simply used "tandem" in the revised manuscript.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.08.02.606296v3
www.biorxiv.org www.biorxiv.org

Integration of parallel pathways for flight control in a hawkmoth reflects prevalence and relevance of natural visual cues

5
1. Public_Reviews 05 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  This important study investigates how hummingbird hawkmoths integrate stimuli from across their visual field to guide flight behavior. Cue conflict experiments provide solid evidence for an integration hierarchy within the visual field: hawkmoths prioritize the avoidance of dorsal visual stimuli, potentially to avoid crashing into foliage, while they use ventrolateral optic flow to guide flight control. The paper will be of broad interest to enthusiasts of visual neuroscience and flight behavior.
  
  Summary
2. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Recent work has demonstrated that the hummingbird hawkmoth, Macroglossum stellatarum, like many other flying insects, use ventrolateral optic flow cues for flight control. However, unlike other flying insects, the same stimulus presented in the dorsal visual field, elicits a directional response. Bigge et al., use behavioral flight experiments to set these two pathways in conflict in order to understand whether these two pathways (ventrolateral and dorsal) work together to direct flight and if so, how. The authors characterize the visual environment (the amount of contrast and translational optic flow) of the hawkmoth and find that different regions of the visual field are matched to relevant visual cues in their natural environment and that the integration of the two pathways reflects a prioritization for generating behavior that supports hawkmoth safety rather than the prevalence for a particular visual cue that is more prevalent in the environment.
  
  Strengths:
  
  This study creatively utilizes previous findings that the hawkmoth partitions their visual field as a way to examine parallel processing. The behavioral assay is well-established and the authors take the extra steps to characterize the visual ecology of the hawkmoth habitat to draw exciting conclusions about the hierarchy of each pathway as it contributes to flight control.
  
  Review 1
3. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary
  
  Bigge and colleagues use a sophisticated free-flight setup to study visuo-motor responses elicited in different parts of the visual field in the hummingbird hawkmoth. Hawkmoths have been previously shown to rely on translational optic flow information for flight control exclusively in the ventral and lateral parts of their visual field. Dorsally presented patterns, elicit a formerly completely unknown response - instead of using dorsal patterns to maintain straight flight paths, hawkmoths fly, more often, in a direction aligned with the main axis of the pattern presented (Bigge et al, 2021). Here, the authors go further and put ventral/lateral and dorsal visual cues into conflict. They found that the different visuomotor pathways act in parallel, and they identified a 'hierarchy': the avoidance of dorsal patterns had the strongest weight and optic flow-based speed regulation the lowest weight. The authors linked their behavioral results to visual scene statistics in the hawkmoths' natural environment. The partition of ventral and dorsal visuomotor pathways is well in line with differences in visual cue frequencies. The response hierarchy, however, seems to be dominated by dorsal features, that are less frequent, but presumably highly relevant for the animals' flight safety.
  
  Strengths
  
  The data are very interesting and unique. The manuscript provides a thorough analysis of free-flight behavior in a non-model organism that is extremely interesting for comparative reasons (and on its own). These data are both difficult to obtain and very valuable to the field.
  
  Weaknesses
  
  While the present manuscript clearly goes beyond Bigge et al, 2021, the advance could have perhaps been even stronger with a more fine-grained investigation of the visual responses in the dorsal visual field. Do hawkmoths, for example, show optomotor responses to rotational optic flow in the dorsal visual field?
  
  I find the majority of the data, which are also the data supporting the main claims of the paper, compelling. However, the measurements of flight height are less solid than the rest and I think these data should be interpreted more carefully.
  
  Review 2
4. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Reviewer #3 (Public review):
  
  The authors have significantly improved the paper in revising to make its contributions distinct from their prior paper. They have also responded to my concerns about quantification and parameter dependency of the integration conclusion. While I think there is still more that could be done in this capacity, especially in terms of the temporal statistics and quantification of the conflict responses, they have a made a case for the conclusions as stated. The paper still stands as an important paper with solid evidence a bit limited by these concerns.
  
  Review 3
5. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Recent work has demonstrated that the hummingbird hawkmoth, Macroglossum stellatarum, like many other flying insects, use ventrolateral optic flow cues for flight control. However, unlike other flying insects, the same stimulus presented in the dorsal visual field elicits a directional response. Bigge et al., use behavioral flight experiments to set these two pathways in conflict in order to understand whether these two pathways (ventrolateral and dorsal) work together to direct flight and if so, how. The authors characterize the visual environment (the amount of contrast and translational optic flow) of the hawkmoth and find that different regions of the visual field are matched to relevant visual cues in their natural environment and that the integration of the two pathways reflects a priortiziation for generating behavior that supports hawkmoth safety rather than than the prevalence for a particular visual cue that is more prevalent in the environment.
  
  Strengths:
  
  This study creatively utilizes previous findings that the hawkmoth partitions their visual field as a way to examine parallel processing. The behavioral assay is well-established and the authors take the extra steps to characterize the visual ecology of the hawkmoth habitat to draw exciting conclusions about the hierarchy of each pathway as it contributes to flight control.
  
  Weaknesses:
  
  The work would be further clarified and strengthened by additional explanation included in the main text, figure legends, and methods that would permit the reader to draw their own conclusions more feasibly. It would be helpful to have all figure panels referenced in the text and referenced in order, as they are currently not. In addition, it seems that sometimes the incorrect figure panel is referenced in the text, Figure S2 is mislabeled with D-E instead of A-C and Table S1 is not referenced in the main text at all. Table S1 is extremely important for understanding the figures in the main text and eliminating acronyms here would support reader comprehension, especially as there is no legend provided for Table S1. For example, a reader that does not specialize in vision may not know that OF stands for optic flow. Further detail in figure legends would also support the reader in drawing their own conclusions. For example, dashed red lines in Figures 3 and 4 A and B are not described and the letters representing statistical significance could be further explained either in the figure legend or materials to help the reader draw their own conclusions.
  
  We appreciate the suggestions to improve the clarity of the manuscript. We have extensively re-structured the entire manuscript. Among others, we have referenced all figure panels in the text in the order they appear. To do so, we combined the optic flow and contrast measurements of our setup with the methods description of the behavioural experiments (formerly Figs. 5 and 2, respectively). This new figure 2 now introduces the methods of the study, while the remainder of Fig. 2, which presented the experiments that investigated the vetrolateral and dorsal response in more detail, is now a separate figure (Fig. 3). This arrangement also balances the amount of information contained in each figure better.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Bigge and colleagues use a sophisticated free-flight setup to study visuo-motor responses elicited in different parts of the visual field in the hummingbird hawkmoth. Hawkmoths have been previously shown to rely on translational optic flow information for flight control exclusively in the ventral and lateral parts of their visual field. Dorsally presented patterns, elicit a formerly completely unknown response - instead of using dorsal patterns to maintain straight flight paths, hawkmoths fly, more often, in a direction aligned with the main axis of the pattern presented (Bigge et al, 2021). Here, the authors go further and put ventral/lateral and dorsal visual cues into conflict. They found that the different visuomotor pathways act in parallel, and they identified a 'hierarchy': the avoidance of dorsal patterns had the strongest weight and optic flow-based speed regulation the lowest weight.
  
  Strengths:
  
  The data are very interesting, unique, and compelling. The manuscript provides a thorough analysis of free-flight behavior in a non-model organism that is extremely interesting for comparative reasons (and on its own). These data are both difficult to obtain and very valuable to the field.
  
  Weaknesses:
  
  While the present manuscript clearly goes beyond Bigge et al, 2021, the advance could have perhaps been even stronger with a more fine-grained investigation of the visual responses in the dorsal visual field. Do hawkmoths, for example, show optomotor responses to rotational optic flow in the dorsal visual field?
  
  We thank the reviewer for the feedback, and the suggestions for improvement of the manuscript (our implementations are detailed below). We fully agree that this study raises several intriguing questions regarding the dorsal visual response, including how the animals perceive and respond to rotational optic flow in their dorsal visual field, particularly since rotational optic flow may be processed separately from translational optic flow.
  
  In our free-flight setup, it was not possible to generate rotational optic flow in a controlled manner. To explore this aspect more systematically, a tethered-flight setup would be ideal, or alternatively, a free-flight setup integrated with virtual reality. This would be a compelling direction for a follow-up study.
  
  Reviewer #3 (Public review):
  
  The central goal of this paper as I understand it is to extract the "integration hierarchy" of stimulus in the dorsal and ventrolateral visual fields. The segregation of these responses is different from what is thought to occur in bees and flies and was established in the authors' prior work. Showing how the stimuli combine and are prioritized goes beyond the authors' prior conclusions that separated the response into two visual regions. The data presented do indeed support the hierarchy reported in Figure 5 and that is a nice summary of the authors' work. The moths respond to combinations of dorsal and lateral cues in a mixed way but also seem to strongly prioritize avoiding dorsal optic flow which the authors interpret as a closed and potentially dangerous ecological context for these animals. The authors use clever combinations of stimuli to put cues into conflict to reveal the response hierarchy.
  
  My most significant concern is that this hierarchy of stimulus responses might be limited to the specific parameters chosen in this study. Presumably, there are parameters of these stimuli that modulate the response (spatial frequency, different amounts of optic flow, contrast, color, etc). While I agree that the hierarchy in Figure 5 is consistent for the particular stimuli given, this may not extend to other parameter combinations of the same cues. For example, as the contrast of the dorsal stimuli is reduced, the inequality may shift. This does not preclude the authors' conclusions but it does mean that they may not generalize, even within this species. For example, other cue conflict studies have quantified the responses to ranges of the parameters (e.g. frequency) and shown that one cue might be prioritized or up-weighted in one frequency band but not in others. I could imagine ecological signatures of dorsal clutter and translational positioning cues could depend on the dynamic range of the optic flow, or even having spatial-temporal frequency-dependent integration independent of net optic flow.
  
  We absolutely agree that in principle, an observed integration hierarchy is only valid for the stimuli tested. Yet, we do believe that we provide good evidence that our key observations are robust also for related stimuli to the ones tested:
  
  Most importantly, we found that both pathways act in parallel (and are not mutually exclusive, or winner-takes-all, for example), when the animals can enact the locomotion induced by the dorsal and ventrolateral pathway. We tested this with the same dorsal cue (the line switching direction), but different behavioural paradigms (centring vs unilateral avoidance), and different ventrolateral stimuli (red gratings of one spatial frequency, and 100% nominal contrast black-and-white checkerboard stimuli which comprised a range of spatial frequencies) – and found the same integration strategy.
  
  Certainly, if the contrast of the visual cues was reduced to the point that the dorsal or ventrolateral responses became weaker, we would expect this to be visible in the combined responses, with the respective reduction in response strength for either pathway, to the same degree as they would be reduced when stimuli were shown independently in the dorsal and ventrolateral visual field.
  
  For testing whether the animals would show a weighting of responses when it was not possible to enact locomotion to both pathways, we felt it was important to use similar external stimuli to be able to compare the responses. So we can confidently interpret their responses in terms of integration. Indeed, how this is translated to responses in the two pathways depends a) on the spatiotemporal tuning, contrast sensitivity and exact receptive fields of the two systems, b) the geometry of the setup and stimulus coverage, and therefore the ability of the animals to enact responses to both pathways independently and c) on the integration weights.
  
  It would indeed be fascinating to obtain this tuning and the receptive fields, and having these, test a large array of combinations of stimuli and presentation geometries, so that one could extract integration weights for different presentation scenarios from the resulting flight responses in a future study.
  
  We also expanded the respective discussion section to reflect these points: l. 391-417. We also updated the former Fig. 5, now Fig. 6 to reflect this discussion.
  
  The second part of this concern is that there seems to be a missed opportunity to quantify the integration, especially when the optic flow magnitude is already calculated. The discussion even highlights that an advantage of the conflict paradigm is that the weights of the integration hierarchy can be compared. But these weights, which I would interpret as stimulus-responses gains, are not reported. What is the ratio of moth response to optic flow in the different regions? When the moth balances responses in the dorsal and ventrolateral region, is it a simple weighted average of the two? When it prioritizes one over the other is the response gain unchanged? This plays into the first concern because such gain responses could strongly depend on the specific stimulus parameters rather than being constant.
  
  Indeed, we set up stimuli that are comparable, as they are all in the visual domain, and since we can calculate their external optic flow and contrast magnitudes, to control for imbalances in stimulus presentation, which is important for the interpretation of the resulting data.
  
  As we discussed above, we are confident that we are observing general principles of the integration of the two parallel pathways. However, we refrained from calculating integration weights, because these might be misleading for several reasons:
  
  (1) In situations where the animals can enact responses to both pathways, we show that they do so at the full original magnitudes. So there are no “weights” of the hierarchy in this case.
  
  (2) Only when responses to both systems are not possible in parallel, do we see a hierarchy. However, combined with point (1), this hierarchy likely depends on the geometry of the moths’ environment: it will be more pronounced the less both systems can be enacted in parallel.
  
  (3) The hierarchy also does not affect all features of the dorsal or ventrolateral pathway equally. The hawkmoths still regulate their perpendicular distance to ventral gratings with dorsal gratings present, to same degree as with only ventral grating - because perpendicular distance regulation is not a feature of the dorsal response. And while the hawkmoths show a significant reduction in their position adjustment to dorsal contrast when it is in conflict with lateral gratings (Fig. 4C), they show exactly the same amount of lateral movement and speed adjustment as for dorsal gratings alone, when not combined with lateral ones (Fig. 4D and Fig. S3A). So even for one particular setup geometry and stimulus combination, there clearly is not one integration weight for all features of the responses.
  
  We extended the discussion section to clarify these points “The benefit of our study system is that the same cues activate different control pathways in different regions of the visual field, so that the resulting behaviour can directly be interpreted in terms of integration weights” (l. 448-451)
  
  l. 391-417, we also updated the former Fig. 5, now Fig. 6 to reflect this discussion.
  
  The authors do explain the choice of specific stimuli in the context of their very nice natural scene analysis in Fig. 1 and there is an excellent discussion of the ecological context for the behaviors. However, I struggled to directly map the results from the natural scenes to the conclusions of the paper. How do they directly inform the methods and conclusions for the laboratory experiments? Most important is the discussion in the middle paragraph of page 12, which suggests a relationship with Figure 1B, but seems provocative but lacking a quantification with respect to the laboratory stimuli.
  
  We show that contrast cues and translational optic flow are not homogeneously distributed in the natural environments of hawkmoths. This directly related to our laboratory findings, when it comes to responses to these stimuli in different parts of their visual field. In order to interpret the results of these behavioural experiments with respect to the visual stimuli, we did perform measurements of translational optic flow and contrast cues in the laboratory setup. As a result, we make several predictions about the animals’ use of translational optic flow and contrast cues in natural settings:
  
  a) Hawkmoths in the lab responded strongest to ventral optic flow, even though it was not stronger in magnitude, given our measurements, than lateral optic flow. Thus, we propose that the stronger response to ventral optic flow might be an evolutionary adaptation to the natural distribution of translational optic flow cues.
  
  b) In the natural habitats of hawkmoths, dorsal coverage is much less frequent that ventrolateral structures generating translational optic flow, yet the hawkmoths responded with a much higher weight to the former. Moreover, in our flight tunnel experiments, the animals responded with the same or higher weights to dorsal cues, which had a lower magnitude of translational optic flow and contrast than the same cues in the ventrolateral visual field. So we showed, combining behavioural experiments and stimulus measurements in the lab that the weighting of dorsal and ventrolateral cues did not follow their stimulus magnitude in the lab. Moreover, comparing to the natural cue distributions, we suggest that the integration weights also did not evolve to match the prevalence of these cues in natural habitats.
  
  We integrated the measurements of natural visual scene statistics in the new Fig. 6, to relate the behavioural findings to the natural context also in the figure structure, and sequence logic of the text, as they are discussed here.
  
  The central conclusion of the first section of the results is that there are likely two different pathways mediating the dorsal and the ventrolateral response. This seems reasonable given the data, however, this was also the message that I got from the authors' prior paper (ref 11). There are certainly more comparisons being done here than in that paper and it is perfectly reasonable to reinforce the conclusion from that study but I think what is new about these results needs to be highlighted in this section and differentiated from prior results. Perhaps one way to help would be to be more explicit with the open hypotheses that remain from that prior paper.
  
  We appreciate the suggestion to highlight more clearly what the open questions that are addressed in this study are. As a result, we have entirely restructured the introduction, added sections to the discussion and fundamentally changed the graphical result summary in Fig. 6, to reflect the following new findings (and differences to the previous paper):
  
  The previous paper demonstrated that there are two different pathways in hummingbird hawkmoths that mediate visual flight guidance, and newly described one of them, the dorsal response. This established flight guidance in hummingbird hawkmoths as a model for the questions asked in the current study, which are very different in nature from the previous paper.
  
  The main question addressed in the current study is how these two flight guidance pathways interact to generate consistent behaviour? Throughout the literature of parallel sensory and motor pathways guiding behaviour, there are different solutions – from winner-takes-all to equal mixed responses. We tested this fundamental question using the hummingbird hawkmoth flight guidance systems as a model.
  
  This is the main question addressed in the various conflict experiments in this study, and we show that indeed, the two systems operate in parallel. As long as the animals can enact both dorsal and optic-flow responses, they do so at the original strengths of the responses. Only when this is not possible, hierarchies become visible. We carefully measured the optic flow and contrast cues generated by the different stimuli to ensure that the hierarchies we observed were not generated by imbalances of the external stimuli.
  
  - Does the interaction hierarchy of the two pathways follow the statistics of natural environments? We did show qualitatively previously how optic flow and contrast cues are distributed across the visual field in natural habitats of the hummingbird hawkmoth. In this study, we quantitatively analysed the natural image data, including a new analysis for the contrast edges, and statistically compared the results across conditions. This quantitative analysis supported the previous qualitative assessment that the prevalence of translational optic flow was highest in the ventral and lowest in the dorsal visual field in all natural habitat types. The distribution of contrast edges across the visual field did depend on habitat type much stronger than visible in the qualitative analysis in the previous paper. When compared to the magnitude of the behavioural responses, and considering that the hummingbird hawkmoth is predominantly found in open and semi-open habitats, the natural distributions of optic flow and contrast edges did not align with the response hierarchy observed in our laboratory experiments. Dorsal cues elicited much stronger responses relative to ventrolateral optic flow responses than would be expected.
  
  To provide a more complete picture of the dorsal pathway, which will be important to understand its nature, and also compare to other species, we conducted additional experiments that were specifically set up to test for response features known from the translational optic flow response. To compare and contrast the two systems. These experiments here allowed us to show that the dorsal response is not simply a translational optic flow reduction response that creates much stronger output than the ventrolateral optic flow response. We particularly show that the dorsal response was lacking the perpendicular distance regulation of the optic flow response, while it did provide alignment with prominent contrasts (possibly to reduce the perceived translational optic flow), which is not observed in the ventrolateral optic flow response. The strong avoidance of any dorsal contrast cues, not just those inducing translational optic flow, is another feature not found in the ventrolateral pathway.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  Many comparisons between visual conditions are made and it was confusing at times to know which conditions the authors were comparing. Thinking of a way to label each condition with a letter or number so that the authors could specify which conditions are specifically being compared would greatly enhance comprehension and readability.
  
  We appreciate this concern. To be able to refer to the individual stimulus conditions in the analysis and results description, we gave each stimulus a unique identifier (see table S1), and provided these identifiers in the respective figures and throughout the text. We hope that this makes the identification of the individual stimuli easier.
  
  Consider adding in descriptive words to the y-axis labels for the position graphs that would help the reader quickly understand what a positive or negative value means with respect to the visual condition.
  
  We did now change the viewpoint on the example tracks in Figs. 2-5, to take a virtual viewpoint from the top, not as the camera recorded from below, which requires some mental rotation to reconcile the left and right sides. Moreover, we noticed that the example track axes were labelled in mm, while the axes for the plots showing median position in the tunnel were labelled in cm. We reconciled the units as well. This will make it easier to see the direct equivalent of the axis (as well as positive and negative values) in the example tracks in those figures, and the median positions, as well as the cross-index.
  
  There are no line numbers provided so it is a bit challenging to provide feedback on specific sentences but there are a handful of typos in the manuscript, a few examples:
  
  (1) Cue conflict section, first paragraph: "When both cues were presented to in combination, ..." (remove to)
  
  (2) The ecological relevance section, first paragraph, first sentence: "would is not to fly"
  
  (3) Figure S3 legend: explanation for C is labeled as B and B is not included with A
  
  We apologise for the missing line numbers. We added these and resolved the issues 1-3.
  
  Reviewer #2 (Recommendations for the authors):
  
  - The pictograms in Fig. 1a were at first glance not clear to me, maybe adding l, r, d, v to the first pictogram could make the figure more immediately accessible.
  
  We added these labels to make it more accessible.
  
  - I would suggest noting in the main text that the red patterns were chosen for technical reasons (see Methods), if this is correct.
  
  We added this information and a reference to the methods in the main text (lines 100-102).
  
  - "Thus, hawkmoths are currently the only insect species for which a partitioning of the visual field has been demonstrated in terms of optic-flow-based flight control [33-35]." I think that is a bit too strong and maybe it would be more interesting to connect the current data to connected data in other insects to perhaps discuss important similarities. Ref 32 for example shows that fruit flies weigh ventral translational optic flow considerably more than dorsal translational optic flow. Reichardt 1983 (Naturwissenschaften) showed that stripe fixation in large flies (a behaviour relying in part on the motion pathway) is confined to the ventral visual field, etc...
  
  We have changed this sentence to acknowledge partitioning in other insects, and motivating the use of our model species for this study: While fruit flies weight ventral translational optic flow stronger than dorsal optic flow, the most extreme partitioning of the visual field in terms of optic-flow-based flight control has been observed in hawkmoths [33-35]. (lines 60-62)
  
  - I think the statistical differences group mean differences could be described in more detail at least in Fig. 2 (to me the description was not immediately clear, in particular with the double letters).
  
  We added an explanation of the letter nomenclature to all respective figure legends:
  
  Black letters show statistically significant differences in group means or median, depending on the normality of the test residuals (see Methods, confidence level: 5%). The red letters represent statistically significant differences in group variance from pairwise Brown–Forsythe tests (significance level 5%). Conditions with different letters were significantly different from each other. The white boxplots depict the median and 25% to 75% range, the whiskers represent the data exceeding the box by more than 1.5 interquartile ranges, and the violin plots indicate the distribution of the individual data points shown in black.
  
  - "When translational optic flow was presented laterally" I would use a more wordy description, since it is the hawkmoth that is controlling the optic flow and in addition to translational optic flow, there might also be rotational components, retinal expansion etc.
  
  We extended the description to explain that the moths were generating the optic flow percept based on stationary gratings in different orientations, by way of their flight through the tunnel. Lines 127-129
  
  - While it is clearly stated that the measure of the perpendicular distance from the ventral and dorsal pattern via the size of the insect as seen by the camera is indirect, I would suggest to determine the measurement uncertainty of distance estimate.
  
  - Connected to above - is the hawkmoth area averaged over the entire flight and is the variance across frames similar in all the stimuli conditions? Is it, in principle, conceivable that the hawkmoths' pitch (up or down) is different across conditions, e.g. with moths rising and falling more frequently in a certain condition, which could influence the area in addition to distance?
  
  There are a number of sources that generate variance in the distance estimate (which was based on the size of the moth in each video frame, after background subtraction): the size of the animal, the contrast with which the animal was filmed (which also depended on the type of pattern in the tunnel – it was lower with ventral or dorsal patterns as a background than with lateral ones), and the speed of the animal, as motion blur could impact the moth’s image on the video. The latter is hard to calibrate, but the uncertainty related to animal size and pattern types could theoretically be estimated. However, since we moved between finishing the data acquisition for this study and publishing the paper, the original setup has been dismantled. We could attempt to recreate it as faithfully as possible, but would be worried to introduce further noise. We therefore decided to not attempt to characterise the uncertainty, to not give a false impression of quantifiability of this measure. For the purpose of this study, it will have to remain a qualitative, rather than a quantitative measure. If we should use a similar measure again, we will make sure to quantify all sources of uncertainty that we have access to.
  
  The variance in area is different between conditions. Most likely, the animals vary their flight height different for different dorsal and ventral patterns, as they vary their lateral flight straightness with different lateral visual input. For the reasons mentioned above, we cannot disentangle the effects of variations in flight height and other sources of uncertainty relating to animal size in the video frames. We therefore averaged the extracted area across the entire flight, to obtain a coarse measure of their flight height. Future studies focusing specifically on the vertical component or filming in 3D will be required to determine the exact amount of vertical flight variation.
  
  - Results second paragraph, suggestion: pattern wavelength or spatial frequency instead of spatial resolution.
  
  - Same paragraph, suggestion: For an optimal wavelength/spatial frequency of XX
  
  We corrected these to spatial frequency.
  
  - Above Fig 3- "this strongly suggests a different visual pathway". In my opinion it would be better to say sensory-motor /visuomotor pathway or to more clearly define visual pathway? Could one in principle imagine a uniform set of local motion sensitive neurons across the entire visual field that connect differentially to descending/motor neurons.
  
  We appreciate this point and changed this, and further instances in the manuscript to visuomotor pathway.
  
  - If I understood correctly, you calculated the magnitude of optic flow in the different tunnel conditions based on the image of a fisheye camera moving centrally in the tunnel, equidistant from all walls. I did not understand why the magnitude of optic flow should differ between the four quadrants showing the same squarewave patterns. Apologies if I missed something, but maybe it is worth explaining this in more detail in the manuscript.
  
  We recognize that this point may not have been immediately clear and have therefore provided additional clarification in the Methods and results section (lines 106-111, 543-549). We anticipated differences in the magnitude of optic flow due to potential contrast variations arising from the way the stimuli were generated—being mounted on the inner surfaces of different tunnel walls while the light source was positioned above. On the dorsal wall, light from the overhead lamps passed through the red material. For laterally mounted patterns, the animals perceived mainly reflected light, as these tunnel walls were not transparent.
  
  A similar principle applied to the background, which consisted of a white diffuser allowing light to pass through dorsally, but white non-transmissive paper laterally, with a 5% contrast random checkerboard patterns. The ventral side presented a more complex scenario, as it needed to be partially transparent for the ventrally mounted camera. Consequently, the animals perceived a combination of light reflections from the red patterns and the white gauze covering the ventral tunnel side, against the much darker background of the surrounding room.
  
  To ensure that the observed flight responses were not artifacts of deviations in visual stimulation from an ideal homogeneous environment, we used the camera to quantify the magnitude of optic flow and contrast patterns under these real experimental conditions. This approach also allowed us to directly relate the optic flow measurements taken indoors to those recorded outdoors, as we employed the same camera and analytical procedures for both datasets.
  
  Reviewer #3 (Recommendations for the authors):
  
  In addition to the considerations above I had a few minor points:
  
  There are so many different directions of stimuli and response that it is quite challenging to parse the results. Can this be made a little easier for the reader?
  
  We appreciate this concern. To be able to refer to the individual stimulus conditions in the analysis and results description, we gave each stimulus a unique identifier (see table S1), and provided these identifiers in the respective figures and throughout the text. We hope that this makes the identification of the individual stimuli easier.
  
  One suggestion (only a suggestion): I found myself continuously rotating the violin plots in my head so that the lateral position axis lined up with the lateral position of the tunnel icons below. Consider if rotating the plots 90 degs would help interpretability. It was challenging to keep track of which side was side.
  
  We did discuss this with a number of test-readers, and tried multiple configurations. They all have advantages and drawbacks, but we decided that the current configuration for the majority of testers was the current one. To help the mental transformations from the example flight tracks in the figures, we now present the example flight tracks in Figs. 2-5 in the same reference frame as the figures showing median position (so positive and negative values on those axes correspond directly), and changed the view from a below the tunnel to an above the tunnel view, as this is the more typical depiction. We hope that this enhances readability.
  
  Are height measurements sensitive to the roll and pitch of the animal? I suspect this is likely small but worth acknowledging.
  
  They are indeed. These effects are likely small but contribute to the overall inaccuracy, which we could not quantify in this particular setup (see also response to reviewer 2 on that point), which is why the height measurements have to be considered a qualitative approximation rather than a quantification of flight height. We added text to acknowledge the effects of roll and pitch specifically (lines 657-658)
  
  The Brown-Forsythe test was reported as paired but this seems odd because the same moths were not used in each condition. Maybe the authors meant something different by "paired" than a paired statistical design?
  
  Indeed, the data was not paired in the sense that we could attribute individual datapoints to individual moths across conditions. We applied the Brown-Forsythe test in a pairwise manner, comparing the variance of each condition with another one in pairs each, to test if the variance in position differed across conditions. We did phrase this misleadingly, and have corrected it to „The variance in the median lateral position (in other words, the spread of the median flight position) was statistically compared between the groups using the pairwise Brown–Forsythe tests“ l. 187-188
  
  There is some concern about individual moth preferences and bias due to repeated measures. I appreciate that the individual moth's identity was not likely known in most cases, but can the authors provide an approximate breakdown of how many individual moths provided the N sample trajectories?
  
  This is a very valid concern, and indeed one we did investigate in a previous study with this setup. We confirmed that the majority of animals (70%, 68% and 53% out of 40 hawkmoths, measured on three consecutive days) crossed the tunnel within a randomly picked window of 3h (Stöckl et al. 2019). We now state this explicitly in the methods section (lines 594-597). Thus, for the sample sizes in our study, statistically, each moth would have contributed a small number of tracks compared to the overall number of tracks sampled.
  
  The statistics section of the methods said that both Tukey-Kramer (post-hoc corrected means) and Kruskal-Wallis (non-parametric medians) were done. It is sometimes not clear which test was done for which figure, and where the Kruskal-Wallis test was done there does not seem to be a corrected statistical significance threshold for the many multiple comparisons (Fig. 2). It is quite possible I am just missing the details and they need to be clarified. I think there also needs to be a correction for the Brown-Forsythe tests but I don't know this method well.
  
  We first performed an ANOVA, and if the test residuals were not normally distributed, we used a Kruskal-Wallis test instead. For the post-hoc tests of both we used Tukey-Kramer to correct for multiple comparisons. The figure legends did indeed miss this information. We added it to clarify our statistical analysis strategy and refer to the methods section for more details (i.e. l. 185-186). All statistical results, including the type of statistical test used, have been uploaded to the data repository as well.
  
  The connection to stimulus reliability in the discussion seems to conflate reliability with prevalence or magnitude.
  
  We have rephrased the respective discussion sections to clearly separate the prevalence and magnitude of stimuli, which was measured, from an implied or hypothesized reliability (lines 510-511).
  
  Line numbers would be helpful for future review.
  
  We apologize for missing the line numbers and have added them to the revised manuscript.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.08.24.609346v3
www.biorxiv.org www.biorxiv.org

Disentangling acute motor deficits and adaptive responses evoked by the loss of cerebellar output

5
1. Public_Reviews 05 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  Using a unique cerebellar disruption approach in non-human primates, this study provides valuable new insight into how cerebellar inputs to the motor cortex contribute to reaching. The findings convincingly demonstrate that reaching movements following cerebellar disruption slow down because of both an acute deficit in producing muscle activity as well as a progressive decline in compensating for limb dynamics. This work will be of interest to neuroscientists and clinicians interested in cerebellar function and pathology.
  
  Summary
2. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  In a previous work Prut and colleagues had shown that during reaching, high frequency stimulation of the cerebellar outputs resulted in reduced reach velocity. Moreover, they showed that the stimulation produced reaches that deviated from a straight line, with the shoulder and elbow movements becoming less coordinated. In this report they extend their previous work by addition of modeling results that investigate the relationship between the kinematic changes and torques produced at the joints. The results show that the slowing is not due to reductions in interaction torques alone, as the reductions in velocity occur even for movements that are single joint. More interestingly, the experiment revealed evidence for decomposition of the reaching movement, as well as an increase in the variance of the trajectory.
  
  Strengths:
  
  This is a rare experiment in a non-human primate that assessed the importance of cerebellar input to the motor cortex during reaching.
  
  Weaknesses:
  
  None
  
  Review 1
3. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  This manuscript asks an interesting and important question: what part of 'cerebellar' motor dysfunction is an acute control problem vs a compensatory strategy to the acute control issue? The authors use a cerebellar 'blockade' protocol, consisting of high frequency stimuli applied to the cerebellar peduncle which is thought to interfere with outflow signals. This protocol was applied in monkeys performing center out reaching movements and has been published from this laboratory in several preceding studies. I found the take-home-message broadly convincing and clarifying - that cerebellar block reduces muscle activation acutely particularly in movements that involve multiple joints and therefore invoke interaction torques, and that movements progressively slow down to in effect 'compensate' for these acute tone deficits. The manuscript was generally well written, data were clear, convincing and novel. The key strengths are differentiating acute from sub-acute (within session but not immediate) kinematic consequences of cerebellar block.
  
  Review 2
4. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  In their revised manuscript, Sinha and colleagues aim to identify distinct causes of motor impairments seen when perturbing cerebellar circuits. This goal is an important one, given the diversity of movement related phenotypes in patients with cerebellar lesion or injury, which are especially difficult to dissect given the chronic nature of the circuit damage. To address this goal, the authors use high-frequency stimulation (HFS) of the superior cerebellar peduncle in monkeys performing reaching movements. HFS provides an attractive approach for transiently disrupting cerebellar function previously published by this group. First, they find a reduction in hand velocities during reaching, which was more pronounced for outward versus inward movements. By modeling inverse dynamics, they find evidence that shoulder muscle torques are especially affected. Next, the authors examine the temporal evolution of movement phenotypes over successive blocks of HFS trials. Using this analysis, they find that in addition to the acute, specific effects on torques in early HFS trials, there was an additional progressive reduction in velocity during later trials, which they interpret as an adaptive response to the inability to effectively compensate for interaction torques during cerebellar block. Finally, the authors examine movement decomposition and trajectory, finding that even when low velocity reaches are matched to controls, HFS produces abnormally decomposed movements and higher than expected variability in trajectory.
  
  Strengths:
  
  Overall, this work provides important insight into how perturbation of cerebellar circuits can elicit diverse effects on movement across multiple timescales.
  
  The HFS approach provides temporal resolution and enables analysis that would be hard to perform in the context of chronic lesions or slow pharmacological interventions. Thus, this study describes an important advance over prior methods of circuit disruption in the monkey, and their approach can be used as a framework for future studies that delve deeper into how additional aspects of sensorimotor control are disrupted (e.g., response to limb perturbations).
  
  In addition, the authors use well-designed behavioral approaches and analysis methods to distinguish immediate from longer-term adaptive effects of HFS on behavior. Moreover, inverse dynamics modeling provides important insight into how movements with different kinematics and muscle dynamics might be differentially disrupted by cerebellar perturbation.
  
  Remaining comments:
  
  The argument that there are acute and adaptive effects to perturbing cerebellar circuits is compelling, but there seems to be a lost opportunity to leverage the fast and reversible nature of the perturbations to further test this idea and strengthen the interpretation. Specifically, the authors could have bolstered this argument by looking at the effects of terminating HFS - one might hypothesize that the acute impacts on joint torques would quickly return to baseline in the absence of HFS, whereas the longer-term adaptive component would persist in the form of aftereffects during the 'washout' period. As is, the reversible nature of the perturbation seems underutilized in testing the authors' ideas. While this experimental design was not implemented here, it seems like a good opportunity for future work using these approaches.
  
  The analysis showing that there is a gradual reduction in velocity during what the authors call an adaptive phase is convincing. While it is still not entirely clear why disruption of movement during the adaptive phase is not seen for inward targets, despite the fact that many of the inward movements also exhibit large interaction torques, the authors do raise potential explanations in the Discussion.
  
  Review 3
5. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews
  
  Reviewer #1 (Public review):
  
  Summary:
  
  In a previous work Prut and colleagues had shown that during reaching, high frequency stimulation of the cerebellar outputs resulted in reduced reach velocity. Moreover, they showed that the stimulation produced reaches that deviated from a straight line, with the shoulder and elbow movements becoming less coordinated. In this report they extend their previous work by addition of modeling results that investigate the relationship between the kinematic changes and torques produced at the joints. The results show that the slowing is not due to reductions in interaction torques alone, as the reductions in velocity occur even for movements that are single joint. More interestingly, the experiment revealed evidence for decomposition of the reaching movement, as well as an increase in the variance of the trajectory.
  
  Strengths:
  
  This is a rare experiment in a non-human primate that assessed the importance of cerebellar input to the motor cortex during reaching.
  
  Weaknesses:
  
  None
  
  Reviewer #1 (Recommendations for the authors):
  
  The authors have answered my questions adequately and I have no further comments.
  
  Reviewer #2 (Public review):
  
  This manuscript asks an interesting and important question: what part of 'cerebellar' motor dysfunction is an acute control problem vs a compensatory strategy to the acute control issue? The authors use a cerebellar 'blockade' protocol, consisting of high frequency stimuli applied to the cerebellar peduncle which is thought to interfere with outflow signals. This protocol was applied in monkeys performing center out reaching movements and has been published from this laboratory in several preceding studies. I found the takehome-message broadly convincing and clarifying - that cerebellar block reduces muscle activation acutely particularly in movements that involve multiple joints and therefore invoke interaction torques, and that movements progressively slow down to in effect 'compensate' for these acute tone deficits. The manuscript was generally well written, data were clear, convincing and novel. The key strengths are differentiating acute from subacute (within session but not immediate) kinematic consequences of cerebellar block.
  
  Reviewer #2 (Recommendations for the authors):
  
  I think the manuscript is good as is. That said, it would have been nice to see more of the behavioral outcomes in Figure 5 (e.g. decomposition and trajectory variability) analyzed longitudinally like the velocity measurements in Fig. 4. This would clearly strengthen the insight into acute and compensatory components of cerebellar motor deficits.
  
  The two behavioral measures of motor noise used in our study are movement decomposition and trajectory variability (Figure 5). Since trajectory variability is measured across trials we could not analyze this measure longitudinally as a function of trial number. However, following the reviewer’s advice, we examined movement
  
  decomposition for successive trials in control vs. cerebellar block for movements to targets 2-4 similar to the analysis of hand velocity in figure 4. We found no interaction effect between trial sequence x cerebellar block on movement decomposition. This result is consistent with our conclusion that noisy joint activation occurs independently of adaptive slowing of multi-joint movements. We have updated our main text (lines 293-299) and supplementary information (supplementary figure S5 and supplementary table S8) to include this result.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  In their revised manuscript, Sinha and colleagues aim to identify distinct causes of motor impairments seen when perturbing cerebellar circuits. This goal is an important one, given the diversity of movement related phenotypes in patients with cerebellar lesion or injury, which are especially difficult to dissect given the chronic nature of the circuit damage. To address this goal, the authors use high-frequency stimulation (HFS) of the superior cerebellar peduncle in monkeys performing reaching movements. HFS provides an attractive approach for transiently disrupting cerebellar function previously published by this group. First, they find a reduction in hand velocities during reaching, which was more pronounced for outward versus inward movements. By modeling inverse dynamics, they find evidence that shoulder muscle torques are especially affected. Next, the authors examine the temporal evolution of movement phenotypes over successive blocks of HFS trials. Using this analysis, they find that in addition to the acute, specific effects on torques in early HFS trials, there was an additional progressive reduction in velocity during later trials, which they interpret as an adaptive response to the inability to effectively compensate for interaction torques during cerebellar block. Finally, the authors examine movement decomposition and trajectory, finding that even when low velocity reaches are matched to controls, HFS produces abnormally decomposed movements and higher than expected variability in trajectory.
  
  Strengths:
  
  Overall, this work provides important insight into how perturbation of cerebellar circuits can elicit diverse effects on movement across multiple timescales.
  
  The HFS approach provides temporal resolution and enables analysis that would be hard to perform in the context of chronic lesions or slow pharmacological interventions. Thus, this study describes an important advance over prior methods of circuit disruption in the monkey, and their approach can be used as a framework for future studies that delve deeper into how additional aspects of sensorimotor control are disrupted (e.g., response to limb perturbations).
  
  In addition, the authors use well-designed behavioral approaches and analysis methods to distinguish immediate from longer-term adaptive effects of HFS on behavior. Moreover, inverse dynamics modeling provides important insight into how movements with different kinematics and muscle dynamics might be differentially disrupted by cerebellar perturbation.
  
  In this revised version of the manuscript, the authors have provided additional analyses and clarification that address several of the comments from the original submission.
  
  Remaining comments:
  
  The argument that there are acute and adaptive effects to perturbing cerebellar circuits is compelling, but there seems to be a lost opportunity to leverage the fast and reversible nature of the perturbations to further test this idea and strengthen the interpretation. Specifically, the authors could have bolstered this argument by looking at the effects of terminating HFS - one might hypothesize that the acute impacts on joint torques would quickly return to baseline in the absence of HFS, whereas the longer-term adaptive component would persist in the form of aftereffects during the 'washout' period. As is, the reversible nature of the perturbation seems underutilized in testing the authors' ideas. While this experimental design was not implemented here, it seems like a good opportunity for future work using these approaches.
  
  We agree with the reviewer that examining the effect of the cerebellar block on immediate post-block washout trials in future studies will be insightful.
  
  The analysis showing that there is a gradual reduction in velocity during what the authors call an adaptive phase is convincing. While it is still not entirely clear why disruption of movement during the adaptive phase is not seen for inward targets, despite the fact that many of the inward movements also exhibit large interaction torques, the authors do raise potential explanations in the Discussion.
  
  The text in the Introduction and in the prior work developing the HFS approach overstates the selectivity of the perturbations. First, there is an emphasis on signals transmitted to the neocortex. As the authors state several times in the Discussion, there are many subcortical targets of the cerebellar nuclei as well, and thus it is difficult to disentangle target-specific behavioral effects using this approach. Second, the superior cerebellar peduncle contains both cerebellar outputs and inputs (e.g., spinocerebellar). Therefore, the selectivity in perturbing cerebellar output feels overstated. Readers would benefit from a more agnostic claim that HFS affects cerebellar communication with the rest of the nervous system, which would not affect the major findings of the study. In the revised manuscript, the authors do provide additional anatomical and evolutionary context and discuss potential limitations in the selectivity of HFS in the Materials and Methods. However, I feel that at least a brief mention of these caveats in the Introduction, where it is stated, "we then reversibly blocked cerebellar output to the motor cortex", would benefit the reader.
  
  Following the advice of the reviewer, we have now revised the introduction section of our manuscript in the following way (lines 61-67):
  
  “…We then reversibly disrupted cerebellar communication with other neural structures using high-frequency stimulation (HFS) of the superior cerebellar peduncle, assessing the impact of this perturbation on subsequent movements. Although our approach primarily affects cerebellar output to the motor cortex, it also disrupts fibers carrying input signals (e.g., spinocerebellar) and pathways to various subcortical targets (e.g., cerebellorubrospinal). Thus, our manipulation broadly interferes with cerebellar communication…”
  
  Reviewer #3 (Recommendations for the authors):
  
  Typo on line 102; "subs-sessions"
  
  We have corrected this typographical error in our revised manuscript (line 106).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.05.21.595172v6
www.biorxiv.org www.biorxiv.org

Expanding Automated Multiconformer Ligand Modeling to Macrocycles and Fragments

4
1. Public_Reviews 05 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  The work presents a valuable extension of qFit-ligand, a computational method for modeling conformational heterogeneity of ligands in X-ray crystallography and cryo-EM density maps. The authors provide solid evidence of improved capabilities through careful validation against the previous version, particularly in expanding ligand sampling within conformational space. Such improvements suggest practical utility for challenging applications, including macrocyclic compound modeling and crystallographic drug fragment screening.
  
  Summary
2. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Flowers et al describe an improved version of qFit-ligand, an extension of qFit. qFit and qFit-ligand seek to model conformational heterogeneity of proteins and ligands, respectively, cryo-EM and X-ray (electron) density maps using multiconformer models-essentially extensions of the traditional alternate conformer approach in which substantial parts of the protein or ligand are kept in place. By contrast, ensemble approaches represent conformational heterogeneity through a superposition of independent molecular conformations.
  
  The authors provide a clear and systematic description of the improvements made to the code, most notably the implementation of a different conformer generator algorithm centered around RDKit. This approach yields modest improvements in the strain of the proposed conformers (meaning that more physically reasonable conformations are generated than with the "old" qFit-ligand) and real space correlation of the model with the experimental electron density maps, indicating that the generated conformers also better explain the experimental data then before. In addition, the authors expand the scope of ligands that can be treated, most notably allowing for multi conformer modeling of macrocyclic compounds.
  
  Strengths:
  
  The manuscript is well written, provides a thorough analysis, and represents a needed improvement of our collective ability to model small-molecule binding to macromolecules based on cryo-EM and X-ray crystallography, and can therefore has a positive impact on both drug discovery and general biological research.
  
  Weaknesses:
  
  Weaknesses were addressed during review. Overall, the demonstrated performance gains are modest.
  
  Specific comments:
  
  (1) The accuracy of initial placement may be critical. At the same time, in my experience ambiguous cases are quite common, for example with flat ligands with a few substituents sticking out or with ligands with highly mobile tails. There remain some questions regarding sensitivity to initial ligand placement, which individual users should check for.
  
  Review 1
3. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The manuscript by Flowers et al. aimed to enhance the accuracy of automated ligand model building by refining the qFit-ligand algorithm. Recognizing that ligands can exhibit conformational flexibility even when bound to receptors, the authors developed a bioinformatic pipeline to model alternate ligand conformations while improving fitting and more energetically favorable conformations.
  
  Strengths:
  
  The authors present a computational pipeline designed to automatically model and fit ligands into electron density maps, identifying potential alternative conformations within the structures.
  
  Weaknesses:
  
  Ligand modeling, particularly in cases of poorly defined electron density, remains a challenging task. The procedure presented in this manuscript exhibits limitations in low-resolution electron density maps (lower than 2.0 Å) and low-occupancy scenarios. Considering that the maps used to establish the operational bounds of qFit-ligand were synthetically generated, it's likely that the resolution cutoff will be even stricter when applied to real-world data.
  
  Review 2
4. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Flowers et al describe an improved version of qFit-ligand, an extension of qFit. qFit and qFit-ligand seek to model conformational heterogeneity of proteins and ligands, respectively, cryo-EM and X-ray (electron) density maps using multi-conformer models - essentially extensions of the traditional alternate conformer approach in which substantial parts of the protein or ligand are kept in place. By contrast, ensemble approaches represent conformational heterogeneity through a superposition of independent molecular conformations.
  
  The authors provide a clear and systematic description of the improvements made to the code, most notably the implementation of a different conformer generator algorithm centered around RDKit. This approach yields modest improvements in the strain of the proposed conformers (meaning that more physically reasonable conformations are generated than with the "old" qFit-ligand) and real space correlation of the model with the experimental electron density maps, indicating that the generated conformers also better explain the experimental data than before. In addition, the authors expand the scope of ligands that can be treated, most notably allowing for multi-conformer modeling of macrocyclic compounds.
  
  Strengths:
  
  The manuscript is well written, provides a thorough analysis, and represents a needed improvement of our collective ability to model small-molecule binding to macromolecules based on cryo-EM and X-ray crystallography, and can therefore have a positive impact on both drug discovery and general biological research.
  
  Weaknesses:
  
  There are several points where the manuscript needs clarification in order to better understand the merits of the described work. Overall the demonstrated performance gains are modest (although the theoretical ceiling on gains in model fit and strain energy are not clear!).
  
  We thank the reviewer for their thoughtful review. To address comments, we have added clarifying statements and discussion points around the extent of performance gains, our choice of benchmarking metrics, and the “standards” in the field for significance. We expanded our analysis to highlight how to use qFit ligand in “discovery” mode, which is aimed at supporting individual modeling efforts. As we now write in the discussion:
  
  “It is advisable to employ qFit-ligand selectively, focusing on cases with a moderate correlation between your input model and the experimental data, strong visual density in the binding pocket, high map resolution, or when your single-conformer ligand model is strained.”
  
  Additionally, we note in the discussion:
  
  “qFit-ligand primarily serves as a “thought partner” for manual modeling. Modelers still must resolve many ambiguities, including initial ligand placement, to fully take advantage of qFit capabilities. In active modeling workflows or large scale analyses, the workflow would only accept the output of qFit-ligand when it improves model quality. In cases where qFit-ligand degrades map-to-model fit and/or strain, we can simply revert to the input model. In practice, users can easily remove poorly fitting conformations using molecular modeling software such as COOT, while keeping the well modeled conformations, which is an advantage of the multiconformer approach over ensemble refinement methods.”
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The manuscript by Flowers et al. aimed to enhance the accuracy of automated ligand model building by refining the qFit-ligand algorithm. Recognizing that ligands can exhibit conformational flexibility even when bound to receptors, the authors developed a bioinformatic pipeline to model alternate ligand conformations while improving fitting and more energetically favorable conformations.
  
  Strengths:
  
  The authors present a computational pipeline designed to automatically model and fit ligands into electron density maps, identifying potential alternative conformations within the structures.
  
  Weaknesses:
  
  Ligand modeling, particularly in cases of poorly defined electron density, remains a challenging task. The procedure presented in this manuscript exhibits clear limitations in low-resolution electron density maps (resolution > 2.0 Å) and low-occupancy scenarios, significantly restricting its applicability. Considering that the maps used to establish the operational bounds of qFit-ligand were synthetically generated, it's likely that the resolution cutoff will be even stricter when applied to real-world data.
  
  We thank Reviewer #2 for their comments on the role of conformational flexibility and how our tool addresses the complexity involved in modeling alternative conformations. We agree that there are limitations at low resolution, limiting the application of our algorithm. That is the case with all structural biology tools. Automatically finding alternative conformations of ligands in high-resolution structures is an enhancement to the toolbox of ligand fitting. Expanding the algorithm to work with fragment screening data is important in this realm, as almost all of this data fits in the high-resolution range where qFit-ligand works best.
  
  The reported changes in real-space correlation coefficients (RSCC) are not substantial, especially considering a cutoff of 0.1. Furthermore, the significance of improvements in the strain metric remains unclear. A comprehensive analysis of the distribution of this metric across the Protein Data Bank (PDB) would provide valuable insights.
  
  We agree that the changes are small, partially because the baseline (manually modeled ligands) is very high. To provide additional evidence, we added evaluations using EDIAm, which is a more sensitive metric. In Figure 2 (page 10), representing the development dataset, we see more improvements above 0.1. With this being said, it is unclear what constitutes a ‘substantial’ improvement for either of these metrics, especially considering alternative conformations may only change the coordinates of a subset of ligands, just slightly improving the fit to density.
  
  We agree that looking across the PDB on strain would provide valuable insight. To explore this, we looked to see how qFit-ligand could improve the fitting of deposited ligands with high strain (see section: Evaluating qFit-ligand on a set of structures known to be highly strained, Page 15). While only a subset of these structures had alternative conformers placed (24.6%), we observed that in this subset, the ligands often improved the RSCC and strain. This figure also demonstrates that while RSCC may not change much numerically, the alternative conformers explain previously unexplained density with lower energy conformers than what is currently deposited.
  
  To mitigate the risk of introducing bias by avoiding real strained ligand conformations, the authors should demonstrate the effectiveness of the new procedure by testing it on known examples of strained ligand-substrate complexes.
  
  See above.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  A - Specific comments:
  
  (1) It appears necessary to provide qFit-ligand with an initial model with the ligand already placed. This is not clear from the start of the introduction on page 3. It appears that ligand position is only weakly adjusted fairly late in the process, in step F of Figure 1. It seems, therefore, that the accuracy of initial placement is rather critical (see the example discussed on page 21). At the same time, in my experience, ambiguous cases are quite common, for example with flat ligands with a few substituents sticking out or with ligands with highly mobile tails. It would be helpful for the authors to comment on the sensitivity to initial ligand placement, either in the discussion or, better yet, in the form of an analysis in which the starting model position is randomly perturbed.
  
  In our revised version, we have modified the introduction to clarify the necessity of including an initial ligand model (page 4).
  
  “The qFit-ligand algorithm takes as input a crystal or cryo-EM structure of an initial protein-ligand complex with a single conformer ligand in PDBx/mmCIF format, a density map or structure factors (encoded by a ccp4 formatted map or an MTZ), and a SMILES string for the ligand.”
  
  We also describe our sampling algorithm more clearly (see: Biasing Conformer Generation, page 6). Steps A-E generate many conformations (using RDKit), which are then selected/fit into experimental density (using quadratic programming). To help with additional shifting issues in the input ligand, after the first selection, we do additional rotation/translation of the generated conformers that are kept. We then do another round of fitting to the density (quadratic programming followed by mixed integer quadratic programming).
  
  Given this sampling, we have not elected to do an additional computational experiment to test the “radius of convergence” or dependence on initial conditions. However, we outline the fundamental procedure here so that someone can build on the work and test the idea:
  
  - Create single conformer models as we currently do
  
  - randomly perturb the coordinates of the ligand by 0.1-0.3Å
  
  - refine to convergence, creating a series of “perturbed, modified true positives” for each dataset
  
  - Run qFit ligand
  
  - Evaluate the variability in the resulting multi-conformer models
  
  (2) Top of page 6 ("Biasing Conformer Generation"): the authors say "as we only want to generate ligands that physically fit within the protein binding pocket, we bias conformation generation towards structures more likely to fit well within the receptor's binding site". Apart from the odd redundancy of this sentence, I am confused: at the stage that seems to be referred to here (A-C in Figure 1) is the fit to the electron density already taken into account, or does this only happen later (after step E)?
  
  Thank you for pointing this out. We have edited the statement to clarify it:
  
  “To guide the conformation generation from the Chem.rdDistGeom based on the ligand type and protein pocket, we developed a suite of specialized sampling functions to bias the conformational search towards structures more likely to fit well into the receptor’s binding site.”
  
  We do not consider the electron density during conformer generation (only selection from the generated conformers). The sampling is additionally biased by the type of ligand and the size of the binding pocket.
  
  (3) qFit-ligand appears to be quite slow. Are there prospects for speedup? Can the code take advantage of GPUs or multi-CPU environments?
  
  We agree with this. We have made some algorithmic improvements, most notably removing duplicate conformers based on root mean squared distance. This, along with parallelization, decreased the average runtime from ~19 minutes to ~8 minutes (see additional details: qFit-ligand runtime, page 8). We do not currently take advantage of GPU specific code.
  
  (4) Section: Detection of experimental true positive multi-conformer ligands:
  
  a) Why are carbohydrate ligands excluded? This seems like an important class of ligands that one would like qFit to be able to treat! Which brings me to a related question: can covalently attached groups (e.g., glycosylation sites!) be modeled using qFit-ligand, or is qFit-ligand restricted to non-covalently bound groups?
  
  Currently, qFit-ligand does not support covalently bound ligands, but this is an area of interest we are hoping to expand into. In the revised version, we added the non-covalently attached carbohydrates back into the true positive dataset. In Figure 4 (page 14), we show that qFit-ligand is able to improve fit to the experimental density in around 80% of structures, while also often reducing torsion strain (see additional details: qFit-ligand applied to unbiased dataset of experimental true positives, page 14).
  
  b) "as well as 758 cases where the ligand model's deposited alternate conformations (altlocs) were not bound in the same chain and residue number" - I do not understand what this means, or why it leads to the exclusion of so many structures. Likewise, a number of additional exclusions are described in Figure S3. Some more background on why these all happened would be helpful. Are you just left with the "easy" cases?
  
  Sometimes modelers will list the multiple conformations of a bound ligand as a separate residue within the PDB file, rather than as a single multiconformer model. For example, rather than writing a multiconformer LIG bound at A, 201 with altlocs ‘A’ and ‘B’, a modeler might write this instead as LIG, A, 201 and LIG A, 301. We initially excluded these kinds of structures. However, we agree that this choice resulted in the removal of many potentially valid true positives. We have since updated our data processing pipeline to include these cases, and they are examined in the updated manuscript.
  
  c) I do not follow the argument made at the end of this section (last two paragraphs on page 9): "when using a single average conformation to describe density from multiple conformations, the true low-energy states may be ignored". I get that, but the conformations in the "modified true positives" dataset derive directly from models in which two conformations were modeled, so this cannot be the explanation for why qFit-ligand models result in somewhat lower average strain. It would seem that the paper could be served by providing examples where single conformations were modeled in deposited structures, but qFit detects multiple conformations.
  
  We agree with this comment that the strain obtained from the modified true positives is likely higher than the deposited models. However, the modified structure is refined with a single conformation, and therefore changed from the deposited “A” conformation. Thus, the reduced strain observed in our qFit-ligand models relative to the modified true positives is not unexpected.
  
  To expand our dataset, we also looked at deposited structures with high strain, all of which were modeled as single conformers. Here, we saw a decrease in strain when alternative conformers were placed (see section: Evaluating qFit-ligand on a set of structures known to be highly strained, page 15). Further, we provide an example from the XGen macrocycle dataset where a ligand initially modeled as a single conformer exhibited relatively high strain. After qFit‐ligand modeled a second conformation, the overall strain was reduced (Figure 6C, page 19; Figure 6—figure supplement 1C, page 59).
  
  (5) Section: qFit-ligand applied to an unbiased dataset of experimental true positives Bottom of page 14: The paragraph starting with "qFit-ligand shows particular strength in scenarios with strong evidence..." is enigmatic: there's no illustration (unless it directly relates to the findings in Figure 4, in which case this should be more explicit). Since this points out when the reader will and will not benefit from using qFit-ligand, it should be clear what the authors are talking about.
  
  This claim considers all the evidence presented in the manuscript, not necessarily one particular aspect of it. We advise using qFit-ligand when there is a moderate correlation between the input model and the experimental data, strong visual density in the binding pocket, high map resolution, and/or when your single conformer ligand model is strained. We have made all of these points clearer in the updated manuscript.
  
  B - Section: qFit-ligand can automatically detect and model multiple conformations of macrocycles:
  
  This is an exciting extension of qFit-ligand, but some aspects of the analysis strike me as worrisome. Of the initial dataset of 150 structures, fewer than half make it all the way through analysis. It's hard to believe that this is a fully representative subset. Why, for example, could 29 structures not be refined against the deposited structure factors? Why does strain calculation (in RDKit?) fail on 30 ligands? What about the other 18 cases--why did these fail (in PHENIX?).
  
  We agree that this is a striking number of failures, however, we note that they are not specific shortcomings of qFit-ligand (in fact, most are because standard structural biology and/or cheminformatics software fail on many PDB depositions). Therefore, these failures reflect broader limitations in standard bioinformatics and refinement restraint files when handling macrocycles. The strain calculator we used was not built for macrocycles, and after consulting with many experts in the field, the consensus was that no method works well with macrocycles. We discuss these issues in additional detail in the discussion (page 27):
  
  “Additionally, our algorithm’s placement within the larger refinement and ligand modeling ecosystem highlighted other areas that need improvement. We note that macrocycles, due to their complicated and interconnected degrees of freedom, suffer acutely from the refinement issues, as demonstrated by the failure of approximately one-third of datasets in our standard preparation or post-refinement pipelines due to ligand parameterization issues. Many of these stemmed from problematic ligand restraint files, highlighting the difficulty of encoding the geometric constraints of macrocycles using standard restraint libraries. Improved force-field or restraints for macrocycles are desperately needed to improve their modeling.”
  
  C - Minor issues:
  
  (1) "Fragment-soaked event maps" - this is a semantically strange section title!
  
  We have updated the section title in our revised manuscript. The new title is ‘qFit-ligand recovers heterogeneity in fragment-soaked event maps’.
  
  (2) Too many digits! All over the manuscript, percentages are displayed with 0.01% precision, while these mostly refer to datasets with ~150 structures. Shifting just one structure from one category to another changes these percentages by nearly 1%.
  
  We have updated the sig figs in our revised manuscript.
  
  (3) The authors are keen to classify decreases in RSCC as significant only when these changes exceed 0.1, but do not apply the same standard for increases. For instance, in Figure 4B if we were to classify improvements as significant if ΔRSCC > 0.1, there would be fewer significant improvements than decreases in performance (although it is visually clear that for most datasets things get better. Similarly, in Figure 5A if we were to classify improvements as significant if ΔRSCC > 0.1, qFit-ligand would only yield significant improvements for two out of 73 cases-not a lot).
  
  We agree with the reviewer that there needs to be more consistency in our analysis of improvements/deteriorations. However, we note that operationally, when the decreases in model quality are observed, the modeler would simply reject the new model in favor of the input model. We have added to the discussion:
  
  “In active modeling workflows or large scale analyses, the workflow would only accept the output of qFit-ligand when it improves model quality. In cases where qFit-ligand degrades map-to-model fit and/or strain, we can simply revert to the input model. In practice, users can easily remove poorly fitting conformations using molecular modeling software such as COOT, while keeping the well modeled conformations, which is an advantage of the multiconformer approach over ensemble refinement methods.”
  
  There is generally no consensus in the field as to what might indicate a ‘significant’ change in RSCC, and any threshold we choose would be arbitrary. We note that in our manuscript, we had previously characterized a decrease in RSCC to be ‘significant’ if it exceeded 0.1. However, as there is no real scientific justification for this cutoff, or any cutoff, we moved away from this framing in the revised manuscript. Therefore, we just classify if we improve RSCC. For example, on page 9:
  
  “qFit-ligand modeled an alternative conformation in 72.5% (n=98) of structures. Compared with the modified true positive models, 83.7% (n=113) of qFit-ligand models have a better RSCC and 77.0% (n=104) structures saw an improvement in EDIAm, representing an improved fit to experimental data in the vast majority of structures.”
  
  In addition, we have conducted additional experiments using more sensitive metrics (EDIAm) to further illustrate qFit-ligand’s performance.
  
  (4) Small peptides are not discussed as a class of ligands, although these are quite common.
  
  Canonical peptides can be modeled with standard qFit. Non-canonical peptides present failure modes similar to the macrocycles discussed above, with a mix of ATOM and HETATM records and the need for custom cif definitions and link records. For these reasons we have not included an analysis outside of the macrocycle section. We have noted this caveat in the discussion:
  
  “We note that even linear non-canonical peptides present similar failure modes to macrocycles, with a mix of ATOM and HETATM records and the need for custom cif definitions and link records. For these reasons, we did not include analysis on small peptide ligands; however, canonical peptides can be modeled with standard qFit [8].”
  
  (5) Top of page 10: "while refinement improves": what kind of refinement does this refer to?
  
  This refers to refinement with Phenix. We have updated this language to reflect this (page 8). “We refer to these altered structures as our ‘modified true positives’, which we use as input to qFit-ligand, and subsequent refinement using Phenix.”
  
  (6) Bottom of page 11: "they often did" -> "it often did"
  
  We have made this change in the revised version.
  
  (7) Top of page 14: RMSDs and B factors do have units.
  
  We have added the units in our revision.
  
  (8) Top of page 24. In the generation of a composite omit map, why are new Rfree flags being generated? Did I misunderstand that?
  
  r_free_flags.generate=True only creates R-free flags if they are not present in the input file as is the case for many (especially older) PDB depositions.
  
  (9) Bottom of page 27: how large is the mask? Presumably when alt confs of the ligand are possible, it would be helpful for the mask to cover those?
  
  We agree that this mask should be updated. In our revision, we define the mask around the coordinates of the full qFit-ligand ensemble. The same mask is used to calculate the RSCC of the input (single conformer) model versus the qFit-ligand model.
  
  (10) Middle of page 29: "These structure factors are then used to compute synthetic electron density maps." - It is not clear whether the following three sentences are an explanation of the details of that statement or rather things that are done afterwards.
  
  We clarify this in the manuscript (page 36).
  
  “These structure factors are then used to compute synthetic electron density maps. To each of these maps, we generate and add random Gaussian noise values scaled proportionally to the resolution. This scaling reflects the escalation of experimental noise as resolution deteriorates, a common occurrence in real-life crystallographic data.”
  
  (11) Chemical synthesis: I am not qualified to assess this and am surprised to see some much detail here rather than in some other manuscript. Are the corresponding structures deposited anywhere?
  
  All of the structures we discuss in this manuscript are deposited in the PDB and listed in Supplementary Table 5.
  
  Reviewer #2 (Recommendations for the authors):
  
  The data should consistently present the number of structures that exhibit improvements or deterioration in particular metrics, like RSCC and strain, using a cutoff that should be significant. For instance, stating that "85.93% (n=116) of structures having a better RSCC in the qFit-ligand models compared to the modified true positive models" without clarifying the magnitude of improvement (e.g., a marginal increase of 0.01 in RSCC) lacks meaningful context. The figures should clearly indicate the specific cutoff values used for each metric. The accompanying text should provide a detailed explanation for the selection of these cutoff values, justifying their significance in the context of the study.
  
  Currently, there is no established consensus within the field on what constitutes a 'significant' improvement in RSCC or strain values. As such, we chose not to impose an arbitrary cutoff and just look at which structures improve RSCC. We also removed all language stating significance, as there isn’t a good standard in the field to assess significance. This is especially important as only improvements would be considered in an active modeling project. In cases where qFit ligand degrades the RSCC (or strain) to a large extent, the modeler would simply revert to the input model.
  
  In the first section of Results: "First, for all ligands, we perform an unconstrained search function allowing the generated conformers to only be constrained from the bounds matrix (Figure 1A). This is particularly advantageous for small ligands that benefit from less restriction to fully explore their conformational space. We then perform a fixed terminal atoms search function (Figure 1B)." It is unclear whether a fixed terminal atom search was conducted for each conformer generated in the initial step to further explore the conformational space. This aspect should be clarified to provide a more comprehensive understanding of the methodology.
  
  Each independent conformer generation function (A-E) is initialized with only the input ligand model and runs in parallel with the other functions. These functions do not build on each other, but rather perturb the input molecule independently of one another. In our updated manuscript, we have clarified the methodology (page 6).
  
  “First, in all cases, we perform an unconstrained search function (Figure 1A), a fixed terminal atoms search function (Figure 1B), and a blob search function (Figure 1C).”
  
  Phrase: "We randomly sampled 150 structures and, after manual inspection of the fit of alternative conformations, chose 135 crystal structures as a development set for improving qFit-ligand." The authors should explain why they filtered 10% of the structures.
  
  To develop qFit-ligand, we wanted to use a very high-quality dataset. We needed to know with some degree of certainty that if qFit-ligand failed to produce an alternate conformation (or generated conformations low in RSCC or high in strain), the failure was due to an algorithmic limitation rather than poor-quality input data. Therefore, after selection based on numerical metrics, we manually examined each ligand in Coot to observe if we believed the alternative conformers fit well into the density.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.09.20.613996v2
www.biorxiv.org www.biorxiv.org

Neural dynamics of visual working memory representation during sensory distraction

3
1. Public_Reviews 05 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  This important study reports a reanalysis of one experiment of a previously-published report to characterize the dynamics of neural population codes during visual working memory in the presence of distracting information. This paper presents solid evidence that working memory representations are dynamic and distinct from sensory representations of intervening distractions. This research will be of interest to cognitive neuroscientists working on the neural bases of visual perception and memory.
  
  Summary
2. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  In this study, the authors re-analyzed a public dataset (Rademaker et al, 2019, Nature Neuroscience) which includes fMRI and behavioral data recorded while participants held an oriented grating in visual working memory (WM) and performed a delayed recall task at the end of an extended delay period. In that experiment, participants were pre-cued on each trial as to whether there would be a distracting visual stimulus presented during the delay period (filtered noise or randomly-oriented grating). In this manuscript, the authors focused on identifying whether the neural code in retinotopic cortex for remembered orientation was 'stable' over the delay period, such that the format of the code remained the same, or whether the code was dynamic, such that information was present, but encoded in an alternative format. They identify some timepoints - especially towards the beginning/end of the delay - where the multivariate activation pattern fails to generalize to other timepoints, and interpret this as evidence for a dynamic code. Additionally, the authors compare the representational format of remembered orientation in the presence vs absence of a distracting stimulus, averaged over the delay period. This analysis suggested a 'rotation' of the representational subspace between distracting orientations and remembered orientations, which may help preserve simultaneous representations of both remembered and viewed stimuli. Intriguingly, this rotation was a bit smaller for Expt 2, in which the orientation distractor had a greater behavioral impact on the participants' behavioral working memory recall performance, suggesting that more separation between subspaces is critical for preserving intact working memory representations.
  
  Strengths:
  
  (1) Direct comparisons of coding subspaces/manifolds between timepoints, task conditions, and experiments is an innovative and useful approach for understanding how neural representations are transformed to support cognition
  
  (2) Re-use of existing dataset substantially goes beyond the authors' previous findings by comparing geometry of representational spaces between conditions and timepoints, and by looking explicitly for dynamic neural representations
  
  (3) Simulations testing whether dynamic codes can be explained purely by changes in data SNR are an important contribution, as this rules out a category of explanations for the dynamic coding results observed
  
  Weaknesses:
  
  (1) Primary evidence for 'dynamic coding', especially in early visual cortex, appears to be related to the transition between encoding/maintenance and maintenance/recall, but the delay period representations seem overall stable, consistent with some previous findings. However, given the simulation results, the general result that representations may change in their format appears solid, though the contribution of different trial phases remains important for considering the overall result.
  
  (2) Converting a continuous decoding metric (angular error) to "% decoding accuracy" serves to obfuscate the units of the actual results. Decoding precision (e.g., sd of decoding error histogram) would be more interpretable and better related to both the previous study and behavioral measures of WM performance.
  
  Comments on revised version:
  
  The authors have addressed all my previous concerns.
  
  Review 1
3. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews
  
  Reviewer #1:
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) At several places in the reply to reviewers and the manuscript, when discussing the new simulations conducted, the authors mention they break the 180 trials into a train/test split of 108/108 - is this value correct? If so, how? (pg 19 of updated manuscript)
  
  Thank you for pointing this out; it was not clearly explained. We have now added the explanation to the Methods section:
  
  “For each iteration, we randomly selected 108 responses from the full set of 180 for training, and then independently sampled another 108 from the same full set for testing. This ensured that the same orientation could appear in both sets, consistent with the structure of the original experiment.”
  
  (2) I appreciate the authors have added the variance explained of principal components to the axes of Fig. 3, though it took me a while to notice this, and this isn't described in the figure caption at all. It would likely help readers to directly explain what the % means on each axis of Fig. 3.
  
  Thank you, we have now added a description in both Fig. 2 and 3:
  
  “The axes represent the first two principal components, with labels indicating the percent of total explained variance.”
  
  (3) I believe there is a typo/missing word in the new paragraph on pg 15: "neural visual WM representations in the early visual cortices are [[biased]] towards distractors" (I think the bracketed word may be omitted as a typo)
  
  Thank you - fixed.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.04.12.589170v4
www.biorxiv.org www.biorxiv.org

A whole-animal phenotypic drug screen identifies suppressors of atherogenic lipoproteins

5
1. Public_Reviews 05 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  In this important study, the authors have performed a zebrafish drug screen to identify suppressors of atherogenic lipoproteins. They utilize a well-established LipoGlo assay to find molecules that modulate these lipoproteins, identifying 49 potential hits. They perform some validation experiments, including studies linking enoxolone to its likely inhibitory effect on a specific transcription factor, HNF4alpha. Overall, the results are convincing and robust, and will open up new areas of exploration for those investigators interested in in vivo lipid biology.
  
  Summary
2. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  A whole-organism drug screen was performed to identify molecules that decrease Apolipoprotein B (ApoB) as a target for agents to reduce atherosclerosis. Kelpsch et al. used a zebrafish reporter line, LipoGlo, which is a fusion of the Nano-luciferase protein to the ApoB protein as a proxy for the presence of ApoB-containing lipoproteins (B-lps) in larval stages. The LipoGlo line was screened against a well-characterized drug library and identified 49 hits from their primary screen. Follow-up studies further refined this list to 19 molecules that reproducibly reduced B-lps significantly. The authors focused their studies on enoxolone, a licorice root extract, and showed that larvae treated with this agent can reduce the production of B-lps. As enoxolone has been reported to suppress Hepatocyte Nuclear factor 4a (HNF4a), the authors investigated whether loss-of-hnf4a or pharmacological inhibition of hnf4a in zebrafish also produced similar phenotypes as enoxolone treatment. Their studies showed that this was the case. Transcriptomic studies after enoxolone treatment resulted in altered expression of genes involved in cholesterol biosynthesis and in glucose/insulin signaling pathways. This study highlights the utility of a zebrafish whole-organism chemical screen for modifiers of B-lps production and/or its clearance. A significant finding is that enoxolone inhibits hnf4a in zebrafish to reduce B-lps production and supports targeting HNF4a as a therapeutic means to reduce the emergence of atherosclerosis.
  
  Strengths:
  
  The authors performed a whole-organism chemical screen with over 3000 agents. Such screens are challenging, and the authors used strict criteria for determining hits. The conclusions of this study are well supported by the presented data.
  
  Weaknesses:
  
  There are areas within the study and writing that can be improved and extended, specifically within the gene expression studies.
  
  Review 1
3. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors aimed to develop a large-scale drug screen to identify B-lp modulators in a vertebrate whole-animal system. Using the zebrafish LipoGlo system that the authors had previously published and validated, the authors screened 2762 drug candidates to generate 49 hits and ultimately validated 19 drugs as genuine ApoB-lowering drugs. Using LipoGlo-Electrophoresis, the authors are able to obtain insights into the ApoB-lipoprotein size/subclass distribution. The authors further validate and study the mechanism of a strong hit, Enoxolone, known as also known as 18β-Glycyrrhetinic acid, which has previously been reported to modulate lipid metabolism. The authors also show that Enoxolone effects are mediated through HNF4⍺, which has been previously shown in the mouse system, but this is the first time it has been shown in the zebrafish.
  
  Strengths:
  
  The study was methodical and robust, using a published and well-validated zebrafish LipoGlo model. The authors validated the hits from the screen independently and considered the possibility that some drugs may have been detected as false positive results due to effects on the enzymatic activity of NanoLuciferase; only one hit, verteporfin, was shown to be a false positive. Using LipoGlo-Electrophoresis, the authors are able to obtain extra insights into the ApoB-lipoprotein size/subclass distribution. They showed that while enoxolone treatment reduces total B-lps, there are no overt changes in B-lp size distribution compared to vehicle-treated animals, other than a slight increase in the zero mobility (ZM) fraction, which contains very large particles and/or tissue aggregates. In contrast, the positive control, lomitapide, does show a change in B-lp size distribution compared to vehicle-treated animals - an increase in frequency of LDLs (low-density lipoprotein), but a decrease in VLDLs (very low-density lipoprotein). This study also assesses the LipoGlo-Electrophoresis profile of HNF4⍺ inhibitors. Work in the zebrafish larvae means that the effect on overall development and an entire vertebrate organism can also be assessed. Finally, the authors applied a thorough statistical measure to define a hit, using the Strictly Standardized Mean Difference (SSMD) method.
  
  Weaknesses:
  
  While the screen was thorough and well-validated, the authors missed a chance to provide a lot of extra significance to a wide range of readership. While the hits were thoroughly validated and displayed, the authors could have also presented the LipoGlo-Electrophoresis for all validated hits or at least a number of them. This would hugely increase the insights into these compounds. Also, the authors chose to validate and follow up a mechanism for Enoxolone, yet this hit was already known to modulate lipid metabolism through HNF4⍺, therefore, hugely limiting the impact of the paper. So what the authors have shown that is novel is only subtly added to this - consistent in vertebrate models, RNA sequencing of pathways, further validation of the HNF4⍺ pathway, and a profile of resulting B-lp size distribution. It seemed an easy way out to pick such a candidate, and they could have followed up by validating more thoroughly a completely novel drug. Also, the authors' prior paper showing the methodology also depicted complementary EM and LipoGlo-microscopy approaches. The microscopy especially, would have been an easy complementary add-on to the screen to really give extra insights into B-lp metabolism in a whole organism for all candidates. This felt like a missed opportunity.
  
  Review 2
4. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  In "A‬‭ whole-animal‬‭ phenotypic‬‭ drug‬‭ screen‬‭ identifies‬‭ suppressors‬‭ of‬‭ atherogenic‬ lipoproteins", Kelpsch et al seek to identify new, chemically targetable pathways that regulate ApoB function and could ultimately serve as treatments for elevated lipid disorders and/or cardiovascular disease. Given the interconnected nature of lipid regulation in the whole organism with interdependent organs and secreted components (i.e. lipoproteins), they use the vertebrate model zebrafish to screen a large library of ~3000 compounds for their ability to lower the important ApoB-containing lipoproteins. They find 49 hits with 19 compounds passing a higher level of scrutiny, and focus on the role of enoxolone in modulating B-Ip levels at least partly through the HNF4alpha transcription factor and, putatively, through downstream cholesterol/lipid biosynthetic pathways.
  
  Strengths:
  
  The study uses a well-validated in vivo stain (LipoGlo) for measuring lipoproteins in the context of a developing whole organism with a quantitative read-out on a high-throughput platform, allowing for screening of thousands of compounds altering the complex metabolic/physiologic functions necessary for lipoprotein production.
  
  The use of genetic mutant HNF4alpha to assign the mechanism of action to the prime candidate compound studied (enoxolone) is a powerful approach for this challenging aspect of chemical genetics studies. See caveats in weaknesses.
  
  Weaknesses:
  
  As shown in Figure 5A, the HNF4alpha mutant homozygous -/- already lowers lipoproteins. Is it just that the mutant level is already at a minimum in this homozygous mutant (and thus enoxolone can not induce even lower lipoprotein levels), or is it true that the enoxolone molecule is primarily acting through this TF (i.e. HNF4alpha homozygous mutant is truly epistatic to enoxolone function) as favored in the text.
  
  While it is definitely interesting to study enoxolone effects during whole embryo development, the link to HNF4alpha had previously been described in the literature, as pointed out by the authors. The generalizability of the approach to identify truly novel pathways remains to be fully realized, but sharing this available screen data to date will invite further inquiry and be very valuable to the community.
  
  Figure 5 - The same allele of HNF4alpha loss of function/hypomorph (rdu14) is used in both 5A and 5B, but labeled differently in each subpanel. This is explained in the figure legend, but could be updated to use the same nomenclature in both panels to clarify the Figure presentation.
  
  Review 3
5. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Author response:
  
  We would like to thank the editors and reviewers for their time and their helpful feedback. We largely agree with the reviewer recommendations and comments, which we will address for the next Version on Record of this manuscript. We plan to address reviewer comments in the following ways.
  
  Reviewers requested a more comprehensive analysis of our RNA-seq experiment comparing vehicle treatment to enoxolone treatment over time. We will improve our analysis by providing clear, accessible, and organized tables defining differentially expressed genes at each time point, gene set lists that comprise our gene ontology analysis, and the lists of shared differentially expressed genes from enoxolone treatment and HNF4⍺ knockout. While some of this data was provided in the supplementary files, we recognize that it should be more accessible for the reader. Furthermore, as suggested by the Reviewer, we will enhance our transcriptomic analysis by utilizing bioinformatic tools such as Enrichr.
  
  The Reviewers noted that we identified a number of lipoprotein-lowering compounds through our drug screen, but limited the impact of our manuscript by focusing on enoxolone, a known inhibitor of HNF4⍺ and modulator of lipid metabolism. While we understand with the sentiment that other novel compounds would be interesting to study, we aimed to demonstrate proof of concept in this manuscript. We view the characterization of novel compounds as beyond the direct scope of this manuscript. We did not perform LipoGlo imaging and electrophoresis experiments on each drug because these experiments are low-throughput given the number of drugs and doses we examined. In light of the Reviewer’s comments, we will add some additional characterizations of our validated hits with LipoGlo imaging and electrophoresis studies.
  
  The reviewers also identified a number of typos in text and figures that will be addressed in the next Version on Record. We believe that the recommended changes will strengthen our manuscript and broaden its appeal. We are grateful for the opportunity to improve our work based on the reviewers’ valuable suggestions.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.11.14.623618v3
www.biorxiv.org www.biorxiv.org

Distal Gene Expression Governed by Lamins and Nesprins via Chromatin Conformation Change

4
1. Public_Reviews 05 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  This study provides useful information on the impact of Lamin A/C knockdown on gene expression using RNA-Seq analysis. In addition, the impact of Lamin A/C knockdown on telomere dynamics is explored using live cell imaging. The conclusions, however, are inadequately supported by the data presented. Weaknesses include excessive reliance on gene ontology analysis without further validation of direct versus indirect effects, use of only one shRNA, which may have off target effects, validation of knockdown only from gene expression rather than protein levels, lack of discussion on previous studies showing the presence of Lamin A/C in the nuclear interior among others.
  
  Summary
2. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  This manuscript reports a descriptive study of changes in gene expression after knockdown of the nuclear envelope proteins lamin A/C and Nesprin2/SYNE2 in human U2OS cells. The readout is RNA-seq, which is analyzed at the level of gene ontology and focused investigation of isoform variants and non-coding RNAs. In addition, the mobility of telomeres is studied after these knockdowns, although the rationale in relation to the RNA-seq analyses is rather unclear.
  
  RNA-seq after knockdown of lamin proteins has been reported many times, and the current study does not provide significant new insights that help us to understand how lamins control gene expression. This is particularly because the vast majority of the observed effects on gene expression appear to occur in regions that are not bound by lamin A. It seems likely that these effects are indirect. There is also virtually no overlap between genes affected by laminA/C and by SYNE2, which remains unexplained; for example, it would be good to know whether laminA/C and SYNE2 bind to different genomic regions. The claim in the Title and Abstract that LMNA governs gene expression / acts through chromatin organization appears to be based only on an enrichment of gene ontology terms "DNA conformation change" and "covalent chromatin conformation" in the RNA-seq data. This is a gross over-interpretation, as no experimental data on chromatin conformation are shown in this study. The analyses of transcript isoform switching and ncRNA expression are potentially interesting but lack a mechanistic rationale: why and how would these nuclear envelope proteins regulate these aspects of RNA expression? The effects of lamin A on telomere movements have been reported before; the effects of SYNE2 on telomere mobility are novel (to my knowledge), but should be discussed in the light of previously documented effects of SUN1/2 on the dynamics of dysfunctional telomeres (Lottersberger et al, Cell 2015).
  
  As indicated below, I have substantial concerns about the experimental design of the knockdown experiments.
  
  Altogether, the results presented here are primarily descriptive and do not offer a significant advance in our understanding of the roles of LaminA and SYNE2 in gene regulation or chromatin biology, because the results remain unexplained mechanistically and functionally. Furthermore, the RNAseq datasets should be interpreted with caution until off-target effects of the shRNAs can be ruled out.
  
  Specific comments:
  
  (1) Knockdowns were only monitored by qPCR. Efficiency at the protein level (e.g., Western blots) needs to be determined.
  
  (2) For each knockdown, only a single shRNA was used. shRNAs are infamous for off-target effects; therefore, multiple shRNAs for each protein, or an alternative method such as CRISPR deletion or degron technology, must be tested to rule out such off-target effects.
  
  (3) It is not clear whether the replicate experiments are true biological replicates (i.e., done on different days) or simply parallel dishes of cells done in a single experiment (= technical replicates). The extremely small standard deviations in the RT-qPCR data suggest the latter, which would not be adequate.
  
  Review 1
3. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This study focused on the roles of the nuclear envelope proteins lamin A and C, as well as nesprin-2, encoded by the LMNA and SYNE2 genes, respectively, on gene expression and chromatin mobility. It is motivated by the established role of lamins in tethering heterochromatin to the nuclear periphery in lamina-associated domains (LADs) and modulating chromatin organization. The authors show that depletion of lamin A, lamin A and C, or nesprin-2 results in differential effects of mRNA and lncRNA expression, primarily affecting genes outside established LADs. In addition, the authors used fluorescent dCas9 labeling of telomeric genomic regions combined with live-cell imaging to demonstrate that depletion of either lamin A, lamin A/C, or nesprin-2 increased the mobility of chromatin, suggesting an important role of lamins and nesprin-2 in chromatin dynamics.
  
  Strengths:
  
  The major strength of this study is the detailed characterization of changes in transcript levels and isoforms resulting from depletion of either lamin A, lamin A/C, or nesprin-2 in human osteosarcoma (U2OS) cells. The authors use a variety of advanced tools to demonstrate the effect of protein depletion on specific gene isoforms and to compare the effects on mRNA and lncRNA levels.
  
  The TIRF imaging of dCas9-labeled telomeres allows for high-resolution tracking of multiple telomeres per cell, thus enabling the authors to obtain detailed measurements of the mobility of telomeres within living cells and the effect of lamin A/C or nesprin-2 depletion.
  
  Weaknesses:
  
  Although the findings presented by the authors overall confirm existing knowledge about the ability of lamins A/C and nesprin to broadly affect gene expression, chromatin organization, and chromatin dynamics, the specific interpretation and the conclusions drawn from the data presented in this manuscript are limited by several technical and conceptual challenges.
  
  One major limitation is that the authors only assess the knockdown of their target genes on the mRNA level, where they observe reductions of around 70%. Given that lamins A and C have long half-lives, the effect at the protein level might be even lower. This incomplete and poorly characterized depletion on the protein level makes interpretation of the results difficult. The description for the shRNA targeting the LMNA gene encoding lamins A and C given by the authors is at times difficult to follow and might confuse some readers, as the authors do not clearly indicate which regions of the gene are targeted by the shRNA, and they do not make it obvious that lamin A and C result from alternative splicing of the same LMNA gene. Based on the shRNA sequences provided in the manuscript, one can conclude that the shLaminA shRNA targets the 3' UTR region of the LMNA gene specific to prelamin A (which undergoes posttranslational processing in the cell to yield lamin A). In contrast, the shRNA described by the authors as 'shLMNA' targets a region within the coding sequence of the LMNA gene that is common to both lamin A and C, i.e., the region corresponding to amino acids 122-129 (KKEGDLIA) of lamin A and C. The authors confirm the isoform-specific effect of the shLaminA isoform, although they seem somewhat surprised by it, but do not confirm the effect of the shLMNA construct. Assessing the effect of the knockdown on the protein level would provide more detailed information both on the extent of the actual protein depletion and the effect on specific lamin isoforms. Similarly, given that nesprin-2 has numerous isoforms resulting from alternative splicing and transcription initiation. In the current form of the manuscript, it remains unclear which specific nesprin-2 isoforms were depleted, and to what extent (on the protein level).
  
  Another substantial limitation of the manuscript is that the current analysis, with the exception of the chromatin mobility measurements, is exclusively based on transcriptomic measurements by RNA-seq and qRT-PCR, without any experimental validation of the predicted protein levels or proposed functional consequences. As such, conclusions about the importance of lamin A/C on RNA synthesis and other functions are derived entirely from gene ontology terms and are not sufficiently supported by experimental data. Thus, the true functional consequences of lamin A/C or nesprin depletion remain unclear. Statements included in the manuscript such as "our findings reveal that lamin A is essential for RNA synthesis, ..." (Lines 79-80) are thus either inaccurate or misleading, as the current data do not show that lamin A is ESSENTIAL for RNA synthesis, and lamin A/C and lamin A deficient cells and mice are viable, suggesting that they are capable of RNA synthesis.
  
  Another substantial weakness is that the data and analysis presented in the manuscript raise some concerns about the robustness of the findings. Given that the 'shLMNA' construct is expected to deplete both lamin A and C, i.e., its effect encompasses the depletion of lamin A, which is achieved by the 'shLaminA' construct, one would expect a substantial overlap between the DEGs in the shLMNA and shLaminA conditions, with the shLMNA depletion producing a broader effect as it targets both lamin A and C. However, the Venn Diagram in Figure 4a, the genomic loci distribution in Figure 4b, and the correlation analysis in Supplementary Figure S2 show little overlap between the shLMNA and shLaminA conditions, which is quite surprising. In the mapping of the DEGs shown in Figure 4b, it is also surprising not to see the gene targeted by the shRNA, LMNA, found on chromosome 1, in the results for the shLMNA and shLamin A depletion.
  
  The correlation analysis in Supplementary Figure S2 raises further questions. The authors use doc-inducible shRNA constructs to target lamin A (shLaminA), lamin A/C (shLMNA), or nesprin-2 (shSYNE2). Thus, the no-dox control (Ctr) for each of these constructs would be expected to be very similar to the non-target scrambled controls (Ctrl.shScramble and Dox.shScramble). However, in the correlation matrix, each of the no-dox controls clusters more closely with the corresponding dox-induced shRNA condition than with the Ctrl.shScramble or Dox.shScramble conditions, suggesting either a very leaky dox-inducible system, strong effects from clonal selection, or substantial batch effects in the processing. Either of these scenarios could substantially affect the interpretation of the findings. For example, differences between different clonal cell lines used for the studies, independent of the targeted gene, could explain the limited overlap between the different shRNA constructs and result in apparent differences when comparing these clones to the scrambled controls, which were derived from different clones.
  
  The manuscript also contains several factually inaccurate or incorrect statements or depictions. For example, the depiction of the nuclear envelope in Figure 1 shows a single bilipid layer, instead of the actual double bi-lipid layer of the inner and outer nuclear membranes that span the nuclear lumen. The depiction further lacks SUN domain proteins, which, together with nesprins, form the LINC complex essential to transmit forces across the nuclear envelope. The statement in line 214 that "Linker of nucleoskeleton and cytoskeleton (LINC) complex component nesprin-2 locates in the nuclear envelope to link the actin cytoskeleton and the nuclear lamina" is not quite accurate, as nesprin-2 also links to microtubules via dynein and kinesin.
  
  The statement that "Our data show that Lamin A knockdown specifically reduced the usage of its primary isoform, suggesting a potential role in chromatin architecture regulation, while other LMNA isoforms remained unaffected, highlighting a selective effect" (lines 407-409) is confusing, as the 'shLaminA' shRNA specifically targets the 3' UTR of lamin A that is not present in the other isoforms. Thus, the observed effect is entirely consistent with the shRNA-mediated depletion, independent of any effects on chromatin architecture.
  
  The premise of the authors that lamins would only affect peripheral chromatin and genes at LADs neglects the fact that lamins A and C are also found in the nuclear interior, where they form stable structure and influence chromatin organization, and the fact that lamins A and C and nesprins additionally interact with numerous transcriptional regulators such as Rb, c-Fos, and beta-catenins, which could further modulate gene expression when lamins or nesprins are depleted.
  
  The comparison of the identified DEGs to genes contained in LADs might be confounded by the fact that the authors relied on the identification of LADs from a previous study (ref #28), which used a different human cell type (human skin fibroblasts) instead of the U2OS osteosarcoma cells used in the present study. As LADs are often highly cell-type specific, the use of the fibroblast data set could lead to substantial differences in LADs.
  
  Another limitation of the current manuscript is that, in the current form, some of the figures and results depicted in the figures are difficult to interpret for a reader not deeply familiar with the techniques, based in part on the insufficient labeling and figure legends. This applies, for example, to the isoform use analysis shown in Figure 3d or the GenometriCorr analysis quantifying spatial distance between LADs and DEGs shown in Figure 4c.
  
  Overall appraisal and context:
  
  Despite its limitations, the present study further illustrates the important roles the nuclear envelope proteins lamin A, lamin C, and nesprin-2 have in chromatin organization, dynamics, and gene expression. It thus confirms results from previous studies (not always fully acknowledged in the current manuscript) previously reported for lamin A/C depletion. For example, the effect of lamin A/C depletion on increasing mobility of chromatin had already been demonstrated by several other groups, such as Bronshtein et al. Nature Comm 2015 (PMID: 26299252) and Ranade et al. BMC Mol Cel Biol 2019 (PMID: 31117946). Additionally, the effect of lamin A/C depletion on gene and protein expression has already been extensively studied in a variety of other cell lines and model systems, including detailed proteomic studies (PMIDs 23990565 and 35896617).
  
  The finding that that lamin A/C or nesprin depletion not only affects genes at the nuclear periphery but also the nuclear interior is not particularly surprising giving the previous studies and the fact that lamins A and C are also founding within the nuclear interior, where they affect chromatin organization and dynamics, and that lamins A/C and nesprins directly interact with numerous transcriptional regulators that could further affect gene expression independent from their role in chromatin organization.
  
  The authors provide a detailed analysis of isoform switching in response to lamin A/C or nesprin depletion, but the underlying mechanism remains unclear. Similarly, their analysis of the genomic location of the observed DEGs shows the wide-ranging effects of lamin A/C or nesprin depletion, but lets the reader wonder how these effects are mediated. A more in-depth analysis of predicted regulator factors and their potential interaction with lamins A/C or nesprin would be beneficial in gaining more mechanistic insights.
  
  Review 2
4. Public_Reviews 05 Jun 2025
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  This manuscript describes DOX inducible RNAi KD of Lamin A, LMNA coded isoforms as a group, and the LINC component SYNE2. The authors report on differentially expressed genes, on differentially expressed isoforms, on the large numbers of differentially expressed genes that are in iLADs rather than LADs, and on telomere mobility changes induced by 2 of the 3 knockdowns.
  
  Strengths:
  
  Overall, the manuscript might be useful as a description for reference data sets that could be of value to the community.
  
  Weaknesses:
  
  The results are presented as a type of data description without formulation of models or explanations of the questions being asked and without follow-up. Thus, conceptually, the manuscript doesn't appear to break new ground.
  
  Not discussed is the previous extensive work by others on the nucleoplasmic forms of LMNA isoforms. Also not discussed are similar experiments- for instance, gene expression changes others have seen after lamin A knockdowns or knockouts, or the effect of lamina on chromatin mobility, including telomere mobility - see, for example, a review by Roland Foisner (doi.org/10.1242/jcs.203430) on nucleoplasmic lamina. The authors need to do a thorough search of the literature and compare their results as much as possible with previous work.
  
  The authors don't seem to make any attempt to explore the correlation of their findings with any of the previous data or correlate their observed differential gene expression with other epigenetic and chromatin features. There is no attempt to explore the direction of changes in gene expression with changes in nuclear positioning or to ask whether the genes affected are those that interact with nucleoplasmic pools of LMNA isoforms. The authors speculate that the DEG might be related to changing mechanical properties of the cells, but do not develop that further.
  
  The technical concerns include: 1) Use of only one shRNA per target. Use of additional shRNAs would have reduced concern about possible off-target knockdown of other genes; 2) Use of only one cell clone per inducible shRNA construct. Here, the concern is that some of the observed changes with shRNA KDs might show clonal effects, particularly given that the cell line used is aneuploid. 3) Use of a single, "scrambled" control shRNA rather than a true scrambled shRNA for each target shRNA.
  
  Review 3
Visit annotations in context

Tags

Review 1

Review 2

Review 3

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.04.01.646570v1
www.biorxiv.org www.biorxiv.org

PRMT1-Mediated Metabolic Reprogramming Promotes Leukemogenesis

4
1. Public_Reviews 04 Jun 2025
  
  in eLife
  
  eLife Assessment
  
  This study reveals that PRMT1 overexpression drives tumorigenesis of acute megakaryocytic leukemia (AMKL) and that targeting PRMT1 is a viable approach for treating AMKL. After revision, both reviewers found that these findings are important and that the data supporting these findings are convincing. Furthermore, these findings likely have significant implications for the treatment of AMKL with PRMT1 overexpression in the future.
  
  Summary
2. Public_Reviews 04 Jun 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  PRMT1 overexpression is linked to poor survival in cancers, including acute megakaryocytic leukemia (AMKL). This manuscript describes the important role of PRMT1 in the metabolic reprograming in AMKL. In a PRMT1-driven AMKL model, only cells with high PRMT1 expression induced leukemia, which was effectively treated with the PRMT1 inhibitor MS023. PRMT1 increased glycolysis, leading to elevated glucose consumption, lactic acid accumulation, and lipid buildup while downregulating CPT1A, a key regulator of fatty acid oxidation. Treatment with 2-deoxy-glucose (2-DG) delayed leukemia progression and induced cell differentiation, while CPT1A overexpression rescued cell proliferation under glucose deprivation. Thus, PRMT1 enhances AMKL cell proliferation by promoting glycolysis and suppressing fatty acid oxidation.
  
  Strengths:
  
  This study highlights the clinical relevance of PRMT1 overexpression with AMKL, identifying it as a promising therapeutic target. A key novel finding is the discovery that only AMKL cells with high PRMT1 expression drive leukemogenesis, and this PRMT1-driven leukemia can be effectively treated with the PRMT1 inhibitor MS023. The work provides significant metabolic insights, showing that PRMT1 enhances glycolysis, suppresses fatty acid oxidation, downregulates CPT1A, and promotes lipid accumulation, which collectively drive leukemia cell proliferation. The successful use of the glucose analogue 2-deoxy-glucose (2-DG) to delay AMKL progression and induce cell differentiation underscores the therapeutic potential of targeting PRMT1-related metabolic pathways. Furthermore, the rescue experiment with ectopic Cpt1a expression strengthens the mechanistic link between PRMT1 and metabolic reprogramming. The study employs robust methodologies, including Seahorse analysis, metabolomics, FACS analysis, and in vivo transplantation models, providing comprehensive and well-supported findings. Overall, this work not only deepens our understanding of PRMT1's role in leukemia progression but also opens new avenues for targeting metabolic pathways in cancer therapy.
  
  Comments on revisions:
  
  The reviewer's questions were adequately addressed.
  
  Review 1
3. Public_Reviews 04 Jun 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The manuscript explores the role of PRMT1 in AMKL, highlighting its overexpression as a driver of metabolic reprogramming. PRMT1 overexpression enhances the glycolytic phenotype and extracellular acidification by increasing lactate production in AMKL cells. Treatment with the PRMT1 inhibitor MS023 significantly reduces AMKL cell viability and improves survival in tumor-bearing mice. Intriguingly, PRMT1 overexpression also increases mitochondrial number and mtDNA content. High PRMT1-expressing cells demonstrate the ability to utilize alternative energy sources dependent on mitochondrial energetics, in contrast to parental cells with lower PRMT1 levels.
  
  Strengths:
  
  This is a conceptually novel and important finding as PRMT1 has never been shown to enhance glycolysis in AMKL, and provides a novel point of therapeutic intervention for AMKL.
  
  Comments on revisions:
  
  The author has responded satisfactorily to the review comments and revised the manuscript accordingly.
  
  Review 2
4. Public_Reviews 04 Jun 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Reviewer #1:
  
  We thank the reviewer for highlighting the strength in our manuscript as quote: “Overall, this work not only deepens our understanding of PRMT1's role in leukemia progression but also opens new avenues for targeting metabolic pathways in cancer therapy.”
  
  Weakness :
  
  (1) The findings rely heavily on a single AMKL cell line, with no validation in patient-derived samples to confirm clinical relevance or even another type of leukemia line. Adding the discussion of PRMT1's role in other leukemia types will increase the impact of this work.
  
  We mentioned in the introduction that PRMT1 is known to be the driver for leukemia with diverse types of mutations. In a related paper published in Cell Reports (Su et al. 2021), we demonstrated that PRMT1 is upregulated in MDS myeloid dysplasia syndrome patient samples and that the inhibition of PRMT1 promotes megakaryocytic differentiation of a few MDS samples. AMKL is very rare. Via Children’s Oncology group consortium, we have obtained five AMKL samples with Down’s syndrome and AMKL with RBM15-MKL1 translocation out of 32 samples in the bank over the last 20 years. Interestingly, these patient samples also contain trisomy 19. As PRMT1 is localized on chromosome 19, we speculate that PRMT1 is the significant driver for AMKL leukemia, although we have very limited genetic evidence. However, these human frozen samples derived from peripheral blood cannot be grown in a cell culture system. Although we did not perform metabolic analysis for other AMKL cell lines, we did validate in our unpublished studies that PRMT1 drives down CPT1A expression in normal bone marrow cells and platelets in mice and in human leukemia cell line called MEG-01, which can be differentiated into megakaryocytes upon PMA (phorbol 12-myristate 13-acetate) treatment. Therefore, we expect that the PRMT1-mediated metabolic reprogramming we described here should apply to other types of hematological malignancies.
  
  (2) The observed heterogeneity in Prmt1 expression is noted but not further investigated, leaving gaps in understanding its broader implications.
  
  The expression level of PRMT1 is heterogeneous within leukemia cell populations, making it intriguing to study. We can sort the cells based on high versus low PRMT1 expression using a fluorescent dye called E84. However, we have not conducted transcriptome analysis on these two populations, mainly due to resource constraints. Theoretically, the E84 high-expression population may transiently utilize glucose more efficiently, as these cells do not ectopically express PRMT1. Therefore, when nutrient levels decline, these cells might switch to the low PRMT1 expression population. It will be interesting to see whether endogenous leukemia cells transiently expressing high levels of PRMT1 take advantage of their efficient usage of glucose and thus adapt to the niche environment successfully, as we observed in the Figure 1. I agree that this would be an interesting direction to pursue in the future.
  
  (3) Some figures and figure legends didn't include important details or had not matching information.
  
  We would like to thank the reviewer for pointing out these mistakes. Now we have corrected.
  
  (4) Some wording is not accurate, such as line 80 "the elevated level of PRMT1 maintains the leukemic stem cells", the study is using the cell line, not leukemia stem cells.
  
  Leukemic stem cells are often referred to as cells that can initiate leukemia when transplanted into recipient mice, a concept first proposed by John Dick. In this study, we found that even the 6133 cell line displays heterogeneity in terms of PRMT1 expression levels. We identified a subgroup of 6133 cells as leukemia stem cells due to their ability to initiate leukemia.
  
  (5) In the disease model, histopathology of blood, spleen, and BM should be shown.
  
  We did not conduct histopathology analysis. 6133 cells associated histopathology has been published in Mercher et al JCI 2009 and a recent preprint by Diane Krause’s group.
  
  (6) Can MS023 treatment reverse the metabolic changes in PRMT1 overexpression AMKL cells?
  
  Yes, We demonstrated in figure 4 in the seahorse assays that prmt1 inhibitor can increase the oxygen consumption.
  
  It would be helpful to provide a summary graph at the end of the manuscript.
  
  Yes, we now provide a graphic abstract.
  
  Reviewer #2 (Public review):
  
  We would like to thank the reviewer for finding the manuscript novel and important.
  
  Weaknesses:
  
  (1) The manuscript lacks detailed molecular mechanisms underlying PRMT1 overexpression, particularly its role in enhancing survival and metabolic reprogramming via upregulated glycolysis and diminished oxidative phosphorylation (OxPhos). The findings primarily report phenomena without exploring the reasons behind these changes.
  
  In the introduction, we highlighted that numerous studies have demonstrated how PMT1 directly interacts with several key enzymes involved in glycolysis. These studies provide a mechanism for the observed upregulation of PMT1 in leukemia. Additionally, our previous research published in eLife 2015 {Zhang, 2015 #5031} demonstrated that PRMT1 methylates the RNA-binding protein RBM15, which can bind to the 3' UTR of mRNAs encoding various metabolic enzymes. Therefore, we propose that PMT1 may also regulate metabolism indirectly through the RBM15 protein.
  
  (2) The article shows that PRMT1 overexpression leads to augmented glycolysis and low reliance on the OxPhos. However, the manuscript also shows that PMRT1 overexpression leads to increased mitochondrial number and mitochondrial DNA content and has an elevated NADPH/NAD+ ratio. Further, these overexpressing cells have the ability to better survive on alternative energy sources in the absence of glucose compared to low PMRT1-expressing parental cells. Surprisingly, the seashores assay in PRMT1 overexpressing cells showed no further enhancement in the ECAR after adding mitochondrial decoupler FCCP, indicating the truncated mitochondrial energetics. These results are contradicting and need a more detailed explanation in the discussion.
  
  We have explained the metabolic changes in more detail now. Increasing mitochondria number is not equivalent to increasing fatty acid oxidation and oxygen consumption, as the mitochondria have many other functions. PRMT1 only downregulates CPT1A, which is a rate-limiting step for long-chain fatty acid oxidation. The data suggest that PRMT1 promotes the biogenesis of mitochondria maybe via PGC1alpha as published by Stallcup’s group. The seahorse assays were performed in the high concentration of glucose instead of alternative carbon sources. FCCP treatment under high glucose conditions did not increase the ECR and OCR, which is normal for leukemia cells as shown in other people’s publications {Sriskanthadevan, 2015 #3944}{Kreitz, 2019 #2133}. PRMT1 could dampen the activities of TCA cycle and the electron transportation chain as the proteomic data from our unpublished data and published data {Fong, 2019 #1185} suggested. The elevated NADPH/NAD+ ratio is another indication that glycolysis and anabolism are enhanced by PRMT1.
  
  (3) How was disease penetrance established following the 6133/PRMT1 transplant before MS023 treatment?
  
  Yes, the data was in figure 1f, demonstrating that the penetrance is 100%.
  
  (4) The 6133/PRMT1 cells show elevated glycolysis compared to parental 6133; why did the author choose the 6133 cells for treatment with the MS023 and ECAR assay (Fig.3 b)? The same is confusing with OCR after inhibitor treatment in 6133 cells; the figure legend and results section description are inconsistent.
  
  Sorry for the mistakes while we are preparing the manuscript. We used 6133/PRMT1 cells to be treated with MS023 in figure 4.
  
  (5) The discussion is too brief and incoherent and does not adequately address key findings. A comprehensive rewrite is necessary to improve coherence and depth.
  
  We agree with the reviewer. Now we added comprehensive review of PRMT1-mediated metabolism. The PRMT1 homolgous in yeast is called hmt1. In yeast, hmt1 is upregulated by glucose and enhance glycolysis. So PRMT1 enhanced glycolysis is a conserved pathway in eukaryocytic cells.
  
  (6) The materials and methods section lacks a description of statistical analysis, and significance is not indicated in several figures (e.g., Figures 1C, D, F; Figures 2D, E, F, I). Statistical significance must be consistently indicated. The methods section requires more detailed descriptions to enable replication of the study's findings.
  
  We have added extra details on the methods and statistical analysis for the figures.
  
  (7) Figures are hazy and unclear. They should be replaced with high-resolution images, ensuring legible text and data.
  
  We have prepared separate figure files with high resolution.
  
  (8) Correct the labeling in Figure 2I by removing the redundant "D."
  
  We would like to thank the reviewer and fixed the figure.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.12.12.628174v2
www.biorxiv.org www.biorxiv.org

Pu.1/Spi1 dosage controls the turnover and maintenance of microglia in zebrafish and mammals

4
1. Public_Reviews 04 Jun 2025
 
 in eLife
 
 eLife Assessment
 
 This study presents valuable findings on the regulation of survival and maintenance of brain-resident immune cells called microglia. Using compelling and sophisticated genetic tools, the authors demonstrate a gene dosage-dependent mechanism using which microglia are eliminated. This research on cell competition and survival will be of broad interest to the cell biology community.
 
 Summary
2. Public_Reviews 04 Jun 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary: The article entitled "Pu.1/Spi1 dosage controls the turnover and maintenance of microglia in zebrafish and mammals" by Wu et al., identifies a role for the master myeloid developmental regulator Pu.1 in the maintenance of microglial populations in the adult. Using a non-homologous end joining knock-in strategy, the authors generated a pu.1 conditional allele in zebrafish, which reports wildtype expression of pu.1 with EGFP and truncated expression of pu.1 with DsRed after Cre mediated recombination. When crossed to existing pu.1 and spi-b mutants, this approach allowed the authors to target a single allele for recombination and induce homozygous loss-of-function microglia in adults. This identified that although there is no short-term consequence to loss of pu.1, microglia lacking any functional copy of pu.1 are depleted over the course of months, even when spi-b is fully functional. The authors go on to identify reduced proliferation, increased cell death, and higher expression of tp53 in the pu.1 deficient microglia, as compared to the wildtype EGFP+ microglia. To extend these findings to mammals, the authors generated a conditional Pu.1 allele in mice and performed similar analyses, finding that loss of a single copy of Pu.1 resulted in similar long-term loss of Pu.1-deficient microglia. The conclusions of this paper are overall well supported by the data.
 
 Strengths: The genetic approaches here for visualizing recombination status of an endogenous allele are very clever, and by comparing the turnover of wildtype and mutant cells in the same animal the authors can make very convincing arguments about the effect of chronic loss of pu.1. Likely this phenotype would be either very subtle or non-existent without the point of comparison and competition with the wildtype cells.
 
 Using multiple species allows for more generalizable results, and shows conservation of the phenomena at play.
 
 The demonstration of changes to proliferation and cell death in concert with higher expression of tp53 is compelling evidence for the authors argument.
 
 Weaknesses: This paper is very strong. It would benefit from further investigating the specific relationship between pu.1 and tp53 specifically. Does pu.1 interact with the tp53 locus? Specific molecular analysis of this interaction would strengthen the mechanistic findings. Recommendations for the authors It would be useful to investigate the relationship between pu.1 and tp53. The data presented here show that pu.1 deficient cells have higher expression of tp53, but this could be an indirect effect. However, since pu.1 has known DNA binding motifs, it would be worthwhile to investigate if there are any direct interactions between pu.1 and the tp53 locus -- does pu.1 directly bind and repress tp53 expression? This could be directly investigated with Cut & Run or an EMSA.
 
 The paper would likely also benefit from more in-depth discussion of the relationship of the zebrafish alleles and their relationship to mammalian Pu.1 -- as presented here, the authors are implicitly arguing that zebrafish pu.1 and spi-b are both more closely related to mammalian Pu.1 than to mammalian Spi-b. Clear argument, perhaps backed up by sequence alignment and homology matching, would help readers, especially those less familiar with zebrafish genome duplications.
 
 Comments on Revised Version (from BRE):
 
 The authors performed in silico analyses to support a regulatory relationship between Pu.1 and Tp53. They identified three putative Pu.1 binding sites within the zebrafish tp53 promoter region. Furthermore, they cite prior evidence demonstrating a similar interaction between PU.1 and members of the P53 family through direct DNA binding.
 
 Review 1
3. Public_Reviews 04 Jun 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary: In the presented work by Wu et al. the authors investigate the role of the transcription factor Pu.1 in the survival and maintenance of microglia, the tissue resident macrophage population in the brain. To this end they generated a sophisticated new conditional pu.1 allele in zebrafish using CRISPR mediated genome editing which allows visual detection of expression of the mutant allele through a switch from GFP to dsRed after Cre-mediated recombination. Using EdU pulse-chase labelling, they first estimate the daily turnover rate of microglia in the adult zebrafish brain which was found to be higher than rates previously estimated for mice and humans. After conditional deletion of pu.1 in coro1a positive cells, they do not find a difference in microglia number at 2 and 8 days or 1 month post injection of Tamoxifen. However, at 3 month post injection, a strong decrease in mutant microglia could be detected. While no change in microglia number was detected at 1mpi, an increase in apoptotic cells and decreased proliferation as observed. RNA-seq analysis of WT and mutant microglia revealed an upregulation of tp53, which was shown to play a role in the depletion of pu.1 mutant microglia as deletion in tp53-/- mutants did not lead to a decrease in microglia number at 3mpi. Through analysis of microglia number in pU.1 mutants, the authors further show that the depletion of microglia in the conditional mutants is dependent on the presence of WT microglia. To show that the phenomenon is conserved between species, similar experiments were also performed in mice.
 
 This work expands on previous in vitro studies using primary human microglia. The majority of conclusions are well supported by the data, addition of controls and experimental details would strengthen the conclusions and rigor of the paper.
 
 Strengths:
 
 Generation of an elegantly designed conditional pu.1 allele in zebrafish that allows for the visual detection of expression of the knockout allele. The combination of analysis of pu.1 function in two model systems, zebrafish and mouse, strengthens the conclusions of the paper. Confirmation of the functional significance of the observed upregulation of tp53 in mutant microglia through double mutant analysis provides some mechanistic insight.
 
 Weaknesses:
 
 (1) The presented RNA-Seq analysis of mutant microglia is underpowered and details on how the data was analyzed is missing. Only 9-15 cells were analyzed in total (3 pools of 3-5 cells each). Further the variability in relative gene expression of ccl35b.1, which was used as a quality control and inclusion criterion to define pools consisting of microglia, is extremely high (between ~4 and ~1600, Fig. S7A).
 
 (2) The authors conclude that the reduction of microglia observed in the adult brain after cKO of pu.1 in the spi-b mutant background is due to apoptosis (Lines 213-215). However, they only provide evidence of apoptosis in 3-5 dpf embryos, a stage at which loss of pu.1 alone does lead to a complete loss of microglia (Fig.2E). A control of pu.1 KI/d839 mutants treated with 4-OHT should be added to show that this effect is indeed dependent on the loss of spi-b. In addition, experiments should be performed to show apoptosis in the adult brain after cKO of pu.1 in spi-b mutants as there seems to be a difference in requirement of pu.1 in embryonic and adult stages.
 
 Comments on Revised Version (from BRE):
 
 The authors have elaborated on the details of the RNA-Seq procedure and clarified the distinct phenotypes observed with global versus condition pu.1 knockout. In addition, the authors' proposed collaborative relationship between Pu.1 and Spi-b has been expanded in the revised manuscript. The authors have addressed all the minor concerns raised by the reviewer.
 
 Review 2
4. Public_Reviews 04 Jun 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews
 
 Reviewer #1 (Public review):
 
 Strengths:
 
 The genetic approaches here for visualizing the recombination status of an endogenous allele are very clever, and by comparing the turnover of wildtype and mutant cells in the same animal the authors can make very convincing arguments about the effect of chronic loss of pu.1. Likely this phenotype would be either very subtle or nonexistent without the point of comparison and competition with the wildtype cells.
 
 Using multiple species allows for more generalizable results, and shows conservation of the phenomena at play.
 
 The demonstration of changes to proliferation and cell death in concert with higher expression of tp53 is compelling evidence for the authors' argument.
 
 Weaknesses:
 
 This paper is very strong. It would benefit from further investigating the specific relationship between pu.1 and tp53 specifically. Does pu.1 interact with the tp53 locus? Specific molecular analysis of this interaction would strengthen the mechanistic findings.
 
 We agree with the reviewer’s assessment regarding the significance of the relationship between PU.1 and TP53. To investigate the potential interaction between Pu.1 and Tp53 in zebrafish, we analyzed the promoter region of zebrafish tp53. Indeed, we found three PU.1 binding sites (GAGGAA) on tp53 promoter, which locate on the antisense strand from position -1047 to -1042, -1098 to -1093 and -1423 to -1418 relative to the transcriptional start site (Fig. S10). These potential Pu.1 binding sites indicate a direct interaction between Pu.1 and tp53 locus. Furthermore, a previous study by Tschan et al. (2008) elucidated the mechanism by which PU.1 attenuates the transcriptional activity of the P53 tumor suppressor family through direct binding to the DNA-binding and/or oligomerization domains of p53/p73 proteins. We have also cited this study (Line 399-401) and included all above information in the discussion of the revised manuscript (Line 399-405).
 
 Reviewer #2 (Public review):
 
 Strengths:
 
 Generation of an elegantly designed conditional pu.1 allele in zebrafish that allows for the visual detection of expression of the knockout allele.
 
 The combination of analysis of pu.1 function in two model systems, zebrafish and mouse, strengthens the conclusions of the paper.
 
 Confirmation of the functional significance of the observed upregulation of tp53 in mutant microglia through double mutant analysis provides some mechanistic insight.
 
 Weaknesses:
 
 (1) The presented RNA-Seq analysis of mutant microglia is underpowered and details on how the data was analyzed are missing. Only 9-15 cells were analyzed in total (3 pools of 3-5 cells each). Further, the variability in relative gene expression of ccl35b.1, which was used as a quality control and inclusion criterion to define pools consisting of microglia, is extremely high (between ~4 and ~1600, Figure S7A).
 
 We feel sorry for the unclearness of RNAseq procedures and have accordingly added the details about RNA-seq data analysis in the “Material and methods” section (Line 491-501). Briefly, reads were aligned to the zebrafish genome using the STAR package. Original counts were calculated with featureCounts package. Differential expression genes (DEGs) were identified with the DESeq2 package. Owing to the technical challenge of unambiguously distinguishing microglia from dendritic cells (DCs) in brain cell suspensions, we employed a strategy of isolating 3-5 cells per pool and quantifying the relative expression of the microglia-specific marker ccl34b.1 normalized to the DC-specific marker ccl19a.1. This approach aimed to reduce DC contamination in downstream analyses. Across all experimental groups subjected to RNA-seq analysis, the ccl34b.1/ccl19a.1 expression ratios exceeded 5, confirming microglia as the dominant cell population. Nonetheless, residual DC contamination in the RNA-seq data cannot be entirely ruled out. We have discussed this technical constraint in the revised manuscript to ensure methodological transparency (Line 498-501).
 
 (2) The authors conclude that the reduction of microglia observed in the adult brain after cKO of pu.1 in the spi-b mutant background is due to apoptosis (Lines 213-215). However, they only provide evidence of apoptosis in 3-5 dpf embryos, a stage at which loss of pu.1 alone does lead to a complete loss of microglia (Figure 2E). A control of pu.1 KI/d839 mutants treated with 4-OHT should be added to show that this effect is indeed dependent on the loss of spi-b. In addition, experiments should be performed to show apoptosis in the adult brain after cKO of pu.1 in spi-b mutants as there seems to be a difference in the requirement of pu.1 in embryonic and adult stages.
 
 We apologize for the omission of data regarding conditional pu.1 knockout alone in the embryos in our manuscript, which may have led to ambiguity. We would like to clarify that conditional pu.1 knockout alone at the embryonic stage does not induce microglial death (Fig S2). Microglial death occurs only in both embryonic and adult brains when Pu.1 is disrupted in the spi-b mutant background. The blebbing morphology of some microglia after pu.1 conditional knockout in adult spi-b mutant indicated microglia undergo apoptosis at both embryonic and adult stages (Figure S4 and Fig. S5). The reviewer’s concern likely arises from the distinct outcomes of global pu.1 knockout (Fig. 2) versus conditional pu.1 ablation (Fig. S2). Global knockout eliminates microglia during early development due to Pu.1’s essential role in myeloid lineage specification. We have included this clarification in the revised manuscript (Line 208-211).
 
 (3) The number of microglia after pu.1 knockout in zebrafish did only show a significant decrease 3 months after 4-OHT injection, whereas microglia were almost completely depleted already 7 days after injection in mice. This major difference is not discussed in the paper.
 
 We propose that zebrafish Pu.1 and Spi-b function cooperatively to regulate microglial maintenance, analogous to the role of PU.1 alone in mice. This cooperative mechanism likely explains the observed difference in microglial depletion kinetics between zebrafish and mice following pu.1 conditional knockout. Specifically, the compensatory activity of Spi-b in zebrafish may buffer the immediate loss of Pu.1, whereas in mice, the absence of Spi-b expression in microglia eliminates this redundancy, resulting in rapid microglial depletion. Furthermore, during evolution, SPI-B appears to have acquired lineage-specific roles, becoming absent in microglia. We have included the clarification in the revised manuscript (Line 302-305).
 
 (4) Data is represented as mean +/-.SEM. Instead of SEM, standard deviation should be shown in all graphs to show the variability of the data. This is especially important for all graphs where individual data points are not shown. It should also be stated in the figure legend if SEM or SD is shown
 
 We have represented our data as mean ± SD in the revised manuscript.
 
 Recommendations for the authors:
 
 Reviewing Editor:
 
 To further strengthen the manuscript, we ask the authors to address the reviewers' comments through additional experiments where necessary. In cases where certain experiments may be challenging, we encourage the authors to address these concerns within the text, such as by referencing any prior evidence of pu.1 and tp53 interactions or incorporating in silico analyses that support such interaction.
 
 As suggested, we have performed in-silico analysis of Pu.1 binding sites in zebrafish tp53 promoter and also cited previous paper showing how PU.1 attenuates the transcriptional activity of the P53 tumor suppressor family (Line 399-405).
 
 Reviewer #1 (Recommendations for the authors):
 
 It would be useful to investigate the relationship between pu.1 and tp53. The data presented here show that pu.1 deficient cells have higher expression of tp53, but this could be an indirect effect. However, since pu.1 has known DNA binding motifs, it would be worthwhile to investigate if there are any direct interactions between pu.1 and the tp53 locus -- does pu.1 directly bind and repress tp53 expression? This could be directly investigated with Cut & Run or an EMSA.
 
 The interaction between Pu.1 and Tp53 has been discussed in the public review section.
 
 The paper would likely also benefit from a more in-depth discussion of the relationship of the zebrafish alleles and their relationship to mammalian Pu.1 -- as presented here, the authors are implicitly arguing that zebrafish pu.1 and spi-b are both more closely related to mammalian Pu.1 than to mammalian Spi-b. A clear argument, perhaps backed up by sequence alignment and homology matching, would help readers, especially those less familiar with zebrafish genome duplications.
 
 We have conducted detailed sequence alignment in our previous work (Yu et al., 2017, Blood) and found zebrafish Spi-b shares the highest similarity with the mammalian SPI-B among Ets family transcription factors in zebrafish. A unique P/S/T-rich region known to be essential for mammalian SPI-B transactivation activity is present in zebrafish Spi-b. Our data do not support the interpretation that Spi-b is more closely related to mammalian Pu.1 than to Spi-b. Instead, functional compensation between pu.1 and spi-b in microglia maintenance likely reflects their shared role as Ets-family transcriptional regulators, rather than ortholog-driven redundancy.
 
 Reviewer #2 (Recommendations for the authors):
 
 (1) The nomenclature of the genes in the SPI family in zebrafish is somewhat confusing as genes were renamed several times. It would make it easier for the reader to understand if in the abstract and the main text, spi-b would be referred to as the zebrafish orthologue of mouse SPI-B (as determined by the authors in previous work) rather than the paralogue of zebrafish pu.1. To clarify which genes were analyzed in both zebrafish and mouse, Gene accession numbers should be added.
 
 Thanks for the recommendations. We have changed “the paralogue of zebrafish pu.1” to “the orthologue of mouse Spi-b” in the abstract (Line 22) and added gene accession numbers for both zebrafish and mouse gene (Line 105-106 and 301-302).
 
 (2) Methods RNA-seq: Details on how the aligned reads were analyzed to detect differentially expressed genes are missing and should be added. In addition, a table with read counts, fold changes and adjusted p values should be added.
 
 We have added details of RNA-seq analysis in the Material and Methods part (Line 491-501). A table generated by Deseq2 has been included as a supplemental file to show read counts, fold changes and adjusted p values (Supplemental file 2).
 
 (3) Figure 2H: It would be helpful to the reader if the KO splicing would be shown in comparison to WT splicing.
 
 Thank you for your suggestion. We have added the sequence result between exon 3 and exon 4 of pu.1 from wildtype cDNA to show WT splicing in Figure 2H.
 
 (4) Legend Figure 5C. Relative expression should be replaced with transcripts per million (TPM).
 
 We have corrected it in the figure legend of Figure 5C (Line 786-787).
 
 (5) In Figure S3. the label on the y-axis in panel B is not visible.
 
 We apologize for the mistake during figures assembling. We have corrected it and now the y-axis is visible.
 
 (6) In Figure S7B an explanation for the colors in the heat map is missing and should be added.
 
 Colors represent scaled TPM values. The red color represents high expression while the blue color represents low expression. We have added the information in the figure legend.
 
 (7) A justification for the use of male mice only should be added or additional experiments in female mice should be performed.
 
 Female mice were excluded to avoid variability associated with estrous cycle-dependent hormonal changes, which are known to influence microglial behavior (Habib P et al., 2015). We have added a justification in the revised manuscript (Line 547-548).
 
 (8) The manuscript would benefit from some language editing. A few examples are listed below:
 
 a) line 97: the rostral blood (RBI) should read the rostral blood island.
 
 b) line 373 typo: nucleus translocation should read nuclear translocation.
 
 c) line 393 typo: pu.1-dificent should read pu.1-deficient.
 
 We apologize for the typos or grammar mistakes in the manuscript. We have checked the manuscript thoroughly and revised those typos or grammar mistakes.
 
 Reference:
 
 Tschan MP, Reddy VA, Ress A, Arvidsson G, Fey MF, Torbett BE (2008) PU.1 binding to the p53 family of tumor suppressors impairs their transcriptional activity. Oncogene 27: 3489-93
 
 Yu T, Guo W, Tian Y, Xu J, Chen J, Li L, Wen Z (2017) Distinct regulatory networks control the development of macrophages of different origins in zebrafish. Blood 129: 509-519
 
 Habib P, Beyer C (2015) Regulation of brain microglia by female gonadal steroids. J Steroid Biochem Mol Biol 146: 3-14
 
 AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 1

Review 2

Summary

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.12.14.570333v3

Public_Reviews

Annotations: 10,000

Joined: March 17, 2021

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators